[PEAK] Adding early ambiguity warnings to PEAK-Rules?
P.J. Eby
pje at telecommunity.com
Wed Aug 18 12:38:53 EDT 2010
This morning I woke up with an idea: what if PEAK-Rules could warn
you about ambiguities as soon as you defined a method that made it possible?
For example, if you define something like:
when(f, (A, B))(...)
when(f, (B, A))(...)
where B is a subclass of A, it's possible to statically know that the
(B, B) case will be ambiguous.
I've given it some thought, and there's actually a pretty
straightforward approach you can use to determine whether a new
method could have ambiguities with respect to the already-defined
methods, and even to show exactly what circumstances would have to
arise to make it ambiguous.
Had this been in PEAK-Rules from the beginning, my guess is that the
original TurboJSON ambiguity issues would never have arisen in the first place.
Now, there are two fairly big downsides to this idea (and a few
smaller ones as well).
First, it's slow, or at least potentially slow, because it requires
performing an intersection between new cases and all previous cases
-- an O(N) cost for each added case, or O(N**2) overall. If you
define 100 methods (or more precisely, methods covering 100
alternative situations), the system will do 5000 intersections in the
first pass of the checking algorithm, accumulated over the definition
of the methods -- an average cost of 50 intersections per method.
Second, it can produce false positives or negatives, due to multiple
inheritance. Let's look at this case again:
when(f, (A, B))(...)
when(f, (B, A))(...)
If A and B are unrelated classes, this pair is still ambiguous if you
later define class C(A, B), and then call the function with (C,C)!
The algorithm I have in mind will actually predict this as an
ambiguity, unless it can be told that there's no way for A and B to
be merged in a subclass. For incompatible base types and
non-subclassables, this can be detected automatically, but that still
leaves a wide assortment of potential false positives. (Or false
negatives, if the the algorithm doesn't check for multiple inheritance.)
Now, if you knew that all relevant classes were already defined in
the program, and at least one of the two classes was new-style you
could just check for the existence of a multiply-inheriting class,
and warn about that. The catch is, there's no way for PEAK-Rules to
tell whether you have defined "all relevant classes" yet, unless the
checking is done explicitly at a user-specified point in time, rather
than incrementally as the rules are defined.
However, I'd prefer to avoid having to do such checks explicitly,
because how would anyone ever realize they need to do them in the
first place? Documentation would help, of course, and certainly
one's test suite would be an appropriate place to put such a
verification. But I'd really like it to be more discoverable, e.g.
by having a warning show up in the doctests of a tutorial.
Perhaps a two-pronged approach would be useful. There are many
ambiguity conditions that can be warned about without any multiple
inheritance checking, so there could be a simple check that doesn't
bother warning you of such problems, and then a deep verifier that
checks all currently-defined (new-style) classes. There could even
be an option to automatically check all defined generic functions,
for use in test suites.
There is one other downside to immediate ambiguity checks, though,
and that's rule ordering. If our main example is checked
immediately, then you can't write your rules like this:
when(f, (A, B))(...)
when(f, (B, A))(...)
when(f, (B, B))(...)
Because you'll get a warning on the second rule. Instead, you'll
have to do this:
when(f, (B, B))(...)
when(f, (A, B))(...)
when(f, (B, A))(...)
Which seems a bit unfortunate, as it's basically forcing you to put
the special case before the general cases! (Which is bad for both
the reader and writer of the code.)
On the other hand, if we wait until the function is first called
before issuing warnings, then 1) the first call will be very slow,
and 2) it'll probably spit out a lot of warnings.
So... on balance, I have no idea what to do here. ;-) It seems as
though an explicit full analyzer should definitely exist, but that it
wouldn't bring as much benefit as something with faster
feedback. However, ambiguity checking is inherently an O(N**2)
operation, which seems to limit its applicability in large enough programs.
Any thoughts?
(Oh, and one other idea I had was that ambiguous method error
messages could be radically improved by better formatting of the
ambiguous signatures, and also outputting what special case needs
covering, by intersecting the signatures and pretty-printing
that. It wouldn't help discoverability, but it would probably help
people see how to fix their rules! Maybe I should fix that first
before worrying about verifiers or early warnings... ;-) )
More information about the PEAK
mailing list