[PEAK] Adding early ambiguity warnings to PEAK-Rules?

Wed Aug 18 12:38:53 EDT 2010

This morning I woke up with an idea: what if PEAK-Rules could warn 
you about ambiguities as soon as you defined a method that made it possible?

For example, if you define something like:

   when(f, (A, B))(...)
   when(f, (B, A))(...)

where B is a subclass of A, it's possible to statically know that the 
(B, B) case will be ambiguous.

I've given it some thought, and there's actually a pretty 
straightforward approach you can use to determine whether a new 
method could have ambiguities with respect to the already-defined 
methods, and even to show exactly what circumstances would have to 
arise to make it ambiguous.

Had this been in PEAK-Rules from the beginning, my guess is that the 
original TurboJSON ambiguity issues would never have arisen in the first place.

Now, there are two fairly big downsides to this idea (and a few 
smaller ones as well).

First, it's slow, or at least potentially slow, because it requires 
performing an intersection between new cases and all previous cases 
-- an O(N) cost for each added case, or O(N**2) overall.  If you 
define 100 methods (or more precisely, methods covering 100 
alternative situations), the system will do 5000 intersections in the 
first pass of the checking algorithm, accumulated over the definition 
of the methods -- an average cost of 50 intersections per method.

Second, it can produce false positives or negatives, due to multiple 
inheritance.  Let's look at this case again:

   when(f, (A, B))(...)
   when(f, (B, A))(...)

If A and B are unrelated classes, this pair is still ambiguous if you 
later define class C(A, B), and then call the function with (C,C)!

The algorithm I have in mind will actually predict this as an 
ambiguity, unless it can be told that there's no way for A and B to 
be merged in a subclass.  For incompatible base types and 
non-subclassables, this can be detected automatically, but that still 
leaves a wide assortment of potential false positives.  (Or false 
negatives, if the the algorithm doesn't check for multiple inheritance.)

Now, if you knew that all relevant classes were already defined in 
the program, and at least one of the two classes was new-style you 
could just check for the existence of a multiply-inheriting class, 
and warn about that.  The catch is, there's no way for PEAK-Rules to 
tell whether you have defined "all relevant classes" yet, unless the 
checking is done explicitly at a user-specified point in time, rather 
than incrementally as the rules are defined.

However, I'd prefer to avoid having to do such checks explicitly, 
because how would anyone ever realize they need to do them in the 
first place?  Documentation would help, of course, and certainly 
one's test suite would be an appropriate place to put such a 
verification.  But I'd really like it to be more discoverable, e.g. 
by having a warning show up in the doctests of a tutorial.

Perhaps a two-pronged approach would be useful.  There are many 
ambiguity conditions that can be warned about without any multiple 
inheritance checking, so there could be a simple check that doesn't 
bother warning you of such problems, and then a deep verifier that 
checks all currently-defined (new-style) classes.  There could even 
be an option to automatically check all defined generic functions, 
for use in test suites.

There is one other downside to immediate ambiguity checks, though, 
and that's rule ordering.  If our main example is checked 
immediately, then you can't write your rules like this:

   when(f, (A, B))(...)
   when(f, (B, A))(...)
   when(f, (B, B))(...)

Because you'll get a warning on the second rule.  Instead, you'll 
have to do this:

   when(f, (B, B))(...)
   when(f, (A, B))(...)
   when(f, (B, A))(...)

Which seems a bit unfortunate, as it's basically forcing you to put 
the special case before the general cases!  (Which is bad for both 
the reader and writer of the code.)

On the other hand, if we wait until the function is first called 
before issuing warnings, then 1) the first call will be very slow, 
and 2) it'll probably spit out a lot of warnings.

So...  on balance, I have no idea what to do here.  ;-)  It seems as 
though an explicit full analyzer should definitely exist, but that it 
wouldn't bring as much benefit as something with faster 
feedback.  However, ambiguity checking is inherently an O(N**2) 
operation, which seems to limit its applicability in large enough programs.

Any thoughts?

(Oh, and one other idea I had was that ambiguous method error 
messages could be radically improved by better formatting of the 
ambiguous signatures, and also outputting what special case needs 
covering, by intersecting the signatures and pretty-printing 
that.  It wouldn't help discoverability, but it would probably help 
people see how to fix their rules!  Maybe I should fix that first 
before worrying about verifiers or early warnings...  ;-) )