[PEAK] Fwd: Adding C generation to bytecode assembler

Mon Aug 21 22:19:45 EDT 2006

At 07:39 PM 8/20/2006 -0700, Michel Pelletier wrote:

>On 8/20/06, Phillip J. Eby 
><<mailto:pje at telecommunity.com>pje at telecommunity.com> wrote:
>
>
>> >My initial crude benchmark results are good, simple loops with math ops
>> >have a 2x increase in speed, with nested loops gaining even more.  This is
>> >without any stack movement reduction yet, so once I implement that I think
>> >the performance will jump even more.
>>
>>I'm impressed, but also very surprised.  I'd really like to see what it is
>>that you're doing, because given the goals and steps that you laid out, I
>>wouldn't have expected such a big improvement in performance without at
>>least some inlining or type inference.  In fact, I'm impressed that you got
>>any improvements at all; the C interpreter has previously been shown not to
>>really *have* that much overhead.  If you're doing this while generating
>>reasonably readable C code (i.e. at least as good as Pyrex's) then I'm in
>>absolute awe.
>
>I think my loop with a few math ops is sufficiently contrived enough that 
>it can't be considered a very reliable benchmark of what's to come or much 
>insperation for awe. ;)

Fair enough.

>   I actually expect a speedup benefit to happen only in certain cases, I 
> suspect most functions would not benefit enough or at all to even bother.

This is especially going to be the case where function calls are concerned, 
but probably also integer math and certain other operations where the 
interpreter has type-specific speedup tricks that can't be obtained by 
straight C API translation.

>   But I'm suprised you don't think the inner interpreter or the stack 
> movement has much overhead, that seems to be pretty low hanging fruit to 
> me but the future could prove me dead wrong. I remember one or two 
> threads specifically on this subject on python-dev, but I don't remember 
> much actual measurement being conclusively shown at the time to prove it 
> one way or another.

Well, it's said that p2c (or was it py2c?) achieved only 10-15% speedup 
using these techniques.

>I've not studied a lot of Pyrex's output, but some through a few 
>trials.  I would be suprised and suspect of my own results if Pyrex didn't 
>generate faster code than I have here,

Hm.  Well, Pyrex generates code that does a lot of setting variables to 
None and inc/decrefing all over the place, and plus which it doesn't take 
nearly as much advantage of the type information it has available to it as 
you'd expect.  And until relatively recent versions it used C strings to do 
attribute access, which is horrifically slow.

In general, simply taking a Python program and compiling it with Pyrex is 
very likely to produce code that is *slower* than the original Python 
program -- even after you add some type declarations.

>  although I don't think Pyrex is as readable,

Heh.  If you want better performance than the CPython interpreter, I think 
you're going to end up with code that's far less readable than Pyrex, but 
that's just a guess.

>it's certainly more mature and probably the closest project in terms of 
>goals.  I think my experiment is much simpler than Pyrex though, not 
>having to maintain a language parser or complex code generator or deal 
>with some of the restrictions Pyrex has.  One of my main goals is to 
>maintain 100% interpreter compatibility.

Unfortunately, I don't think that you can do that *and* get more than say a 
10-15% speedup -- i.e., one probably not worth all the effort.  The big 
speedups are really all in one of two places:

1. Specialization (via type declarations, inference, or JIT specialization 
ala Psyco)

2. Dropping compatibility with frames, function objects, trace/profile 
hooks, GIL releasing, etc.

And of the two, my guess is that specialization is the only way to get 
triple-digit performance improvements.  I could be wrong, of course, but my 
assessment of the odds is a significant why I haven't tried such 
experiments myself.

Meanwhile, it seems to me that PyPy's new extension builder system is 
actually the closest thing to your goals, in that it operates on bytecode 
and produces C code using the CPython API.  I don't know whether it 
successfully produces any performance improvements, or whether its primary 
goal is just to make it easy to create wrappers for C libraries (ala 
Pyrex).  However, it seems worth looking into.

Also, is there an SVN repository somewhere of what you're working on?  I'd 
love to have a look.