[TransWarp] memory leak in peak.running

Phillip J. Eby pje at telecommunity.com
Fri Aug 22 16:45:45 EDT 2003


At 10:43 PM 8/22/03 +0300, alexander smishlajev wrote:
>Phillip J. Eby wrote, at 22.08.2003 22:13:
>
>>>the following script is leaking memory:
>>Have you already tried adding a gc.collect() call to make sure that it's 
>>not just buildup between collections?
>
>yes.  we have tried to do gc.collect() and look at gc.garbage in our 
>application.  this did not help.  the garbage list is always empty, but 
>the application core grows from ~8M to more than 150M by one night - 
>several Kb per second.
>
>i managed to narrow the problem down to the given example.
>
>i have studied the code of MainLoop, UntwistedReactor, TaskQueue and 
>AdaptiveTask, and i must confess that i cannot see any possible leakage 
>sources there.  do you have any clues?
>
>just to be sure, i have just added gc.collect() to the doWork() of this 
>Test class.  this slowed things down a lot (also leakage is much slower), 
>but the memory is still leaking.

I found the problem.  It was in the C version of adapt(), and had nothing 
to do with any of the specific work the app was doing.  Specifically, the 
'pollInterval' attribute binding was trying to suggest a parent component 
for the new poll interval on each execution of the task.  In doing this, it 
tried to adapt the poll interval to IAttachable, which then leaked a new 
bound method for IAttachable.__adapt__ on each execution.  Unfortunately, 
gc.collect() doesn't catch refcount bugs.  :(

I found the problem by simple binary search...  first I changed the test's 
run() method to just loop, calling self.doWork().  Since that didn't leak, 
I knew doWork() wasn't the problem.  I then tried getWork() and doWork(), 
which also didn't leak.  That narrowed it down to something in __call__(), 
and a little more testing showed it wasn't lockMe() or unlockMe().  After a 
bit more of this binary searching I narrowed it down to the line where 
__call__ set 'self.pollInterval = pi'.  Looking through the code used by 
binding descriptors to set attributes, I eventually noticed that 
'suggestParentComponent()' was being called, so I changed 'pollInterval' to 
set 'suggestParent=False', and the leak went away.

Finally, noticing that all 'suggestParentComponent()' will do if it is 
given a number is call adapt() on it, it was then just a matter of combing 
through protocols._speedups source to find the refcount leak.  That, of 
course, took a bit more time.




More information about the PEAK mailing list