Hi. Sorry it took me this long to get back to you. I'll try to reply in a more timely way if you have more requests. Notes inline martin rudalics writes: > >> IIUC your numbers of Lucid with scrollbars now coincide with the numbers > >> of Lucid without scrollbars before the "fix". > > > > No, that's not right. > > > > Lucid with scrollbars post-fix is the blue line: memory usage is stable > > as frames are created/destroyed: the leak is ~0 > > > > Lucid without scrollbars pre-fix is the green line. Memory usage is > > climbing. We aren't leaking as badly as the other cases, but we ARE > > leaking. > > > > So the fix resolved the large leakages in the other cases and also the > > small leakages that weren't scroll-bar-related. > > I can't follow you. before.no.scroll.log has > > 16112 19608 S ? 00:00:00 emacs > ... > 16112 29700 S ? 00:00:12 emacs > > while after.yes.scroll.log has > > 17508 19652 S ? 00:00:00 emacs > ... > 17508 29160 S ? 00:00:13 emacs > > which strike me as very similar (especially given the figures of the two > other logs). What am I missing? I wasn't consistent in how long I was running each trial, so by simply looking at the final memory usage values as you're doing above, you're looking at different runtimes. The plot attached the last time shows all of the raw data, and you should clearly see the different slopes of the two trials: i.e. one is leaking and the other is not. > >> OTOH the numbers for GTK largely coincide with those of Lucid with > >> scrollbars before the "fix". So X itself seems much more dominant than > >> any toolkit particularities. > > > > I don't think this is right either. Lucid with scrollbars pre-fix is the > > purple line. We leak memory at a high, constant rate. GTK memory usage > > (yellow) is noisy and fragmented (I bet we're invoking malloc/free much > > more often). The baseline memory consumption is higher, the past that, > > the leak rate isn't nearly as bad as the purple. The higher > > fragmentation means that the internals of malloc() matter too: I invoked > > malloc_trim() just after t=450s, and we see the memory usage dropped > > sharply as a result. > > I can't tell that since it's not in your figures. But again, the raw > data from before.log and after.yes.scroll.gtk.log seem similar too. Again, the plot is much more descriptive. The GTK case uses a lot more memory upfront and then leaks slowly. Conversely, the lucid no-fix-yes-scrollbars case uses much less upfront, but then leaks much faster. In the arbitrary-length trials from the previous email, they end up at roughly the same place. But that's a coincidence. > >> This does not explain any difference between the GTK (before the "fix") > >> and Lucid (after the "fix") behaviors. What happens with GTK when you > >> allow it to delete the terminal by allowing terminal->reference_count > >> drop to zero? If it does not crash immediately, is the memory leak more > >> heavy than it is now? > > > > I haven't run that experiment, but I could do that. Probably won't get > > to it for a week, though. > > Please try. If you succeed running it, the resulting difference between > the "before" GTK and Lucid runs should IMHO show the noise introduced by > malloc/free more clearly. And obviously running the before/after > experiments for Motif would be interesting too: The Motif build crashed > in the menu bar section while the Lucid build crashed in the scroll bar > section. > > >> Also, could you try whether changing gc-cons-threshold in either > >> direction has any impact on the occurrence of the toolkit bug or the > >> growth of the memory leak? Once I thought that this could affect the > >> frequency of the error but didn't get any conclusive results. I ran a few more trials to get the data you asked for. All of these: - Are at the same baseline revision as before: 979797b9eca0ab. This is after your lucid "fix" - Are 4 minutes long - invoke malloc_trim() at the end, so the noticeable drop in resident memory use at the end of each trial is visible The different trials are - GTK stock: has the reference-counting logic to prevent terminal->reference_count reaching 0 - GTK without the refcount logic: has NO reference-counting logic to prevent terminal->reference_count reaching 0 - Lucid stock - Lucid with no refcount logic and stock 800k gc-cons-threshold - Lucid with no refcount logic and 80k gc-cons-threshold - Lucid with no refcount logic and 8000k gc-cons-threshold Raw data and accompanying plot attached. Plot made thusly: cat \ <(< gtk.stock.log awk '{print NR,"gtk.stock.log", $2}') \ <(< gtk.no.refcount.logic.log awk '{print NR,"gtk.no.refcount.logic.log", $2}') \ <(< lucid.no.refcount.logic.gc80k.log awk '{print NR,"lucid.no.refcount.logic.gc80k.log", $2}') \ <(< lucid.no.refcount.logic.gc800k.log awk '{print NR,"lucid.no.refcount.logic.gc800k.log", $2}') \ <(< lucid.no.refcount.logic.gc8000k.log awk '{print NR,"lucid.no.refcount.logic.gc8000k.log", $2}') \ <(< lucid.yes.refcount.logic.log awk '{print NR,"lucid.yes.refcount.logic.log", $2}') \ | feedgnuplot --lines --dataid --domain --autolegend --xlabel 'frame index (2 seconds per frame)' --ylabel 'Memory consumed (kB)' As expected, turning off the refcounting logic for the GTK terminal makes it crash long before the trial ends. And it leaks lots of memory in the meantime: ~ 430kB/frame For whatever reason, the gtk trial is much more consistent this time. It leaks at ~ 15kB/frame, although malloc_trim() gives back 1300kB at the end, so the 15kB/frame is an overestimate, as far as emacs is concerned at least. As before, adding the refcounting logic to the lucid terminal drops the leak rate to 0. Tweaking gc-cons-threshold does have an effect, but it's not clear what. The default is 800kB. Looks like the 80k and 800k settings produce similar leaks at 54kB/frame. An 8000kB setting produces a sharp climb of about 8000kB above the baseline, and then a slower leak of ~44kB/frame, although this isn't as linear as the others. Does any of this speak to you?