On 01/18/2014 06:53 PM, Stefan Monnier wrote: >> value. dlmalloc's free memory retention seems a bit severe here. > > There are several levels at which the memory is "returned to the other level": > - if a single cons cell is in use in a "cons cell block", that block > can't be freed. > - those blocks are themselves allocated in groups of 16 IIRC, so those > groups can only be freed once all 16 of them have been freed at the > previous level. > - malloc/free can itself decide to keep those "freed" blocks for later > use, or to return them to the OS. At this level, the behavior depends > on the malloc library in use, which depends on the OS. > IIUC there are malloc libraries in use which never return memory back > to the OS. > >> Are we just badly fragmenting the heap? > > Could be. For an Emacs that grew to 6GB, I don't find it worrisome > if it doesn't shrink back below 2GB. I have no idea what contributed to that 6GB. Shared mappings count toward virtsize. Of this 6GB, though, dlmalloc has 2GB in its free lists. This figure is worrisome because this memory waste isn't coming from a simple leak we can plug. In the debugger, before I killed Emacs, I called malloc_trim, which didn't seem to have any effect. (Not that I expected it to.) dlmalloc is an sbrk-based allocator. It can only return memory to the system by reducing the data segment size. It can almost never do that in programs with typical allocation patterns, so in effect, the heap grows forever. dlmalloc does have code for using mmap for large allocations, but we've rendered that code inoperative in alloc.c by forcing sbrk allocation for all lisp objects, however large. If we allocate a 40MB vector and a cons block (or anything else), then GC the vector but keep at least one cons cell in that block live, we can never get that 40MB back. Ordinarily, dlmalloc would have just allocated that 40MB vector using mmap and expanded the heap only slightly for the cons block. We forbid mmap allocation of lisp objects because unexec doesn't restore the contents of mmaped regions, leaving some lisp objects out of the dump. One simple thing we can do to reduce fragmentation is to relax this restriction. If we know Emacs is already dumped, we can allow malloc to use mmap to allocate some lisp objects since we know emacs won't be dumped again. Today, Emacs technically supports being dumped multiple times, but we can safely kill this feature because it is broken on several major platforms already and almost certainly goes unused. On Cygwin and NS, dumping an already-dumped Emacs is explicitly forbidden. On my GTK3 GNU/Linux Emacs, attempting to dump a dumped Emacs results in a segfaults. I haven't tried it in NT Emacs, but I wouldn't be surprised if the feature were also broken there. The attached patch allows mmap allocation of large lisp objects in an Emacs that has been dumped (or that cannot ever be dumped). It could use more polish (e.g., enforcing the dump-once restriction for all platforms), but it shows that the basic idea works. Another simple thing we can do is switch malloc implementations. jemalloc is a modern mmap-based allocator available on many systems. It should be close to a drop-in replacement for dlmalloc. Conveniently, it has both sbrk and mmap modes. We could use it in sbrk mode before dumping and mmap mode afterward. Longer-term, it would be nice to be able to compact objects. We could move objects during the unmark phase of GC by looking for forwarding pointers to new object locations. (Of course, objects found through conservative scanning would have to be considered pinned.) > I'm much more worried about: how > on earth did it grow to 6GB? I have no idea --- I was just doing normal editing over a few dozen files.