all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Daniel Colascione <dancol@dancol.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Emacs developers <emacs-devel@gnu.org>
Subject: Re: (heap 1024 82721 1933216)
Date: Sat, 18 Jan 2014 19:31:55 -0800	[thread overview]
Message-ID: <52DB472B.5060805@dancol.org> (raw)
In-Reply-To: <jwvzjmsg023.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 3818 bytes --]

On 01/18/2014 06:53 PM, Stefan Monnier wrote:
>> value. dlmalloc's free memory retention seems a bit severe here.
>
> There are several levels at which the memory is "returned to the other level":
> - if a single cons cell is in use in a "cons cell block", that block
>    can't be freed.
> - those blocks are themselves allocated in groups of 16 IIRC, so those
>    groups can only be freed once all 16 of them have been freed at the
>    previous level.
> - malloc/free can itself decide to keep those "freed" blocks for later
>    use, or to return them to the OS.  At this level, the behavior depends
>    on the malloc library in use, which depends on the OS.
>    IIUC there are malloc libraries in use which never return memory back
>    to the OS.
>
>> Are we just badly fragmenting the heap?
>
> Could be.  For an Emacs that grew to 6GB, I don't find it worrisome
> if it doesn't shrink back below 2GB.

I have no idea what contributed to that 6GB. Shared mappings count 
toward virtsize. Of this 6GB, though, dlmalloc has 2GB in its free 
lists. This figure is worrisome because this memory waste isn't coming 
from a simple leak we can plug.

In the debugger, before I killed Emacs, I called malloc_trim, which 
didn't seem to have any effect. (Not that I expected it to.) dlmalloc is 
an sbrk-based allocator. It can only return memory to the system by 
reducing the data segment size. It can almost never do that in programs 
with typical allocation patterns, so in effect, the heap grows forever.

dlmalloc does have code for using mmap for large allocations, but we've 
rendered that code inoperative in alloc.c by forcing sbrk allocation for 
all lisp objects, however large. If we allocate a 40MB vector and a cons 
block (or anything else), then GC the vector but keep at least one cons 
cell in that block live, we can never get that 40MB back. Ordinarily, 
dlmalloc would have just allocated that 40MB vector using mmap and 
expanded the heap only slightly for the cons block.

We forbid mmap allocation of lisp objects because unexec doesn't restore 
the contents of mmaped regions, leaving some lisp objects out of the dump.

One simple thing we can do to reduce fragmentation is to relax this 
restriction. If we know Emacs is already dumped, we can allow malloc to 
use mmap to allocate some lisp objects since we know emacs won't be 
dumped again.

Today, Emacs technically supports being dumped multiple times, but we 
can safely kill this feature because it is broken on several major 
platforms already and almost certainly goes unused. On Cygwin and NS, 
dumping an already-dumped Emacs is explicitly forbidden. On my GTK3 
GNU/Linux Emacs, attempting to dump a dumped Emacs results in a 
segfaults. I haven't tried it in NT Emacs, but I wouldn't be surprised 
if the feature were also broken there.

The attached patch allows mmap allocation of large lisp objects in an 
Emacs that has been dumped (or that cannot ever be dumped). It could use 
more polish (e.g., enforcing the dump-once restriction for all 
platforms), but it shows that the basic idea works.

Another simple thing we can do is switch malloc implementations. 
jemalloc is a modern mmap-based allocator available on many systems. It 
should be close to a drop-in replacement for dlmalloc. Conveniently, it 
has both sbrk and mmap modes. We could use it in sbrk mode before 
dumping and mmap mode afterward.

Longer-term, it would be nice to be able to compact objects. We could 
move objects during the unmark phase of GC by looking for forwarding 
pointers to new object locations. (Of course, objects found through 
conservative scanning would have to be considered pinned.)

 > I'm much more worried about: how
> on earth did it grow to 6GB?

I have no idea --- I was just doing normal editing over a few dozen files.

[-- Attachment #2: memfrag.patch --]
[-- Type: text/x-patch, Size: 4852 bytes --]

=== modified file 'src/alloc.c'
--- src/alloc.c	2014-01-03 06:42:23 +0000
+++ src/alloc.c	2014-01-19 03:12:44 +0000
@@ -95,6 +95,11 @@
 
 #define MMAP_MAX_AREAS 100000000
 
+/* Specify the allocation size over which to request bytes from mmap
+   directly.  */
+
+#define MMAP_THRESHOLD (64*1024)
+
 #endif /* not DOUG_LEA_MALLOC */
 
 /* Mark, unmark, query mark bit of a Lisp string.  S must be a pointer
@@ -204,6 +209,13 @@
 static char *stack_copy;
 static ptrdiff_t stack_copy_size;
 
+/* True if we need to preserve memory regions for dumping.  */
+#ifdef CANNOT_DUMP
+#define might_dump 0
+#else
+static bool might_dump = true;
+#endif
+
 /* Copy to DEST a block of memory from SRC of size SIZE bytes,
    avoiding any address sanitization.  */
 
@@ -963,21 +975,10 @@
 #endif
 
 /* BLOCK_ALIGN has to be a power of 2.  */
-#define BLOCK_ALIGN (1 << 10)
+#define BLOCK_ALIGN (1 << 16)
 
-/* Padding to leave at the end of a malloc'd block.  This is to give
-   malloc a chance to minimize the amount of memory wasted to alignment.
-   It should be tuned to the particular malloc library used.
-   On glibc-2.3.2, malloc never tries to align, so a padding of 0 is best.
-   aligned_alloc on the other hand would ideally prefer a value of 4
-   because otherwise, there's 1020 bytes wasted between each ablocks.
-   In Emacs, testing shows that those 1020 can most of the time be
-   efficiently used by malloc to place other objects, so a value of 0 can
-   still preferable unless you have a lot of aligned blocks and virtually
-   nothing else.  */
-#define BLOCK_PADDING 0
 #define BLOCK_BYTES \
-  (BLOCK_ALIGN - sizeof (struct ablocks *) - BLOCK_PADDING)
+  (BLOCK_ALIGN - sizeof (struct ablocks *))
 
 /* Internal data structures and constants.  */
 
@@ -1001,11 +1002,6 @@
      (if not, the word before the first ablock holds a pointer to the
      real base).  */
   struct ablocks *abase;
-  /* The padding of all but the last ablock is unused.  The padding of
-     the last ablock in an ablocks is not allocated.  */
-#if BLOCK_PADDING
-  char padding[BLOCK_PADDING];
-#endif
 };
 
 /* A bunch of consecutive aligned blocks.  */
@@ -1015,7 +1011,7 @@
 };
 
 /* Size of the block requested from malloc or aligned_alloc.  */
-#define ABLOCKS_BYTES (sizeof (struct ablocks) - BLOCK_PADDING)
+#define ABLOCKS_BYTES (sizeof (struct ablocks))
 
 #define ABLOCK_ABASE(block) \
   (((uintptr_t) (block)->abase) <= (1 + 2 * ABLOCKS_SIZE)	\
@@ -1062,7 +1058,8 @@
       /* Prevent mmap'ing the chunk.  Lisp data may not be mmap'ed
 	 because mapped region contents are not preserved in
 	 a dumped Emacs.  */
-      mallopt (M_MMAP_MAX, 0);
+      if (might_dump)
+        mallopt (M_MMAP_MAX, 0);
 #endif
 
 #ifdef USE_ALIGNED_ALLOC
@@ -1084,7 +1081,8 @@
 
 #ifdef DOUG_LEA_MALLOC
       /* Back to a reasonable maximum of mmap'ed areas.  */
-      mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+      if (might_dump)
+        mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
 #endif
 
 #if ! USE_LSB_TAG
@@ -1728,14 +1726,16 @@
          mmap'ed data typically have an address towards the top of the
          address space, which won't fit into an EMACS_INT (at least on
          32-bit systems with the current tagging scheme).  --fx  */
-      mallopt (M_MMAP_MAX, 0);
+      if (might_dump)
+        mallopt (M_MMAP_MAX, 0);
 #endif
 
       b = lisp_malloc (size + GC_STRING_EXTRA, MEM_TYPE_NON_LISP);
 
 #ifdef DOUG_LEA_MALLOC
       /* Back to a reasonable maximum of mmap'ed areas.  */
-      mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+      if (might_dump)
+        mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
 #endif
 
       b->next_free = b->data;
@@ -3039,7 +3039,8 @@
       /* Prevent mmap'ing the chunk.  Lisp data may not be mmap'ed
 	 because mapped region contents are not preserved in
 	 a dumped Emacs.  */
-      mallopt (M_MMAP_MAX, 0);
+      if (might_dump)
+        mallopt (M_MMAP_MAX, 0);
 #endif
 
       if (nbytes <= VBLOCK_BYTES_MAX)
@@ -3057,7 +3058,8 @@
 
 #ifdef DOUG_LEA_MALLOC
       /* Back to a reasonable maximum of mmap'ed areas.  */
-      mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+      if (might_dump)
+        mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
 #endif
 
       consing_since_gc += nbytes;
@@ -6777,9 +6779,9 @@
 #endif
 
 #ifdef DOUG_LEA_MALLOC
-  mallopt (M_TRIM_THRESHOLD, 128 * 1024); /* Trim threshold.  */
-  mallopt (M_MMAP_THRESHOLD, 64 * 1024);  /* Mmap threshold.  */
-  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);   /* Max. number of mmap'ed areas.  */
+  mallopt (M_TRIM_THRESHOLD, 2 * MMAP_THRESHOLD);
+  mallopt (M_MMAP_THRESHOLD, MMAP_THRESHOLD);
+  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
 #endif
   init_strings ();
   init_vectors ();
@@ -6804,6 +6806,11 @@
 #if USE_VALGRIND
   valgrind_p = RUNNING_ON_VALGRIND != 0;
 #endif
+
+#ifndef CANNOT_DUMP
+  if (initialized)
+    might_dump = false;
+#endif
 }
 
 void


  reply	other threads:[~2014-01-19  3:31 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-18 13:39 (heap 1024 82721 1933216) Daniel Colascione
2014-01-18 13:50 ` Eli Zaretskii
2014-01-18 13:51   ` Daniel Colascione
2014-01-18 14:07     ` Eli Zaretskii
2014-01-18 14:12       ` Daniel Colascione
2014-01-18 17:17         ` Eli Zaretskii
2014-01-18 14:15       ` David Kastrup
2014-01-18 17:19         ` Eli Zaretskii
2014-01-19  2:53 ` Stefan Monnier
2014-01-19  3:31   ` Daniel Colascione [this message]
2014-01-19  4:19     ` Stefan Monnier
2014-01-19  4:24       ` Daniel Colascione
2014-01-19 14:37         ` Stefan Monnier
2014-01-19 15:45     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52DB472B.5060805@dancol.org \
    --to=dancol@dancol.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.