GC: cons sweeping and cons block size

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* GC: cons sweeping and cons block size
@ 2007-07-03 15:16 Dmitry Antipov
  0 siblings, 0 replies; 9+ messages in thread
From: Dmitry Antipov @ 2007-07-03 15:16 UTC (permalink / raw)
  To: emacs-devel

Here is another observation I've made around my previous stuff.

For the current cons sweeping method, the size of cons block (i.e. number of
conses per 'cons_block') is orthogonal to sweeping speed since the process
will scan over each bit of each cons block anyway. But this is not true for
the method I'm proposing.

Taking a convenient 32-bit hardware, consider 1K and 4K cons blocks. In the
first case, one cons block has room for 125 conses, and sweeping process for
a full block will do 3 fast and 1 slow bitmap scan. For 100000 conses, we will
allocate 800 cons blocks, and (assuming that all conses are marked) sweeping
will do 2400 fast and 800 slow scans. For the second case, one cons block has
room for ~500 conses, and sweeping process for a full block will do 15 fast and
1 slow bitmap scan. Again, for 100000 conses, we will allocate ~200 blocks and
sweeping will do ~3000 fast and ~200 slow scans.

Well-skilled eye will pay attention on the fact that non-full larger block
tends to be more fragmented than the smaller one, and this fragmentation
really reduces the fast/slow scans ratio for a larger block. This is true. But,
since 50% - 80% of a cons blocks are usually full and fast/slow scans ratio is
much better for a larger block, the potential slowdown for a non-full block is
expiated by the substantial speedup for the full one.

As usual, nothing is ideal. Since cons block must be aligned, malloc()'ing
1K-aligned 1K-block is cheaper than 4K-aligned 4K-block, and the latter allocation
probably tends to a higher heap fragmentation. On the other side, since malloc()
needs to maintain the number of blocks which is 4 times less, it's internal space
overhead is reduced.

Practically, here is a typical cons sweeping times for '(replace-string "a" "__A__")'
in src/ChangeLog, depends on the cons block size:

GC     1k       2k       4k       8k      16k
---------------------------------------------
  0    503      470      477      452      675
  1    581      490      488      448      428
  2    674      572      574      523      500
  3    538      416      418      352      362
  4    514      404      602      361      349
  5    597      466      448      408      353
  6    653      529      643      445      410
  7    697      554      518      458      443
  8    749      586      548      480      465
  9    892      617      564      500      482
  10   821      626      578      492      489
  11   857      644      581      510      495
  12   874      656      595      500      487
  13   906      679      608      501      481
  14   936      705      612      513      500
  15   986      737      648      537      549
  16   1040     761      674      552      539
  17   1106     885      712      671      564
  18   1214     848      746      711      586
  19   1239     910      783      652      616
  20   1317     937      833      684      657
  21   1381     992      875      710      681
  22   1458     1043     902      783      709
  23   1527     1095     960      790      752
  24   1615     1158     1021     838      792
  25   1800     1244     1077     892      844
  26   1822     1294     1145     944      888
  27   1921     1383     1201     987      939
  28   2033     1455     1259     1166     981
  29   2144     1626     1312     1089     1042
  30   2224     1578     1373     1219     1086
  31   2357     1696     1468     1221     1158
  32   2493     1796     1569     1301     1233
  33   3127     1860     1726     1338     1302
  34   2767     2000     1724     1446     1354
  35   2885     2113     1825     1599     1439

This table shows an effect described above quite well. Although the possible heap
fragmentation penalty caused by larger blocks needs to be investigated closer,
I believe that 4K cons blocks will be a better choice for the proposed sweeping
method than 1K ones.

Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* GC: cons sweeping and cons block size
       [not found] <E1I5ktS-0001uQ-Uc@monty-python.gnu.org>
@ 2007-07-03 20:22 ` Jonathan Yavner
  2007-07-05  1:29   ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Yavner @ 2007-07-03 20:22 UTC (permalink / raw)
  To: emacs-devel

> malloc()'ing 1K-aligned 1K-block is cheaper than 4K-aligned 4K-block,
> and the latter allocation probably tends to a higher heap
> fragmentation.

On some systems, large malloc blocks go directly to MMAP_ANONYMOUS and 
so fragmentation worries are eliminated.  I don't see any support for 
direct use of mmap() in gmalloc.c, though.  Should we consider it?  
OS-specific code is needed -- Windoze calls it VirtualAlloc(), while 
MacOS calls it vm_alloc().

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GC: cons sweeping and cons block size
@ 2007-07-04  6:43 dmantipov
  2007-07-05 14:06 ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: dmantipov @ 2007-07-04  6:43 UTC (permalink / raw)
  To: emacs-devel; +Cc: jyavner

Jonathan Yavner wrote:

> On some systems, large malloc blocks go directly to MMAP_ANONYMOUS and 
> so fragmentation worries are eliminated.  I don't see any support for 
> direct use of mmap() in gmalloc.c, though.  Should we consider it?  
> OS-specific code is needed -- Windoze calls it VirtualAlloc(), while 
> MacOS calls it vm_alloc().

On some systems, Lisp_Object may be a tagged pointer with 29-bit (on 32-bit system) pointer field. Since mmap() tends to allocate memory at high addresses, this memory is likely to be non-addressable by such pointer. That is why lisp_align_malloc() currently uses mallopt(M_MMAP_MAX, 0) to prevent mapping the region (if underlying malloc() supports this). 

On other systems, like the most commonly used i386 systems with GNU C library :-), all malloc()ed addresses are multiplies of 8, so top 3 bits may be used for tagging and address space may be unrestricted. It's technically possible to use mmap() on such systems, an it would ne an interesting task to implement this. The minor weakness of having everything 8-bytes aligned is a fragmentation - for example, 20-bytes objects must be interleaved with 4-bytes holes, or have spacers to enlarge themselves to 24 bytes.

And, finally, what about the dumper ? "Lisp data may not be mmap()'ed because mapped region contents are not preserved in a dumped Emacs" - quoted from src/alloc.c. I'm not familiar with dumper details enough to answer on this. Of course, we may check if we're dumped or not and use mmap() only if dumped, but I'm not sure this is the most elegant solution.

Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GC: cons sweeping and cons block size
  2007-07-03 20:22 ` Jonathan Yavner
@ 2007-07-05  1:29   ` Richard Stallman
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Stallman @ 2007-07-05  1:29 UTC (permalink / raw)
  To: Jonathan Yavner; +Cc: emacs-devel

    On some systems, large malloc blocks go directly to MMAP_ANONYMOUS and 
    so fragmentation worries are eliminated.  I don't see any support for 
    direct use of mmap() in gmalloc.c, though.

Allocation of buffer contents has code to use mmap.
I don't think anything else Emacs allocates is likely to be very large.

(In GNU we reject the Unix convention of writing `()' after a function
name.  `mmap()' is not a function, it is a function call with no
arguments.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GC: cons sweeping and cons block size
  2007-07-04  6:43 GC: cons sweeping and cons block size dmantipov
@ 2007-07-05 14:06 ` Stefan Monnier
  2007-07-05 14:42   ` David Kastrup
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2007-07-05 14:06 UTC (permalink / raw)
  To: dmantipov; +Cc: jyavner, emacs-devel

> On some systems, Lisp_Object may be a tagged pointer with 29-bit (on
> 32-bit system) pointer field. Since mmap() tends to allocate memory at
> high addresses, this memory is likely to be non-addressable by such
> pointer. That is why lisp_align_malloc() currently uses
> mallopt(M_MMAP_MAX, 0) to prevent mapping the region (if underlying
> malloc() supports this). 

I haven't found the time to do it, but it would be good to get rid of this
situation and always place tags in the 3 LSB bits.


        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GC: cons sweeping and cons block size
  2007-07-05 14:06 ` Stefan Monnier
@ 2007-07-05 14:42   ` David Kastrup
  2007-07-05 16:02     ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: David Kastrup @ 2007-07-05 14:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: jyavner, dmantipov, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> On some systems, Lisp_Object may be a tagged pointer with 29-bit (on
>> 32-bit system) pointer field. Since mmap() tends to allocate memory at
>> high addresses, this memory is likely to be non-addressable by such
>> pointer. That is why lisp_align_malloc() currently uses
>> mallopt(M_MMAP_MAX, 0) to prevent mapping the region (if underlying
>> malloc() supports this). 
>
> I haven't found the time to do it, but it would be good to get rid of this
> situation and always place tags in the 3 LSB bits.

Would that impact the severity of YAILM (or what the integer/lisp
mixups were called) occurences?

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GC: cons sweeping and cons block size
  2007-07-05 14:42   ` David Kastrup
@ 2007-07-05 16:02     ` Stefan Monnier
  2007-07-06 11:54       ` Ken Raeburn
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2007-07-05 16:02 UTC (permalink / raw)
  To: David Kastrup; +Cc: jyavner, dmantipov, emacs-devel

>>> On some systems, Lisp_Object may be a tagged pointer with 29-bit (on
>>> 32-bit system) pointer field. Since mmap() tends to allocate memory at
>>> high addresses, this memory is likely to be non-addressable by such
>>> pointer. That is why lisp_align_malloc() currently uses
>>> mallopt(M_MMAP_MAX, 0) to prevent mapping the region (if underlying
>>> malloc() supports this). 
>> 
>> I haven't found the time to do it, but it would be good to get rid of this
>> situation and always place tags in the 3 LSB bits.

> Would that impact the severity of YAILM (or what the integer/lisp
> mixups were called) occurences?

Not at all.  Actually the severity of YAILOM is increased with the use of
LSB bits for tags: since the 000 tag has traditionally been used for
integers, it means that positive integers were represented by themselves
(when placing tags in the MSB), so YAILOM errors would often cause no
problem in practice, whereas when placing tags in the LSB all integers need
shifting, so YAILOM errors lead to lack of those shifts which leads to
integers being "tagged" randomly (depending on their value mod 8), often
causing Emacs to treat those integers as pointers which then cause
segmentation violations.

Luckily, YAILOM problems can be cought by the compiler if you
use -DUSE_LISP_UNION_TYPE (but it incurs a performance hit, which is why
it's not used by default).

        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GC: cons sweeping and cons block size
  2007-07-05 16:02     ` Stefan Monnier
@ 2007-07-06 11:54       ` Ken Raeburn
  2007-07-06 12:48         ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Ken Raeburn @ 2007-07-06 11:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On Jul 5, 2007, at 12:02, Stefan Monnier wrote:
(quoting David, who was quoting Stefan)
>>> I haven't found the time to do it, but it would be good to get  
>>> rid of this
>>> situation and always place tags in the 3 LSB bits.

It does appear to be the default for most systems, maybe all modern  
ones, but only when we compile with GCC.  Are we confident no one is  
using Emacs on systems where we use a system malloc that doesn't do 8- 
byte alignment?  Or where the native compiler doesn't do 8-byte  
alignment for static objects?  I think I'd be happier with changing  
the default to use USE_LSB_TAG always but check the malloc return  
values and compiler-produced alignment on the configurations where we  
don't already know we're safe, and perhaps delete that code and maybe  
much of the non-USE_LSB_TAG code a while after it's released without  
related bug reports; I'm afraid that temporarily increases the  
ugliness rather than reducing it though.

> Luckily, YAILOM problems can be cought by the compiler if you
> use -DUSE_LISP_UNION_TYPE (but it incurs a performance hit, which  
> is why
> it's not used by default).

Hm, I don't think I ever got around to seeing if __attribute__ 
((transparent_union)) would fix the performance...

We could do a USE_LSB_TAG variant of the union, to get more address  
space back but keep the paranoid type checking capability.

Ken

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GC: cons sweeping and cons block size
  2007-07-06 11:54       ` Ken Raeburn
@ 2007-07-06 12:48         ` Stefan Monnier
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Monnier @ 2007-07-06 12:48 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: emacs-devel

> (quoting David, who was quoting Stefan)
>>>> I haven't found the time to do it, but it would be good to get  rid of
>>>> this situation and always place tags in the 3 LSB bits.

> It does appear to be the default for most systems, maybe all modern ones,
> but only when we compile with GCC.  Are we confident no one is using Emacs
> on systems where we use a system malloc that doesn't do 8-
> byte alignment?

No.  IIRC at least Mac OS 9's malloc doesn't do 8-byte alignment.
By getting rid of the MSB tags I meant write the code to handle the case
where malloc doesn't return multiples of 8 and where we can't use GCC's
alignment attribute to force alignment on static (and/or stack structures).

> We could do a USE_LSB_TAG variant of the union, to get more address  space
> back but keep the paranoid type checking capability.

If we can really get the exact same performance, maybe it's a good idea, but
since it's only really useful at compile-time, it's quite sufficient to
compile the source every once in a while with the USE_LISP_UNION_TYPE flag.

        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-07-06 12:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-04  6:43 GC: cons sweeping and cons block size dmantipov
2007-07-05 14:06 ` Stefan Monnier
2007-07-05 14:42   ` David Kastrup
2007-07-05 16:02     ` Stefan Monnier
2007-07-06 11:54       ` Ken Raeburn
2007-07-06 12:48         ` Stefan Monnier
     [not found] <E1I5ktS-0001uQ-Uc@monty-python.gnu.org>
2007-07-03 20:22 ` Jonathan Yavner
2007-07-05  1:29   ` Richard Stallman
  -- strict thread matches above, loose matches on Subject: below --
2007-07-03 15:16 Dmitry Antipov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).