unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Objects layout and tagging scheme
@ 2012-08-02 16:10 Dmitry Antipov
  2012-08-02 18:09 ` Paul Eggert
  2012-08-03  9:49 ` Stefan Monnier
  0 siblings, 2 replies; 5+ messages in thread
From: Dmitry Antipov @ 2012-08-02 16:10 UTC (permalink / raw)
  To: Emacs development discussions

Any non-trivial GC design which I can imagine requires extra per-object information.
It may be just a few bits, bit this is a problem anyway because some objects
(conses at first) has no free space already so their mark bits are allocated
from separate bitmaps. Symbols and misc objects are free from these problems,
but vectors and string are affected. So I would like to raise the following
question: if I need, for example, 2 extra bits for each object, where to get them?

Dmitry





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Objects layout and tagging scheme
  2012-08-02 16:10 Objects layout and tagging scheme Dmitry Antipov
@ 2012-08-02 18:09 ` Paul Eggert
  2012-08-03  8:17   ` Dmitry Antipov
  2012-08-03  9:49 ` Stefan Monnier
  1 sibling, 1 reply; 5+ messages in thread
From: Paul Eggert @ 2012-08-02 18:09 UTC (permalink / raw)
  To: emacs-devel

On 08/02/2012 09:10 AM, Dmitry Antipov wrote:
> vectors and string are affected. So I would like to raise the following
> question: if I need, for example, 2 extra bits for each object, where to get them?

For strings, 3 bits are free in the pointers to intervals,
if we can assume intervals are aligned like other lisp
objects, which should be possible to arrange.

For vectors the same trick could be played, with next.buffer
and next.vector.   Presumably we can think of a similar way
to do it with next.nbytes, since nbytes is limited.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Objects layout and tagging scheme
  2012-08-02 18:09 ` Paul Eggert
@ 2012-08-03  8:17   ` Dmitry Antipov
  2012-08-03 23:10     ` Stefan Monnier
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Antipov @ 2012-08-03  8:17 UTC (permalink / raw)
  To: Paul Eggert, Stefan Monnier, Eli Zaretskii; +Cc: emacs-devel

On 08/02/2012 10:09 PM, Paul Eggert wrote:

> For strings, 3 bits are free in the pointers to intervals,
> if we can assume intervals are aligned like other lisp
> objects, which should be possible to arrange.
>
> For vectors the same trick could be played, with next.buffer
> and next.vector.   Presumably we can think of a similar way
> to do it with next.nbytes, since nbytes is limited.

The more I do different things for C part of Emacs, the more I hate such
a bit tricks. IMHO they're much more obfuscating than all of the xVAR stuff.
Even worse, packing every possible unused bit turns further extensions into
a nightmare. For example, I can follow your suggestions and hack 2 bits
into free bits of pointers (and add more ugly stuff to Lisp_Cons); next,
someone will ask for 1 more bit (for tricolor marking - why not?), and
next round of obfuscation will start again.

That's why I'm thinking about per-object unified headers. Consider the
following layout: if LSB (or MSB) of Lisp_Object is non-zero, the rest
bits represents signed integer; otherwise, the rest bits represents the
pointer to heap object. Each object has 4-byte header. In the header,
mark bit, extra gc information and type information are always the same bits
for all objects; the rest of the header is object-specific or unused.
For example, cons header may be

struct cons_header {
   unsigned type : 6;     /* Lisp_Cons */
   unsigned gcmark : 1:
   unsigned gcinfo : 2;
   unsigned unused : 23;
};

Symbol header may be:

struct symbol_header {
   unsigned type : 6;    /* Lisp_Symbol */
   unsigned gcmark : 1;
   unsigned gcinfo : 2;
   unsigned redirect : 3;
   unsigned constant : 2:
   unsigned interned : 2;
   unsigned declared_special : 1;
   unsigned unused : 15;
};

etc. The only disadvantage is an increased memory consumption (Lisp_Cons is
a great loser here, plus pure objects which doesn't need gcXXX bits). But,
at the cost of this, we can have at least;

- No USE_LSB_TAG hacks - it's pretty enough to be sure that all heap objects
   are aligned to word boundary;
- No address space limitation, welcome mmap;
- Native limitation for vectors and strings length (size is, really, size,
   without ARRAY_MARK_FLAG, PSEUDOVECTOR_FLAG and so);
- No separate bitmaps for conses and floats, so, no alignment limitations
   for cons and float blocks - say goodbye to lisp_align_malloc;
- faster mark and check whether the mark is here already - no more
   switch (XTYPE (obj)) because all type bits are identically placed for all;
- simple type system without second-class citizens like current misc family.

I'm not sure that this layout may co-exists with the current one, so it's
a subject for development in the branch; when it will be done, we will
have a solid base for further GC improvements.

Dmitry




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Objects layout and tagging scheme
  2012-08-02 16:10 Objects layout and tagging scheme Dmitry Antipov
  2012-08-02 18:09 ` Paul Eggert
@ 2012-08-03  9:49 ` Stefan Monnier
  1 sibling, 0 replies; 5+ messages in thread
From: Stefan Monnier @ 2012-08-03  9:49 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: Emacs development discussions

> So I would like to raise the following question: if I need, for
> example, 2 extra bits for each object, where to get them?

For most object types, it's not too hard to make room for 2 extra bits.
The only real hard ones are conses and floats.

Of course, you can simply extend their current bitmaps which store one
extra bit per cons (and per float) to store 3 extra bits instead of 1.

But depending on the specific use (e.g. how often is it accessed,
whether it can be approximated or shared between objects, ...), it might
be a bad idea, and there might be a better option.


        Stefan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Objects layout and tagging scheme
  2012-08-03  8:17   ` Dmitry Antipov
@ 2012-08-03 23:10     ` Stefan Monnier
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Monnier @ 2012-08-03 23:10 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: Eli Zaretskii, Paul Eggert, emacs-devel

> That's why I'm thinking about per-object unified headers. Consider the
> following layout: if LSB (or MSB) of Lisp_Object is non-zero, the rest
> bits represents signed integer; otherwise, the rest bits represents the
> pointer to heap object. Each object has 4-byte header. In the header,

Yes, alternatives exist which have various advantages.
The advantages of the current system include:
- memory-efficient conses (and floats).
- cpu-efficient CONSP, SYMBOLP, STRINGP.

Also, many systems nowadays like to align their objects on multiples of
8 bytes, so increasing conses and strings by 32bits might end up
increasing their memory use by 8bytes unless we handle them specially.

Also, adding a separate "type+gc" word to vectors would only be
acceptable if we can get rid of the "next" field (which we could already
eliminate in the current code, BTW).

We could begin by trying to turn symbols into a kind of lisp_misc?


        Stefan



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-08-03 23:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-02 16:10 Objects layout and tagging scheme Dmitry Antipov
2012-08-02 18:09 ` Paul Eggert
2012-08-03  8:17   ` Dmitry Antipov
2012-08-03 23:10     ` Stefan Monnier
2012-08-03  9:49 ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).