* Objects layout and tagging scheme @ 2012-08-02 16:10 Dmitry Antipov 2012-08-02 18:09 ` Paul Eggert 2012-08-03 9:49 ` Stefan Monnier 0 siblings, 2 replies; 5+ messages in thread From: Dmitry Antipov @ 2012-08-02 16:10 UTC (permalink / raw) To: Emacs development discussions Any non-trivial GC design which I can imagine requires extra per-object information. It may be just a few bits, bit this is a problem anyway because some objects (conses at first) has no free space already so their mark bits are allocated from separate bitmaps. Symbols and misc objects are free from these problems, but vectors and string are affected. So I would like to raise the following question: if I need, for example, 2 extra bits for each object, where to get them? Dmitry ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Objects layout and tagging scheme 2012-08-02 16:10 Objects layout and tagging scheme Dmitry Antipov @ 2012-08-02 18:09 ` Paul Eggert 2012-08-03 8:17 ` Dmitry Antipov 2012-08-03 9:49 ` Stefan Monnier 1 sibling, 1 reply; 5+ messages in thread From: Paul Eggert @ 2012-08-02 18:09 UTC (permalink / raw) To: emacs-devel On 08/02/2012 09:10 AM, Dmitry Antipov wrote: > vectors and string are affected. So I would like to raise the following > question: if I need, for example, 2 extra bits for each object, where to get them? For strings, 3 bits are free in the pointers to intervals, if we can assume intervals are aligned like other lisp objects, which should be possible to arrange. For vectors the same trick could be played, with next.buffer and next.vector. Presumably we can think of a similar way to do it with next.nbytes, since nbytes is limited. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Objects layout and tagging scheme 2012-08-02 18:09 ` Paul Eggert @ 2012-08-03 8:17 ` Dmitry Antipov 2012-08-03 23:10 ` Stefan Monnier 0 siblings, 1 reply; 5+ messages in thread From: Dmitry Antipov @ 2012-08-03 8:17 UTC (permalink / raw) To: Paul Eggert, Stefan Monnier, Eli Zaretskii; +Cc: emacs-devel On 08/02/2012 10:09 PM, Paul Eggert wrote: > For strings, 3 bits are free in the pointers to intervals, > if we can assume intervals are aligned like other lisp > objects, which should be possible to arrange. > > For vectors the same trick could be played, with next.buffer > and next.vector. Presumably we can think of a similar way > to do it with next.nbytes, since nbytes is limited. The more I do different things for C part of Emacs, the more I hate such a bit tricks. IMHO they're much more obfuscating than all of the xVAR stuff. Even worse, packing every possible unused bit turns further extensions into a nightmare. For example, I can follow your suggestions and hack 2 bits into free bits of pointers (and add more ugly stuff to Lisp_Cons); next, someone will ask for 1 more bit (for tricolor marking - why not?), and next round of obfuscation will start again. That's why I'm thinking about per-object unified headers. Consider the following layout: if LSB (or MSB) of Lisp_Object is non-zero, the rest bits represents signed integer; otherwise, the rest bits represents the pointer to heap object. Each object has 4-byte header. In the header, mark bit, extra gc information and type information are always the same bits for all objects; the rest of the header is object-specific or unused. For example, cons header may be struct cons_header { unsigned type : 6; /* Lisp_Cons */ unsigned gcmark : 1: unsigned gcinfo : 2; unsigned unused : 23; }; Symbol header may be: struct symbol_header { unsigned type : 6; /* Lisp_Symbol */ unsigned gcmark : 1; unsigned gcinfo : 2; unsigned redirect : 3; unsigned constant : 2: unsigned interned : 2; unsigned declared_special : 1; unsigned unused : 15; }; etc. The only disadvantage is an increased memory consumption (Lisp_Cons is a great loser here, plus pure objects which doesn't need gcXXX bits). But, at the cost of this, we can have at least; - No USE_LSB_TAG hacks - it's pretty enough to be sure that all heap objects are aligned to word boundary; - No address space limitation, welcome mmap; - Native limitation for vectors and strings length (size is, really, size, without ARRAY_MARK_FLAG, PSEUDOVECTOR_FLAG and so); - No separate bitmaps for conses and floats, so, no alignment limitations for cons and float blocks - say goodbye to lisp_align_malloc; - faster mark and check whether the mark is here already - no more switch (XTYPE (obj)) because all type bits are identically placed for all; - simple type system without second-class citizens like current misc family. I'm not sure that this layout may co-exists with the current one, so it's a subject for development in the branch; when it will be done, we will have a solid base for further GC improvements. Dmitry ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Objects layout and tagging scheme 2012-08-03 8:17 ` Dmitry Antipov @ 2012-08-03 23:10 ` Stefan Monnier 0 siblings, 0 replies; 5+ messages in thread From: Stefan Monnier @ 2012-08-03 23:10 UTC (permalink / raw) To: Dmitry Antipov; +Cc: Eli Zaretskii, Paul Eggert, emacs-devel > That's why I'm thinking about per-object unified headers. Consider the > following layout: if LSB (or MSB) of Lisp_Object is non-zero, the rest > bits represents signed integer; otherwise, the rest bits represents the > pointer to heap object. Each object has 4-byte header. In the header, Yes, alternatives exist which have various advantages. The advantages of the current system include: - memory-efficient conses (and floats). - cpu-efficient CONSP, SYMBOLP, STRINGP. Also, many systems nowadays like to align their objects on multiples of 8 bytes, so increasing conses and strings by 32bits might end up increasing their memory use by 8bytes unless we handle them specially. Also, adding a separate "type+gc" word to vectors would only be acceptable if we can get rid of the "next" field (which we could already eliminate in the current code, BTW). We could begin by trying to turn symbols into a kind of lisp_misc? Stefan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Objects layout and tagging scheme 2012-08-02 16:10 Objects layout and tagging scheme Dmitry Antipov 2012-08-02 18:09 ` Paul Eggert @ 2012-08-03 9:49 ` Stefan Monnier 1 sibling, 0 replies; 5+ messages in thread From: Stefan Monnier @ 2012-08-03 9:49 UTC (permalink / raw) To: Dmitry Antipov; +Cc: Emacs development discussions > So I would like to raise the following question: if I need, for > example, 2 extra bits for each object, where to get them? For most object types, it's not too hard to make room for 2 extra bits. The only real hard ones are conses and floats. Of course, you can simply extend their current bitmaps which store one extra bit per cons (and per float) to store 3 extra bits instead of 1. But depending on the specific use (e.g. how often is it accessed, whether it can be approximated or shared between objects, ...), it might be a bad idea, and there might be a better option. Stefan ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-08-03 23:10 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-02 16:10 Objects layout and tagging scheme Dmitry Antipov 2012-08-02 18:09 ` Paul Eggert 2012-08-03 8:17 ` Dmitry Antipov 2012-08-03 23:10 ` Stefan Monnier 2012-08-03 9:49 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).