Quesition about Lisp

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Quesition about Lisp_Vector
@ 2011-11-11  6:29 Qiang Guo
  2011-11-11  8:59 ` Andreas Schwab
  2011-11-11  9:01 ` Ken Raeburn
  0 siblings, 2 replies; 8+ messages in thread
From: Qiang Guo @ 2011-11-11  6:29 UTC (permalink / raw)
  To: emacs-devel


Hi, I'm new to devel aspect of Emacs. And I'm having a problem reading
Emacs source code: in the definition of Lisp_Vector
 struct Lisp_Vector
  {
    EMACS_UINT size;
    struct Lisp_Vector *next;
    Lisp_Object contents[1];
  };

why contents's size is fixed ? And later when actually allocating space

 nbytes = sizeof *p + (len - 1) * sizeof p->contents[0];

I don't understand how this nbytes is calculated. why only (len-1)
contents[0] ? and how can the size of an array be changed ?

Thanks
Qiang
 




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quesition about Lisp_Vector
  2011-11-11  6:29 Quesition about Lisp_Vector Qiang Guo
@ 2011-11-11  8:59 ` Andreas Schwab
  2011-11-11  9:01 ` Ken Raeburn
  1 sibling, 0 replies; 8+ messages in thread
From: Andreas Schwab @ 2011-11-11  8:59 UTC (permalink / raw)
  To: Qiang Guo; +Cc: emacs-devel

Qiang Guo <qguo@ualberta.ca> writes:

> Hi, I'm new to devel aspect of Emacs. And I'm having a problem reading
> Emacs source code: in the definition of Lisp_Vector
>  struct Lisp_Vector
>   {
>     EMACS_UINT size;
>     struct Lisp_Vector *next;
>     Lisp_Object contents[1];
>   };
>
> why contents's size is fixed ?

Standard C89 doesn't have flexible array members, so this is using the
well known struct hack.

> And later when actually allocating space
>
>  nbytes = sizeof *p + (len - 1) * sizeof p->contents[0];
>
> I don't understand how this nbytes is calculated. why only (len-1)

Because one element is already included in struct Lisp_Vector.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quesition about Lisp_Vector
  2011-11-11  6:29 Quesition about Lisp_Vector Qiang Guo
  2011-11-11  8:59 ` Andreas Schwab
@ 2011-11-11  9:01 ` Ken Raeburn
  2011-11-12  3:42   ` Stephen J. Turnbull
  1 sibling, 1 reply; 8+ messages in thread
From: Ken Raeburn @ 2011-11-11  9:01 UTC (permalink / raw)
  To: Qiang Guo; +Cc: emacs-devel

On Nov 11, 2011, at 01:29, Qiang Guo wrote:
> Hi, I'm new to devel aspect of Emacs. And I'm having a problem reading
> Emacs source code: in the definition of Lisp_Vector
> struct Lisp_Vector
>  {
>    EMACS_UINT size;
>    struct Lisp_Vector *next;
>    Lisp_Object contents[1];
>  };
> 
> why contents's size is fixed ? And later when actually allocating space

This is a common idiom for handling a structure with a variable-sized array at the end.  The latest spec for the C language has support built in for a slightly different mechanism, but this scheme used above has been a common way to get around the absence of variable-sized arrays in structures in traditional C for a long time, and it works on most if not all C compilers.

> 
> nbytes = sizeof *p + (len - 1) * sizeof p->contents[0];
> 
> I don't understand how this nbytes is calculated. why only (len-1)
> contents[0] ? and how can the size of an array be changed ?

The "len-1" means we want an array with "len" elements in it, but the first element is covered by "sizeof *p" because "struct Lisp_Vector" includes one array element already.  If you use an index greater than zero, the value referenced may be stored in extra padding space at the end of the structure, if there is any, or in the additional space we allocated past the end of the structure.  If you think about it, it really doesn't matter which; all that matters is that for array indexes less than "len", we've allocated at least enough extra space.

If you use a compiler or interpreter designed to run C code with extensive, very pedantic run-time checks, it *might* notice if the array index used is greater than 0 and warn you.  But under pretty much any normal compiler this should just work without any complaint.

Ken

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quesition about Lisp_Vector
  2011-11-11  9:01 ` Ken Raeburn
@ 2011-11-12  3:42   ` Stephen J. Turnbull
  2011-11-12  8:49     ` Eli Zaretskii
  2011-11-13  1:02     ` Ken Raeburn
  0 siblings, 2 replies; 8+ messages in thread
From: Stephen J. Turnbull @ 2011-11-12  3:42 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: Qiang Guo, emacs-devel

Ken Raeburn writes:

 > This is a common idiom for handling a structure with a
 > variable-sized array at the end.

More precisely, "a concrete struct moduling a (abstract) structure
containing exactly one array whose size is variable at compile time".
It's always possible to move one array to the end of the struct, but
this only works well if there's just one.  (If there's more than one,
I would consider the cognitive load of figuring out which one to
optimize and then later remembering that decision too great to be
worth it.)

 > The latest spec for the C language has support built in for a
 > slightly different mechanism, but this scheme used above has been a
 > common way to get around the absence of variable-sized arrays in
 > structures in traditional C for

That is to say, it's an optimization hack.  It saves the space of one
pointer, and the time of one pointer dereference (plus possible
allocation overhead), versus a member that is a pointer to an array
allocated elsewhere.  The alternative of a fixed-size array would be
unacceptable in Lisp, but obviously in most cases you'd end up with a
huge amount of space wasted since the size distribution of arrays very
likely follows a power law, so the fixed size "should" be "big".

 > a long time, and it works on most if not all C compilers.

clang (llvm) warns about accesses with a constant index greater than
zero, but it works.

 > The "len-1" means we want an array with "len" elements in it, but
 > the first element is covered by "sizeof *p" because "struct
 > Lisp_Vector" includes one array element already.  If you use an
 > index greater than zero, the value referenced may be stored in
 > extra padding space at the end of the structure, if there is any,

Ah, now that interests me.  If there actually is such space, most
constant index accesses are to a[1], and in theory we could allocate
array space to fill out the padding.  However, since the structure
alignment requirement is usually no more than sizeof(pointer), which
is the same as sizeof(Lisp_Object), and IIRC all of the Lisp_Vectors
warned about are general (ie, the elements are Lisp_Objects), we
probably don't buy anything.

I wonder if there is any real loss to allocating two or three elements
in the base structure (ie, how often vectors of length less than three
or four are used).

Aside to OP: In Emacs Lisp although strings (ie, arrays with bytes as
elements and a non-uniform indexing scheme) are vectors in the API,
they don't share an implementation with general vectors.  In fact the
string storage is allocated separately (another optimization, because
strings with len(data) % pointersize != 0 would waste substantial
amounts of memory, at least by 1980 standards), so there's always a
pointer indirection.  This usually is not too inefficient since many
operations on strings are of the form "get string data, do O(len)
operation on it", so the overhead of a pointer dereference to "get
string data" is tolerable.

 > If you use a compiler or interpreter designed to run C code with
 > extensive, very pedantic run-time checks, it *might* notice if the
 > array index used is greater than 0 and warn you.  But under pretty
 > much any normal compiler this should just work without any
 > complaint.

I guess you would consider clang "perverse" then.  I agree, but don't
care:  it saves about 40% of compilation time, and 10% of total build
time for me. :-)

It would also be possible to do dataflow analysis on loops and
determine that for non-degenerate termination values there would be an
access to elements after the declared end.  Ie,

    for (i=0; i < n; i++) foo(lispvec.data[i]);

will be flagged for n > 1.  Since there's no real point to a `for'
loop if n <= 1 always, issue a warning.  This would be stupid in C, of
course, because this idiom is very common (and typically used to
implement a structure that enables the program to check bounds etc at
runtime, as Emacs Lisp does).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quesition about Lisp_Vector
  2011-11-12  3:42   ` Stephen J. Turnbull
@ 2011-11-12  8:49     ` Eli Zaretskii
  2011-11-12 15:45       ` Stephen J. Turnbull
  2011-11-13  1:02     ` Ken Raeburn
  1 sibling, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2011-11-12  8:49 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: raeburn, qguo, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Date: Sat, 12 Nov 2011 12:42:42 +0900
> Cc: Qiang Guo <qguo@ualberta.ca>, emacs-devel@gnu.org
> 
> Aside to OP: In Emacs Lisp although strings (ie, arrays with bytes as
> elements and a non-uniform indexing scheme) are vectors in the API,
> they don't share an implementation with general vectors.  In fact the
> string storage is allocated separately (another optimization, because
> strings with len(data) % pointersize != 0 would waste substantial
> amounts of memory, at least by 1980 standards), so there's always a
> pointer indirection.  This usually is not too inefficient since many
> operations on strings are of the form "get string data, do O(len)
> operation on it", so the overhead of a pointer dereference to "get
> string data" is tolerable.

Having string data accessible through indirection allows to relocate
strings without affecting callers who have references to Lisp strings
in local variables.  (This is similar to the ability to relocate
buffer text, because buffer text is also accessed through
indirection.)  These abilities are important, because strings, like
buffers, can be large, so being able to relocate them during GC helps
making more efficient use of the available memory.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quesition about Lisp_Vector
  2011-11-12  8:49     ` Eli Zaretskii
@ 2011-11-12 15:45       ` Stephen J. Turnbull
  2011-11-12 15:53         ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen J. Turnbull @ 2011-11-12 15:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: raeburn, qguo, emacs-devel

Eli Zaretskii writes:

 > Having string data accessible through indirection allows to relocate
 > strings without affecting callers who have references to Lisp strings
 > in local variables.  (This is similar to the ability to relocate
 > buffer text, because buffer text is also accessed through
 > indirection.)

The implementation is similar, but the rationale is not.  Buffers
cannot be allocated by the s[1] trick, because they could not grow
bigger than the allocated size (except by reallocating the buffer
metadata along with the final array, but then how would you deal with
fixing up references?)

 > These abilities are important, because strings, like buffers, can
 > be large, so being able to relocate them during GC helps making
 > more efficient use of the available memory.

I wonder if the problem is not more that there are so many of them
allocated and deallocated, leading to fragmentation, rather than the
size.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quesition about Lisp_Vector
  2011-11-12 15:45       ` Stephen J. Turnbull
@ 2011-11-12 15:53         ` Eli Zaretskii
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2011-11-12 15:53 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: raeburn, qguo, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: raeburn@raeburn.org,
>     qguo@ualberta.ca,
>     emacs-devel@gnu.org
> Date: Sun, 13 Nov 2011 00:45:29 +0900
> 
> Eli Zaretskii writes:
> 
>  > Having string data accessible through indirection allows to relocate
>  > strings without affecting callers who have references to Lisp strings
>  > in local variables.  (This is similar to the ability to relocate
>  > buffer text, because buffer text is also accessed through
>  > indirection.)
> 
> The implementation is similar, but the rationale is not.  Buffers
> cannot be allocated by the s[1] trick, because they could not grow
> bigger than the allocated size

I wasn't implying they could.  I was just explaining why accessing
string through indirection is important on its own right, not just
because of some optimization.  That is very much tangential to the
main issue discussed in this thread, so apologies if it disrupted the
discussion.

>  > These abilities are important, because strings, like buffers, can
>  > be large, so being able to relocate them during GC helps making
>  > more efficient use of the available memory.
> 
> I wonder if the problem is not more that there are so many of them
> allocated and deallocated, leading to fragmentation, rather than the
> size.

Yes, that too.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quesition about Lisp_Vector
  2011-11-12  3:42   ` Stephen J. Turnbull
  2011-11-12  8:49     ` Eli Zaretskii
@ 2011-11-13  1:02     ` Ken Raeburn
  1 sibling, 0 replies; 8+ messages in thread
From: Ken Raeburn @ 2011-11-13  1:02 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Qiang Guo, emacs-devel

On Nov 11, 2011, at 22:42, Stephen J. Turnbull wrote:
>> a long time, and it works on most if not all C compilers.
> 
> clang (llvm) warns about accesses with a constant index greater than
> zero, but it works.

Good point.  In terms of complaints, I was thinking more about the detection of run-time indexes greater than zero.  I've used one implementation that either interpreted C or compiled with extensive run-time checks (not that it matters which), that probably would've flagged the reference at run time for checking, but I don't recall for sure.  (It was a couple decades ago.)  I expect something like valgrind wouldn't have enough information to detect it.

> Ah, now that interests me.  If there actually is such space, most
> constant index accesses are to a[1], and in theory we could allocate
> array space to fill out the padding.  However, since the structure
> alignment requirement is usually no more than sizeof(pointer), which
> is the same as sizeof(Lisp_Object), and IIRC all of the Lisp_Vectors
> warned about are general (ie, the elements are Lisp_Objects), we
> probably don't buy anything.

Right, in practice in this case it's not likely to save us anything on most platforms.  But an implementation could have rules like "structures over 8 bytes get 16-byte alignment" (assuming 32-bit pointers), in which case there would be wasted space at the end.  On the other hand, the rules could also say that larger arrays get stricter alignment, which (in other use cases, probably not this one) could cause an attempt to utilize that extra space to instead introduce padding before the array.

The C99 spec for flexible array members is even written to give the implementation some flexibility regarding this; it can place the member as if it has some unspecified size chosen by the implementation, with the alignment and padding rules following from that choice.

> I wonder if there is any real loss to allocating two or three elements
> in the base structure (ie, how often vectors of length less than three
> or four are used).

Instrumenting the GC code would be an easy way to find out how many are live at any given GC point, assuming the size fields don't lie.

Ken

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-11-13  1:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-11  6:29 Quesition about Lisp_Vector Qiang Guo
2011-11-11  8:59 ` Andreas Schwab
2011-11-11  9:01 ` Ken Raeburn
2011-11-12  3:42   ` Stephen J. Turnbull
2011-11-12  8:49     ` Eli Zaretskii
2011-11-12 15:45       ` Stephen J. Turnbull
2011-11-12 15:53         ` Eli Zaretskii
2011-11-13  1:02     ` Ken Raeburn

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).