unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <stephen@xemacs.org>
To: Ken Raeburn <raeburn@raeburn.org>
Cc: Qiang Guo <qguo@ualberta.ca>, emacs-devel@gnu.org
Subject: Re: Quesition about Lisp_Vector
Date: Sat, 12 Nov 2011 12:42:42 +0900	[thread overview]
Message-ID: <87ty6aj9al.fsf@uwakimon.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <553A1AC5-4002-4175-AD1F-CC596356E44C@raeburn.org>

Ken Raeburn writes:

 > This is a common idiom for handling a structure with a
 > variable-sized array at the end.

More precisely, "a concrete struct moduling a (abstract) structure
containing exactly one array whose size is variable at compile time".
It's always possible to move one array to the end of the struct, but
this only works well if there's just one.  (If there's more than one,
I would consider the cognitive load of figuring out which one to
optimize and then later remembering that decision too great to be
worth it.)

 > The latest spec for the C language has support built in for a
 > slightly different mechanism, but this scheme used above has been a
 > common way to get around the absence of variable-sized arrays in
 > structures in traditional C for

That is to say, it's an optimization hack.  It saves the space of one
pointer, and the time of one pointer dereference (plus possible
allocation overhead), versus a member that is a pointer to an array
allocated elsewhere.  The alternative of a fixed-size array would be
unacceptable in Lisp, but obviously in most cases you'd end up with a
huge amount of space wasted since the size distribution of arrays very
likely follows a power law, so the fixed size "should" be "big".

 > a long time, and it works on most if not all C compilers.

clang (llvm) warns about accesses with a constant index greater than
zero, but it works.

 > The "len-1" means we want an array with "len" elements in it, but
 > the first element is covered by "sizeof *p" because "struct
 > Lisp_Vector" includes one array element already.  If you use an
 > index greater than zero, the value referenced may be stored in
 > extra padding space at the end of the structure, if there is any,

Ah, now that interests me.  If there actually is such space, most
constant index accesses are to a[1], and in theory we could allocate
array space to fill out the padding.  However, since the structure
alignment requirement is usually no more than sizeof(pointer), which
is the same as sizeof(Lisp_Object), and IIRC all of the Lisp_Vectors
warned about are general (ie, the elements are Lisp_Objects), we
probably don't buy anything.

I wonder if there is any real loss to allocating two or three elements
in the base structure (ie, how often vectors of length less than three
or four are used).

Aside to OP: In Emacs Lisp although strings (ie, arrays with bytes as
elements and a non-uniform indexing scheme) are vectors in the API,
they don't share an implementation with general vectors.  In fact the
string storage is allocated separately (another optimization, because
strings with len(data) % pointersize != 0 would waste substantial
amounts of memory, at least by 1980 standards), so there's always a
pointer indirection.  This usually is not too inefficient since many
operations on strings are of the form "get string data, do O(len)
operation on it", so the overhead of a pointer dereference to "get
string data" is tolerable.

 > If you use a compiler or interpreter designed to run C code with
 > extensive, very pedantic run-time checks, it *might* notice if the
 > array index used is greater than 0 and warn you.  But under pretty
 > much any normal compiler this should just work without any
 > complaint.

I guess you would consider clang "perverse" then.  I agree, but don't
care:  it saves about 40% of compilation time, and 10% of total build
time for me. :-)

It would also be possible to do dataflow analysis on loops and
determine that for non-degenerate termination values there would be an
access to elements after the declared end.  Ie,

    for (i=0; i < n; i++) foo(lispvec.data[i]);

will be flagged for n > 1.  Since there's no real point to a `for'
loop if n <= 1 always, issue a warning.  This would be stupid in C, of
course, because this idiom is very common (and typically used to
implement a structure that enables the program to check bounds etc at
runtime, as Emacs Lisp does).




  reply	other threads:[~2011-11-12  3:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-11  6:29 Quesition about Lisp_Vector Qiang Guo
2011-11-11  8:59 ` Andreas Schwab
2011-11-11  9:01 ` Ken Raeburn
2011-11-12  3:42   ` Stephen J. Turnbull [this message]
2011-11-12  8:49     ` Eli Zaretskii
2011-11-12 15:45       ` Stephen J. Turnbull
2011-11-12 15:53         ` Eli Zaretskii
2011-11-13  1:02     ` Ken Raeburn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ty6aj9al.fsf@uwakimon.sk.tsukuba.ac.jp \
    --to=stephen@xemacs.org \
    --cc=emacs-devel@gnu.org \
    --cc=qguo@ualberta.ca \
    --cc=raeburn@raeburn.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).