From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Quesition about Lisp_Vector Date: Sat, 12 Nov 2011 12:42:42 +0900 Message-ID: <87ty6aj9al.fsf@uwakimon.sk.tsukuba.ac.jp> References: <553A1AC5-4002-4175-AD1F-CC596356E44C@raeburn.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: dough.gmane.org 1321069388 19602 80.91.229.12 (12 Nov 2011 03:43:08 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 12 Nov 2011 03:43:08 +0000 (UTC) Cc: Qiang Guo , emacs-devel@gnu.org To: Ken Raeburn Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 12 04:42:58 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RP4Ua-0006Xg-2k for ged-emacs-devel@m.gmane.org; Sat, 12 Nov 2011 04:42:56 +0100 Original-Received: from localhost ([::1]:38951 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RP4UY-0002BO-PG for ged-emacs-devel@m.gmane.org; Fri, 11 Nov 2011 22:42:54 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:46979) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RP4UV-0002B5-Vs for emacs-devel@gnu.org; Fri, 11 Nov 2011 22:42:53 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RP4UU-0004Sa-R3 for emacs-devel@gnu.org; Fri, 11 Nov 2011 22:42:51 -0500 Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:49659) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RP4UU-0004SQ-B2 for emacs-devel@gnu.org; Fri, 11 Nov 2011 22:42:50 -0500 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id 823EE9706A3; Sat, 12 Nov 2011 12:42:42 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 6E7991A2772; Sat, 12 Nov 2011 12:42:42 +0900 (JST) In-Reply-To: <553A1AC5-4002-4175-AD1F-CC596356E44C@raeburn.org> X-Mailer: VM 8.2.0a1 under 21.5 (beta31) "ginger" 2dbefd79b3d3 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 130.158.97.224 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:145997 Archived-At: Ken Raeburn writes: > This is a common idiom for handling a structure with a > variable-sized array at the end. More precisely, "a concrete struct moduling a (abstract) structure containing exactly one array whose size is variable at compile time". It's always possible to move one array to the end of the struct, but this only works well if there's just one. (If there's more than one, I would consider the cognitive load of figuring out which one to optimize and then later remembering that decision too great to be worth it.) > The latest spec for the C language has support built in for a > slightly different mechanism, but this scheme used above has been a > common way to get around the absence of variable-sized arrays in > structures in traditional C for That is to say, it's an optimization hack. It saves the space of one pointer, and the time of one pointer dereference (plus possible allocation overhead), versus a member that is a pointer to an array allocated elsewhere. The alternative of a fixed-size array would be unacceptable in Lisp, but obviously in most cases you'd end up with a huge amount of space wasted since the size distribution of arrays very likely follows a power law, so the fixed size "should" be "big". > a long time, and it works on most if not all C compilers. clang (llvm) warns about accesses with a constant index greater than zero, but it works. > The "len-1" means we want an array with "len" elements in it, but > the first element is covered by "sizeof *p" because "struct > Lisp_Vector" includes one array element already. If you use an > index greater than zero, the value referenced may be stored in > extra padding space at the end of the structure, if there is any, Ah, now that interests me. If there actually is such space, most constant index accesses are to a[1], and in theory we could allocate array space to fill out the padding. However, since the structure alignment requirement is usually no more than sizeof(pointer), which is the same as sizeof(Lisp_Object), and IIRC all of the Lisp_Vectors warned about are general (ie, the elements are Lisp_Objects), we probably don't buy anything. I wonder if there is any real loss to allocating two or three elements in the base structure (ie, how often vectors of length less than three or four are used). Aside to OP: In Emacs Lisp although strings (ie, arrays with bytes as elements and a non-uniform indexing scheme) are vectors in the API, they don't share an implementation with general vectors. In fact the string storage is allocated separately (another optimization, because strings with len(data) % pointersize != 0 would waste substantial amounts of memory, at least by 1980 standards), so there's always a pointer indirection. This usually is not too inefficient since many operations on strings are of the form "get string data, do O(len) operation on it", so the overhead of a pointer dereference to "get string data" is tolerable. > If you use a compiler or interpreter designed to run C code with > extensive, very pedantic run-time checks, it *might* notice if the > array index used is greater than 0 and warn you. But under pretty > much any normal compiler this should just work without any > complaint. I guess you would consider clang "perverse" then. I agree, but don't care: it saves about 40% of compilation time, and 10% of total build time for me. :-) It would also be possible to do dataflow analysis on loops and determine that for non-degenerate termination values there would be an access to elements after the declared end. Ie, for (i=0; i < n; i++) foo(lispvec.data[i]); will be flagged for n > 1. Since there's no real point to a `for' loop if n <= 1 always, issue a warning. This would be stupid in C, of course, because this idiom is very common (and typically used to implement a structure that enables the program to check bounds etc at runtime, as Emacs Lisp does).