From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Proposal: immediate strings Date: Thu, 24 May 2012 01:17:31 -0400 Message-ID: References: <4FBB51E7.6080601@yandex.ru> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1337836672 12660 80.91.229.3 (24 May 2012 05:17:52 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 24 May 2012 05:17:52 +0000 (UTC) Cc: emacs-devel@gnu.org To: Dmitry Antipov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu May 24 07:17:51 2012 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SXQQn-0004GX-Pu for ged-emacs-devel@m.gmane.org; Thu, 24 May 2012 07:17:50 +0200 Original-Received: from localhost ([::1]:43622 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXQQn-0001jU-0E for ged-emacs-devel@m.gmane.org; Thu, 24 May 2012 01:17:49 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:54331) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXQQc-0001gU-QF for emacs-devel@gnu.org; Thu, 24 May 2012 01:17:47 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SXQQZ-0002sC-6N for emacs-devel@gnu.org; Thu, 24 May 2012 01:17:38 -0400 Original-Received: from ironport-out.teksavvy.com ([206.248.143.162]:18047) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXQQZ-0002rT-2U for emacs-devel@gnu.org; Thu, 24 May 2012 01:17:35 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av0EAG6Zu09MCpYd/2dsb2JhbABEtBGBCIIVAQEEAScvIwULCw4mEhQYDSSGJYF3BboJkEQDozOBWIMFgUM X-IronPort-AV: E=Sophos;i="4.75,637,1330923600"; d="scan'208";a="182156156" Original-Received: from 76-10-150-29.dsl.teksavvy.com (HELO pastel.home) ([76.10.150.29]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 24 May 2012 01:17:32 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id A438F58C01; Thu, 24 May 2012 01:17:31 -0400 (EDT) In-Reply-To: <4FBB51E7.6080601@yandex.ru> (Dmitry Antipov's message of "Tue, 22 May 2012 12:44:23 +0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 206.248.143.162 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:150626 Archived-At: > +#if __GNUC__ > 1 /* Any GCC still in use should support this. */ > +#define PACKED __attribute__((packed)) > +#else > +#define PACKED > +#endif Let's not use "packed" structures: too much trouble. > + if (nbytes >= STRING_IMM_MAX) > + /* Impossible immediate string. */ Actually, we can allocate all the pure strings as "immediate strings" (within the limits of imm.nbytes, of course), in which case we'd have strings with more bytes than STRING_IMM_MAX, so while this assertion may have been useful for debugging, we'll want to get rid of it. > + if (nbytes < STRING_IMM_MAX) > + /* Impossible normal string. */ > + abort (); Please use "eassert (nbytes >= STRING_IMM_MAX);" instead. > +#ifdef GC_STRING_STATS > + total_imm_strings = total_dat_strings = 0; > + total_imm_bytes = total_dat_bytes = 0; > + total_imm_intervals = total_dat_intervals = 0; > +#endif Have you looked at some of those stats? I'd be interested to know which proportion of the strings are "immediate and using intervals", v.s. "just slightly too big to be immediate, but not using intervals". > + /* Fill data with special pattern. Used by > + GC to find dead immediate strings. */ > + memset (s->u.imm.data, 0xff, STRING_IMM_MAX); Is that really 100% reliable? Why not use the `intervals' field with a 0xdeadbeef pointer instead? > +#define SDATA(string) (XSTRING (string)->u.imm.immbit ? \ > + (XSTRING (string)->u.imm.data + 0) : \ > + (XSTRING (string)->u.dat.data + 0)) IIUC the "+ 0" are unneeded here, because using a "..?..:.." already makes sure we have an rvalue. Same for SCHARS. > + This assumes that sizeof (EMACS_INT) is equal to sizeof (void * ). */ > +#define STRING_IMM_MAX (3 * sizeof (EMACS_INT) - 2) Miles already pointed out that this is not a valid assumption. > do { if (EQ (STR, empty_multibyte_string)) \ > (STR) = empty_unibyte_string; \ > - else XSTRING (STR)->size_byte = -1; } while (0) > + else if (XSTRING (STR)->u.imm.immbit) \ > + XSTRING (STR)->u.imm.size_byte = -1; \ > + else \ > + XSTRING (STR)->u.dat.size_byte = -1; } while (0) BTW, TAB in cc-mode will not only properly indent the code, but also nicely align the \ at the end of lines. ;-) > + union { > + > + /* GC mark bit and subtype bit are in IMM just by convention - when > + IMMBIT is 0, the DAT field is used except it's UNUSED field. */ > + > + struct { > + unsigned gcmarkbit : 1; > + unsigned immbit : 1; > + EMACS_INT size : 7; > + EMACS_INT size_byte : 7; > + unsigned char data[STRING_IMM_MAX]; > + } PACKED imm; > + > + struct { > + unsigned unused : 2; /* Do not access this placeholder. */ > + EMACS_INT size : BITS_PER_EMACS_INT - 1; > + EMACS_INT size_byte : BITS_PER_EMACS_INT - 1; > + unsigned char *data; > + } PACKED dat; > + > + } u; > }; Access to your `size' field is inefficient because it will straddle two words. We could move the gcmarkbit as Paul suggests (at the cost of having to check immbit before knowing where gcmarkbit is located), or maybe we can move both the mark bit and the immbit into the `intervals' fields (after all, we know those are aligned on a multiple of at least 4 on all architectures on which Emacs is known to run, so we have 2 bits free for (ab)use there). Stefan