From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel,gmane.emacs.help Subject: Re: maximum buffer size exceeded Date: Wed, 05 Sep 2007 11:00:43 -0400 Message-ID: References: <87ejhgdux0.fsf@lion.rapttech.com.au> <87veaqr5l2.fsf@kobe.laptop> <873axt1e6d.fsf@kfs-lx.testafd.dk> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1189004468 9206 80.91.229.12 (5 Sep 2007 15:01:08 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 5 Sep 2007 15:01:08 +0000 (UTC) Cc: Eli Zaretskii , help-gnu-emacs@gnu.org, emacs-devel@gnu.org To: storm@cua.dk (Kim F. Storm) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Sep 05 17:01:08 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1ISwNF-0007uE-Ke for ged-emacs-devel@m.gmane.org; Wed, 05 Sep 2007 17:00:57 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1ISwND-0006GB-VB for ged-emacs-devel@m.gmane.org; Wed, 05 Sep 2007 11:00:56 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1ISwN9-0006DS-03 for emacs-devel@gnu.org; Wed, 05 Sep 2007 11:00:51 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1ISwN8-0006C0-2D for emacs-devel@gnu.org; Wed, 05 Sep 2007 11:00:50 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1ISwN7-0006Be-Pu; Wed, 05 Sep 2007 11:00:49 -0400 Original-Received: from bc.sympatico.ca ([209.226.175.184] helo=tomts22-srv.bellnexxia.net) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1ISwN3-0006jJ-TK; Wed, 05 Sep 2007 11:00:46 -0400 Original-Received: from pastel.home ([70.53.192.250]) by tomts22-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20070905150044.ZZQX18413.tomts22-srv.bellnexxia.net@pastel.home>; Wed, 5 Sep 2007 11:00:44 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id F19AD82E4; Wed, 5 Sep 2007 11:00:43 -0400 (EDT) In-Reply-To: <873axt1e6d.fsf@kfs-lx.testafd.dk> (Kim F. Storm's message of "Wed\, 05 Sep 2007 14\:37\:14 +0200") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/23.0.50 (gnu/linux) X-Detected-Kernel: Solaris 8 (1) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:77865 gmane.emacs.help:47239 Archived-At: >> I think the current view in Emacs development is that 64-bit platforms >> solve this problem so easily that its solution for 32-bit machines is >> much less important than working on other Emacs features. > Actually, I think a small trick could increase the buffer size to 1 GB > on 32 bit machines at the cost of a little(?) wasted memory. > [Note: Assuming USE_LSB_TAG is defined] > Currently, we have the lowest 3 bits reserved for the Lisp Type, > meaning that the largest positive Emacs integer is 2^28-1 (256MB). > Now, consider if we reserve 4 bits for the Lisp Type, but > in such a way the Lisp_Int == 0, while the other Lisp types > are odd numbers 1,3,5,7,... > In this setup, an integer can be recognized by looking at the lowest > bit alone (== 0), while the other Lisp types are recognized using the > current methods (looking at all 4 type bits). > The only drawback I can see is that Lisp_Objects have to be allocated > on 16 byte boundaries rather than the current 8 byte boundary, so a > little space may be wasted (and maybe not...). > I haven't tried this, but given that Lisp_Objects are usually accessed > via suitable macros, it looks quite doable. Increasing from 8 to 16 bytes alignment may be a non-trivial problem: 1 - cons cells use 8 bytes right now, so you'd waste a lot of space for them. 2 - same for floats. 3 - in many places, we rely on malloc to align objects on multiple of 8, so we'd have to use some other approach. Numbers 1 and 2 can be solved by giving two tags to cons and floats, so they only need alignment on multiple of 8. Number 3 is more work. But this work may be the same as the one needed to allow us to use USE_LSB_TAG everywhere (even on machines where malloc and static-vars do not guarantee mult-of-8 alignment). We currently have 7 different types (of the 8 possible tag we only use 7). My own local Emacs build uses the trick you suggest but on the 3bits of tags, so I gave 2 tags to integers to allow them to grow up the 2^29 (i.e. max buffer size = 512MB). That's a very simple change. What you suggest would be to use 4 bits i.e. 16 possible tags: - 8 tags for integers (i.e. 8 tags left for the 6 other types) - 2 tags for cons cells (6 tags left for the 5 other types) - 2 tags for floats - one tag each for the remaining 4 types (arrays, symbols, strings, misc). One other problem: currently `misc' objects need 5 32bit words which USE_LSB_TAG forced to round up to 6 32bit words and symbols use 6 32bit words. So rounding up to mult-of-16 would round them both up to 8 32bit words. The two subtypes of misc which use up 5 words are markers and overlays. So with your rounding up, an overlay would use up 3*8=24 words (3 because there's the overlay object plus the two associated marker objects) instead of 15 (without USE_LSB_TAG) or 18. I had plans to try and squeeze `misc' objects down to 4 words (and hence overlays down to 12 words), but this is a non-trivial change. One possible approach is to replace the linked lists of overlays and markers by arrays (managed just like buffer text: with a gap). Another option is to remove the `symbol' and `string' tags and make symbols and strings subtype of `misc'. Then we could keep 3 tag bits and give 4 of the 8 tags to integers. This would simplify the alloc.c code but would also waste more memory (6 words for string objects) and slow down SYMBOLP and STRINGP slightly. Still, the fundamental problem remains the same: files larger than 256MB are most likely not generated manually. So they may very likely grow to more than 4GB tomorrow. Bumping the limit to 512MB or 1GB (or even 4GB for that matter) is only going to help in some fraction of the cases. I think a better approach to handle this problem is to create a special package to visit arbitrarily large files which would work by loading only parts of the file at a time and do manual "swapping". This would not work as smoothly, but then again manipulating 256MB files in Emacs is currently not that smooth either. Stefan PS: You can supposedly open >4GB files in Emacs with 64bit systems, but looking at the C code, it's clear that you'll bump into bugs where we cast EMACS_INT values to and from `int' (which on many 64bit systems are only 32bit). I tend to fix those bugs when I bump into them, but they're everywhere and I've fixed only a tiny fraction of them.