From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Lee Sau Dan Newsgroups: gmane.emacs.help Subject: Re: Reading portions of large files Date: 20 Jan 2003 08:50:31 +0100 Organization: Rechenzentrum der Universitaet Freiburg, Germany Sender: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Message-ID: References: <5lbs2mdrxs.fsf@rum.cs.yale.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=cn-big5 Content-Transfer-Encoding: 8bit X-Trace: main.gmane.org 1043056576 14649 80.91.224.249 (20 Jan 2003 09:56:16 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 20 Jan 2003 09:56:16 +0000 (UTC) Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18aYf8-0003o8-00 for ; Mon, 20 Jan 2003 10:56:14 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18aYXW-0006gm-01 for gnu-help-gnu-emacs@m.gmane.org; Mon, 20 Jan 2003 04:48:22 -0500 Original-Newsgroups: gnu.emacs.help Original-Lines: 41 Original-NNTP-Posting-Host: camaro.informatik.uni-freiburg.de User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!newsmi-us.news.garr.it!NewsITBone-GARR!news.mailgate.org!newsfeed.stueberl.de!npeer.de.kpn-eurorings.net!rz.uni-karlsruhe.de!news.uni-ulm.de!news.belwue.de!news.uni-freiburg.de!not-for-mail Original-Xref: shelby.stanford.edu gnu.emacs.help:109222 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.help:5744 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:5744 >>>>> "Stefan" == "Stefan Monnier " writes: Stefan> Since at least 1 bit of tag is needed, that means that to Stefan> get 31bit integers we'd need to move the mark bit Stefan> somewhere else. XEmacs decided to use 3-word cons cells Stefan> (and I know they're still regularly wondering whether it Stefan> was a good idea). Another approach is to use a separate Stefan> mark-bit array. I think the separate mark-bit array would be cleaner. You don't need to access the mark bits unless you're doing gc. Why let that bit stick there in the _main_ working set all the time? Wouldn't a separate mark-bit array also improve locality (important for caching)? Then, in theory, the tag bits can also be kept separately, giving the full 32 bits to integers (represented as machine-native words). I think we only need 1 tag bit in the separate tag-bit array. Its function is to indicate whether the corresponding memory word is an integer or not. If not, then the remaining tag bits are found in the word itself. And integer arithmetic can certainly be faster! Would this implementation be more efficient or worse? Stefan> Lots of trade offs, a fair bit of coding, even more Stefan> testing, ... Anybody interested is welcome to tried it Stefan> out. My opinion is that maybe it would be nice, but since Stefan> the only application I'm aware of is "editing files Stefan> between 128MB and 1GB on 32bit systems", I don't think Stefan> it's worth the trouble. Yeah. I share this last point with you. >128MB text files are simply weird. And for binary file, a real hex editor (or 'xxd', which I just discovered) is a more appropriate tool, or just 'dd'. -- Lee Sau Dan §õ¦u´°(Big5) ~{@nJX6X~}(HZ) E-mail: danlee@informatik.uni-freiburg.de Home page: http://www.informatik.uni-freiburg.de/~danlee