From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Lee Sau Dan Newsgroups: gmane.emacs.help Subject: Re: Reading portions of large files Date: 20 Jan 2003 08:50:30 +0100 Organization: Rechenzentrum der Universitaet Freiburg, Germany Sender: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=cn-big5 Content-Transfer-Encoding: 8bit X-Trace: main.gmane.org 1043055996 12526 80.91.224.249 (20 Jan 2003 09:46:36 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 20 Jan 2003 09:46:36 +0000 (UTC) Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18aYVm-0003Fr-00 for ; Mon, 20 Jan 2003 10:46:35 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18aYWw-0006Tz-05 for gnu-help-gnu-emacs@m.gmane.org; Mon, 20 Jan 2003 04:47:46 -0500 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!newsmi-us.news.garr.it!NewsITBone-GARR!news.mailgate.org!newsfeed.stueberl.de!news-mue1.dfn.de!news-stu1.dfn.de!news.belwue.de!news.uni-freiburg.de!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 77 Original-NNTP-Posting-Host: camaro.informatik.uni-freiburg.de User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 Original-Xref: shelby.stanford.edu gnu.emacs.help:109220 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.help:5739 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:5739 >>>>> "Benjamin" == Benjamin Riefenstahl writes: >> Assuming all editing is within the first 2000 bytes (not >> tested): >> >> head -c2000 bigfile > header-to-be-edited >> tail -c+2001 bigfile > the-rest >> (edit header-to-be-edited, save) >> cat header-to-be-edited the-rest > new-big-file Benjamin> This assumes a) Unix, b) that you have the space and Benjamin> time ;-) to deal with the large temporary files. (b) is assumed even if you use other method. Most *text* editors would save files by first writing a temp. copy of the new version, followed by renaming the new version to the old name. So, in case of a crash, you don't lose everything. Either the old version or the new version should survive intact. So, if you didn't have the extra disk space, you can't do the editing either. Time? It doesn't take much time to 'split' and 'cat'. Moreover, running the editor on smaller pieces do save time on loading and saving the file fragments. Moreover, the editor doesn't need that much RAM when editing the file. Benjamin> If you can assume Unix, dd is a little better, I think. Why not 'split'? Benjamin> I recently had success with using it for extracting and Benjamin> later re-inserting a bit in a large file. Only when the extracted and re-inserted blocks are of the same size. This is the case for hex editing, but not *text* editing. If you're doing hex editing, you shouldn't be using a text editor in the first place. There are hex editors which doesn't need to load the whole file into memory. Benjamin> Getting the options right is a bit of a pain, No. That is true only when you're using 'dd' for the first time. After a few times, it's easy to remember what options to use. Most of the time, I only need "if=", "of=", "bs=", "skip=", "seek=" and "count=". These option names are quite easy to remember once you know the basic principle that 'dd' works by transferring blocks of the input file to output file. Benjamin> but the main thing was getting the direction (extract Benjamin> and re-insert) right and using conv=notrunc for Benjamin> re-insertion. And than dd is oriented towards blocks of Benjamin> bytes, not lines, of course. This is the down side. For line-oriented operations, use 'head', 'tail', 'cat', 'sed', or even 'awk' and 'perl'. Benjamin> And you can not change the size of the block to be Benjamin> edited, but than large files are usually binary files, Benjamin> where you don't want to change byte offsets anyway. Then, find a hex editor. *Text* editors are simply not the right tool to edit huge *binary* files. In theory, hex editors can be implemented very efficiently using mmap(). -- Lee Sau Dan §õ¦u´°(Big5) ~{@nJX6X~}(HZ) E-mail: danlee@informatik.uni-freiburg.de Home page: http://www.informatik.uni-freiburg.de/~danlee