* Reading portions of large files @ 2003-01-09 15:45 Gerald.Jean 0 siblings, 0 replies; 21+ messages in thread From: Gerald.Jean @ 2003-01-09 15:45 UTC (permalink / raw) Hello, I have very large files, sometimes over 1G, from which I would like to edit very small portions, the headers or trailers for example. Emacs won't open those files, it complains about them being too big. Is it possible to edit, and save back after editing, only small portions of such files. Thanks, Gérald Jean Analyste-conseil (statistiques), Actuariat télephone : (418) 835-4900 poste (7639) télecopieur : (418) 835-6657 courrier électronique: gerald.jean@spgdag.ca "In God we trust all others must bring data" W. Edwards Deming ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <mailman.100.1042135372.21513.help-gnu-emacs@gnu.org>]
* Re: Reading portions of large files [not found] <mailman.100.1042135372.21513.help-gnu-emacs@gnu.org> @ 2003-01-09 18:20 ` David Kastrup 2003-01-10 19:21 ` Eli Zaretskii [not found] ` <mailman.153.1042230313.21513.help-gnu-emacs@gnu.org> 2003-01-10 16:27 ` Eric Pement 2003-01-10 17:16 ` Brendan Halpin 2 siblings, 2 replies; 21+ messages in thread From: David Kastrup @ 2003-01-09 18:20 UTC (permalink / raw) Gerald.Jean@spgdag.ca writes: > Hello, > > I have very large files, sometimes over 1G, from which I would like to edit > very small portions, the headers or trailers for example. Emacs won't open > those files, it complains about them being too big. Is it possible to > edit, and save back after editing, only small portions of such files. insert-file-contents is a built-in function. (insert-file-contents FILENAME &optional VISIT BEG END REPLACE) Insert contents of file FILENAME after point. Returns list of absolute file name and number of bytes inserted. If second argument VISIT is non-nil, the buffer's visited filename and last save file modtime are set, and it is marked unmodified. If visiting and the file does not exist, visiting is completed before the error is signaled. The optional third and fourth arguments BEG and END specify what portion of the file to insert. [...] As to writing? No idea at the moment. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-09 18:20 ` David Kastrup @ 2003-01-10 19:21 ` Eli Zaretskii [not found] ` <mailman.153.1042230313.21513.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 21+ messages in thread From: Eli Zaretskii @ 2003-01-10 19:21 UTC (permalink / raw) > From: David Kastrup <dak@gnu.org> > Newsgroups: gnu.emacs.help > Date: 09 Jan 2003 19:20:06 +0100 > > > I have very large files, sometimes over 1G, from which I would like to edit > > very small portions, the headers or trailers for example. Emacs won't open > > those files, it complains about them being too big. Is it possible to > > edit, and save back after editing, only small portions of such files. > > insert-file-contents is a built-in function. > (insert-file-contents FILENAME &optional VISIT BEG END REPLACE) I don't think this will help the OP, since BEG and END need to be representable as Lisp integers, so they still are subject to the same 128-MB limit. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <mailman.153.1042230313.21513.help-gnu-emacs@gnu.org>]
* Re: Reading portions of large files [not found] ` <mailman.153.1042230313.21513.help-gnu-emacs@gnu.org> @ 2003-01-10 20:51 ` David Kastrup 2003-01-11 8:51 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: David Kastrup @ 2003-01-10 20:51 UTC (permalink / raw) "Eli Zaretskii" <eliz@is.elta.co.il> writes: > > From: David Kastrup <dak@gnu.org> > > Newsgroups: gnu.emacs.help > > Date: 09 Jan 2003 19:20:06 +0100 > > > > > I have very large files, sometimes over 1G, from which I would > > > like to edit very small portions, the headers or trailers for > > > example. Emacs won't open those files, it complains about them > > > being too big. Is it possible to edit, and save back after > > > editing, only small portions of such files. > > > > insert-file-contents is a built-in function. > > (insert-file-contents FILENAME &optional VISIT BEG END REPLACE) > > I don't think this will help the OP, since BEG and END need to be > representable as Lisp integers, so they still are subject to the same > 128-MB limit. Oops, I forgot. In that case it would probably be best to run dd from or to pipes with appropriate options for writing and reading pieces from a big file. BTW, would it be terribly complicated to extend the range of Lisp integers to 31bit? Integers don't need any garbage collection or tag bits per se. One could still use, say, the upper byte (or a smaller unit) as a tag byte, only that the first or last 128 values would all signify "integer". Emacs has a most-positive-fixnum of 134217727, while XEmacs has 1073741823, more than 8 times as much. So it would appear to be possible in theory. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-10 20:51 ` David Kastrup @ 2003-01-11 8:51 ` Eli Zaretskii [not found] ` <mailman.169.1042278925.21513.help-gnu-emacs@gnu.org> 2003-01-12 20:38 ` Stefan Monnier <foo@acm.com> 2 siblings, 0 replies; 21+ messages in thread From: Eli Zaretskii @ 2003-01-11 8:51 UTC (permalink / raw) > From: David Kastrup <dak@gnu.org> > Newsgroups: gnu.emacs.help > Date: 10 Jan 2003 21:51:49 +0100 > > BTW, would it be terribly complicated to extend the range of Lisp > integers to 31bit? It's not terribly hard, but IIRC the current consensus among the Emacs maintainers is that it's not important enough to do that because before long all machines will have 64-bit compilers. Perhaps this should be discussed again on the developers' list. > Integers don't need any garbage collection or tag bits per se. They need to be distinguishable from other Lisp types, so their tag bitfield cannot have an arbitrary bit pattern. > Emacs has a most-positive-fixnum of 134217727, while XEmacs has > 1073741823, more than 8 times as much. So it would appear to be > possible in theory. IIRC, the XEmacs way requires extensive changes in how Emacs works, but I don't remember the details. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <mailman.169.1042278925.21513.help-gnu-emacs@gnu.org>]
* Re: Reading portions of large files [not found] ` <mailman.169.1042278925.21513.help-gnu-emacs@gnu.org> @ 2003-01-11 10:42 ` David Kastrup 0 siblings, 0 replies; 21+ messages in thread From: David Kastrup @ 2003-01-11 10:42 UTC (permalink / raw) "Eli Zaretskii" <eliz@is.elta.co.il> writes: > > From: David Kastrup <dak@gnu.org> > > Newsgroups: gnu.emacs.help > > Date: 10 Jan 2003 21:51:49 +0100 > > > > BTW, would it be terribly complicated to extend the range of Lisp > > integers to 31bit? > > It's not terribly hard, but IIRC the current consensus among the Emacs > maintainers is that it's not important enough to do that because > before long all machines will have 64-bit compilers. > > Perhaps this should be discussed again on the developers' list. > > > Integers don't need any garbage collection or tag bits per se. > > They need to be distinguishable from other Lisp types, so their tag > bitfield cannot have an arbitrary bit pattern. Yes, but a single bit is sufficient for that distinction. This could even speed up operations, since the sign bit is a candidate that can be rather quickly checked. Something like if (x < 0) will establish that something is an integer, (x + 0x40000000) will yield the value of the integer, and (x | 0x8000000) will convert an integer back to a Lisp number. I don't know whether an integer Lisp object needs to be identical to an integer. If it does, then the above needs an offset of 0x40000000 everywhere, of course. > > Emacs has a most-positive-fixnum of 134217727, while XEmacs has > > 1073741823, more than 8 times as much. So it would appear to be > > possible in theory. > > IIRC, the XEmacs way requires extensive changes in how Emacs works, > but I don't remember the details. No clue about that. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-10 20:51 ` David Kastrup 2003-01-11 8:51 ` Eli Zaretskii [not found] ` <mailman.169.1042278925.21513.help-gnu-emacs@gnu.org> @ 2003-01-12 20:38 ` Stefan Monnier <foo@acm.com> 2003-01-13 7:40 ` Miles Bader 2003-01-20 7:50 ` Lee Sau Dan 2 siblings, 2 replies; 21+ messages in thread From: Stefan Monnier <foo@acm.com> @ 2003-01-12 20:38 UTC (permalink / raw) > BTW, would it be terribly complicated to extend the range of Lisp > integers to 31bit? Currently a cons cell takes 2 words. Each word has 3 tag bits and 1 mark bit. When marking a cons cell, the GC sets the mark bit of the first word of the cell. The mark bit of the second word is unused (i.e. wasted). Since at least 1 bit of tag is needed, that means that to get 31bit integers we'd need to move the mark bit somewhere else. XEmacs decided to use 3-word cons cells (and I know they're still regularly wondering whether it was a good idea). Another approach is to use a separate mark-bit array. Lots of trade offs, a fair bit of coding, even more testing, ... Anybody interested is welcome to tried it out. My opinion is that maybe it would be nice, but since the only application I'm aware of is "editing files between 128MB and 1GB on 32bit systems", I don't think it's worth the trouble. Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-12 20:38 ` Stefan Monnier <foo@acm.com> @ 2003-01-13 7:40 ` Miles Bader 2003-01-13 7:42 ` Miles Bader 2003-01-20 7:50 ` Lee Sau Dan 1 sibling, 1 reply; 21+ messages in thread From: Miles Bader @ 2003-01-13 7:40 UTC (permalink / raw) "Stefan Monnier <foo@acm.com>" <monnier+gnu.emacs.help/news/@flint.cs.yale.edu> writes: > Since at least 1 bit of tag is needed, that means that to get 31bit > integers we'd need to move the mark bit somewhere else. Hmmm? I thought only boxed object had to have a mark bit, in which case integers don't need one. [Indeed, looking at the current garbage collector, it doesn't seem to mark integers] I'd also like to have low-bit tags so I can stack-allocate lisp objects... -Miles -- Is it true that nothing can be known? If so how do we know this? -Woody Allen ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-13 7:40 ` Miles Bader @ 2003-01-13 7:42 ` Miles Bader 2003-01-13 7:55 ` David Kastrup 0 siblings, 1 reply; 21+ messages in thread From: Miles Bader @ 2003-01-13 7:42 UTC (permalink / raw) Miles Bader <miles@gnu.org> writes: > Hmmm? I thought only boxed object had to have a mark bit, in which case > integers don't need one. [Indeed, looking at the current garbage > collector, it doesn't seem to mark integers] Oh wait, I was confused, it does need a mark-bit for cons cells... Sorry for the noise... -miles -- Freedom's just another word, for nothing left to lose --Janis Joplin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-13 7:42 ` Miles Bader @ 2003-01-13 7:55 ` David Kastrup 2003-01-13 8:05 ` Miles Bader 0 siblings, 1 reply; 21+ messages in thread From: David Kastrup @ 2003-01-13 7:55 UTC (permalink / raw) Miles Bader <miles@gnu.org> writes: > Miles Bader <miles@gnu.org> writes: > > Hmmm? I thought only boxed object had to have a mark bit, in which case > > integers don't need one. [Indeed, looking at the current garbage > > collector, it doesn't seem to mark integers] > > Oh wait, I was confused, it does need a mark-bit for cons cells... > > Sorry for the noise... Cons cells are not integers. Care to explain for somebody dull? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-13 7:55 ` David Kastrup @ 2003-01-13 8:05 ` Miles Bader 0 siblings, 0 replies; 21+ messages in thread From: Miles Bader @ 2003-01-13 8:05 UTC (permalink / raw) David Kastrup <dak@gnu.org> writes: > Cons cells are not integers. Care to explain for somebody dull? Cons cells don't have a header, so they need to use the mark-bit of one of their components, meaning that anything you can store into a cons-cell needs a mark-bit. I wonder how feasible it would be to use another sort of GC, like stop-and-copy, which doesn't need mark-bits... -Miles -- "1971 pickup truck; will trade for guns" ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-12 20:38 ` Stefan Monnier <foo@acm.com> 2003-01-13 7:40 ` Miles Bader @ 2003-01-20 7:50 ` Lee Sau Dan 2003-01-24 7:55 ` Mac 2003-01-27 14:44 ` Stefan Monnier <foo@acm.com> 1 sibling, 2 replies; 21+ messages in thread From: Lee Sau Dan @ 2003-01-20 7:50 UTC (permalink / raw) >>>>> "Stefan" == "Stefan Monnier <foo@acm.com>" <monnier+gnu.emacs.help/news/@flint.cs.yale.edu> writes: Stefan> Since at least 1 bit of tag is needed, that means that to Stefan> get 31bit integers we'd need to move the mark bit Stefan> somewhere else. XEmacs decided to use 3-word cons cells Stefan> (and I know they're still regularly wondering whether it Stefan> was a good idea). Another approach is to use a separate Stefan> mark-bit array. I think the separate mark-bit array would be cleaner. You don't need to access the mark bits unless you're doing gc. Why let that bit stick there in the _main_ working set all the time? Wouldn't a separate mark-bit array also improve locality (important for caching)? Then, in theory, the tag bits can also be kept separately, giving the full 32 bits to integers (represented as machine-native words). I think we only need 1 tag bit in the separate tag-bit array. Its function is to indicate whether the corresponding memory word is an integer or not. If not, then the remaining tag bits are found in the word itself. And integer arithmetic can certainly be faster! Would this implementation be more efficient or worse? Stefan> Lots of trade offs, a fair bit of coding, even more Stefan> testing, ... Anybody interested is welcome to tried it Stefan> out. My opinion is that maybe it would be nice, but since Stefan> the only application I'm aware of is "editing files Stefan> between 128MB and 1GB on 32bit systems", I don't think Stefan> it's worth the trouble. Yeah. I share this last point with you. >128MB text files are simply weird. And for binary file, a real hex editor (or 'xxd', which I just discovered) is a more appropriate tool, or just 'dd'. -- Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ) E-mail: danlee@informatik.uni-freiburg.de Home page: http://www.informatik.uni-freiburg.de/~danlee ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-20 7:50 ` Lee Sau Dan @ 2003-01-24 7:55 ` Mac 2003-01-27 14:44 ` Stefan Monnier <foo@acm.com> 1 sibling, 0 replies; 21+ messages in thread From: Mac @ 2003-01-24 7:55 UTC (permalink / raw) On 20 Jan 2003, Lee Sau Dan wrote: > > Stefan> Lots of trade offs, a fair bit of coding, even more > Stefan> testing, ... Anybody interested is welcome to tried it > Stefan> out. My opinion is that maybe it would be nice, but > Stefan> since the only application I'm aware of is "editing > Stefan> files between 128MB and 1GB on 32bit systems", I don't > Stefan> think it's worth the trouble. > > Yeah. I share this last point with you. >128MB text files are > simply weird. And for binary file, a real hex editor (or 'xxd', > which I just discovered) is a more appropriate tool, or just 'dd'. Well, it is a weird world. When working with hardware development, file sizes over 128MB is very common (netlists, sdf-files, logfiles...), although what you do with these huge files are limited. Its mainly search and replace (occur, query-replace-regexp etc). /mac ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-20 7:50 ` Lee Sau Dan 2003-01-24 7:55 ` Mac @ 2003-01-27 14:44 ` Stefan Monnier <foo@acm.com> 1 sibling, 0 replies; 21+ messages in thread From: Stefan Monnier <foo@acm.com> @ 2003-01-27 14:44 UTC (permalink / raw) > think we only need 1 tag bit in the separate tag-bit array. Its > function is to indicate whether the corresponding memory word is an > integer or not. If not, then the remaining tag bits are found in the > word itself. And integer arithmetic can certainly be faster! Integer arithmetic performance is a complete non-issue in Emacs (and most other tagged programming languages, as a matter of fact). Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files [not found] <mailman.100.1042135372.21513.help-gnu-emacs@gnu.org> 2003-01-09 18:20 ` David Kastrup @ 2003-01-10 16:27 ` Eric Pement 2003-01-10 17:16 ` Brendan Halpin 2 siblings, 0 replies; 21+ messages in thread From: Eric Pement @ 2003-01-10 16:27 UTC (permalink / raw) Gerald.Jean@spgdag.ca wrote in message news:<mailman.100.1042135372.21513.help-gnu-emacs@gnu.org>... > Hello, > > I have very large files, sometimes over 1G, from which I would like to > edit > very small portions, the headers or trailers for example. Emacs won't > open > those files, it complains about them being too big. The Emacs FAQ says that Emacs 20 and above can be compiled "on some 64-bit systems" to hande files of up to 550 million Gigabytes. However, it looks a bit dated and it would be handier if this section of the GNU Emacs FAQ were brought more up-to-date. If you use Windows editors, Vedit (http://www.vedit.com) will edit files of up to 2 Gigs in size, though it may take some time to load files of this size. And for just over-the-top accommodation, PDT-Windows claims to handle filesizes of up to 18 Exabytes (that's 18 billion Gigs)! I wonder if that is larger than the aggregate storage of all disks on the Internet? On a more realistic plane, their website says they have "easily edited files of 3 - 5 gigabytes". I've downloaded the eval version, and this editor is intended for editing large databases or binary files. It won't work well for plaintext or concatenated.tar program code. If you're interested, the URL is http://www.pro-central.com/pdt_win.htm HTH. -- Eric Pement ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files [not found] <mailman.100.1042135372.21513.help-gnu-emacs@gnu.org> 2003-01-09 18:20 ` David Kastrup 2003-01-10 16:27 ` Eric Pement @ 2003-01-10 17:16 ` Brendan Halpin 2003-01-10 20:35 ` Benjamin Riefenstahl 2003-01-20 7:50 ` Lee Sau Dan 2 siblings, 2 replies; 21+ messages in thread From: Brendan Halpin @ 2003-01-10 17:16 UTC (permalink / raw) Gerald.Jean@spgdag.ca writes: > I have very large files, sometimes over 1G, from which I would like to edit > very small portions, the headers or trailers for example. Emacs won't open > those files, it complains about them being too big. Is it possible to > edit, and save back after editing, only small portions of such files. Use head and tail to split the file into the header-to-be-edited and the-rest. Edit the header-to-be-edited in emacs, save, then concatenated the-rest onto it. Assuming all editing is within the first 2000 bytes (not tested): head -c2000 bigfile > header-to-be-edited tail -c+2001 bigfile > the-rest (edit header-to-be-edited, save) cat header-to-be-edited the-rest > new-big-file Even if the file is not too big to fit in Emacs, this should be faster for very big files where the editing is all in a small section. Brendan -- Brendan Halpin, Deptartment of Sociology, University of Limerick, Ireland Tel: w +353-61-213147 f +353-61-202569 h +353-61-390476; Room F2-025 x 3147 <mailto:brendan.halpin@ul.ie> <http://wivenhoe.staff8.ul.ie/~brendan> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-10 17:16 ` Brendan Halpin @ 2003-01-10 20:35 ` Benjamin Riefenstahl 2003-01-11 10:25 ` Klaus Berndl 2003-01-20 7:50 ` Lee Sau Dan 2003-01-20 7:50 ` Lee Sau Dan 1 sibling, 2 replies; 21+ messages in thread From: Benjamin Riefenstahl @ 2003-01-10 20:35 UTC (permalink / raw) Brendan Halpin <brendan.halpin@ul.ie> writes: > Use head and tail to split the file into the header-to-be-edited and > the-rest. Edit the header-to-be-edited in emacs, save, then > concatenated the-rest onto it. > > Assuming all editing is within the first 2000 bytes (not tested): > > head -c2000 bigfile > header-to-be-edited > tail -c+2001 bigfile > the-rest > (edit header-to-be-edited, save) > cat header-to-be-edited the-rest > new-big-file This assumes a) Unix, b) that you have the space and time ;-) to deal with the large temporary files. If you can assume Unix, dd is a little better, I think. I recently had success with using it for extracting and later re-inserting a bit in a large file. Getting the options right is a bit of a pain, but the main thing was getting the direction (extract and re-insert) right and using conv=notrunc for re-insertion. And than dd is oriented towards blocks of bytes, not lines, of course. And you can not change the size of the block to be edited, but than large files are usually binary files, where you don't want to change byte offsets anyway. so long, benny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-10 20:35 ` Benjamin Riefenstahl @ 2003-01-11 10:25 ` Klaus Berndl 2003-01-20 7:50 ` Lee Sau Dan 1 sibling, 0 replies; 21+ messages in thread From: Klaus Berndl @ 2003-01-11 10:25 UTC (permalink / raw) On 10 Jan 2003, Benjamin Riefenstahl wrote: > Brendan Halpin <brendan.halpin@ul.ie> writes: > > Use head and tail to split the file into the header-to-be-edited and > > the-rest. Edit the header-to-be-edited in emacs, save, then > > concatenated the-rest onto it. > > > > Assuming all editing is within the first 2000 bytes (not tested): > > > > head -c2000 bigfile > header-to-be-edited > > tail -c+2001 bigfile > the-rest > > (edit header-to-be-edited, save) > > cat header-to-be-edited the-rest > new-big-file > > This assumes a) Unix, b) that you have the space and time ;-) to deal > with the large temporary files. Assumption a) is not necessary or correct because there is the cygwin-suite for Windows available - IMHO a must for using Emacs on Windows-systems ;-) Cygwin contains tail and head! Klaus -- Klaus Berndl mailto: klaus.berndl@sdm.de sd&m AG http://www.sdm.de software design & management Thomas-Dehler-Str. 27, 81737 München, Germany Tel +49 89 63812-392, Fax -220 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-10 20:35 ` Benjamin Riefenstahl 2003-01-11 10:25 ` Klaus Berndl @ 2003-01-20 7:50 ` Lee Sau Dan 2003-01-20 12:46 ` Benjamin Riefenstahl 1 sibling, 1 reply; 21+ messages in thread From: Lee Sau Dan @ 2003-01-20 7:50 UTC (permalink / raw) >>>>> "Benjamin" == Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes: >> Assuming all editing is within the first 2000 bytes (not >> tested): >> >> head -c2000 bigfile > header-to-be-edited >> tail -c+2001 bigfile > the-rest >> (edit header-to-be-edited, save) >> cat header-to-be-edited the-rest > new-big-file Benjamin> This assumes a) Unix, b) that you have the space and Benjamin> time ;-) to deal with the large temporary files. (b) is assumed even if you use other method. Most *text* editors would save files by first writing a temp. copy of the new version, followed by renaming the new version to the old name. So, in case of a crash, you don't lose everything. Either the old version or the new version should survive intact. So, if you didn't have the extra disk space, you can't do the editing either. Time? It doesn't take much time to 'split' and 'cat'. Moreover, running the editor on smaller pieces do save time on loading and saving the file fragments. Moreover, the editor doesn't need that much RAM when editing the file. Benjamin> If you can assume Unix, dd is a little better, I think. Why not 'split'? Benjamin> I recently had success with using it for extracting and Benjamin> later re-inserting a bit in a large file. Only when the extracted and re-inserted blocks are of the same size. This is the case for hex editing, but not *text* editing. If you're doing hex editing, you shouldn't be using a text editor in the first place. There are hex editors which doesn't need to load the whole file into memory. Benjamin> Getting the options right is a bit of a pain, No. That is true only when you're using 'dd' for the first time. After a few times, it's easy to remember what options to use. Most of the time, I only need "if=", "of=", "bs=", "skip=", "seek=" and "count=". These option names are quite easy to remember once you know the basic principle that 'dd' works by transferring blocks of the input file to output file. Benjamin> but the main thing was getting the direction (extract Benjamin> and re-insert) right and using conv=notrunc for Benjamin> re-insertion. And than dd is oriented towards blocks of Benjamin> bytes, not lines, of course. This is the down side. For line-oriented operations, use 'head', 'tail', 'cat', 'sed', or even 'awk' and 'perl'. Benjamin> And you can not change the size of the block to be Benjamin> edited, but than large files are usually binary files, Benjamin> where you don't want to change byte offsets anyway. Then, find a hex editor. *Text* editors are simply not the right tool to edit huge *binary* files. In theory, hex editors can be implemented very efficiently using mmap(). -- Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ) E-mail: danlee@informatik.uni-freiburg.de Home page: http://www.informatik.uni-freiburg.de/~danlee ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-20 7:50 ` Lee Sau Dan @ 2003-01-20 12:46 ` Benjamin Riefenstahl 0 siblings, 0 replies; 21+ messages in thread From: Benjamin Riefenstahl @ 2003-01-20 12:46 UTC (permalink / raw) Hi, > [attribution cut off] > >> head -c2000 bigfile > header-to-be-edited > >> tail -c+2001 bigfile > the-rest > >> [...] > >>>>> "Benjamin" == Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> > writes: > Benjamin> This assumes a) Unix, b) that you have the space and > Benjamin> time ;-) to deal with the large temporary files. Lee Sau Dan <danlee@informatik.uni-freiburg.de> writes: > (b) is assumed even if you use other method. With something like the dd method I don't ever have to copy the whole file. Makes a difference when your file is a CD image of 600 MB and all you want to do is patch the partition table. > Time? It doesn't take much time to 'split' and 'cat'. It takes several minutes on my machine with the mentioned file. > Why not 'split'? I didn't think of that one before. But it also copies the whole file. > There are hex editors which doesn't need to load the whole file into > memory. I'm not aware of a commonly used hex editor on Unix. Do you have a recommendation? so long, benny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files 2003-01-10 17:16 ` Brendan Halpin 2003-01-10 20:35 ` Benjamin Riefenstahl @ 2003-01-20 7:50 ` Lee Sau Dan 1 sibling, 0 replies; 21+ messages in thread From: Lee Sau Dan @ 2003-01-20 7:50 UTC (permalink / raw) >>>>> "Brendan" == Brendan Halpin <brendan.halpin@ul.ie> writes: Brendan> Use head and tail to split the file into the Brendan> header-to-be-edited and the-rest. Edit the Brendan> header-to-be-edited in emacs, save, then concatenated Brendan> the-rest onto it. Brendan> Assuming all editing is within the first 2000 bytes (not Brendan> tested): Brendan> head -c2000 bigfile > header-to-be-edited Brendan> tail -c+2001 bigfile > the-rest Brendan> (edit header-to-be-edited, save) Brendan> cat header-to-be-edited the-rest > new-big-file Why not use 'split'? :) -- Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ) E-mail: danlee@informatik.uni-freiburg.de Home page: http://www.informatik.uni-freiburg.de/~danlee ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2003-01-27 14:44 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-01-09 15:45 Reading portions of large files Gerald.Jean [not found] <mailman.100.1042135372.21513.help-gnu-emacs@gnu.org> 2003-01-09 18:20 ` David Kastrup 2003-01-10 19:21 ` Eli Zaretskii [not found] ` <mailman.153.1042230313.21513.help-gnu-emacs@gnu.org> 2003-01-10 20:51 ` David Kastrup 2003-01-11 8:51 ` Eli Zaretskii [not found] ` <mailman.169.1042278925.21513.help-gnu-emacs@gnu.org> 2003-01-11 10:42 ` David Kastrup 2003-01-12 20:38 ` Stefan Monnier <foo@acm.com> 2003-01-13 7:40 ` Miles Bader 2003-01-13 7:42 ` Miles Bader 2003-01-13 7:55 ` David Kastrup 2003-01-13 8:05 ` Miles Bader 2003-01-20 7:50 ` Lee Sau Dan 2003-01-24 7:55 ` Mac 2003-01-27 14:44 ` Stefan Monnier <foo@acm.com> 2003-01-10 16:27 ` Eric Pement 2003-01-10 17:16 ` Brendan Halpin 2003-01-10 20:35 ` Benjamin Riefenstahl 2003-01-11 10:25 ` Klaus Berndl 2003-01-20 7:50 ` Lee Sau Dan 2003-01-20 12:46 ` Benjamin Riefenstahl 2003-01-20 7:50 ` Lee Sau Dan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).