* Reading portions of large files
@ 2003-01-09 15:45 Gerald.Jean
0 siblings, 0 replies; 21+ messages in thread
From: Gerald.Jean @ 2003-01-09 15:45 UTC (permalink / raw)
Hello,
I have very large files, sometimes over 1G, from which I would like to edit
very small portions, the headers or trailers for example. Emacs won't open
those files, it complains about them being too big. Is it possible to
edit, and save back after editing, only small portions of such files.
Thanks,
Gérald Jean
Analyste-conseil (statistiques), Actuariat
télephone : (418) 835-4900 poste (7639)
télecopieur : (418) 835-6657
courrier électronique: gerald.jean@spgdag.ca
"In God we trust all others must bring data" W. Edwards Deming
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
[not found] <mailman.100.1042135372.21513.help-gnu-emacs@gnu.org>
@ 2003-01-09 18:20 ` David Kastrup
2003-01-10 19:21 ` Eli Zaretskii
[not found] ` <mailman.153.1042230313.21513.help-gnu-emacs@gnu.org>
2003-01-10 16:27 ` Eric Pement
2003-01-10 17:16 ` Brendan Halpin
2 siblings, 2 replies; 21+ messages in thread
From: David Kastrup @ 2003-01-09 18:20 UTC (permalink / raw)
Gerald.Jean@spgdag.ca writes:
> Hello,
>
> I have very large files, sometimes over 1G, from which I would like to edit
> very small portions, the headers or trailers for example. Emacs won't open
> those files, it complains about them being too big. Is it possible to
> edit, and save back after editing, only small portions of such files.
insert-file-contents is a built-in function.
(insert-file-contents FILENAME &optional VISIT BEG END REPLACE)
Insert contents of file FILENAME after point.
Returns list of absolute file name and number of bytes inserted.
If second argument VISIT is non-nil, the buffer's visited filename
and last save file modtime are set, and it is marked unmodified.
If visiting and the file does not exist, visiting is completed
before the error is signaled.
The optional third and fourth arguments BEG and END
specify what portion of the file to insert.
[...]
As to writing? No idea at the moment.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
[not found] <mailman.100.1042135372.21513.help-gnu-emacs@gnu.org>
2003-01-09 18:20 ` Reading portions of large files David Kastrup
@ 2003-01-10 16:27 ` Eric Pement
2003-01-10 17:16 ` Brendan Halpin
2 siblings, 0 replies; 21+ messages in thread
From: Eric Pement @ 2003-01-10 16:27 UTC (permalink / raw)
Gerald.Jean@spgdag.ca wrote in message news:<mailman.100.1042135372.21513.help-gnu-emacs@gnu.org>...
> Hello,
>
> I have very large files, sometimes over 1G, from which I would like to
> edit
> very small portions, the headers or trailers for example. Emacs won't
> open
> those files, it complains about them being too big.
The Emacs FAQ says that Emacs 20 and above can be compiled "on some
64-bit systems" to hande files of up to 550 million Gigabytes. However,
it looks a bit dated and it would be handier if this section of the
GNU Emacs FAQ were brought more up-to-date.
If you use Windows editors, Vedit (http://www.vedit.com) will edit
files of up to 2 Gigs in size, though it may take some time to load
files of this size.
And for just over-the-top accommodation, PDT-Windows claims to
handle filesizes of up to 18 Exabytes (that's 18 billion Gigs)! I
wonder if that is larger than the aggregate storage of all disks on
the Internet? On a more realistic plane, their website says they
have "easily edited files of 3 - 5 gigabytes".
I've downloaded the eval version, and this editor is intended
for editing large databases or binary files. It won't work well for
plaintext or concatenated.tar program code. If you're interested,
the URL is http://www.pro-central.com/pdt_win.htm
HTH.
--
Eric Pement
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
[not found] <mailman.100.1042135372.21513.help-gnu-emacs@gnu.org>
2003-01-09 18:20 ` Reading portions of large files David Kastrup
2003-01-10 16:27 ` Eric Pement
@ 2003-01-10 17:16 ` Brendan Halpin
2003-01-10 20:35 ` Benjamin Riefenstahl
2003-01-20 7:50 ` Lee Sau Dan
2 siblings, 2 replies; 21+ messages in thread
From: Brendan Halpin @ 2003-01-10 17:16 UTC (permalink / raw)
Gerald.Jean@spgdag.ca writes:
> I have very large files, sometimes over 1G, from which I would like to edit
> very small portions, the headers or trailers for example. Emacs won't open
> those files, it complains about them being too big. Is it possible to
> edit, and save back after editing, only small portions of such files.
Use head and tail to split the file into the header-to-be-edited
and the-rest. Edit the header-to-be-edited in emacs, save, then
concatenated the-rest onto it.
Assuming all editing is within the first 2000 bytes (not tested):
head -c2000 bigfile > header-to-be-edited
tail -c+2001 bigfile > the-rest
(edit header-to-be-edited, save)
cat header-to-be-edited the-rest > new-big-file
Even if the file is not too big to fit in Emacs, this should be
faster for very big files where the editing is all in a small
section.
Brendan
--
Brendan Halpin, Deptartment of Sociology, University of Limerick, Ireland
Tel: w +353-61-213147 f +353-61-202569 h +353-61-390476; Room F2-025 x 3147
<mailto:brendan.halpin@ul.ie> <http://wivenhoe.staff8.ul.ie/~brendan>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-09 18:20 ` Reading portions of large files David Kastrup
@ 2003-01-10 19:21 ` Eli Zaretskii
[not found] ` <mailman.153.1042230313.21513.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 21+ messages in thread
From: Eli Zaretskii @ 2003-01-10 19:21 UTC (permalink / raw)
> From: David Kastrup <dak@gnu.org>
> Newsgroups: gnu.emacs.help
> Date: 09 Jan 2003 19:20:06 +0100
>
> > I have very large files, sometimes over 1G, from which I would like to edit
> > very small portions, the headers or trailers for example. Emacs won't open
> > those files, it complains about them being too big. Is it possible to
> > edit, and save back after editing, only small portions of such files.
>
> insert-file-contents is a built-in function.
> (insert-file-contents FILENAME &optional VISIT BEG END REPLACE)
I don't think this will help the OP, since BEG and END need to be
representable as Lisp integers, so they still are subject to the same
128-MB limit.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-10 17:16 ` Brendan Halpin
@ 2003-01-10 20:35 ` Benjamin Riefenstahl
2003-01-11 10:25 ` Klaus Berndl
2003-01-20 7:50 ` Lee Sau Dan
2003-01-20 7:50 ` Lee Sau Dan
1 sibling, 2 replies; 21+ messages in thread
From: Benjamin Riefenstahl @ 2003-01-10 20:35 UTC (permalink / raw)
Brendan Halpin <brendan.halpin@ul.ie> writes:
> Use head and tail to split the file into the header-to-be-edited and
> the-rest. Edit the header-to-be-edited in emacs, save, then
> concatenated the-rest onto it.
>
> Assuming all editing is within the first 2000 bytes (not tested):
>
> head -c2000 bigfile > header-to-be-edited
> tail -c+2001 bigfile > the-rest
> (edit header-to-be-edited, save)
> cat header-to-be-edited the-rest > new-big-file
This assumes a) Unix, b) that you have the space and time ;-) to deal
with the large temporary files.
If you can assume Unix, dd is a little better, I think. I recently
had success with using it for extracting and later re-inserting a bit
in a large file. Getting the options right is a bit of a pain, but
the main thing was getting the direction (extract and re-insert) right
and using conv=notrunc for re-insertion. And than dd is oriented
towards blocks of bytes, not lines, of course. And you can not change
the size of the block to be edited, but than large files are usually
binary files, where you don't want to change byte offsets anyway.
so long, benny
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
[not found] ` <mailman.153.1042230313.21513.help-gnu-emacs@gnu.org>
@ 2003-01-10 20:51 ` David Kastrup
2003-01-11 8:51 ` Eli Zaretskii
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: David Kastrup @ 2003-01-10 20:51 UTC (permalink / raw)
"Eli Zaretskii" <eliz@is.elta.co.il> writes:
> > From: David Kastrup <dak@gnu.org>
> > Newsgroups: gnu.emacs.help
> > Date: 09 Jan 2003 19:20:06 +0100
> >
> > > I have very large files, sometimes over 1G, from which I would
> > > like to edit very small portions, the headers or trailers for
> > > example. Emacs won't open those files, it complains about them
> > > being too big. Is it possible to edit, and save back after
> > > editing, only small portions of such files.
> >
> > insert-file-contents is a built-in function.
> > (insert-file-contents FILENAME &optional VISIT BEG END REPLACE)
>
> I don't think this will help the OP, since BEG and END need to be
> representable as Lisp integers, so they still are subject to the same
> 128-MB limit.
Oops, I forgot. In that case it would probably be best to run dd
from or to pipes with appropriate options for writing and reading
pieces from a big file.
BTW, would it be terribly complicated to extend the range of Lisp
integers to 31bit? Integers don't need any garbage collection or tag
bits per se. One could still use, say, the upper byte (or a smaller
unit) as a tag byte, only that the first or last 128 values would all
signify "integer".
Emacs has a most-positive-fixnum of 134217727, while XEmacs has
1073741823, more than 8 times as much. So it would appear to be
possible in theory.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-10 20:51 ` David Kastrup
@ 2003-01-11 8:51 ` Eli Zaretskii
[not found] ` <mailman.169.1042278925.21513.help-gnu-emacs@gnu.org>
2003-01-12 20:38 ` Stefan Monnier <foo@acm.com>
2 siblings, 0 replies; 21+ messages in thread
From: Eli Zaretskii @ 2003-01-11 8:51 UTC (permalink / raw)
> From: David Kastrup <dak@gnu.org>
> Newsgroups: gnu.emacs.help
> Date: 10 Jan 2003 21:51:49 +0100
>
> BTW, would it be terribly complicated to extend the range of Lisp
> integers to 31bit?
It's not terribly hard, but IIRC the current consensus among the Emacs
maintainers is that it's not important enough to do that because
before long all machines will have 64-bit compilers.
Perhaps this should be discussed again on the developers' list.
> Integers don't need any garbage collection or tag bits per se.
They need to be distinguishable from other Lisp types, so their tag
bitfield cannot have an arbitrary bit pattern.
> Emacs has a most-positive-fixnum of 134217727, while XEmacs has
> 1073741823, more than 8 times as much. So it would appear to be
> possible in theory.
IIRC, the XEmacs way requires extensive changes in how Emacs works,
but I don't remember the details.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-10 20:35 ` Benjamin Riefenstahl
@ 2003-01-11 10:25 ` Klaus Berndl
2003-01-20 7:50 ` Lee Sau Dan
1 sibling, 0 replies; 21+ messages in thread
From: Klaus Berndl @ 2003-01-11 10:25 UTC (permalink / raw)
On 10 Jan 2003, Benjamin Riefenstahl wrote:
> Brendan Halpin <brendan.halpin@ul.ie> writes:
> > Use head and tail to split the file into the header-to-be-edited and
> > the-rest. Edit the header-to-be-edited in emacs, save, then
> > concatenated the-rest onto it.
> >
> > Assuming all editing is within the first 2000 bytes (not tested):
> >
> > head -c2000 bigfile > header-to-be-edited
> > tail -c+2001 bigfile > the-rest
> > (edit header-to-be-edited, save)
> > cat header-to-be-edited the-rest > new-big-file
>
> This assumes a) Unix, b) that you have the space and time ;-) to deal
> with the large temporary files.
Assumption a) is not necessary or correct because there is the cygwin-suite
for Windows available - IMHO a must for using Emacs on Windows-systems ;-)
Cygwin contains tail and head!
Klaus
--
Klaus Berndl mailto: klaus.berndl@sdm.de
sd&m AG http://www.sdm.de
software design & management
Thomas-Dehler-Str. 27, 81737 München, Germany
Tel +49 89 63812-392, Fax -220
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
[not found] ` <mailman.169.1042278925.21513.help-gnu-emacs@gnu.org>
@ 2003-01-11 10:42 ` David Kastrup
0 siblings, 0 replies; 21+ messages in thread
From: David Kastrup @ 2003-01-11 10:42 UTC (permalink / raw)
"Eli Zaretskii" <eliz@is.elta.co.il> writes:
> > From: David Kastrup <dak@gnu.org>
> > Newsgroups: gnu.emacs.help
> > Date: 10 Jan 2003 21:51:49 +0100
> >
> > BTW, would it be terribly complicated to extend the range of Lisp
> > integers to 31bit?
>
> It's not terribly hard, but IIRC the current consensus among the Emacs
> maintainers is that it's not important enough to do that because
> before long all machines will have 64-bit compilers.
>
> Perhaps this should be discussed again on the developers' list.
>
> > Integers don't need any garbage collection or tag bits per se.
>
> They need to be distinguishable from other Lisp types, so their tag
> bitfield cannot have an arbitrary bit pattern.
Yes, but a single bit is sufficient for that distinction. This could
even speed up operations, since the sign bit is a candidate that can
be rather quickly checked.
Something like
if (x < 0)
will establish that something is an integer,
(x + 0x40000000)
will yield the value of the integer, and
(x | 0x8000000)
will convert an integer back to a Lisp number.
I don't know whether an integer Lisp object needs to be identical to
an integer. If it does, then the above needs an offset of 0x40000000
everywhere, of course.
> > Emacs has a most-positive-fixnum of 134217727, while XEmacs has
> > 1073741823, more than 8 times as much. So it would appear to be
> > possible in theory.
>
> IIRC, the XEmacs way requires extensive changes in how Emacs works,
> but I don't remember the details.
No clue about that.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-10 20:51 ` David Kastrup
2003-01-11 8:51 ` Eli Zaretskii
[not found] ` <mailman.169.1042278925.21513.help-gnu-emacs@gnu.org>
@ 2003-01-12 20:38 ` Stefan Monnier <foo@acm.com>
2003-01-13 7:40 ` Miles Bader
2003-01-20 7:50 ` Lee Sau Dan
2 siblings, 2 replies; 21+ messages in thread
From: Stefan Monnier <foo@acm.com> @ 2003-01-12 20:38 UTC (permalink / raw)
> BTW, would it be terribly complicated to extend the range of Lisp
> integers to 31bit?
Currently a cons cell takes 2 words. Each word has 3 tag bits and
1 mark bit. When marking a cons cell, the GC sets the mark bit of the
first word of the cell. The mark bit of the second word is unused
(i.e. wasted).
Since at least 1 bit of tag is needed, that means that to get 31bit
integers we'd need to move the mark bit somewhere else. XEmacs decided to
use 3-word cons cells (and I know they're still regularly wondering
whether it was a good idea). Another approach is to use a separate mark-bit
array.
Lots of trade offs, a fair bit of coding, even more testing, ...
Anybody interested is welcome to tried it out. My opinion is that maybe it
would be nice, but since the only application I'm aware of is "editing files
between 128MB and 1GB on 32bit systems", I don't think it's worth
the trouble.
Stefan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-12 20:38 ` Stefan Monnier <foo@acm.com>
@ 2003-01-13 7:40 ` Miles Bader
2003-01-13 7:42 ` Miles Bader
2003-01-20 7:50 ` Lee Sau Dan
1 sibling, 1 reply; 21+ messages in thread
From: Miles Bader @ 2003-01-13 7:40 UTC (permalink / raw)
"Stefan Monnier <foo@acm.com>" <monnier+gnu.emacs.help/news/@flint.cs.yale.edu> writes:
> Since at least 1 bit of tag is needed, that means that to get 31bit
> integers we'd need to move the mark bit somewhere else.
Hmmm? I thought only boxed object had to have a mark bit, in which case
integers don't need one. [Indeed, looking at the current garbage
collector, it doesn't seem to mark integers]
I'd also like to have low-bit tags so I can stack-allocate lisp objects...
-Miles
--
Is it true that nothing can be known? If so how do we know this? -Woody Allen
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-13 7:40 ` Miles Bader
@ 2003-01-13 7:42 ` Miles Bader
2003-01-13 7:55 ` David Kastrup
0 siblings, 1 reply; 21+ messages in thread
From: Miles Bader @ 2003-01-13 7:42 UTC (permalink / raw)
Miles Bader <miles@gnu.org> writes:
> Hmmm? I thought only boxed object had to have a mark bit, in which case
> integers don't need one. [Indeed, looking at the current garbage
> collector, it doesn't seem to mark integers]
Oh wait, I was confused, it does need a mark-bit for cons cells...
Sorry for the noise...
-miles
--
Freedom's just another word, for nothing left to lose --Janis Joplin
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-13 7:42 ` Miles Bader
@ 2003-01-13 7:55 ` David Kastrup
2003-01-13 8:05 ` Miles Bader
0 siblings, 1 reply; 21+ messages in thread
From: David Kastrup @ 2003-01-13 7:55 UTC (permalink / raw)
Miles Bader <miles@gnu.org> writes:
> Miles Bader <miles@gnu.org> writes:
> > Hmmm? I thought only boxed object had to have a mark bit, in which case
> > integers don't need one. [Indeed, looking at the current garbage
> > collector, it doesn't seem to mark integers]
>
> Oh wait, I was confused, it does need a mark-bit for cons cells...
>
> Sorry for the noise...
Cons cells are not integers. Care to explain for somebody dull?
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-13 7:55 ` David Kastrup
@ 2003-01-13 8:05 ` Miles Bader
0 siblings, 0 replies; 21+ messages in thread
From: Miles Bader @ 2003-01-13 8:05 UTC (permalink / raw)
David Kastrup <dak@gnu.org> writes:
> Cons cells are not integers. Care to explain for somebody dull?
Cons cells don't have a header, so they need to use the mark-bit of one
of their components, meaning that anything you can store into a
cons-cell needs a mark-bit.
I wonder how feasible it would be to use another sort of GC, like
stop-and-copy, which doesn't need mark-bits...
-Miles
--
"1971 pickup truck; will trade for guns"
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-10 20:35 ` Benjamin Riefenstahl
2003-01-11 10:25 ` Klaus Berndl
@ 2003-01-20 7:50 ` Lee Sau Dan
2003-01-20 12:46 ` Benjamin Riefenstahl
1 sibling, 1 reply; 21+ messages in thread
From: Lee Sau Dan @ 2003-01-20 7:50 UTC (permalink / raw)
>>>>> "Benjamin" == Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:
>> Assuming all editing is within the first 2000 bytes (not
>> tested):
>>
>> head -c2000 bigfile > header-to-be-edited
>> tail -c+2001 bigfile > the-rest
>> (edit header-to-be-edited, save)
>> cat header-to-be-edited the-rest > new-big-file
Benjamin> This assumes a) Unix, b) that you have the space and
Benjamin> time ;-) to deal with the large temporary files.
(b) is assumed even if you use other method. Most *text* editors
would save files by first writing a temp. copy of the new version,
followed by renaming the new version to the old name. So, in case of
a crash, you don't lose everything. Either the old version or the new
version should survive intact.
So, if you didn't have the extra disk space, you can't do the editing
either.
Time? It doesn't take much time to 'split' and 'cat'. Moreover,
running the editor on smaller pieces do save time on loading and
saving the file fragments. Moreover, the editor doesn't need that
much RAM when editing the file.
Benjamin> If you can assume Unix, dd is a little better, I think.
Why not 'split'?
Benjamin> I recently had success with using it for extracting and
Benjamin> later re-inserting a bit in a large file.
Only when the extracted and re-inserted blocks are of the same size.
This is the case for hex editing, but not *text* editing. If you're
doing hex editing, you shouldn't be using a text editor in the first
place. There are hex editors which doesn't need to load the whole
file into memory.
Benjamin> Getting the options right is a bit of a pain,
No. That is true only when you're using 'dd' for the first time.
After a few times, it's easy to remember what options to use. Most of
the time, I only need "if=", "of=", "bs=", "skip=", "seek=" and
"count=". These option names are quite easy to remember once you know
the basic principle that 'dd' works by transferring blocks of the
input file to output file.
Benjamin> but the main thing was getting the direction (extract
Benjamin> and re-insert) right and using conv=notrunc for
Benjamin> re-insertion. And than dd is oriented towards blocks of
Benjamin> bytes, not lines, of course.
This is the down side. For line-oriented operations, use 'head',
'tail', 'cat', 'sed', or even 'awk' and 'perl'.
Benjamin> And you can not change the size of the block to be
Benjamin> edited, but than large files are usually binary files,
Benjamin> where you don't want to change byte offsets anyway.
Then, find a hex editor. *Text* editors are simply not the right tool
to edit huge *binary* files. In theory, hex editors can be
implemented very efficiently using mmap().
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-10 17:16 ` Brendan Halpin
2003-01-10 20:35 ` Benjamin Riefenstahl
@ 2003-01-20 7:50 ` Lee Sau Dan
1 sibling, 0 replies; 21+ messages in thread
From: Lee Sau Dan @ 2003-01-20 7:50 UTC (permalink / raw)
>>>>> "Brendan" == Brendan Halpin <brendan.halpin@ul.ie> writes:
Brendan> Use head and tail to split the file into the
Brendan> header-to-be-edited and the-rest. Edit the
Brendan> header-to-be-edited in emacs, save, then concatenated
Brendan> the-rest onto it.
Brendan> Assuming all editing is within the first 2000 bytes (not
Brendan> tested):
Brendan> head -c2000 bigfile > header-to-be-edited
Brendan> tail -c+2001 bigfile > the-rest
Brendan> (edit header-to-be-edited, save)
Brendan> cat header-to-be-edited the-rest > new-big-file
Why not use 'split'? :)
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-12 20:38 ` Stefan Monnier <foo@acm.com>
2003-01-13 7:40 ` Miles Bader
@ 2003-01-20 7:50 ` Lee Sau Dan
2003-01-24 7:55 ` Mac
2003-01-27 14:44 ` Stefan Monnier <foo@acm.com>
1 sibling, 2 replies; 21+ messages in thread
From: Lee Sau Dan @ 2003-01-20 7:50 UTC (permalink / raw)
>>>>> "Stefan" == "Stefan Monnier <foo@acm.com>" <monnier+gnu.emacs.help/news/@flint.cs.yale.edu> writes:
Stefan> Since at least 1 bit of tag is needed, that means that to
Stefan> get 31bit integers we'd need to move the mark bit
Stefan> somewhere else. XEmacs decided to use 3-word cons cells
Stefan> (and I know they're still regularly wondering whether it
Stefan> was a good idea). Another approach is to use a separate
Stefan> mark-bit array.
I think the separate mark-bit array would be cleaner. You don't need
to access the mark bits unless you're doing gc. Why let that bit
stick there in the _main_ working set all the time? Wouldn't a
separate mark-bit array also improve locality (important for caching)?
Then, in theory, the tag bits can also be kept separately, giving the
full 32 bits to integers (represented as machine-native words). I
think we only need 1 tag bit in the separate tag-bit array. Its
function is to indicate whether the corresponding memory word is an
integer or not. If not, then the remaining tag bits are found in the
word itself. And integer arithmetic can certainly be faster!
Would this implementation be more efficient or worse?
Stefan> Lots of trade offs, a fair bit of coding, even more
Stefan> testing, ... Anybody interested is welcome to tried it
Stefan> out. My opinion is that maybe it would be nice, but since
Stefan> the only application I'm aware of is "editing files
Stefan> between 128MB and 1GB on 32bit systems", I don't think
Stefan> it's worth the trouble.
Yeah. I share this last point with you. >128MB text files are simply
weird. And for binary file, a real hex editor (or 'xxd', which I just
discovered) is a more appropriate tool, or just 'dd'.
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-20 7:50 ` Lee Sau Dan
@ 2003-01-20 12:46 ` Benjamin Riefenstahl
0 siblings, 0 replies; 21+ messages in thread
From: Benjamin Riefenstahl @ 2003-01-20 12:46 UTC (permalink / raw)
Hi,
> [attribution cut off]
> >> head -c2000 bigfile > header-to-be-edited
> >> tail -c+2001 bigfile > the-rest
> >> [...]
> >>>>> "Benjamin" == Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de>
> writes:
> Benjamin> This assumes a) Unix, b) that you have the space and
> Benjamin> time ;-) to deal with the large temporary files.
Lee Sau Dan <danlee@informatik.uni-freiburg.de> writes:
> (b) is assumed even if you use other method.
With something like the dd method I don't ever have to copy the whole
file. Makes a difference when your file is a CD image of 600 MB and
all you want to do is patch the partition table.
> Time? It doesn't take much time to 'split' and 'cat'.
It takes several minutes on my machine with the mentioned file.
> Why not 'split'?
I didn't think of that one before. But it also copies the whole
file.
> There are hex editors which doesn't need to load the whole file into
> memory.
I'm not aware of a commonly used hex editor on Unix. Do you have a
recommendation?
so long, benny
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-20 7:50 ` Lee Sau Dan
@ 2003-01-24 7:55 ` Mac
2003-01-27 14:44 ` Stefan Monnier <foo@acm.com>
1 sibling, 0 replies; 21+ messages in thread
From: Mac @ 2003-01-24 7:55 UTC (permalink / raw)
On 20 Jan 2003, Lee Sau Dan wrote:
>
> Stefan> Lots of trade offs, a fair bit of coding, even more
> Stefan> testing, ... Anybody interested is welcome to tried it
> Stefan> out. My opinion is that maybe it would be nice, but
> Stefan> since the only application I'm aware of is "editing
> Stefan> files between 128MB and 1GB on 32bit systems", I don't
> Stefan> think it's worth the trouble.
>
> Yeah. I share this last point with you. >128MB text files are
> simply weird. And for binary file, a real hex editor (or 'xxd',
> which I just discovered) is a more appropriate tool, or just 'dd'.
Well, it is a weird world. When working with hardware development,
file sizes over 128MB is very common (netlists, sdf-files,
logfiles...), although what you do with these huge files are
limited. Its mainly search and replace (occur, query-replace-regexp
etc).
/mac
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Reading portions of large files
2003-01-20 7:50 ` Lee Sau Dan
2003-01-24 7:55 ` Mac
@ 2003-01-27 14:44 ` Stefan Monnier <foo@acm.com>
1 sibling, 0 replies; 21+ messages in thread
From: Stefan Monnier <foo@acm.com> @ 2003-01-27 14:44 UTC (permalink / raw)
> think we only need 1 tag bit in the separate tag-bit array. Its
> function is to indicate whether the corresponding memory word is an
> integer or not. If not, then the remaining tag bits are found in the
> word itself. And integer arithmetic can certainly be faster!
Integer arithmetic performance is a complete non-issue in Emacs (and most
other tagged programming languages, as a matter of fact).
Stefan
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2003-01-27 14:44 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <mailman.100.1042135372.21513.help-gnu-emacs@gnu.org>
2003-01-09 18:20 ` Reading portions of large files David Kastrup
2003-01-10 19:21 ` Eli Zaretskii
[not found] ` <mailman.153.1042230313.21513.help-gnu-emacs@gnu.org>
2003-01-10 20:51 ` David Kastrup
2003-01-11 8:51 ` Eli Zaretskii
[not found] ` <mailman.169.1042278925.21513.help-gnu-emacs@gnu.org>
2003-01-11 10:42 ` David Kastrup
2003-01-12 20:38 ` Stefan Monnier <foo@acm.com>
2003-01-13 7:40 ` Miles Bader
2003-01-13 7:42 ` Miles Bader
2003-01-13 7:55 ` David Kastrup
2003-01-13 8:05 ` Miles Bader
2003-01-20 7:50 ` Lee Sau Dan
2003-01-24 7:55 ` Mac
2003-01-27 14:44 ` Stefan Monnier <foo@acm.com>
2003-01-10 16:27 ` Eric Pement
2003-01-10 17:16 ` Brendan Halpin
2003-01-10 20:35 ` Benjamin Riefenstahl
2003-01-11 10:25 ` Klaus Berndl
2003-01-20 7:50 ` Lee Sau Dan
2003-01-20 12:46 ` Benjamin Riefenstahl
2003-01-20 7:50 ` Lee Sau Dan
2003-01-09 15:45 Gerald.Jean
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).