* map-file-lines @ 2009-02-02 17:20 Ted Zlatanov 2009-02-02 18:54 ` map-file-lines Stefan Monnier 2009-02-02 20:48 ` map-file-lines Ted Zlatanov 0 siblings, 2 replies; 49+ messages in thread From: Ted Zlatanov @ 2009-02-02 17:20 UTC (permalink / raw) To: emacs-devel Emacs Lisp lacks a good way to iterate over all the lines of a file, especially for a large file. The following code tries to provide a solution, concentrating on reading a block of data in one shot and then processing it line by line. It may be more efficient to write this in C. Also, it does not deal with cases where the first line read is bigger than the buffer size, and may have other bugs, but it works for me so I thought I'd post it for comments and criticism. Thanks Ted (defun map-file-lines (file func &optional bufsize) (let ((filepos 0) (linenum 0) (bufsize (or bufsize 4096))) (with-temp-buffer (while (let* ((inserted (insert-file-contents file nil filepos (+ filepos bufsize) t)) (numlines (count-lines (point-min) (point-max))) (read (nth 1 inserted)) (done (< 1 read))) (dotimes (n (count-lines (point-min) (point-max))) (goto-char (point-min)) (funcall func (buffer-substring (line-beginning-position) (line-end-position)) (incf linenum)) (incf filepos (line-end-position)) (forward-line)) done))) linenum)) ;;(map-file-lines "/tmp/test" (lambda (line num) (message "%d: %s" line))) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 17:20 map-file-lines Ted Zlatanov @ 2009-02-02 18:54 ` Stefan Monnier 2009-02-02 19:22 ` map-file-lines Ted Zlatanov 2009-02-02 20:48 ` map-file-lines Ted Zlatanov 1 sibling, 1 reply; 49+ messages in thread From: Stefan Monnier @ 2009-02-02 18:54 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > Emacs Lisp lacks a good way to iterate over all the lines of a file, > especially for a large file. I'm not really happy about focusing on "line at a time". It's a useful and common case, but Emacs usually is pretty good about being "line agnostic" (font-lock being an obvious counter example). Providing some kind of stream-processing functionality might be good, tho the need doesn't seem terribly high, since we've managed to avoid it until now. FWIW, another option is to provide an open-file-stream along the same lines as open-network-stream. I.e. the chunks are received via a process filter. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 18:54 ` map-file-lines Stefan Monnier @ 2009-02-02 19:22 ` Ted Zlatanov 2009-02-02 19:52 ` map-file-lines joakim 2009-02-02 22:40 ` map-file-lines Stefan Monnier 0 siblings, 2 replies; 49+ messages in thread From: Ted Zlatanov @ 2009-02-02 19:22 UTC (permalink / raw) To: emacs-devel On Mon, 02 Feb 2009 13:54:30 -0500 Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> Emacs Lisp lacks a good way to iterate over all the lines of a file, >> especially for a large file. SM> I'm not really happy about focusing on "line at a time". It's a useful SM> and common case, but Emacs usually is pretty good about being "line SM> agnostic" (font-lock being an obvious counter example). SM> Providing some kind of stream-processing functionality might be good, SM> tho the need doesn't seem terribly high, since we've managed to avoid it SM> until now. Without this function, Emacs simply can't handle large files and that's been requested at least 4 times by users that I can recall. I think a general solution to the large file Emacs problem would be better, but line-oriented processing is a classic approach to processing large files that many Emacs users will probably find familiar. SM> FWIW, another option is to provide an open-file-stream along the same SM> lines as open-network-stream. I.e. the chunks are received via SM> a process filter. How is that better than insert-file-contents as I use it? Are you thinking of a stateful back/forward seek capability? Or do you mean you'd like it to be asynchronous? Ted ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 19:22 ` map-file-lines Ted Zlatanov @ 2009-02-02 19:52 ` joakim 2009-02-02 20:54 ` map-file-lines Ted Zlatanov 2009-02-02 22:41 ` map-file-lines Stefan Monnier 2009-02-02 22:40 ` map-file-lines Stefan Monnier 1 sibling, 2 replies; 49+ messages in thread From: joakim @ 2009-02-02 19:52 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel Ted Zlatanov <tzz@lifelogs.com> writes: > On Mon, 02 Feb 2009 13:54:30 -0500 Stefan Monnier <monnier@iro.umontreal.ca> wrote: > >>> Emacs Lisp lacks a good way to iterate over all the lines of a file, >>> especially for a large file. > > SM> I'm not really happy about focusing on "line at a time". It's a useful > SM> and common case, but Emacs usually is pretty good about being "line > SM> agnostic" (font-lock being an obvious counter example). > > SM> Providing some kind of stream-processing functionality might be good, > SM> tho the need doesn't seem terribly high, since we've managed to avoid it > SM> until now. > > Without this function, Emacs simply can't handle large files and that's > been requested at least 4 times by users that I can recall. I think a > general solution to the large file Emacs problem would be better, but > line-oriented processing is a classic approach to processing large files > that many Emacs users will probably find familiar. > > SM> FWIW, another option is to provide an open-file-stream along the same > SM> lines as open-network-stream. I.e. the chunks are received via > SM> a process filter. > > How is that better than insert-file-contents as I use it? Are you > thinking of a stateful back/forward seek capability? Or do you mean > you'd like it to be asynchronous? I would like to add my voice in requesting "large file" capability in Emacs. I wanted it many times over the years, but always coped-out into some other editor for this. A common example is hex-editing a large binary file, or finding some text in some enourmous log-file. > > Ted > > -- Joakim Verona ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 19:52 ` map-file-lines joakim @ 2009-02-02 20:54 ` Ted Zlatanov 2009-02-02 22:42 ` map-file-lines Stefan Monnier 2009-02-02 22:41 ` map-file-lines Stefan Monnier 1 sibling, 1 reply; 49+ messages in thread From: Ted Zlatanov @ 2009-02-02 20:54 UTC (permalink / raw) To: emacs-devel On Mon, 02 Feb 2009 20:52:20 +0100 joakim@verona.se wrote: j> I would like to add my voice in requesting "large file" capability in j> Emacs. I wanted it many times over the years, but always coped-out into j> some other editor for this. j> A common example is hex-editing a large binary file, or finding some j> text in some enourmous log-file. I plan to implement large file viewing and searching, depending on Stefan and others' decisions and suggestions. I don't know the Emacs file internals well, so a stream-based solution may be better underneath. Ted ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 20:54 ` map-file-lines Ted Zlatanov @ 2009-02-02 22:42 ` Stefan Monnier 2009-02-03 13:57 ` map-file-lines Ted Zlatanov 0 siblings, 1 reply; 49+ messages in thread From: Stefan Monnier @ 2009-02-02 22:42 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > I plan to implement large file viewing and searching, depending on > Stefan and others' decisions and suggestions. I don't know the Emacs > file internals well, so a stream-based solution may be better > underneath. How would you use map-file-lines to implement the "large file viewing and searching"? Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 22:42 ` map-file-lines Stefan Monnier @ 2009-02-03 13:57 ` Ted Zlatanov 0 siblings, 0 replies; 49+ messages in thread From: Ted Zlatanov @ 2009-02-03 13:57 UTC (permalink / raw) To: emacs-devel On Mon, 02 Feb 2009 17:42:31 -0500 Stefan Monnier <monnier@iro.umontreal.ca> wrote: SM> How would you use map-file-lines to implement the "large file viewing SM> and searching"? ... SM> I know about the large-file problem, obviously, but I wonder what kind SM> of UI you expect to provide, in order for it to be able to work just one SM> line at a time. ... SM> I don't se how such a "line-at-a-time" or "chunk-at-a-time" processing SM> (i.e. stream processing) will enable Emacs to let you conveniently edit SM> a large binary file. ... SM> OK, that's indeed how I imagine it as well, but I fail to see how this SM> relates to map-file-lines. All you need for that is to use the BEG and SM> END args of insert-file-contents (and maybe also to extend those args SM> so they can be floats, in case Emacs's ints are too limited). On Tue, 03 Feb 2009 08:27:01 +0100 joakim@verona.se wrote: j> That being said, how would the insert-file-contents solution work in j> practice? Has something been done along these lines already? Would it be j> possible to make some kind of generic solution that would make for j> instance hexl mode work on large files? I tried to condense the various messages into one reply. map-file-lines as it stands is just a stream processor, and not useful to *view* a large file. Unfortunately Emacs does almost everything in the buffer, so true large-file view and edit requires core-level work to make the buffer able to address the whole file. In addition, almost every Emacs package assumes that the buffer is small and can be scanned quickly; imagine how slow it will be to open and edit a 20GB gzipped file. map-file-lines was much easier to implement, and (just like `grep' and many other utilities) can be useful on its own despite the single-line limitations. So to answer your and Joakim's questions, based on what I know so far, the best approach is to have a special narrowing mode to view large files. It won't use map-file-lines, but like it, it will fetch the next chunk only when needed. It needs to be special because the normal narrow/widen calls should not widen to the whole file. It's best, in fact, if the decision to move the "window" into the file back and forth is only left to the user and to special motion commands, and normal packages can not move that window. Writing inserted text, in particular, is very slow with large files. A binary editor is not so hard to implement, but inserting text may need special permission (like the motion commands above) to make the user experience bearable. Ted ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 19:52 ` map-file-lines joakim 2009-02-02 20:54 ` map-file-lines Ted Zlatanov @ 2009-02-02 22:41 ` Stefan Monnier 2009-02-02 23:59 ` map-file-lines joakim 1 sibling, 1 reply; 49+ messages in thread From: Stefan Monnier @ 2009-02-02 22:41 UTC (permalink / raw) To: joakim; +Cc: Ted Zlatanov, emacs-devel > I would like to add my voice in requesting "large file" capability in > Emacs. I wanted it many times over the years, but always coped-out into > some other editor for this. > A common example is hex-editing a large binary file, or finding some > text in some enourmous log-file. I don't se how such a "line-at-a-time" or "chunk-at-a-time" processing (i.e. stream processing) will enable Emacs to let you conveniently edit a large binary file. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 22:41 ` map-file-lines Stefan Monnier @ 2009-02-02 23:59 ` joakim 2009-02-03 4:13 ` map-file-lines Stefan Monnier 0 siblings, 1 reply; 49+ messages in thread From: joakim @ 2009-02-02 23:59 UTC (permalink / raw) To: Stefan Monnier; +Cc: Ted Zlatanov, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> I would like to add my voice in requesting "large file" capability in >> Emacs. I wanted it many times over the years, but always coped-out into >> some other editor for this. > >> A common example is hex-editing a large binary file, or finding some >> text in some enourmous log-file. > > I don't se how such a "line-at-a-time" or "chunk-at-a-time" processing > (i.e. stream processing) will enable Emacs to let you conveniently edit > a large binary file. Maybe like this: - find where you want to edit, by searching chunk-at-a-time - display a buffer of this chunk - edit it - save the chunk back into the file This could behave conceptually like narrowing. Obviously not perfect, but better than not being able to edit large files at all. > > > Stefan > -- Joakim Verona ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 23:59 ` map-file-lines joakim @ 2009-02-03 4:13 ` Stefan Monnier 2009-02-03 7:27 ` map-file-lines joakim 0 siblings, 1 reply; 49+ messages in thread From: Stefan Monnier @ 2009-02-03 4:13 UTC (permalink / raw) To: joakim; +Cc: Ted Zlatanov, emacs-devel > Maybe like this: > - find where you want to edit, by searching chunk-at-a-time > - display a buffer of this chunk > - edit it > - save the chunk back into the file > This could behave conceptually like narrowing. Obviously not perfect, > but better than not being able to edit large files at all. OK, that's indeed how I imagine it as well, but I fail to see how this relates to map-file-lines. All you need for that is to use the BEG and END args of insert-file-contents (and maybe also to extend those args so they can be floats, in case Emacs's ints are too limited). Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-03 4:13 ` map-file-lines Stefan Monnier @ 2009-02-03 7:27 ` joakim 2009-02-03 14:50 ` map-file-lines Stefan Monnier 0 siblings, 1 reply; 49+ messages in thread From: joakim @ 2009-02-03 7:27 UTC (permalink / raw) To: Stefan Monnier; +Cc: Ted Zlatanov, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> Maybe like this: >> - find where you want to edit, by searching chunk-at-a-time >> - display a buffer of this chunk >> - edit it >> - save the chunk back into the file > >> This could behave conceptually like narrowing. Obviously not perfect, >> but better than not being able to edit large files at all. > > OK, that's indeed how I imagine it as well, but I fail to see how this > relates to map-file-lines. All you need for that is to use the BEG and > END args of insert-file-contents (and maybe also to extend those args > so they can be floats, in case Emacs's ints are too limited). That might indeed be the case. I dont advocate any particular solution to the bigfile problem. I just wanted to explain that some of us longterm emacs users would like this functionality in Emacs. That being said, how would the insert-file-contents solution work in practice? Has something been done along these lines already? Would it be possible to make some kind of generic solution that would make for instance hexl mode work on large files? > > Stefan -- Joakim Verona ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-03 7:27 ` map-file-lines joakim @ 2009-02-03 14:50 ` Stefan Monnier 2009-02-04 7:04 ` map-file-lines Richard M Stallman 2009-02-06 13:20 ` map-file-lines Mathias Dahl 0 siblings, 2 replies; 49+ messages in thread From: Stefan Monnier @ 2009-02-03 14:50 UTC (permalink / raw) To: joakim; +Cc: Ted Zlatanov, emacs-devel > That might indeed be the case. I dont advocate any particular solution > to the bigfile problem. I just wanted to explain that some of us > longterm Emacs users would like this functionality in Emacs. I'm quite aware of it. I myself have had such needs. > That being said, how would the insert-file-contents solution work in > practice? Has something been done along these lines already? Can't remember if someone tried it at all, but I don't think anyone has gotten very far, no. > Would it be possible to make some kind of generic solution that would > make for instance hexl mode work on large files? That sounds very difficult. But it should not be too hard to adapt hexl-mode specifically once the other part is implemented. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-03 14:50 ` map-file-lines Stefan Monnier @ 2009-02-04 7:04 ` Richard M Stallman 2009-02-04 15:38 ` map-file-lines Ted Zlatanov 2009-02-06 13:20 ` map-file-lines Mathias Dahl 1 sibling, 1 reply; 49+ messages in thread From: Richard M Stallman @ 2009-02-04 7:04 UTC (permalink / raw) To: Stefan Monnier; +Cc: tzz, joakim, emacs-devel Here's an idea for a UI for editing big files. First you run M-x grep on the file, and display the matches for whatever regexp. In the *grep* buffer you specify a region, which is a way of choosing two matches, the ones whose entries contain point and mark. Then you give a command to edit the file from one of the matches to the other. It marks these matches (and the lines containing them) as read-only so that you can't spoil the correspondance with the file. Thus, you can always save this partial-file buffer. The beginning and end of the *grep* buffer can be used to specify that the portion to edit starts or ends at bof or eof. It would be easy to adapt this to variants such as (1) using hexl-mode to visit the file, (2) using methods other than grep to subdivide the file, (3) providing more friendly front ends to grep. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-04 7:04 ` map-file-lines Richard M Stallman @ 2009-02-04 15:38 ` Ted Zlatanov 2009-02-05 5:40 ` map-file-lines Richard M Stallman 0 siblings, 1 reply; 49+ messages in thread From: Ted Zlatanov @ 2009-02-04 15:38 UTC (permalink / raw) To: emacs-devel On Wed, 04 Feb 2009 02:04:32 -0500 Richard M Stallman <rms@gnu.org> wrote: RMS> Here's an idea for a UI for editing big files. First you run M-x grep on RMS> the file, and display the matches for whatever regexp. In the *grep* RMS> buffer you specify a region, which is a way of choosing two matches, RMS> the ones whose entries contain point and mark. Then you give a command to edit RMS> the file from one of the matches to the other. It marks these matches RMS> (and the lines containing them) as read-only so that you can't RMS> spoil the correspondance with the file. Thus, you can always save this RMS> partial-file buffer. RMS> The beginning and end of the *grep* buffer can be used to specify RMS> that the portion to edit starts or ends at bof or eof. RMS> It would be easy to adapt this to variants such as RMS> (1) using hexl-mode to visit the file, RMS> (2) using methods other than grep to subdivide the file, RMS> (3) providing more friendly front ends to grep. This is essentially mapping byte offsets to line positions, with extra calculations. As Stefan suggested, it's better to just use byte offsets. Your approach requires a lot of tracking of the grep lines, whereas just using byte offsets requires remembering the two current offsets and nothing else. Otherwise I think your suggestions are similar to mine: set up a special mode where the buffer is a window[1] into the file instead of the whole file, and create special commands to move the window back and forth. Saving would only save the buffer contents; the window won't be moveable until changes are saved (another approach is to remember modifications outside the window, but that gets hairy with undo). Ted [1] I know "window" has meaning in Emacs already, but I can't think of a better term. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-04 15:38 ` map-file-lines Ted Zlatanov @ 2009-02-05 5:40 ` Richard M Stallman 2009-02-06 18:42 ` view/edit large files (was: map-file-lines) Ted Zlatanov 0 siblings, 1 reply; 49+ messages in thread From: Richard M Stallman @ 2009-02-05 5:40 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel This is essentially mapping byte offsets to line positions, with extra calculations. As Stefan suggested, it's better to just use byte offsets. The main point of my message is the UI proposal. You're right that it's good to use byte offsets. I had not thought about that, but it could be done by specifying the -b option for grep. Otherwise I think your suggestions are similar to mine: set up a special mode where the buffer is a window[1] into the file instead of the whole file, and create special commands to move the window back and forth. I proposed a specific UI for specifying which part of the file to edit, one I think will be convenient. ^ permalink raw reply [flat|nested] 49+ messages in thread
* view/edit large files (was: map-file-lines) 2009-02-05 5:40 ` map-file-lines Richard M Stallman @ 2009-02-06 18:42 ` Ted Zlatanov 2009-02-06 21:06 ` view/edit large files Ted Zlatanov 2009-02-07 9:14 ` view/edit large files (was: map-file-lines) Richard M Stallman 0 siblings, 2 replies; 49+ messages in thread From: Ted Zlatanov @ 2009-02-06 18:42 UTC (permalink / raw) To: emacs-devel On Thu, 05 Feb 2009 00:40:40 -0500 Richard M Stallman <rms@gnu.org> wrote: RMS> The main point of my message is the UI proposal. You're right that RMS> it's good to use byte offsets. I had not thought about that, but it RMS> could be done by specifying the -b option for grep. RMS> I proposed a specific UI for specifying which part of the file to RMS> edit, one I think will be convenient. Could you please explain, with code or text, what using your UI would look like? I looked over your suggestions and I still think we have the same idea, just expressed differently. You do seem to want `grep' instead of dynamic offsets, but see my comments later. Here's the "window"[1] API I'm suggesting, as a detailed list of TODO items: 1) a buffer-local set of offset variables that indicate the beginning and the end of the current window into the file. 2) override all write-file functions to write the buffer at the starting offset. I don't think there's a write-file-contents analogous to insert-file-contents 3) override all insert-file* functions to respect the offsets as well 3) disable insertion, always in overwrite mode for better performance (maybe allow insert at end of file...). Force save when the "window" is moved. 4) "window" management functions: set/get-window-offset, set/get-window-length, etc. These operate on the (1) buffer-local variables. As you can see, it requires no grep calls to pre-scan the file, and should be consistent with the existing Emacs code. Pre-scanning a large file with grep can be very expensive, and it's inaccurate if the large file is growing (e.g. a log file). Thanks to anyone with suggestions... Ted [1] I still can't think of a better term than "window." large-file-window is too verbose. boffset? byte-offset? virtual-buffer? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-06 18:42 ` view/edit large files (was: map-file-lines) Ted Zlatanov @ 2009-02-06 21:06 ` Ted Zlatanov 2009-02-06 21:49 ` Miles Bader 2009-02-07 9:14 ` view/edit large files (was: map-file-lines) Richard M Stallman 1 sibling, 1 reply; 49+ messages in thread From: Ted Zlatanov @ 2009-02-06 21:06 UTC (permalink / raw) To: emacs-devel On Fri, 6 Feb 2009 14:20:45 +0100 Mathias Dahl <mathias.dahl@gmail.com> wrote: MD> This is how far I got: MD> http://www.emacswiki.org/emacs/vlf.el Thank you, I looked at it and it's almost exactly what I was thinking originally (but actually implemented :). I would like it, however, to be a minor mode rather than a major one so it's more generally useful. Also writing modifications back is an interesting challenge. MD> What I do know is that it hits the roof when the file is larger than MD> that integer limit in Emacs, whatever it is. Modifying insert-file-contents to take float or list arguments to specify the file position should not be too hard--I assume that's the place where it fails. Using floats bothers me a bit. I'd really like the offet to be a pair of integers, similar to the time storage in Emacs. I also got these comments from Chetan Pandya that I wanted to answer here: CP> Is this for editing binary files or file with single byte encoding? CP> If not, it gets more complicated. It must be single-byte or binary. insert-file-contents doesn't handle multibyte encodings and Emacs doesn't have a way to ensure a random seek is to a valid sequence. I believe this is all fixable, but I don't know enough about multibyte encodings to be helpful. CP> Is this to be the major mode for the file? In that case it may be CP> OK. Otherwise it wrecks the font lock information and functions that CP> work with sexp and such syntactic information. I think as a major mode it's not very useful. You can use `more' or `less' from the shell to view a large file in a pager. hexl-mode would be a good major mode for large files, for example. I don't think the font-lock information is very useful for large files over multiple lines. The most common case (viewing logs) just needs to examine a single line. Can you think of large files that have sexps and other multiline (over 1000 lines) font-lockable data, which Emacs should handle? I can't think of any common ones. In any case, at worst the user will fall back to fundamental-mode, and that's better than nothing. Ted ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-06 21:06 ` view/edit large files Ted Zlatanov @ 2009-02-06 21:49 ` Miles Bader [not found] ` <864oz3nyj8.fsf@lifelogs.com> 0 siblings, 1 reply; 49+ messages in thread From: Miles Bader @ 2009-02-06 21:49 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel Ted Zlatanov <tzz@lifelogs.com> writes: > MD> What I do know is that it hits the roof when the file is larger than > MD> that integer limit in Emacs, whatever it is. > > Using floats bothers me a bit. I'd really like the offet to be a pair > of integers, similar to the time storage in Emacs. Why? Floats are certainly a bit more convenient for the user... -Miles -- Generous, adj. Originally this word meant noble by birth and was rightly applied to a great multitude of persons. It now means noble by nature and is taking a bit of a rest. ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <864oz3nyj8.fsf@lifelogs.com>]
* Re: view/edit large files [not found] ` <864oz3nyj8.fsf@lifelogs.com> @ 2009-02-10 1:58 ` Stefan Monnier 2009-02-10 8:46 ` Eli Zaretskii 0 siblings, 1 reply; 49+ messages in thread From: Stefan Monnier @ 2009-02-10 1:58 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel MB> Why? Floats are certainly a bit more convenient for the user... > By the same logic, time storage could have been done with floats. Most likely time conses date back to a time were Emacs could be configured without floats. > The reason why it bothers me a bit is that it would be inconsistent > with time storage--now there's two ways of storing large integers. There are already many inconsistencies in this regard. FWIW, I believe that file-attributes can return floats for things like file-size. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-10 1:58 ` Stefan Monnier @ 2009-02-10 8:46 ` Eli Zaretskii 2009-02-10 9:23 ` Miles Bader 0 siblings, 1 reply; 49+ messages in thread From: Eli Zaretskii @ 2009-02-10 8:46 UTC (permalink / raw) To: Stefan Monnier; +Cc: tzz, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Mon, 09 Feb 2009 20:58:05 -0500 > Cc: emacs-devel@gnu.org > > MB> Why? Floats are certainly a bit more convenient for the user... > > By the same logic, time storage could have been done with floats. > > Most likely time conses date back to a time were Emacs could be > configured without floats. Yes, probably. > > The reason why it bothers me a bit is that it would be inconsistent > > with time storage--now there's two ways of storing large integers. > > There are already many inconsistencies in this regard. FWIW, I believe > that file-attributes can return floats for things like file-size. Yes, we do return a float for size. But for some attributes, like inode, floats are not a good idea, because inodes are habitually compared for exact equality. I'm not sure time values need that measure of accuracy, though. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-10 8:46 ` Eli Zaretskii @ 2009-02-10 9:23 ` Miles Bader 2009-02-10 9:54 ` Eli Zaretskii 2009-02-10 12:28 ` Eli Zaretskii 0 siblings, 2 replies; 49+ messages in thread From: Miles Bader @ 2009-02-10 9:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: tzz, Stefan Monnier, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > Yes, we do return a float for size. But for some attributes, like > inode, floats are not a good idea, because inodes are habitually > compared for exact equality. I'm not sure time values need that > measure of accuracy, though. "floats" can exactly represent integers if the integer quantity fits within the mantissa. For an IEEE double, that's 52 bits, which is enough for many uses (for an inode number, I'm not sure -- obviously it's enough for 32-bit inode numbers, but possibly not some 64-bit numbers ... OTOH, neither is a cons of integers). Requiring emacs platforms to support double-precision floats is probably pretty safe these days, but I suppose it's the sort of thing people could argue about... -Miles -- Bacchus, n. A convenient deity invented by the ancients as an excuse for getting drunk. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-10 9:23 ` Miles Bader @ 2009-02-10 9:54 ` Eli Zaretskii 2009-02-10 10:02 ` Miles Bader 2009-02-10 12:28 ` Eli Zaretskii 1 sibling, 1 reply; 49+ messages in thread From: Eli Zaretskii @ 2009-02-10 9:54 UTC (permalink / raw) To: Miles Bader; +Cc: tzz, monnier, emacs-devel > From: Miles Bader <miles@gnu.org> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, tzz@lifelogs.com, > emacs-devel@gnu.org > Date: Tue, 10 Feb 2009 18:23:58 +0900 > > "floats" can exactly represent integers if the integer quantity fits > within the mantissa. For an IEEE double, that's 52 bits, which is > enough for many uses Right, but is it enough in this case? I don't know, it all depends on what kind of time resolution is needed. Also, time values are frequently used in arithmetic operations that could lose a few low bits. > (for an inode number, I'm not sure -- obviously > it's enough for 32-bit inode numbers, but possibly not some 64-bit > numbers Windows NTFS uses 64-bit numbers for the ``file index'' we use as the replacement for inode. > ... OTOH, neither is a cons of integers). That's why we use a cons of 3 numbers. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-10 9:54 ` Eli Zaretskii @ 2009-02-10 10:02 ` Miles Bader 2009-02-10 11:50 ` Eli Zaretskii 0 siblings, 1 reply; 49+ messages in thread From: Miles Bader @ 2009-02-10 10:02 UTC (permalink / raw) To: Eli Zaretskii; +Cc: tzz, monnier, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> "floats" can exactly represent integers if the integer quantity fits >> within the mantissa. For an IEEE double, that's 52 bits, which is >> enough for many uses > > Right, but is it enough in this case? I don't know, it all depends on > what kind of time resolution is needed. Also, time values are > frequently used in arithmetic operations that could lose a few low > bits. If it's an integer, and it fits, it's exact -- there is no loss of precision. >> (for an inode number, I'm not sure -- obviously >> it's enough for 32-bit inode numbers, but possibly not some 64-bit >> numbers > > Windows NTFS uses 64-bit numbers for the ``file index'' we use as the > replacement for inode. For traditional style inode numbers, which are allocate sequentially from zero, it doesn't matter; however, for abstract 64-bit quantities for which no guarantees, it wouldn't work. -Miles -- Discriminate, v.i. To note the particulars in which one person or thing is, if possible, more objectionable than another. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-10 10:02 ` Miles Bader @ 2009-02-10 11:50 ` Eli Zaretskii 2009-02-10 15:08 ` Ted Zlatanov 0 siblings, 1 reply; 49+ messages in thread From: Eli Zaretskii @ 2009-02-10 11:50 UTC (permalink / raw) To: Miles Bader; +Cc: tzz, monnier, emacs-devel > From: Miles Bader <miles@gnu.org> > Cc: tzz@lifelogs.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Tue, 10 Feb 2009 19:02:55 +0900 > > Eli Zaretskii <eliz@gnu.org> writes: > >> "floats" can exactly represent integers if the integer quantity fits > >> within the mantissa. For an IEEE double, that's 52 bits, which is > >> enough for many uses > > > > Right, but is it enough in this case? I don't know, it all depends on > > what kind of time resolution is needed. Also, time values are > > frequently used in arithmetic operations that could lose a few low > > bits. > > If it's an integer, and it fits, it's exact -- there is no loss of precision. I was talking about arithmetic operations such as multiplication by small factors, such as 2, in case it wasn't clear. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-10 11:50 ` Eli Zaretskii @ 2009-02-10 15:08 ` Ted Zlatanov 2009-02-17 19:23 ` Stefan Monnier 0 siblings, 1 reply; 49+ messages in thread From: Ted Zlatanov @ 2009-02-10 15:08 UTC (permalink / raw) To: emacs-devel On Tue, 10 Feb 2009 13:50:46 +0200 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Miles Bader <miles@gnu.org> >> If it's an integer, and it fits, it's exact -- there is no loss of precision. EZ> I was talking about arithmetic operations such as multiplication by EZ> small factors, such as 2, in case it wasn't clear. While time values and file offsets can certainly be represented as floats under some constraints, I think it's an inelegant solution. This is the chance to have a clean design for support of large integers, since I or someone else will be modifying insert-file-contents anyhow. Why not add a int64 type? It doesn't have to be supported everywhere, and it can fail `integerp' as long as simple arithmetic works (in fact, only + - < > need to support it for the file offsets work). We can have int64p and int-any-size-p as well. The time functions can be modified to support either the old-style conses or an int64. The support for int64 can be gradually grown; when people need it they can implement it. Scratch the itch. I'm definitely not an expert on the Emacs internals, so this may be completely untenable and it's probably been debated to death, but I hope we can at least get started with a int64 implementation I can use for large file support. Thanks Ted ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-10 15:08 ` Ted Zlatanov @ 2009-02-17 19:23 ` Stefan Monnier 2009-02-17 19:47 ` Eli Zaretskii 0 siblings, 1 reply; 49+ messages in thread From: Stefan Monnier @ 2009-02-17 19:23 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > While time values and file offsets can certainly be represented as > floats under some constraints, I think it's an inelegant solution. > This is the chance to have a clean design for support of large integers, > since I or someone else will be modifying insert-file-contents anyhow. Using floats has the major advantage that it only requires changes in insert-file-contents (e.g. try the patch below). Large integers can be added as well, but it's a mostly orthogonal issue. Stefan === modified file 'src/fileio.c' --- src/fileio.c 2009-02-11 20:00:50 +0000 +++ src/fileio.c 2009-02-17 19:21:59 +0000 @@ -3161,6 +3161,7 @@ Lisp_Object old_Vdeactivate_mark = Vdeactivate_mark; int we_locked_file = 0; int deferred_remove_unwind_protect = 0; + off_t beg_offset, end_offset; if (current_buffer->base_buffer && ! NILP (visit)) error ("Cannot do file visiting in an indirect buffer"); @@ -3268,12 +3269,12 @@ } if (!NILP (beg)) - CHECK_NUMBER (beg); + CHECK_NUMBER_OR_FLOAT (beg); else XSETFASTINT (beg, 0); if (!NILP (end)) - CHECK_NUMBER (end); + CHECK_NUMBER_OR_FLOAT (end); else { if (! not_regular) @@ -3408,6 +3409,8 @@ set_coding_system = 1; } + beg_offset = FLOATP (beg) ? (off_t) XFLOAT_DATA (beg) : XINT (beg); + end_offset = FLOATP (end) ? (off_t) XFLOAT_DATA (end) : XINT (end); /* If requested, replace the accessible part of the buffer with the file contents. Avoid replacing text at the beginning or end of the buffer that matches the file contents; @@ -3438,9 +3441,9 @@ give up on handling REPLACE in the optimized way. */ int giveup_match_end = 0; - if (XINT (beg) != 0) + if (beg_offset != 0) { - if (lseek (fd, XINT (beg), 0) < 0) + if (lseek (fd, beg_offset, 0) < 0) report_file_error ("Setting file position", Fcons (orig_filename, Qnil)); } @@ -3487,7 +3490,7 @@ immediate_quit = 0; /* If the file matches the buffer completely, there's no need to replace anything. */ - if (same_at_start - BEGV_BYTE == XINT (end)) + if (same_at_start - BEGV_BYTE == end_offset - beg_offset) { emacs_close (fd); specpdl_ptr--; @@ -3505,7 +3508,7 @@ EMACS_INT total_read, nread, bufpos, curpos, trial; /* At what file position are we now scanning? */ - curpos = XINT (end) - (ZV_BYTE - same_at_end); + curpos = end_offset - (ZV_BYTE - same_at_end); /* If the entire file matches the buffer tail, stop the scan. */ if (curpos == 0) break; @@ -3583,8 +3586,8 @@ same_at_end += overlap; /* Arrange to read only the nonmatching middle part of the file. */ - XSETFASTINT (beg, XINT (beg) + (same_at_start - BEGV_BYTE)); - XSETFASTINT (end, XINT (end) - (ZV_BYTE - same_at_end)); + beg_offset += same_at_start - BEGV_BYTE; + end_offset -= ZV_BYTE - same_at_end; del_range_byte (same_at_start, same_at_end, 0); /* Insert from the file at the proper position. */ @@ -3628,7 +3631,7 @@ /* First read the whole file, performing code conversion into CONVERSION_BUFFER. */ - if (lseek (fd, XINT (beg), 0) < 0) + if (lseek (fd, beg_offset, 0) < 0) report_file_error ("Setting file position", Fcons (orig_filename, Qnil)); @@ -3803,7 +3806,7 @@ { register Lisp_Object temp; - total = XINT (end) - XINT (beg); + total = end_offset - beg_offset; /* Make sure point-max won't overflow after this insertion. */ XSETINT (temp, total); @@ -3830,9 +3833,9 @@ if (GAP_SIZE < total) make_gap (total - GAP_SIZE); - if (XINT (beg) != 0 || !NILP (replace)) + if (beg_offset != 0 || !NILP (replace)) { - if (lseek (fd, XINT (beg), 0) < 0) + if (lseek (fd, beg_offset, 0) < 0) report_file_error ("Setting file position", Fcons (orig_filename, Qnil)); } ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-17 19:23 ` Stefan Monnier @ 2009-02-17 19:47 ` Eli Zaretskii 2009-02-17 20:18 ` Miles Bader 2009-02-18 1:56 ` Stefan Monnier 0 siblings, 2 replies; 49+ messages in thread From: Eli Zaretskii @ 2009-02-17 19:47 UTC (permalink / raw) To: Stefan Monnier; +Cc: tzz, emacs-devel > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Date: Tue, 17 Feb 2009 14:23:32 -0500 > Cc: emacs-devel@gnu.org > > + off_t beg_offset, end_offset; Is off_t guaranteed to be 64-bit wide? If not, we lose the advantage of the floats, no? > + beg_offset = FLOATP (beg) ? (off_t) XFLOAT_DATA (beg) : XINT (beg); > + end_offset = FLOATP (end) ? (off_t) XFLOAT_DATA (end) : XINT (end); Shouldn't we round rather than truncate, when converting to off_t? > - if (XINT (beg) != 0) > + if (beg_offset != 0) Exact equalities might be dangerous with floats. > - if (same_at_start - BEGV_BYTE == XINT (end)) > + if (same_at_start - BEGV_BYTE == end_offset - beg_offset) Likewise. > - if (XINT (beg) != 0 || !NILP (replace)) > + if (beg_offset != 0 || !NILP (replace)) Likewise. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-17 19:47 ` Eli Zaretskii @ 2009-02-17 20:18 ` Miles Bader 2009-02-17 20:51 ` Eli Zaretskii 2009-02-18 1:56 ` Stefan Monnier 1 sibling, 1 reply; 49+ messages in thread From: Miles Bader @ 2009-02-17 20:18 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> + off_t beg_offset, end_offset; > > Is off_t guaranteed to be 64-bit wide? If not, we lose the advantage > of the floats, no? If the system isn't capable of handling large files at all, then there's no point in worrying about it, right? >> + beg_offset = FLOATP (beg) ? (off_t) XFLOAT_DATA (beg) : XINT (beg); >> + end_offset = FLOATP (end) ? (off_t) XFLOAT_DATA (end) : XINT (end); > > Shouldn't we round rather than truncate, when converting to off_t? No. The values being represented are integers. The user almost certainly will not be passing in a non-integral float; if he is doing something weird so that he may end up with non-integral offsets, then it's his job to worry about how such values are interpreted as integer offsets. Maybe it should guard against overflow in the conversion though (and signal an error?). >> - if (XINT (beg) != 0) >> + if (beg_offset != 0) > > Exact equalities might be dangerous with floats. > >> - if (same_at_start - BEGV_BYTE == XINT (end)) >> + if (same_at_start - BEGV_BYTE == end_offset - beg_offset) > Likewise. >> - if (XINT (beg) != 0 || !NILP (replace)) >> + if (beg_offset != 0 || !NILP (replace)) > Likewise. Comparing against zero here is fine -- a float can represent it exactly, and there's no non-integer calculation to lose accuracy. If there was overflow in the conversion to off_t, it probabably should have been caught during the conversion. -Miles -- Discriminate, v.i. To note the particulars in which one person or thing is, if possible, more objectionable than another. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-17 20:18 ` Miles Bader @ 2009-02-17 20:51 ` Eli Zaretskii 2009-02-17 21:19 ` Miles Bader 0 siblings, 1 reply; 49+ messages in thread From: Eli Zaretskii @ 2009-02-17 20:51 UTC (permalink / raw) To: Miles Bader; +Cc: emacs-devel > From: Miles Bader <miles@gnu.org> > Date: Wed, 18 Feb 2009 05:18:03 +0900 > > Eli Zaretskii <eliz@gnu.org> writes: > >> + off_t beg_offset, end_offset; > > > > Is off_t guaranteed to be 64-bit wide? If not, we lose the advantage > > of the floats, no? > > If the system isn't capable of handling large files at all, then there's > no point in worrying about it, right? Some systems can handle large files, but only if you use something like off64_t. > >> + beg_offset = FLOATP (beg) ? (off_t) XFLOAT_DATA (beg) : XINT (beg); > >> + end_offset = FLOATP (end) ? (off_t) XFLOAT_DATA (end) : XINT (end); > > > > Shouldn't we round rather than truncate, when converting to off_t? > > No. The values being represented are integers. The user almost > certainly will not be passing in a non-integral float I was thinking about 1234.99999 or some such, due to inaccuracies in converting textual representation into a float. > Maybe it should guard against overflow in the conversion though (and > signal an error?). Yes, probably. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-17 20:51 ` Eli Zaretskii @ 2009-02-17 21:19 ` Miles Bader 2009-02-17 21:21 ` Miles Bader 0 siblings, 1 reply; 49+ messages in thread From: Miles Bader @ 2009-02-17 21:19 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> > Is off_t guaranteed to be 64-bit wide? If not, we lose the advantage >> > of the floats, no? >> >> If the system isn't capable of handling large files at all, then there's >> no point in worrying about it, right? > > Some systems can handle large files, but only if you use something > like off64_t. Sure, but using variant interfaces for large-file support is a much bigger and more intrusive change. Oh, BTW, of course there's a range of offsets which are still within 32-bits, and are representable by floats but not by emacs integers. A separate question is whether emacs should try to use something like _FILE_OFFSET_BITS=64 by default or not (on linux/solaris/... that causes 64-bit variants of off_t, syscalls, etc, to be used even on 32-bit systems). >> No. The values being represented are integers. The user almost >> certainly will not be passing in a non-integral float > > I was thinking about 1234.99999 or some such, due to inaccuracies in > converting textual representation into a float. This should not happen with integer values (if it does, something is very wrong). -Miles -- Congratulation, n. The civility of envy. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-17 21:19 ` Miles Bader @ 2009-02-17 21:21 ` Miles Bader 2009-02-18 4:09 ` Eli Zaretskii 0 siblings, 1 reply; 49+ messages in thread From: Miles Bader @ 2009-02-17 21:21 UTC (permalink / raw) To: emacs-devel Miles Bader <miles@gnu.org> writes: > Oh, BTW, of course there's a range of offsets which are still within > 32-bits, and are representable by floats but not by emacs integers. When I say "float", btw, I of course mean "double"... :-/ -Miles -- 97% of everything is grunge ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-17 21:21 ` Miles Bader @ 2009-02-18 4:09 ` Eli Zaretskii 0 siblings, 0 replies; 49+ messages in thread From: Eli Zaretskii @ 2009-02-18 4:09 UTC (permalink / raw) To: Miles Bader; +Cc: emacs-devel > From: Miles Bader <miles@gnu.org> > Date: Wed, 18 Feb 2009 06:21:26 +0900 > > Miles Bader <miles@gnu.org> writes: > > Oh, BTW, of course there's a range of offsets which are still within > > 32-bits, and are representable by floats but not by emacs integers. > > When I say "float", btw, I of course mean "double"... :-/ So did I. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-17 19:47 ` Eli Zaretskii 2009-02-17 20:18 ` Miles Bader @ 2009-02-18 1:56 ` Stefan Monnier 2009-02-20 19:23 ` Ted Zlatanov 1 sibling, 1 reply; 49+ messages in thread From: Stefan Monnier @ 2009-02-18 1:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: tzz, emacs-devel > Is off_t guaranteed to be 64-bit wide? If not, we lose the advantage > of the floats, no? I wouldn't worry about it for now. This is just a quick patch, barely tested (I wrote it a while ago, but haven't actually used it). "off_t" is what is used by "lseek", so if it's not enough, we need further changes. In any case, this was a mistake: it was only intended to be sent to Ted. We're in pretest, so we shouldn't waste time on such things. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-18 1:56 ` Stefan Monnier @ 2009-02-20 19:23 ` Ted Zlatanov 0 siblings, 0 replies; 49+ messages in thread From: Ted Zlatanov @ 2009-02-20 19:23 UTC (permalink / raw) To: emacs-devel On Tue, 17 Feb 2009 20:56:59 -0500 Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> Is off_t guaranteed to be 64-bit wide? If not, we lose the advantage >> of the floats, no? SM> I wouldn't worry about it for now. This is just a quick patch, barely SM> tested (I wrote it a while ago, but haven't actually used it). "off_t" SM> is what is used by "lseek", so if it's not enough, we need SM> further changes. SM> In any case, this was a mistake: it was only intended to be sent SM> to Ted. We're in pretest, so we shouldn't waste time on such things. Thanks for the information. I will wait until the release to get this discussion going again (I also have the hashtable read support patch waiting for that). Ted ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-10 9:23 ` Miles Bader 2009-02-10 9:54 ` Eli Zaretskii @ 2009-02-10 12:28 ` Eli Zaretskii 2009-02-10 12:46 ` Miles Bader 1 sibling, 1 reply; 49+ messages in thread From: Eli Zaretskii @ 2009-02-10 12:28 UTC (permalink / raw) To: Miles Bader; +Cc: tzz, monnier, emacs-devel > From: Miles Bader <miles@gnu.org> > Date: Tue, 10 Feb 2009 18:23:58 +0900 > Cc: tzz@lifelogs.com, Stefan Monnier <monnier@iro.umontreal.ca>, > emacs-devel@gnu.org > > Eli Zaretskii <eliz@gnu.org> writes: > > Yes, we do return a float for size. But for some attributes, like > > inode, floats are not a good idea, because inodes are habitually > > compared for exact equality. I'm not sure time values need that > > measure of accuracy, though. > > "floats" can exactly represent integers if the integer quantity fits > within the mantissa. On second thought, I don't think I agree. For example an integer number as small and "simple" as 5 does not have an exact representation as an IEEE floating-point number, right? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-10 12:28 ` Eli Zaretskii @ 2009-02-10 12:46 ` Miles Bader 0 siblings, 0 replies; 49+ messages in thread From: Miles Bader @ 2009-02-10 12:46 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> "floats" can exactly represent integers if the integer quantity fits >> within the mantissa. > > On second thought, I don't think I agree. For example an integer > number as small and "simple" as 5 does not have an exact > representation as an IEEE floating-point number, right? Yes, it does. All integers which fit into the mantissa (plus some others) are exactly representable. -Miles -- `There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.' ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files (was: map-file-lines) 2009-02-06 18:42 ` view/edit large files (was: map-file-lines) Ted Zlatanov 2009-02-06 21:06 ` view/edit large files Ted Zlatanov @ 2009-02-07 9:14 ` Richard M Stallman 2009-02-09 20:26 ` view/edit large files Ted Zlatanov 1 sibling, 1 reply; 49+ messages in thread From: Richard M Stallman @ 2009-02-07 9:14 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel Could you please explain, with code or text, what using your UI would look like? Describe it in text is what I thought I did. I don't have time to implement it, though. Your API seems to be aimed at a lower level, so the two could work together. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-07 9:14 ` view/edit large files (was: map-file-lines) Richard M Stallman @ 2009-02-09 20:26 ` Ted Zlatanov 2009-02-10 20:02 ` Richard M Stallman 0 siblings, 1 reply; 49+ messages in thread From: Ted Zlatanov @ 2009-02-09 20:26 UTC (permalink / raw) To: emacs-devel On Sat, 07 Feb 2009 04:14:27 -0500 Richard M Stallman <rms@gnu.org> wrote: RMS> Describe it in text is what I thought I did. RMS> I don't have time to implement it, though. I quote your suggestion here: > Here's an idea for a UI for editing big files. First you run M-x grep on > the file, and display the matches for whatever regexp. In the *grep* > buffer you specify a region, which is a way of choosing two matches, > the ones whose entries contain point and mark. Then you give a command to edit > the file from one of the matches to the other. It marks these matches > (and the lines containing them) as read-only so that you can't > spoil the correspondance with the file. Thus, you can always save this > partial-file buffer. > The beginning and end of the *grep* buffer can be used to specify > that the portion to edit starts or ends at bof or eof. I didn't understand your suggestion fully initially, sorry. I think you're suggesting that the user should pick a "window" between two places in the file. Then the user can only edit the file between those two places. That `grep' is producing the list of places is not as important as the idea of having those places. RMS> Your API seems to be aimed at a lower level, RMS> so the two could work together. Yes, I believe so. I'll try to implement the items I listed and perhaps someone can use them productively, to implement the UI you suggested or something else. Thanks for your help and time. Ted ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: view/edit large files 2009-02-09 20:26 ` view/edit large files Ted Zlatanov @ 2009-02-10 20:02 ` Richard M Stallman 0 siblings, 0 replies; 49+ messages in thread From: Richard M Stallman @ 2009-02-10 20:02 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel I didn't understand your suggestion fully initially, sorry. I think you're suggesting that the user should pick a "window" between two places in the file. Then the user can only edit the file between those two places. That `grep' is producing the list of places is not as important as the idea of having those places. Yes, that's it. A further part of the idea is that the matching lines that mark the beginning and the end of the segment would be read-only, to prevent confusion in what it means to save the file back. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-03 14:50 ` map-file-lines Stefan Monnier 2009-02-04 7:04 ` map-file-lines Richard M Stallman @ 2009-02-06 13:20 ` Mathias Dahl 1 sibling, 0 replies; 49+ messages in thread From: Mathias Dahl @ 2009-02-06 13:20 UTC (permalink / raw) To: Stefan Monnier; +Cc: Ted Zlatanov, joakim, emacs-devel > > That being said, how would the insert-file-contents solution work in > > practice? Has something been done along these lines already? > > Can't remember if someone tried it at all, but I don't think anyone has > gotten very far, no. This is how far I got: http://www.emacswiki.org/emacs/vlf.el However, I have seldom had a real need so I cannot really say if it is useful or not. What I do know is that it hits the roof when the file is larger than that integer limit in Emacs, whatever it is. And Richard's UI suggestion could be used together with this I guess. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 19:22 ` map-file-lines Ted Zlatanov 2009-02-02 19:52 ` map-file-lines joakim @ 2009-02-02 22:40 ` Stefan Monnier 2009-02-03 4:11 ` map-file-lines Stefan Monnier 1 sibling, 1 reply; 49+ messages in thread From: Stefan Monnier @ 2009-02-02 22:40 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel >>> Emacs Lisp lacks a good way to iterate over all the lines of a file, >>> especially for a large file. SM> I'm not really happy about focusing on "line at a time". It's a useful SM> and common case, but Emacs usually is pretty good about being "line SM> agnostic" (font-lock being an obvious counter example). SM> Providing some kind of stream-processing functionality might be good, SM> tho the need doesn't seem terribly high, since we've managed to avoid it SM> until now. > Without this function, Emacs simply can't handle large files and that's > been requested at least 4 times by users that I can recall. I think a > general solution to the large file Emacs problem would be better, but > line-oriented processing is a classic approach to processing large files > that many Emacs users will probably find familiar. I know about the large-file problem, obviously, but I wonder what kind of UI you expect to provide, in order for it to be able to work just one line at a time. SM> FWIW, another option is to provide an open-file-stream along the same SM> lines as open-network-stream. I.e. the chunks are received via SM> a process filter. > How is that better than insert-file-contents as I use it? It processes just one chunk at a time, with no need to keep the whole file in memory. A naive implementation could look like (start-process "open-file-stream" nil "cat" file). > Are you thinking of a stateful back/forward seek capability? Or do > you mean you'd like it to be asynchronous? Just that the file is received/processed one chunk at a time, so you never need to hold the whole file in memory at any one time. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 22:40 ` map-file-lines Stefan Monnier @ 2009-02-03 4:11 ` Stefan Monnier 0 siblings, 0 replies; 49+ messages in thread From: Stefan Monnier @ 2009-02-03 4:11 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel >> How is that better than insert-file-contents as I use it? > It processes just one chunk at a time, with no need to keep the whole > file in memory. A naive implementation could look like > (start-process "open-file-stream" nil "cat" file). Sorry, I misread. No it's not necessarily better. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 17:20 map-file-lines Ted Zlatanov 2009-02-02 18:54 ` map-file-lines Stefan Monnier @ 2009-02-02 20:48 ` Ted Zlatanov 2009-02-03 8:08 ` map-file-lines Thien-Thi Nguyen 2009-02-03 10:45 ` map-file-lines Thierry Volpiatto 1 sibling, 2 replies; 49+ messages in thread From: Ted Zlatanov @ 2009-02-02 20:48 UTC (permalink / raw) To: emacs-devel On Mon, 02 Feb 2009 11:20:07 -0600 Ted Zlatanov <tzz@lifelogs.com> wrote: TZ> Emacs Lisp lacks a good way to iterate over all the lines of a file, TZ> especially for a large file. The following code tries to provide a TZ> solution, concentrating on reading a block of data in one shot and then TZ> processing it line by line. It may be more efficient to write this in TZ> C. Also, it does not deal with cases where the first line read is TZ> bigger than the buffer size, and may have other bugs, but it works for TZ> me so I thought I'd post it for comments and criticism. Updated: - line count 0-based now, logic is cleaner - buffer size 128K by default - accept start line and count - abort when the lambda func returns nil - renamed endline to line-end for clarity Thanks Ted (defun map-file-lines (file func &optional startline count bufsize) (let ((filepos 0) (linenum 0) (bufsize (or bufsize (* 128 1024)))) (with-temp-buffer (while (let* ((inserted (insert-file-contents file nil filepos (+ filepos bufsize) t)) (numlines (count-lines (point-min) (point-max))) (read (nth 1 inserted)) (done (< 1 read)) result line-end) (dotimes (n (count-lines (point-min) (point-max))) (goto-char (point-min)) (setq line-end (line-end-position) result (if (and startline (< linenum startline)) () (if (and count (>= (- linenum startline) count)) (return) (funcall func (buffer-substring (line-beginning-position) line-end) linenum))) done (and done result)) (incf filepos line-end) (forward-line) (incf linenum)) done))) linenum)) ;;(map-file-lines "/tmp/test" (lambda (line num) (message "%d: %s" num line))) ;;(map-file-lines "/tmp/test" (lambda (line num) (message "%d: %s" num line)) 100) ;;(map-file-lines "/tmp/test" (lambda (line num) (message "%d: %s" num line)) 100 10) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 20:48 ` map-file-lines Ted Zlatanov @ 2009-02-03 8:08 ` Thien-Thi Nguyen 2009-02-03 14:00 ` map-file-lines Ted Zlatanov 2009-02-03 10:45 ` map-file-lines Thierry Volpiatto 1 sibling, 1 reply; 49+ messages in thread From: Thien-Thi Nguyen @ 2009-02-03 8:08 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel () Ted Zlatanov <tzz@lifelogs.com> () Mon, 02 Feb 2009 14:48:25 -0600 (numlines (count-lines (point-min) (point-max))) ... (dotimes (n (count-lines (point-min) (point-max))) How about (while (not (zerop (dec numlines))) ...)? thi ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-03 8:08 ` map-file-lines Thien-Thi Nguyen @ 2009-02-03 14:00 ` Ted Zlatanov 2009-02-03 14:17 ` map-file-lines Miles Bader 0 siblings, 1 reply; 49+ messages in thread From: Ted Zlatanov @ 2009-02-03 14:00 UTC (permalink / raw) To: emacs-devel On Tue, 03 Feb 2009 09:08:33 +0100 Thien-Thi Nguyen <ttn@gnuvola.org> wrote: TN> () Ted Zlatanov <tzz@lifelogs.com> TN> () Mon, 02 Feb 2009 14:48:25 -0600 TN> (numlines (count-lines (point-min) (point-max))) TN> ... TN> (dotimes (n (count-lines (point-min) (point-max))) TN> How about (while (not (zerop (dec numlines))) ...)? That's better and no CL dependency, thanks! Ted ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-03 14:00 ` map-file-lines Ted Zlatanov @ 2009-02-03 14:17 ` Miles Bader 0 siblings, 0 replies; 49+ messages in thread From: Miles Bader @ 2009-02-03 14:17 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel Ted Zlatanov <tzz@lifelogs.com> writes: > TN> How about (while (not (zerop (dec numlines))) ...)? > > That's better and no CL dependency, thanks! I presume `decf' was meant, which is in cl, albeit a macro. For non-cl: (setq numlines (1- numlines)) -Miles -- Cat, n. A soft, indestructible automaton provided by nature to be kicked when things go wrong in the domestic circle. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-02 20:48 ` map-file-lines Ted Zlatanov 2009-02-03 8:08 ` map-file-lines Thien-Thi Nguyen @ 2009-02-03 10:45 ` Thierry Volpiatto 2009-02-03 14:06 ` map-file-lines Ted Zlatanov 1 sibling, 1 reply; 49+ messages in thread From: Thierry Volpiatto @ 2009-02-03 10:45 UTC (permalink / raw) To: emacs-devel Hi Ted! Ted Zlatanov <tzz@lifelogs.com> writes: > On Mon, 02 Feb 2009 11:20:07 -0600 Ted Zlatanov <tzz@lifelogs.com> wrote: > > TZ> Emacs Lisp lacks a good way to iterate over all the lines of a file, > TZ> especially for a large file. The following code tries to provide a > TZ> solution, concentrating on reading a block of data in one shot and then > TZ> processing it line by line. It may be more efficient to write this in > TZ> C. Also, it does not deal with cases where the first line read is > TZ> bigger than the buffer size, and may have other bugs, but it works for > TZ> me so I thought I'd post it for comments and criticism. > > Updated: > > - line count 0-based now, logic is cleaner > - buffer size 128K by default > - accept start line and count > - abort when the lambda func returns nil > - renamed endline to line-end for clarity > > Thanks > Ted Can you try `tve-flines-iterator' that work a little like python iterators work. I plan to use it in futures versions of traverselisp if it is faster than actual code (traverselisp don't use this code actually). ,----[ C-h f tve-flines-iterator RET ] | tve-flines-iterator is a Lisp function in `traverselisp.el'. | | (tve-flines-iterator file &optional nlines startpos bufsize) | | Return an iterator on `nlines' lines of file. | `startpos' and `bufsize' are the byte options to give to | `insert-file-contents'. | | [back] | | ===*===*===*===*===*===*===*===*===*===*=== | Example: | ,---- | | ;; create an elisp-iterator object that | | ;; record the first 1024 bytes of my .emacs | | (setq A (tve-flines-iterator "~/.emacs.el" nil 0 1024)) | | | | ;; eval as many times as needed or launch it in a loop | | (tve-next A) | `---- `---- You can get it with hg here: hg clone http://freehg.org/u/thiedlecques/traverselisp/ > (defun map-file-lines (file func &optional startline count bufsize) > (let ((filepos 0) > (linenum 0) > (bufsize (or bufsize (* 128 1024)))) > (with-temp-buffer > (while > (let* > ((inserted (insert-file-contents > file nil > filepos (+ filepos bufsize) > t)) > (numlines (count-lines (point-min) (point-max))) > (read (nth 1 inserted)) > (done (< 1 read)) > result line-end) > (dotimes (n (count-lines (point-min) (point-max))) > (goto-char (point-min)) > (setq line-end (line-end-position) > result (if (and startline (< linenum startline)) > () > (if (and > count > (>= (- linenum startline) count)) > (return) > (funcall func > (buffer-substring > (line-beginning-position) > line-end) > linenum))) > done (and done result)) > (incf filepos line-end) > (forward-line) > (incf linenum)) > done))) > linenum)) > > ;;(map-file-lines "/tmp/test" (lambda (line num) (message "%d: %s" num line))) > ;;(map-file-lines "/tmp/test" (lambda (line num) (message "%d: %s" num line)) 100) > ;;(map-file-lines "/tmp/test" (lambda (line num) (message "%d: %s" num line)) 100 10) > -- A + Thierry Volpiatto Location: Saint-Cyr-Sur-Mer - France ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-03 10:45 ` map-file-lines Thierry Volpiatto @ 2009-02-03 14:06 ` Ted Zlatanov 2009-02-03 14:56 ` map-file-lines Thierry Volpiatto 0 siblings, 1 reply; 49+ messages in thread From: Ted Zlatanov @ 2009-02-03 14:06 UTC (permalink / raw) To: emacs-devel On Tue, 03 Feb 2009 11:45:50 +0100 Thierry Volpiatto <thierry.volpiatto@gmail.com> wrote: TV> Can you try `tve-flines-iterator' that work a little like python TV> iterators work. TV> I plan to use it in futures versions of traverselisp if it is faster than TV> actual code (traverselisp don't use this code actually). I like it, and you do several things I should have (e.g. after forward-line check if we've moved at all). I'm not sure about performance either... Thanks Ted ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: map-file-lines 2009-02-03 14:06 ` map-file-lines Ted Zlatanov @ 2009-02-03 14:56 ` Thierry Volpiatto 0 siblings, 0 replies; 49+ messages in thread From: Thierry Volpiatto @ 2009-02-03 14:56 UTC (permalink / raw) To: emacs-devel Ted Zlatanov <tzz@lifelogs.com> writes: > On Tue, 03 Feb 2009 11:45:50 +0100 Thierry Volpiatto <thierry.volpiatto@gmail.com> wrote: > > TV> Can you try `tve-flines-iterator' that work a little like python > TV> iterators work. > TV> I plan to use it in futures versions of traverselisp if it is faster than > TV> actual code (traverselisp don't use this code actually). > > I like it, and you do several things I should have (e.g. after > forward-line check if we've moved at all). I'm not sure about > performance either... > > Thanks > Ted Thanks to try it, i also don't know yet about performance... Need to be tested. :) -- A + Thierry Volpiatto Location: Saint-Cyr-Sur-Mer - France ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2009-02-20 19:23 UTC | newest] Thread overview: 49+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-02 17:20 map-file-lines Ted Zlatanov 2009-02-02 18:54 ` map-file-lines Stefan Monnier 2009-02-02 19:22 ` map-file-lines Ted Zlatanov 2009-02-02 19:52 ` map-file-lines joakim 2009-02-02 20:54 ` map-file-lines Ted Zlatanov 2009-02-02 22:42 ` map-file-lines Stefan Monnier 2009-02-03 13:57 ` map-file-lines Ted Zlatanov 2009-02-02 22:41 ` map-file-lines Stefan Monnier 2009-02-02 23:59 ` map-file-lines joakim 2009-02-03 4:13 ` map-file-lines Stefan Monnier 2009-02-03 7:27 ` map-file-lines joakim 2009-02-03 14:50 ` map-file-lines Stefan Monnier 2009-02-04 7:04 ` map-file-lines Richard M Stallman 2009-02-04 15:38 ` map-file-lines Ted Zlatanov 2009-02-05 5:40 ` map-file-lines Richard M Stallman 2009-02-06 18:42 ` view/edit large files (was: map-file-lines) Ted Zlatanov 2009-02-06 21:06 ` view/edit large files Ted Zlatanov 2009-02-06 21:49 ` Miles Bader [not found] ` <864oz3nyj8.fsf@lifelogs.com> 2009-02-10 1:58 ` Stefan Monnier 2009-02-10 8:46 ` Eli Zaretskii 2009-02-10 9:23 ` Miles Bader 2009-02-10 9:54 ` Eli Zaretskii 2009-02-10 10:02 ` Miles Bader 2009-02-10 11:50 ` Eli Zaretskii 2009-02-10 15:08 ` Ted Zlatanov 2009-02-17 19:23 ` Stefan Monnier 2009-02-17 19:47 ` Eli Zaretskii 2009-02-17 20:18 ` Miles Bader 2009-02-17 20:51 ` Eli Zaretskii 2009-02-17 21:19 ` Miles Bader 2009-02-17 21:21 ` Miles Bader 2009-02-18 4:09 ` Eli Zaretskii 2009-02-18 1:56 ` Stefan Monnier 2009-02-20 19:23 ` Ted Zlatanov 2009-02-10 12:28 ` Eli Zaretskii 2009-02-10 12:46 ` Miles Bader 2009-02-07 9:14 ` view/edit large files (was: map-file-lines) Richard M Stallman 2009-02-09 20:26 ` view/edit large files Ted Zlatanov 2009-02-10 20:02 ` Richard M Stallman 2009-02-06 13:20 ` map-file-lines Mathias Dahl 2009-02-02 22:40 ` map-file-lines Stefan Monnier 2009-02-03 4:11 ` map-file-lines Stefan Monnier 2009-02-02 20:48 ` map-file-lines Ted Zlatanov 2009-02-03 8:08 ` map-file-lines Thien-Thi Nguyen 2009-02-03 14:00 ` map-file-lines Ted Zlatanov 2009-02-03 14:17 ` map-file-lines Miles Bader 2009-02-03 10:45 ` map-file-lines Thierry Volpiatto 2009-02-03 14:06 ` map-file-lines Ted Zlatanov 2009-02-03 14:56 ` map-file-lines Thierry Volpiatto
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).