Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: "Clément Pit--Claudel" <clement.pit@gmail.com>
To: Michael Heerdegen <michael_heerdegen@web.de>
Cc: emacs-devel@gnu.org
Subject: Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams
Date: Wed, 14 Sep 2016 19:26:03 -0400	[thread overview]
Message-ID: <dc917534-7d46-1998-af9b-25a172a8f4c7@gmail.com> (raw)
In-Reply-To: <87vaxywmh9.fsf@web.de>

[-- Attachment #1.1: Type: text/plain, Size: 4196 bytes --]

On 2016-09-14 11:05, Michael Heerdegen wrote:
> Clément Pit--Claudel <clement.pit@gmail.com> writes:
>> … If the file is represented as a stream of lines, then tail makes
>> sense.  Doesn't it?  Converting the file to a list of lines
>> beforehand sounds like a bad idea, memory wise.
> 
> Indeed, in this case, it would be nonsense to convert into a list.

Ok, so we agree on this part :) Note that it may be nonsense to load the file into a buffer, too; holding the full file in a list is more or less as bad as holding the full file in a buffer.  If the file is large, that's very inefficient.

>> I don't see why; isn't it common to implement slyding-window-style 
>> algorithms on data streams? 'tail' is just one such example.
> 
> Do you have an example where this would be useful for streams in
> Emacs, different from the "last n lines" one (see my comment about
> that below).

Sure: if I have a stream of numbers, I may want to calculate the average of the last n numbers.  Or if I have a stream of non-overlapping buffer spans, I may want to remove just the last one.  To continue on the `last n lines' example, I may be interested in only the last n lines of output of a running process (think dmesg | tail).

>> Let me know what you think of the 'last n lines of a file'
>> example.
> 
> I think this is actually a very good example why it is good to
> forbid negative indexes.  If you are interested in the last n lines
> of a file, why would you dissect the complete file (or buffer) into
> lines and throw away nearly all of the result?

Because it's much more memory-efficient, as long as the file's lines are short :)  Note that I was careful to say file, not buffer: I don't need to load a full file in memory before I start processing its lines.  Same for the output of a running process: if I just want the last n lines, then accumulating all of the output before going to the end and looking backwards is extremely inefficient, memory-wise.  Dissecting the output (splitting it on newlines) and using a ring buffer to keep only the last `n` ones is much better.

> In Emacs, if you dissect a buffer into its lines, this can take
> seconds if the buffer is a bit larger (no matter if you collect the
> lines in a data structure or just throw the result away).

I agree; this approach would not be optimal for a buffer.  But I'm not talking about buffers :)

> If you are interested in the last n lines, this is potentially
> unlimited inefficient.

Loading the contents of a file into a buffer costs time linear in the size of the file and memory linear in that size too.  If I have a large file on disk, loading all of it into a buffer just to get the last few lines would be horribly inefficient.  On the other hand, the stream-based approach using subseq with negative indices would take time linear in the size of the buffer, but memory linear in just the number of lines that I want to keep.

> Instead, go to the end of buffer, go back N lines, and start building
> a stream from there.

I don't know of any implementation of the `tail` utility that works this way (loading the entire file, then looking backwards for newlines).  The approach you outline is the obvious one if the file is already loaded in a buffer, but that's not my use case (sorry for not making that clear!).

> So, in this case, in my opinion forbidding negative indexes would
> have saved you from the error of writing bad code.  Negative indexes
> would be an invitation to write bad code.

Could it be that I didn't explain the problem well, and you concluded it was bad code based on that?  The stream-based tail implementation has significant advantages in terms of memory usage.  Forbidding negative indexes would have forced me to either write memory-inefficient code, or more likely to just reimplement the required subseq functionality myself using a circular (ring) buffer.

> Maybe it would be better to provide that semantics in a separate 
> function, so that the programmer is forced to think about what he is 
> doing.  WDYT?

I don't know :)  By default, I'd assume programmers already think about what they are doing.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

next prev parent reply	other threads:[~2016-09-14 23:26 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-13 16:23 [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams Michael Heerdegen
2016-09-13 18:02 ` Clément Pit--Claudel
2016-09-13 21:17   ` Michael Heerdegen
2016-09-14  1:24     ` Clément Pit--Claudel
2016-09-14 15:05       ` Michael Heerdegen
2016-09-14 23:26         ` Clément Pit--Claudel [this message]
2016-09-15  0:51           ` John Mastro
2016-09-15  2:00             ` Clément Pit--Claudel
2016-09-15 17:01               ` John Mastro
2016-09-15 21:07               ` Michael Heerdegen
2016-09-15 22:18                 ` Clément Pit--Claudel
2016-09-15 22:28                   ` Michael Heerdegen
2016-09-15 22:52                     ` Clément Pit--Claudel
2016-09-15  0:58           ` Michael Heerdegen
2016-09-15  3:47             ` Clément Pit--Claudel
2016-09-15  8:42               ` Nicolas Petton
2016-09-15 22:30                 ` Michael Heerdegen
2016-09-15 23:08                   ` Nicolas Petton
2016-09-15 21:29               ` Michael Heerdegen
2016-09-14  1:28     ` John Wiegley
2016-09-14 15:15       ` Michael Heerdegen
2016-09-13 22:20 ` Nicolas Petton
2016-09-13 22:40   ` Michael Heerdegen
2016-09-14  8:25     ` Nicolas Petton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc917534-7d46-1998-af9b-25a172a8f4c7@gmail.com \
    --to=clement.pit@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=michael_heerdegen@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.