From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: =?UTF-8?Q?Cl=c3=a9ment_Pit--Claudel?= Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams Date: Wed, 14 Sep 2016 19:26:03 -0400 Message-ID: References: <87bmzrahvg.fsf@web.de> <79c6ccd6-9808-f4fd-071a-58559f72ecdc@gmail.com> <8737l3a4ab.fsf@web.de> <27962aa1-ae40-99f8-64ad-ae21012fb36e@gmail.com> <87vaxywmh9.fsf@web.de> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="BCPM4v3T3HvfJ79PCRIo8txbu9m2sB0Ad" X-Trace: blaine.gmane.org 1473898205 5809 195.159.176.226 (15 Sep 2016 00:10:05 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 15 Sep 2016 00:10:05 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 Cc: emacs-devel@gnu.org To: Michael Heerdegen Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Sep 15 02:10:01 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bkKFZ-0000Sa-90 for ged-emacs-devel@m.gmane.org; Thu, 15 Sep 2016 02:09:57 +0200 Original-Received: from localhost ([::1]:59210 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkKFX-0006w0-4O for ged-emacs-devel@m.gmane.org; Wed, 14 Sep 2016 20:09:55 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41004) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkKEj-0006tw-Ve for emacs-devel@gnu.org; Wed, 14 Sep 2016 20:09:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bkKEb-0007y8-NC for emacs-devel@gnu.org; Wed, 14 Sep 2016 20:09:05 -0400 Original-Received: from mout.kundenserver.de ([212.227.126.130]:54328) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkKEb-0007xe-CD for emacs-devel@gnu.org; Wed, 14 Sep 2016 20:08:57 -0400 Original-Received: from [18.189.118.169] ([18.189.118.169]) by mrelayeu.kundenserver.de (mreue002) with ESMTPSA (Nemesis) id 0LkXdk-1bC4gA0ZWu-00aZPH; Thu, 15 Sep 2016 01:26:10 +0200 In-Reply-To: <87vaxywmh9.fsf@web.de> X-Provags-ID: V03:K0:3u+Kk7lhJiZBlYsGOQhVPcdWH4wEdRE8cZwyyF8gXG7G6UMQxxf XOpeSHFItr1QU4AcDrzeUc69G1EguoeYuy5ZygvxRmPmnh/xtJbHnv1NjadwLhu3fZ2qnFi Zz+0WqLihXy3VVZ2g+Y58aekL/HJx1X9ajliZ7YrN12/kVTun0I723apjeoXWVUWDmJvcIV zz1BQf2p6VZc6pwA+4PgQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:npBO/gE1PnQ=:uV2zbhnHZm6wHfEkxdI8Nv AX61ucTApXnVANnoDCVnN1iT2kXUAifgIxfAGBTUxcfXxHHE0w+Hr4/HziweKQZzHTX2XPJbJ ZvcIPmvhBnIGaJK8pK9GacMq8l8BnwKyhj8o/kfDyRDuPogOxNslM2jQH+AuT5/aTyrBm038R BlGE15rnQaGNxVQBgbaqEGhBHRmq8PdB99xbqU6HAxF8YSFg/v8Gi6gp2ZXQKIK0nkXoL/P4s wD1EodPm8wRvKIdhwwy2sE5e0q750sliQItZUV2Dlp8rP1NaerX+2wInUlSRC+AhM5+qgplX1 BgVD2fiEiwbn7nL6zevtcDHa0GvJeMudyfcVVOKci3rx/C5h0Hrtyx0Vmnn7dnL8j4verVPkL TdlcfYqRfM/4D+66ID+m/2UtHqOytfF5NZRsSyLH6x0NpNsdFjv9wejI85vHEWN1M9Hq174E6 u2SFVzKSEmJJKNnm4RuCa+UieQA5tIodF+HPKiXO/fkUN6iRgadq338oEY7x51k5P1C94GIHk PufCnqclspq15o0FlOu1AKKqrJH3ovw3pjk2Iy1WjEslwoEdAwUI/iLAabyQE+Royn9bU5t1x LBhvYdr0Ipge7RJPKC7xpQEVo1DaY9EXpnMsj+glzsq8ZFJxrJGKdMkqu3FSTjP84GeY4OD0T Jh8IwMJRIlMq5MSskDsEVf9yh5JBw3zWjFw6zPdPJSNdrHLHK47yCNf7UHdranhC0kVg= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 212.227.126.130 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:207435 Archived-At: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --BCPM4v3T3HvfJ79PCRIo8txbu9m2sB0Ad Content-Type: multipart/mixed; boundary="i7esLbDf0jKuTSMvH1KjdtoqNqjniRgnE"; protected-headers="v1" From: =?UTF-8?Q?Cl=c3=a9ment_Pit--Claudel?= To: Michael Heerdegen Cc: emacs-devel@gnu.org Message-ID: Subject: Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams References: <87bmzrahvg.fsf@web.de> <79c6ccd6-9808-f4fd-071a-58559f72ecdc@gmail.com> <8737l3a4ab.fsf@web.de> <27962aa1-ae40-99f8-64ad-ae21012fb36e@gmail.com> <87vaxywmh9.fsf@web.de> In-Reply-To: <87vaxywmh9.fsf@web.de> --i7esLbDf0jKuTSMvH1KjdtoqNqjniRgnE Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 2016-09-14 11:05, Michael Heerdegen wrote: > Cl=C3=A9ment Pit--Claudel writes: >> =E2=80=A6 If the file is represented as a stream of lines, then tail m= akes >> sense. Doesn't it? Converting the file to a list of lines >> beforehand sounds like a bad idea, memory wise. >=20 > Indeed, in this case, it would be nonsense to convert into a list. Ok, so we agree on this part :) Note that it may be nonsense to load the = file into a buffer, too; holding the full file in a list is more or less = as bad as holding the full file in a buffer. If the file is large, that'= s very inefficient. >> I don't see why; isn't it common to implement slyding-window-style=20 >> algorithms on data streams? 'tail' is just one such example. >=20 > Do you have an example where this would be useful for streams in > Emacs, different from the "last n lines" one (see my comment about > that below). Sure: if I have a stream of numbers, I may want to calculate the average = of the last n numbers. Or if I have a stream of non-overlapping buffer s= pans, I may want to remove just the last one. To continue on the `last n= lines' example, I may be interested in only the last n lines of output o= f a running process (think dmesg | tail). >> Let me know what you think of the 'last n lines of a file' >> example. >=20 > I think this is actually a very good example why it is good to > forbid negative indexes. If you are interested in the last n lines > of a file, why would you dissect the complete file (or buffer) into > lines and throw away nearly all of the result? Because it's much more memory-efficient, as long as the file's lines are = short :) Note that I was careful to say file, not buffer: I don't need t= o load a full file in memory before I start processing its lines. Same f= or the output of a running process: if I just want the last n lines, then= accumulating all of the output before going to the end and looking backw= ards is extremely inefficient, memory-wise. Dissecting the output (split= ting it on newlines) and using a ring buffer to keep only the last `n` on= es is much better. > In Emacs, if you dissect a buffer into its lines, this can take > seconds if the buffer is a bit larger (no matter if you collect the > lines in a data structure or just throw the result away). I agree; this approach would not be optimal for a buffer. But I'm not ta= lking about buffers :) > If you are interested in the last n lines, this is potentially > unlimited inefficient. Loading the contents of a file into a buffer costs time linear in the siz= e of the file and memory linear in that size too. If I have a large file= on disk, loading all of it into a buffer just to get the last few lines = would be horribly inefficient. On the other hand, the stream-based appro= ach using subseq with negative indices would take time linear in the size= of the buffer, but memory linear in just the number of lines that I want= to keep. > Instead, go to the end of buffer, go back N lines, and start building > a stream from there. I don't know of any implementation of the `tail` utility that works this = way (loading the entire file, then looking backwards for newlines). The = approach you outline is the obvious one if the file is already loaded in = a buffer, but that's not my use case (sorry for not making that clear!). > So, in this case, in my opinion forbidding negative indexes would > have saved you from the error of writing bad code. Negative indexes > would be an invitation to write bad code. Could it be that I didn't explain the problem well, and you concluded it = was bad code based on that? The stream-based tail implementation has sig= nificant advantages in terms of memory usage. Forbidding negative indexe= s would have forced me to either write memory-inefficient code, or more l= ikely to just reimplement the required subseq functionality myself using = a circular (ring) buffer. > Maybe it would be better to provide that semantics in a separate=20 > function, so that the programmer is forced to think about what he is=20 > doing. WDYT? I don't know :) By default, I'd assume programmers already think about w= hat they are doing. --i7esLbDf0jKuTSMvH1KjdtoqNqjniRgnE-- --BCPM4v3T3HvfJ79PCRIo8txbu9m2sB0Ad Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX2dyLAAoJEPqg+cTm90wj4XMQAJNcetNSge4SdiZ+C2Es1qg5 O0f2nnIIlFzsJgOFdosBoLKwjHX3OXX+mi/cLw/7k0/n9KdWQIGqvzY52vpkZkfI yHs5t03Dmc1atq/yfoG7NQHbDnB0tnJ0h/qAznIJJuko5ARrryPHYHRku2QNaGOi PIU2AX66YWiMLDYi+CZ3W7RXVngIJllT1SjqboToHVUdl4Cqm75yHbZs7nPdlytG XlI9V8rTYNf8dsQOVqqerEhiiQc4S5PIJgU3TCNP9uH8waKxFjCt8eaH4PnIgjiH 8L4CzPmXizvokB/ooGZaoCZklxDhJbXGe07C89JVYqukbkaivvYjAZvCPe/dinBc Lvah5ynoqao8TzmCxSwQEWsHf+kEpu5jtX+R9YdIcPeem1KtVT5dn1SOG61rmp1B hCMKxgy08QC4pQz45ndXWTaz8QWYqmz+MMBksSnjtKAWQY7qKv9xLDnVN1/qZRwB 9mS4Kovq3t8+kwUah6MorctFuDUMzAbX3cqtNUNQIMeoHMW/C7+IZYYV4T2SjzeN 3tkUuUprrIufTz4NGZT7lW1mfKptg0kbPNt4LCaOi89TxFwgn0r1MagPSJuLY4An eLKSDF42n7hiM5147SQt/DMfjjHsvstG5ZT4MHbgmdZxHH643aV2CPMGE+1YcQW9 1l1kYVuyrsmVf6/wP7b1 =Zwnb -----END PGP SIGNATURE----- --BCPM4v3T3HvfJ79PCRIo8txbu9m2sB0Ad--