From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: =?UTF-8?Q?Cl=c3=a9ment_Pit--Claudel?= Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams Date: Wed, 14 Sep 2016 22:00:19 -0400 Message-ID: <49083306-0193-3bb0-74cc-ecac6e2d6022@gmail.com> References: <87bmzrahvg.fsf@web.de> <79c6ccd6-9808-f4fd-071a-58559f72ecdc@gmail.com> <8737l3a4ab.fsf@web.de> <27962aa1-ae40-99f8-64ad-ae21012fb36e@gmail.com> <87vaxywmh9.fsf@web.de> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="m3Ueu18MKP2Jf5Qh06PXDbWJAduAv86rC" X-Trace: blaine.gmane.org 1473909077 18885 195.159.176.226 (15 Sep 2016 03:11:17 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 15 Sep 2016 03:11:17 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 Cc: Michael Heerdegen To: John Mastro , emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Sep 15 05:11:13 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bkN4r-0003VJ-TH for ged-emacs-devel@m.gmane.org; Thu, 15 Sep 2016 05:11:06 +0200 Original-Received: from localhost ([::1]:59524 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkN4p-0005OR-Qe for ged-emacs-devel@m.gmane.org; Wed, 14 Sep 2016 23:11:03 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34774) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkN3q-0005M9-99 for emacs-devel@gnu.org; Wed, 14 Sep 2016 23:10:03 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bkN3m-0001Un-9d for emacs-devel@gnu.org; Wed, 14 Sep 2016 23:10:01 -0400 Original-Received: from mout.kundenserver.de ([212.227.126.131]:55600) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkN3l-0001Uh-VH for emacs-devel@gnu.org; Wed, 14 Sep 2016 23:09:58 -0400 Original-Received: from [18.189.118.169] ([18.189.118.169]) by mrelayeu.kundenserver.de (mreue001) with ESMTPSA (Nemesis) id 0M7Pmh-1aqDFC38id-00xKS4; Thu, 15 Sep 2016 04:00:21 +0200 In-Reply-To: X-Provags-ID: V03:K0:8WK8Q7SpfFMkpr7YBkAAwisn+g4+qBPF9zmQruztcmg+NjAHh+o OhL4bx3rucsj8AoezncGgB3/5Mj+i61lQG5Jpv8nXYjq7NJeiOu/slFc1VHx51Jdh6+iaLY w0UXyRdAhGnb7B1KOsLDFQIyLYhuOd/UPAB6Yj4LaR7hJFQQhb1Gv7LP7ATBmBW04ebdKaB 4HK5cFihsVMhqfcK3ckhA== X-UI-Out-Filterresults: notjunk:1;V01:K0:nON6lWtpaI4=:3GqobAIoJ5RhTs/S+OqrWy ElAUIqkoj+wRF88Dhgd2ioX+qpY48NZPqQ0Q6bwLTHErlnXfhpQkuFru/kPZlyS1cIYRkt0Te YunDmamfNExv5vbR6m1Do5qXAYtEwRZAjXXmLfzUZdJc7yzypvbowBaumP1xlxgrjPtDyGfQF pfWV7Ns8yrbSaUp14lyURmgRFMYnl1a8npR9Z3uYTASEcS4YQG+qPzf65TgOwe8Q29Ap99Q34 jrze1GnrB8wsYijVu5sPOjXeZeF22KqEcL6acM+N4l70Lz+1sZalXz/UuAd4mkzYnszKGlRhD acn0ro5V9FdEh7m9YLrxWX67bCYr+I9V4uI1T+dR1f6HZSYqsIhFHh7ftnbHTL7FNx10aZM+J sGrf2iBTem6K69tbNzvCR91V0EtM3I8qCHSPBBAKxgKNgSFFvVQz/ylqScTO69HABnMARb2Gm iQpZybXHg66PN4rzIgnMzQ0rC4KVVzCTT8Oew+9fUX3YEXcVA6pmMw79q6uCeajR49oyoejJ1 kKVSR8VLYdQ+Jdte207NX1NGHqJ4YkDtvXSU8y3AQxcUgzG6DBc4na2wrynuWFAunC8ZlTsBz hZP/ocn6Hcq369EmGjV9+frB8ur6cgBWBfLp75uUau5EgB5+OdB1oVqc1oE2czPnXe24UbU4O tE04ltpzch59BCh9WWbCTyyOtUTLkQT+JgM6Ai5knctVyhIze/yee/Kcw+ui0y4VO6so= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 212.227.126.131 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:207438 Archived-At: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --m3Ueu18MKP2Jf5Qh06PXDbWJAduAv86rC Content-Type: multipart/mixed; boundary="tRffi3grV0ADG10eBJsolknv6XQcoon0s"; protected-headers="v1" From: =?UTF-8?Q?Cl=c3=a9ment_Pit--Claudel?= To: John Mastro , emacs-devel Cc: Michael Heerdegen Message-ID: <49083306-0193-3bb0-74cc-ecac6e2d6022@gmail.com> Subject: Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams References: <87bmzrahvg.fsf@web.de> <79c6ccd6-9808-f4fd-071a-58559f72ecdc@gmail.com> <8737l3a4ab.fsf@web.de> <27962aa1-ae40-99f8-64ad-ae21012fb36e@gmail.com> <87vaxywmh9.fsf@web.de> In-Reply-To: --tRffi3grV0ADG10eBJsolknv6XQcoon0s Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 2016-09-14 20:51, John Mastro wrote: > Cl=C3=A9ment Pit--Claudel wrote: >>> I think this is actually a very good example why it is good to >>> forbid negative indexes. If you are interested in the last n lines >>> of a file, why would you dissect the complete file (or buffer) into >>> lines and throw away nearly all of the result? >> >> Because it's much more memory-efficient, as long as the file's lines >> are short :) Note that I was careful to say file, not buffer: I don't >> need to load a full file in memory before I start processing its >> lines. Same for the output of a running process: if I just want the >> last n lines, then accumulating all of the output before going to the >> end and looking backwards is extremely inefficient, memory-wise. >> Dissecting the output (splitting it on newlines) and using a ring >> buffer to keep only the last `n` ones is much better. >=20 > (Asking for my own edification) :) Keep in mind that I could be making a mistake, too :) > Wouldn't finding the last N elements require forcing every thunk in the= > stream (to find its end), thus using more memory than a linked list wit= h > the same contents? That's correct, if by more memory you mean "more memory allocated over th= e lifetime of the process". The key is that we don't need to allocate it= all at once. In the simple case where we want, for example, just the las= t element, we only need to hold on to one value at a time. > As long as you don't "hang on to the head" of the > stream, earlier elements could be reclaimed by GC, Exactly :) > but the same applies to a list. Not exactly: the list needs to be fully built before you iterate over it.= That's when the memory problems occur. So yes, during iteration you can discard the cons cells that you've alrea= dy seen in both cases, but in the list case these cells need to all be co= nstructed and kept in memory beforehand. > In short, I find this conversation interesting, but don't quite > understand where the memory savings come in :) Let me try to summarize it in a different way. In the stream case, you b= uild one cons cell at a time, and every time you build a new cons cell th= e previous one is available for garbage collection. With a good GC, ther= e's only a few cells physically present in memory at any time (plus the m= emory it takes to keep the last "n" elements, if you're desired output is= the n-elements tail of the stream). In the list case, on the other hand, the full list exists in memory befor= e you iterate on it. Sure, after you iterate on it, the list can be garb= age collected; but before you iterate on it, all the cons cells need to e= xist at the same time. Here's a concrete bit of code to demo this (I tried to write this in Emac= s Lisp, but Emacs kept segfaulting on me, so I gave up and wrote it in Py= thon): import sys def mkstream(n): for k in range(n): yield "a" * (k % 25) * 10 def mklist(n): return ["a" * (k % 25) * 10 for k in range(n)] def last(seq): lastx =3D None for x in seq: lastx =3D x return lastx def test_list(n): last(mklist(n)) def test_stream(n): last(mkstream(n)) tests =3D {"stream": test_stream, "list": test_list} tests[sys.argv[1]](100*1000*1000) When I run this on my machine as =E2=80=9Cpython stream.py stream=E2=80=9D= , the total memory usage doesn't noticeably change. When I run it as =E2= =80=9Cpython stream.py list=E2=80=9D, python allocates about 25GB of RAM.= This obviously not exactly the same as what would happen on the Emacs L= isp side, but hopefully it's close enough :) Cl=C3=A9ment. --tRffi3grV0ADG10eBJsolknv6XQcoon0s-- --m3Ueu18MKP2Jf5Qh06PXDbWJAduAv86rC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX2gCzAAoJEPqg+cTm90wjl08P/iSkrVpCyniR2woqOkj8QBQj 2HVLsY7+p0hyFZzH2RfefabqkO3z83OTwWwLbZC3fikcOGgIIHJLM4K92ypH4rHG 6WiF3WHs681yqXXT/sREd/pEIBoWTwhagZL8THfcGAaK5VrdIZ8Fu0C1adoeAM63 2wyWJ2Iue2wAesWETaM0lbdkaVeRqBrwAJuIDAXJvAcHxD2Tla8mmk30CYHngSQK H9xXJ/TXvFLLZELWXBAmwom9uXXLnwrxZnWYURfGD34I4pT/hNv85vh5PXgZd5rz 9hbgjEhGC+6IH/xujq66dORjHa8MEyV4EaV0vac/rOjOvBe83Agt5ssEwvYMVKmP LeqDIkSCHb30wuK5P4Yi+so+f04HbkTdx4WiuRRGyzm4Mc2Mc8PwJD+Bh9N6lqXL C9Go6DXNhMkhMeppJ0/IcrK25kXxRNuwxVW8krUi5VN7JjP9VRlKD9s1LH+APK7B 1JY8P9bbow6CFTQfYQj4yF+Xd5BospHtNFetNiQYDiEQeDz3jc3XNRw59klgbrxu UskmDctRzrjAOzcz8kDuCOAm3kz3h05XqlaxlsxTql7BXzxL9C9MjUo42hmkoY0m B6/3nB1Zqhy3nC0BddFC4QMzB1tAMje4Yq6pMMn+8nu13tXcQtiUppdCKnqSzoRs 4ZWJVoshUHY4O7mkvL8D =aT+1 -----END PGP SIGNATURE----- --m3Ueu18MKP2Jf5Qh06PXDbWJAduAv86rC--