Passing buffers to function in elisp

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* Passing buffers to function in elisp
@ 2023-02-21 21:18 Petteri Hintsanen
  2023-02-21 23:21 ` [External] : " Drew Adams
  2023-02-22  5:30 ` tomas
  0 siblings, 2 replies; 22+ messages in thread
From: Petteri Hintsanen @ 2023-02-21 21:18 UTC (permalink / raw)
  To: help-gnu-emacs

Hello list,

Alan J. Perlis said "A LISP programmer knows the value of everything,
but the cost of nothing."

I'm reading some bytes into a temp buffer, like so:

  (with-temp-buffer
    (set-buffer-multibyte nil)
    (insert-file-contents-literally filename nil 0 64000))

then I pass these bytes to functions for processing, like this

    (func1 (buffer-string))

or sometimes just part of them

    (func2 (substring (buffer-string) 100 200))

Now:

. does this generate garbage?  (I believe it does.)
. if there are many funcalls like that, will there be lots of garbage?
  (I guess there will be.)
. is this bad style?  (I'm afraid it is, hence asking.)

Is it better just to assume in functions that the current buffer is the
data buffer and work on that, instead of passing data as function
arguments?

[Why am I doing like this?  It is /slightly/ easier to write tests when
functions get their data in their arguments.]

Also: is it good idea to try to limit the number temp buffers
(with-temp-buffer expressions)?  Or are they somehow recycled within the
elisp interpreter?

Thanks,
Petteri

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [External] : Passing buffers to function in elisp
  2023-02-21 21:18 Passing buffers to function in elisp Petteri Hintsanen
@ 2023-02-21 23:21 ` Drew Adams
  2023-02-22  5:35   ` tomas
  2023-02-22  5:30 ` tomas
  1 sibling, 1 reply; 22+ messages in thread
From: Drew Adams @ 2023-02-21 23:21 UTC (permalink / raw)
  To: Petteri Hintsanen, help-gnu-emacs@gnu.org

> I'm reading some bytes into a temp buffer, like so:
> 
>   (with-temp-buffer
>     (set-buffer-multibyte nil)
>     (insert-file-contents-literally filename nil 0 64000))
> 
> then I pass these bytes to functions for processing, like this
> 
>     (func1 (buffer-string))
> 
> or sometimes just part of them
> 
>     (func2 (substring (buffer-string) 100 200))

Why aren't you passing the buffer itself to func1?
Why aren't you passing the buffer itself and the limits 100 and 200 to func2?

What is it that you're really trying to do?

Yes, if you start manipulating strings instead of buffer text you will pay a performance penalty, in general.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-21 23:21 ` [External] : " Drew Adams
@ 2023-02-22  5:35   ` tomas
  2023-02-24 20:08     ` Petteri Hintsanen
  0 siblings, 1 reply; 22+ messages in thread
From: tomas @ 2023-02-22  5:35 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 916 bytes --]

On Tue, Feb 21, 2023 at 11:21:47PM +0000, Drew Adams wrote:
> > I'm reading some bytes into a temp buffer, like so:
> > 
> >   (with-temp-buffer
> >     (set-buffer-multibyte nil)
> >     (insert-file-contents-literally filename nil 0 64000))
> > 
> > then I pass these bytes to functions for processing, like this
> > 
> >     (func1 (buffer-string))
> > 
> > or sometimes just part of them
> > 
> >     (func2 (substring (buffer-string) 100 200))
> 
> Why aren't you passing the buffer itself to func1?
> Why aren't you passing the buffer itself and the limits 100 and 200 to func2?
> 
> What is it that you're really trying to do?

That's exactly the point, yes.

> Yes, if you start manipulating strings instead of buffer text you will pay a performance penalty, in general.

...the question being whether it's worth it or not.
Sometimes it is, sometimes it isn't :-)

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-22  5:35   ` tomas
@ 2023-02-24 20:08     ` Petteri Hintsanen
  2023-02-25  6:40       ` tomas
  2023-02-25 11:23       ` Michael Heerdegen
  0 siblings, 2 replies; 22+ messages in thread
From: Petteri Hintsanen @ 2023-02-24 20:08 UTC (permalink / raw)
  To: help-gnu-emacs

<tomas@tuxteam.de> writes:

> On Tue, Feb 21, 2023 at 11:21:47PM +0000, Drew Adams wrote:
>> What is it that you're really trying to do?
>
> That's exactly the point, yes.

Specifics, as usual, are somewhat messy.  But I try to summarize below.

I'm working with Ogg audio files, specifically Vorbis and Opus.  I need
to extract certain metadata from such files.  This code is part of EMMS
(see https://git.savannah.gnu.org/cgit/emms.git/tree/emms-info-native.el
if you're curious, but please note that it's not the same version I'm
working on.)

Ogg file is basically a sequence of logical "pages", and each page has
zero or more logical "packets".  I need to read and decode the first two
packets and the last page from a given file.  Page size is bounded by
65307 bytes, while packets can be of any size (they can span multiple
pages).

My code extracts the first two packets by repeatedly reading and
decoding a single page along its packet data ("payload"), until I have
assembled two complete packets.  These packets contain most of the
metadata I'm interested in.

Each page is read and decoded in somewhat wasteful manner by reading
65307 bytes worth of data from a certain offset (a page boundary) into a
temporary buffer.  So "func1" in my original posting is actually this:

  (defun emms-info-native--read-and-decode-ogg-page (filename offset)
    (with-temp-buffer
      (set-buffer-multibyte nil)
      (insert-file-contents-literally filename
                                      nil
                                      offset
                                      (+ offset
                                         emms-info-native--ogg-page-size))
      (emms-info-native--decode-ogg-page (buffer-string))))

The function emms-info-native--decode-ogg-page uses bindat to do the
actual decoding, and packs the results into a plist, which is then
returned to the caller.  I'm using separate function here because it is
easy to test -- just supply fixed byte vectors for it and check that you
get correct results.

Calling code looks like this:

  (defun emms-info-native--decode-ogg-packets (filename packets)
    (let ((num-packets 0)
          (offset 0)
          (stream (vector)))
      (while (< num-packets packets)
        (let ((page (emms-info-native--read-and-decode-ogg-page filename
                                                                offset)))
          (cl-incf num-packets (or (plist-get page :num-packets) 0))
          (cl-incf offset (plist-get page :num-bytes))
          (setq stream (vconcat stream (plist-get page :stream)))
      stream))

This function calls emms-info-native--read-and-decode-ogg-page in a loop
until the desired number of packets has been extracted.  So by
evaluating (emms-info-native--decode-ogg-packets filename 2) I get what
I need.

All data is read-only in the sense it is read from the disk and then
just copied around to alists, plists, vectors and so on.

-----

I added a counter for tracking the number of temp buffers and ran a
benchmark against some 3000+ Ogg files.  This was done on primed cache
so disk I/O should have had minimal effect.

There were 12538 temp buffers created (= 12538 pages decoded).
Benchmark function output was

  "Elapsed time: 23.806966s (18.743661s in 373 GCs)"

So this means that ~78% of the time was spent on garbage collection?  If
so, I think my design sucks.

-----

I am well aware that (preliminary) optimization is best avoided.
Also, "when in doubt, use brute force."
And even the current performance is good enough.

My problem here is of more fundamental sort: I don't know what are the
right data structures and calling conventions.  I am still learning
(emacs) lisp, and it shows.

In C or C++ it is "easier": just pass pointers or references and you're
good.  With Lisp and especially Emacs Lisp things are more convoluted --
at least until you learn the necessary idioms.

Thanks,
Petteri

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-24 20:08     ` Petteri Hintsanen
@ 2023-02-25  6:40       ` tomas
  2023-02-25 11:23       ` Michael Heerdegen
  1 sibling, 0 replies; 22+ messages in thread
From: tomas @ 2023-02-25  6:40 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]

On Fri, Feb 24, 2023 at 10:08:11PM +0200, Petteri Hintsanen wrote:
> <tomas@tuxteam.de> writes:
> 
> > On Tue, Feb 21, 2023 at 11:21:47PM +0000, Drew Adams wrote:
> >> What is it that you're really trying to do?
> >
> > That's exactly the point, yes.
> 
> Specifics, as usual, are somewhat messy.  But I try to summarize below.

[...]

Thanks for this very interesting dive :)

It seems you so deeper in the rabbit hole that my general handwaving
doesn't do justice to it.

I'd suggest to call `garbage-collect' explicitly from some
strategic point in your code will tell you what kinds (and
how many) of objects have been collected. You could then at
least have a rough idea on where to focus your efforts (are
the many buffers killing you -- or rather loads and loads
of small cons pairs? Or those many vectors?)

There are many knobs and variables to "look into" what the
garbage collector is thinking, see "Garbage Collection" and
"Memory Usage" in Appendix E of the Elisp manual (the Web
version is here [1], if you prefer that).

Thanks for hacking :-)

Cheers

[1] https://www.gnu.org/software/emacs/manual/html_node/elisp/GNU-Emacs-Internals.html
- 
tomás

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-24 20:08     ` Petteri Hintsanen
  2023-02-25  6:40       ` tomas
@ 2023-02-25 11:23       ` Michael Heerdegen
  2023-02-25 13:45         ` tomas
  2023-02-25 23:52         ` Stefan Monnier via Users list for the GNU Emacs text editor
  1 sibling, 2 replies; 22+ messages in thread
From: Michael Heerdegen @ 2023-02-25 11:23 UTC (permalink / raw)
  To: help-gnu-emacs

Petteri Hintsanen <petterih@iki.fi> writes:

>   (defun emms-info-native--read-and-decode-ogg-page (filename offset)
>     (with-temp-buffer
>       (set-buffer-multibyte nil)
>       (insert-file-contents-literally filename
>                                       nil
>                                       offset
>                                       (+ offset
>                                          emms-info-native--ogg-page-size))
>       (emms-info-native--decode-ogg-page (buffer-string))))
> [...]
>
>   (defun emms-info-native--decode-ogg-packets (filename packets)
>     (let ((num-packets 0)
>           (offset 0)
>           (stream (vector)))
>       (while (< num-packets packets)
>         (let ((page (emms-info-native--read-and-decode-ogg-page filename
>                                                                 offset)))
>           (cl-incf num-packets (or (plist-get page :num-packets) 0))
>           (cl-incf offset (plist-get page :num-bytes))
>           (setq stream (vconcat stream (plist-get page :stream)))
>       stream))

If `emms-info-native--read-and-decode-ogg-page' is called very often
(hundreds of times or more), it's probably better to use one single
buffer instead of a fresh temp buffer every single time.  Using temp
buffers creates quite a bunch of garbage IME.

Michael.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-25 11:23       ` Michael Heerdegen
@ 2023-02-25 13:45         ` tomas
  2023-02-25 18:31           ` Michael Heerdegen
  2023-02-25 23:52         ` Stefan Monnier via Users list for the GNU Emacs text editor
  1 sibling, 1 reply; 22+ messages in thread
From: tomas @ 2023-02-25 13:45 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 429 bytes --]

On Sat, Feb 25, 2023 at 12:23:22PM +0100, Michael Heerdegen wrote:

[...]

> If `emms-info-native--read-and-decode-ogg-page' is called very often
> (hundreds of times or more), it's probably better to use one single
> buffer instead of a fresh temp buffer every single time.  Using temp
> buffers creates quite a bunch of garbage IME.

And then do (erase-buffer) then (insert-file-contents-literally)?

Cheers
- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-25 13:45         ` tomas
@ 2023-02-25 18:31           ` Michael Heerdegen
  2023-02-25 19:05             ` tomas
  0 siblings, 1 reply; 22+ messages in thread
From: Michael Heerdegen @ 2023-02-25 18:31 UTC (permalink / raw)
  To: help-gnu-emacs

<tomas@tuxteam.de> writes:

> > If `emms-info-native--read-and-decode-ogg-page' is called very often
> > (hundreds of times or more), it's probably better to use one single
> > buffer instead of a fresh temp buffer every single time.  Using temp
> > buffers creates quite a bunch of garbage IME.
>
> And then do (erase-buffer) then (insert-file-contents-literally)?

Yes.

I had a case in my own personal code where recycling temp buffers made a
big difference wrt garbage.  Not sure if the cases are comparable, but I
would give it a try.

Michael.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-25 18:31           ` Michael Heerdegen
@ 2023-02-25 19:05             ` tomas
  0 siblings, 0 replies; 22+ messages in thread
From: tomas @ 2023-02-25 19:05 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 236 bytes --]

On Sat, Feb 25, 2023 at 07:31:01PM +0100, Michael Heerdegen wrote:
> <tomas@tuxteam.de> writes:
> 

[reuse buffer]

> > And then do (erase-buffer) then (insert-file-contents-literally)?
> 
> Yes.

Thanks :-)

Cheers
- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-25 11:23       ` Michael Heerdegen
  2023-02-25 13:45         ` tomas
@ 2023-02-25 23:52         ` Stefan Monnier via Users list for the GNU Emacs text editor
  2023-02-27 20:44           ` Petteri Hintsanen
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2023-02-25 23:52 UTC (permalink / raw)
  To: help-gnu-emacs

> If `emms-info-native--read-and-decode-ogg-page' is called very often
> (hundreds of times or more), it's probably better to use one single
> buffer instead of a fresh temp buffer every single time.  Using temp
> buffers creates quite a bunch of garbage IME.

That's definitely something to consider.  Another is whether the ELisp
code was byte-compiled (if not, then all bets are off, the interpreter
itself generates a fair bit of garbage, especially if you use a lot of
macros).  Are you using `bindat-type`?


        Stefan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-25 23:52         ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2023-02-27 20:44           ` Petteri Hintsanen
  2023-02-28  5:37             ` tomas
  2023-03-03 15:19             ` Stefan Monnier via Users list for the GNU Emacs text editor
  0 siblings, 2 replies; 22+ messages in thread
From: Petteri Hintsanen @ 2023-02-27 20:44 UTC (permalink / raw)
  To: help-gnu-emacs

Stefan Monnier via Users list for the GNU Emacs text editor
<help-gnu-emacs@gnu.org> writes:

>> If `emms-info-native--read-and-decode-ogg-page' is called very often
>> (hundreds of times or more), it's probably better to use one single
>> buffer instead of a fresh temp buffer every single time.

I tried this and, for a moment, I _think_ it shaved off something like
20-25% of the memory usage (according to the memory profiler).  That
would be a big win.

Sadly enough, it was just for a moment, because I cannot replicate it
anymore.  It wasn't a particularly controlled setup, so probably I just
messed up something at some point.  Nonetheless, using a persistent
buffer seems to be the right thing to do, and seeing how many
" *foo-bar-baz*" buffers there are, it even looks like a pattern.

Also, if I interpreted profiler's hieroglyphs correctly, it told me that
this setq

  (setq stream (vconcat stream (plist-get page :stream)))

is a pig -- well, of course it is.  I'm accumulating byte vector by
copying its parts.  Similarly bindat consumes a lot of memory.

I think I can replace vectors with strings, which should, according to
the elisp manual, "occupy one-fourth the space of a vector of the same
elements."  And I guess that accumulation would be best done with a
buffer, not with strings or vectors.

But bindat internals are beyond me.

> That's definitely something to consider.  Another is whether the ELisp
> code was byte-compiled (if not, then all bets are off, the interpreter
> itself generates a fair bit of garbage, especially if you use a lot of
> macros).  

No, it was not byte-compiled.  I don't know how many macros there are.
Just by hand-waving I'd say "not that many".  But again what bindat does
is beyond me.

I'll try byte-compiling after the code is in good enough shape to do
controlled experiments.

> Are you using `bindat-type`?

No, not yet.  I have been thinking about it, not only because the
current implementation is riddled with ugly evals and kludges, but I
want to save the kittens ;-D

I also need to discuss with EMMS maintainer whether using Emacs 28+
feature is okay.

Thanks to all for insights, I learned a lot.
Petteri

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-27 20:44           ` Petteri Hintsanen
@ 2023-02-28  5:37             ` tomas
  2023-03-03 15:19             ` Stefan Monnier via Users list for the GNU Emacs text editor
  1 sibling, 0 replies; 22+ messages in thread
From: tomas @ 2023-02-28  5:37 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1578 bytes --]

On Mon, Feb 27, 2023 at 10:44:43PM +0200, Petteri Hintsanen wrote:

[...]

> Also, if I interpreted profiler's hieroglyphs correctly, it told me that
> this setq
> 
>   (setq stream (vconcat stream (plist-get page :stream)))
> 
> is a pig -- well, of course it is.  I'm accumulating byte vector by
> copying its parts.  Similarly bindat consumes a lot of memory.
> 
> I think I can replace vectors with strings, which should, according to
> the elisp manual, "occupy one-fourth the space of a vector of the same
> elements."  And I guess that accumulation would be best done with a
> buffer, not with strings or vectors.

I must admit I didn't look too closely into your code, but this one
stuck out too. Not only the copying, but the throwing away of so many
vectors.

I don't know whether it applies in your case, but one "classical"
Lispy pattern when you have to concatenate many things in order is
just consing them (at the front of the list) and nreversing the
list at the end, like so:

  (let ((result '()))
    (while (more)
      (setq result (cons (next) result)))
    (nreverse result))

(nreverse does things "in place", so it reuses the cons pairs:
never do that when someone else is looking ;-)

Another, more functional, of course is to arrange things so you
can use map or similar.

Then, at the end you can concatenate the whole list, if need be.

Basically it pays off when the "spine" of the whole thing (i.e.
all those cons pairs you are using) is significantly smaller than
whatever hangs off it

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-02-27 20:44           ` Petteri Hintsanen
  2023-02-28  5:37             ` tomas
@ 2023-03-03 15:19             ` Stefan Monnier via Users list for the GNU Emacs text editor
  2023-03-07 21:48               ` Petteri Hintsanen
  2023-09-06 19:05               ` Petteri Hintsanen
  1 sibling, 2 replies; 22+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2023-03-03 15:19 UTC (permalink / raw)
  To: help-gnu-emacs

> Also, if I interpreted profiler's hieroglyphs correctly, it told me that
> this setq
>
>   (setq stream (vconcat stream (plist-get page :stream)))

This is a typical a source of unnecessary O(N²) complexity: the above
line takes O(N) time, so if you do it O(N) times, you got your
N² blowup.  You're usually better off doing

    (push (plist-get page :stream) stream-chunks)

and then at the end get the `stream` with

    (mapconcat #'identity (nreverse stream-chunks) nil)
or
    (apply #'vconcat (nreverse stream-chunks))

Of course that depends on what else happens with `stream` (I haven't
really looked at your code, sorry).

> I think I can replace vectors with strings, which should, according to
> the elisp manual, "occupy one-fourth the space of a vector of the same
> elements."

More likely one-eighth nowadays (64 bit machines).

> Similarly bindat consumes a lot of memory.

Hmm... IIRC it should not use up very much "auxiliary" memory.
IOW its memory usage should be determined by the amount of data it returns.
So, when producing the bytestring it should be quite efficient memorywise.
When reading the bytestring it may be wastefully allocating memory for
all the alists (and also it may be wasteful if you only need some info
because you still need to parse everything and allocate data to
represent its parsed form).

> But bindat internals are beyond me.

I can be of help here :-)

>> That's definitely something to consider.  Another is whether the ELisp
>> code was byte-compiled (if not, then all bets are off, the interpreter
>> itself generates a fair bit of garbage, especially if you use a lot of
>> macros).
> No, it was not byte-compiled.

Then stop right there and fix this problem.  There's absolutely no point
worrying about performance (including memory use) if the code is
not compiled because compilation can change the behavior drastically.

The only reason to run interpreted code nowadays is when you're
Edebugging a piece of code.

> I'll try byte-compiling after the code is in good enough shape to do
> controlled experiments.

The compiler is your friend.  He can help you get the code in good shape :-)

        Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-03-03 15:19             ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2023-03-07 21:48               ` Petteri Hintsanen
  2023-03-07 22:45                 ` Stefan Monnier
  2023-09-06 19:05               ` Petteri Hintsanen
  1 sibling, 1 reply; 22+ messages in thread
From: Petteri Hintsanen @ 2023-03-07 21:48 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: Stefan Monnier

Stefan Monnier via Users list for the GNU Emacs text editor
<help-gnu-emacs@gnu.org> writes:

> This is a typical a source of unnecessary O(N²) complexity: the above
> line takes O(N) time, so if you do it O(N) times, you got your
> N² blowup.  You're usually better off doing
>
>     (push (plist-get page :stream) stream-chunks)
>
> and then at the end get the `stream` with
>
>     (mapconcat #'identity (nreverse stream-chunks) nil)
> or
>     (apply #'vconcat (nreverse stream-chunks))

Right, I see.  Stream chunks are in this case byte vectors, so
just reversing those chunks does not do the trick.
But surely I can get from an order of N² to 2N or so.

> Of course that depends on what else happens with `stream` (I haven't
> really looked at your code, sorry).

It's ok, I'm not expecting any reviews here.  All these comments from
you and others have been valuable already.

>> No, it was not byte-compiled.
>
> Then stop right there and fix this problem.  There's absolutely no point
> worrying about performance (including memory use) if the code is
> not compiled because compilation can change the behavior drastically.
>
> The only reason to run interpreted code nowadays is when you're
> Edebugging a piece of code.

Okay, this is something I did not foresee.  But what about eval-defun
and eval-... in general?  They are very convenient when trying out
things.  Should I bind compile-defun to C-M-x then?  And instead of
eval-buffer use byte-compile-file?  Or emacs-lisp-byte-compile-and-load?
Manual is a bit spotty here; emacs-lisp-byte-compile-... functions are
not mentioned.

>> I'll try byte-compiling after the code is in good enough shape to do
>> controlled experiments.
>
> The compiler is your friend.  He can help you get the code in good shape :-)

I'm afraid that even the compiler cannot help against quadratic
complexity blunders.  But I think I got your point.

Thanks,
Petteri

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-03-07 21:48               ` Petteri Hintsanen
@ 2023-03-07 22:45                 ` Stefan Monnier
  2023-03-08  5:38                   ` tomas
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Monnier @ 2023-03-07 22:45 UTC (permalink / raw)
  To: Petteri Hintsanen; +Cc: help-gnu-emacs

>> This is a typical a source of unnecessary O(N²) complexity: the above
>> line takes O(N) time, so if you do it O(N) times, you got your
>> N² blowup.  You're usually better off doing
>>
>>     (push (plist-get page :stream) stream-chunks)
>>
>> and then at the end get the `stream` with
>>
>>     (mapconcat #'identity (nreverse stream-chunks) nil)
>> or
>>     (apply #'vconcat (nreverse stream-chunks))
>
> Right, I see.  Stream chunks are in this case byte vectors, so
> just reversing those chunks does not do the trick.
> But surely I can get from an order of N² to 2N or so.

I'm suggesting to build a list of chunks backward and to reverse *the
list*, not the chunks.  So the end result should still be the same.

> Okay, this is something I did not foresee.  But what about eval-defun
> and eval-... in general?  They are very convenient when trying out
> things.

It's OK to use them, of course.  It usually means you still have 98% of
your code compiled.

>> The compiler is your friend.  He can help you get the code in good
>> shape :-)
> I'm afraid that even the compiler cannot help against quadratic
> complexity blunders.

:-)

It's just a friend, yes.


        Stefan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [External] : Passing buffers to function in elisp
  2023-03-07 22:45                 ` Stefan Monnier
@ 2023-03-08  5:38                   ` tomas
  0 siblings, 0 replies; 22+ messages in thread
From: tomas @ 2023-03-08  5:38 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]

On Tue, Mar 07, 2023 at 05:45:49PM -0500, Stefan Monnier wrote:
> >> This is a typical a source of unnecessary O(N²) complexity: the above
> >> line takes O(N) time, so if you do it O(N) times, you got your
> >> N² blowup.  You're usually better off doing
> >>
> >>     (push (plist-get page :stream) stream-chunks)
> >>
> >> and then at the end get the `stream` with
> >>
> >>     (mapconcat #'identity (nreverse stream-chunks) nil)
> >> or
> >>     (apply #'vconcat (nreverse stream-chunks))
> >
> > Right, I see.  Stream chunks are in this case byte vectors, so
> > just reversing those chunks does not do the trick.
> > But surely I can get from an order of N² to 2N or so.
> 
> I'm suggesting to build a list of chunks backward and to reverse *the
> list*, not the chunks.  So the end result should still be the same.

Judging by the "2N instead of N^2" I guess Petteri had the right
mental model, though.

> > Okay, this is something I did not foresee.  But what about eval-defun
> > and eval-... in general?  They are very convenient when trying out
> > things.
> 
> It's OK to use them, of course.  It usually means you still have 98% of
> your code compiled.
> 
> >> The compiler is your friend.  He can help you get the code in good
> >> shape :-)
> > I'm afraid that even the compiler cannot help against quadratic
> > complexity blunders.
> 
> :-)
> 
> It's just a friend, yes.

:-)

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Passing buffers to function in elisp
  2023-03-03 15:19             ` Stefan Monnier via Users list for the GNU Emacs text editor
  2023-03-07 21:48               ` Petteri Hintsanen
@ 2023-09-06 19:05               ` Petteri Hintsanen
  2023-09-06 21:12                 ` Stefan Monnier
  1 sibling, 1 reply; 22+ messages in thread
From: Petteri Hintsanen @ 2023-09-06 19:05 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: Stefan Monnier

Hello all,

It took some time to do these memory optimizations I asked about few
months ago.  Here are some remarks.

>> Also, if I interpreted profiler's hieroglyphs correctly, it told me that
>> this setq
>>
>>   (setq stream (vconcat stream (plist-get page :stream)))
>
> This is a typical a source of unnecessary O(N²) complexity: the above
> line takes O(N) time, so if you do it O(N) times, you got your
> N² blowup.  You're usually better off doing
>
>     (push (plist-get page :stream) stream-chunks)
>
> and then at the end get the `stream` with
>
>     (mapconcat #'identity (nreverse stream-chunks) nil)
> or
>     (apply #'vconcat (nreverse stream-chunks))

I replaced vconcat with push.  However it did not have a significant
effect (measured with Emacs memory profiler).  Perhaps the chunks were
quite small after all.  In complexity speak, with small N one usually
does not need to worry about quadratics.

But it is no worse either, so I left it that way.

>> I think I can replace vectors with strings, which should, according to
>> the elisp manual, "occupy one-fourth the space of a vector of the same
>> elements."
>
> More likely one-eighth nowadays (64 bit machines).

Changing vectors to strings did indeed have a significant effect.  It is
also the right thing to do, because, frankly, much of the data *are*
strings.

>> Similarly bindat consumes a lot of memory.
>
> Hmm... IIRC it should not use up very much "auxiliary" memory.  IOW
> its memory usage should be determined by the amount of data it
> returns.  So, when producing the bytestring it should be quite
> efficient memorywise.

This is correct.  Bindat is very conservative.  I probably misread the
profiler report back then and unjustly put part of the blame on bindat.

>>> That's definitely something to consider.  Another is whether the ELisp
>>> code was byte-compiled (if not, then all bets are off, the interpreter
>>> itself generates a fair bit of garbage, especially if you use a lot of
>>> macros).
>> No, it was not byte-compiled.
>
> Then stop right there and fix this problem.  There's absolutely no point
> worrying about performance (including memory use) if the code is
> not compiled because compilation can change the behavior drastically.

This is also absolutely correct.  There is no point in profiling non
compiled code.  Non compiled code gives wildly changing profiles from
time to time.

>> I'll try byte-compiling after the code is in good enough shape to do
>> controlled experiments.
>
> The compiler is your friend.  He can help you get the code in good shape :-)

Truly he does.

I have also native compilation enabled.  Don't know how much effect it
had.

I also tried to replace with-temp-buffer forms (such forms are called
hundreds of times) with a static buffer for holding temporary data.  It
produced mixed results.  In some limited settings, memory savings were
considerable, but in some others cases it blew up memory usage.  I
cannot explain why that happened.  But it seems safest to stick to
with-temp-buffer.

Nonetheless, the code is now much better. 
Thank you all for your insights,
Petteri

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Passing buffers to function in elisp
  2023-09-06 19:05               ` Petteri Hintsanen
@ 2023-09-06 21:12                 ` Stefan Monnier
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Monnier @ 2023-09-06 21:12 UTC (permalink / raw)
  To: Petteri Hintsanen; +Cc: help-gnu-emacs

>> This is a typical a source of unnecessary O(N²) complexity: the above
>> line takes O(N) time, so if you do it O(N) times, you got your
>> N² blowup.  You're usually better off doing
[...]
> I replaced vconcat with push.  However it did not have a significant
> effect (measured with Emacs memory profiler).  Perhaps the chunks were
> quite small after all.

That's usually the case, indeed.

> In complexity speak, with small N one usually
> does not need to worry about quadratics.

But: it's rare to be sure that N will *always* be small :-(

> I also tried to replace with-temp-buffer forms (such forms are called
> hundreds of times) with a static buffer for holding temporary data.  It
> produced mixed results.  In some limited settings, memory savings were
> considerable, but in some others cases it blew up memory usage.  I
> cannot explain why that happened.  But it seems safest to stick to
> with-temp-buffer.

`with-temp-buffer` is fairly costly, but to the extent that it's pretty
much a constant cost it shouldn't [known on wood] bring surprises in
unexpected circumstances, so if it's fast enough it's a good choice.


        Stefan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Passing buffers to function in elisp
  2023-02-21 21:18 Passing buffers to function in elisp Petteri Hintsanen
  2023-02-21 23:21 ` [External] : " Drew Adams
@ 2023-02-22  5:30 ` tomas
  2023-02-23  9:34   ` Michael Heerdegen
  1 sibling, 1 reply; 22+ messages in thread
From: tomas @ 2023-02-22  5:30 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 2825 bytes --]

On Tue, Feb 21, 2023 at 11:18:25PM +0200, Petteri Hintsanen wrote:
> Hello list,
> 
> 
> Alan J. Perlis said "A LISP programmer knows the value of everything,
> but the cost of nothing."
> 
> 
> I'm reading some bytes into a temp buffer, like so:
> 
>   (with-temp-buffer
>     (set-buffer-multibyte nil)
>     (insert-file-contents-literally filename nil 0 64000))
> 
> then I pass these bytes to functions for processing, like this
> 
>     (func1 (buffer-string))
> 
> or sometimes just part of them
> 
>     (func2 (substring (buffer-string) 100 200))
> 
> Now:
> 
> . does this generate garbage?  (I believe it does.)

It most probably does, but that will depend on the future
history of buffer and whatever func1 does with its arg.

If both are needed later, it isn't garbage (yet). They
become garbage once they aren't needed.

> . if there are many funcalls like that, will there be lots of garbage?
>   (I guess there will be.)

See above. See, the documentation of `buffer string' hints
that it is doing a copy. If you modify the string, the buffer
will stay the same and vice-versa. If that is what you want,
then go for it :-)

> . is this bad style?  (I'm afraid it is, hence asking.)

See above: it depends. If you want func1 to operate on the
buffer content, then you better pass it the buffer itself
(actually a reference to the buffer, but that's "details" ;-)
If you'd be surprised that func1 is able to change the buffer,
then better pass it a copy: `buffer-string' seems a good
way to do that.

> Is it better just to assume in functions that the current buffer is the
> data buffer and work on that, instead of passing data as function
> arguments?

That depends on your style and on the "contracts" you make
with yourself (and ultimately, of course, on what you are
trying to do: for each different purpose, some style will
be clearer/more efficient -- ideally both, but life and
things).

> [Why am I doing like this?  It is /slightly/ easier to write tests when
> functions get their data in their arguments.]

Then go for it. To accompany your nice Perlis quote above
I offer "Premature optimization is the root of all evil",
which is attributed to Donald Knuth (some say it was Tony
Hoare).

Keep an eye on things and be ready to notice whether it
is creating performance problems.

> Also: is it good idea to try to limit the number temp buffers
> (with-temp-buffer expressions)?  Or are they somehow recycled within the
> elisp interpreter?

Once the interpreter (well, it's a hybrid these days. Let's
call it the "run time") can prove they aren't needed, it
will get recycled, yes.

If you are curious, just invoke (garbage-collect) after you
have accumulated some. It will tell you what it found.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Passing buffers to function in elisp
  2023-02-22  5:30 ` tomas
@ 2023-02-23  9:34   ` Michael Heerdegen
  2023-02-23  9:51     ` tomas
  2023-02-23 16:19     ` Marcin Borkowski
  0 siblings, 2 replies; 22+ messages in thread
From: Michael Heerdegen @ 2023-02-23  9:34 UTC (permalink / raw)
  To: help-gnu-emacs

<tomas@tuxteam.de> writes:

> > Is it better just to assume in functions that the current buffer is
> > the data buffer and work on that, instead of passing data as
> > function arguments?
>
> That depends on your style and on the "contracts" you make
> with yourself (and ultimately, of course, on what you are
> trying to do: for each different purpose, some style will
> be clearer/more efficient -- ideally both, but life and
> things).

And there is not only garbage, there is also the aspect of speed: many
operations can be performed in buffers and likewise for strings, but
sometimes operations are a lot faster for strings (modifying a buffer is
a more complicated operation).

Michael.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Passing buffers to function in elisp
  2023-02-23  9:34   ` Michael Heerdegen
@ 2023-02-23  9:51     ` tomas
  2023-02-23 16:19     ` Marcin Borkowski
  1 sibling, 0 replies; 22+ messages in thread
From: tomas @ 2023-02-23  9:51 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]

On Thu, Feb 23, 2023 at 10:34:32AM +0100, Michael Heerdegen wrote:
> <tomas@tuxteam.de> writes:
> 
> > > Is it better just to assume in functions that the current buffer is
> > > the data buffer and work on that, instead of passing data as
> > > function arguments?
> >
> > That depends on your style and on the "contracts" you make
> > with yourself (and ultimately, of course, on what you are
> > trying to do: for each different purpose, some style will
> > be clearer/more efficient -- ideally both, but life and
> > things).
> 
> And there is not only garbage, there is also the aspect of speed: many
> operations can be performed in buffers and likewise for strings, but
> sometimes operations are a lot faster for strings (modifying a buffer is
> a more complicated operation).

And then, if you have the right garbage collector, creating some
garbage might be faster than modifying things in place (if some
stars align, and you take into account other things and all that
:-)

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Passing buffers to function in elisp
  2023-02-23  9:34   ` Michael Heerdegen
  2023-02-23  9:51     ` tomas
@ 2023-02-23 16:19     ` Marcin Borkowski
  1 sibling, 0 replies; 22+ messages in thread
From: Marcin Borkowski @ 2023-02-23 16:19 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: help-gnu-emacs

On 2023-02-23, at 10:34, Michael Heerdegen <michael_heerdegen@web.de> wrote:

> <tomas@tuxteam.de> writes:
>
>> > Is it better just to assume in functions that the current buffer is
>> > the data buffer and work on that, instead of passing data as
>> > function arguments?
>>
>> That depends on your style and on the "contracts" you make
>> with yourself (and ultimately, of course, on what you are
>> trying to do: for each different purpose, some style will
>> be clearer/more efficient -- ideally both, but life and
>> things).
>
> And there is not only garbage, there is also the aspect of speed: many
> operations can be performed in buffers and likewise for strings, but
> sometimes operations are a lot faster for strings (modifying a buffer is
> a more complicated operation).

Well, I am fairly sure there are things which are faster for buffers,
too...  A few years ago I did some experimenting with that:
https://mbork.pl/2019-03-25_Using_benchmark_to_measure_speed_of_Elisp_code

As for testing with buffers, you might be interested in the
`elisp-tests-with-temp-buffer' macro I've written a long time ago (see
emacs/test/lisp/emacs-lisp/lisp-tests.el:317).

The bottom line is probably this: do whatever you prefer, and optimize
when it's needed (as Tomas said).

Hth,

-- 
Marcin Borkowski
http://mbork.pl

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-09-06 21:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-21 21:18 Passing buffers to function in elisp Petteri Hintsanen
2023-02-21 23:21 ` [External] : " Drew Adams
2023-02-22  5:35   ` tomas
2023-02-24 20:08     ` Petteri Hintsanen
2023-02-25  6:40       ` tomas
2023-02-25 11:23       ` Michael Heerdegen
2023-02-25 13:45         ` tomas
2023-02-25 18:31           ` Michael Heerdegen
2023-02-25 19:05             ` tomas
2023-02-25 23:52         ` Stefan Monnier via Users list for the GNU Emacs text editor
2023-02-27 20:44           ` Petteri Hintsanen
2023-02-28  5:37             ` tomas
2023-03-03 15:19             ` Stefan Monnier via Users list for the GNU Emacs text editor
2023-03-07 21:48               ` Petteri Hintsanen
2023-03-07 22:45                 ` Stefan Monnier
2023-03-08  5:38                   ` tomas
2023-09-06 19:05               ` Petteri Hintsanen
2023-09-06 21:12                 ` Stefan Monnier
2023-02-22  5:30 ` tomas
2023-02-23  9:34   ` Michael Heerdegen
2023-02-23  9:51     ` tomas
2023-02-23 16:19     ` Marcin Borkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).