string_char_to_byte and string_byte_to

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* string_char_to_byte and string_byte_to_char micro-optimisation
@ 2019-06-14 12:37 Robert Pluim
  2019-06-14 16:53 ` Paul Eggert
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Pluim @ 2019-06-14 12:37 UTC (permalink / raw)
  To: emacs-devel

Hi,

in <https://nullprogram.com/blog/2019/05/29/> a benchmark is shown:

(defun compare (string-a string-b)
  (cl-loop for a being the elements of string-a
           for b being the elements of string-b
           unless (eql a b)
           return (cons a b)))
(benchmark-run
    (let ((a (make-string 100000 0))
          (b (make-string 100000 0)))
      (setf (aref a (1- (length a))) 256
            (aref b (1- (length b))) 256)
      (compare a b)))

which runs very slowly because string_char_to_byte and
string_byte_to_char only cache the found values for 1 previous string.

I have a patch which extends this cache to two (count 'em, two!)
previous strings, which fixes this particular benchmark.

What I donʼt have is any intuition on whether such a change actually
makes any difference in real-world Emacs usage. Can anyone suggest any
benchmarks?

Thanks

Robert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string_char_to_byte and string_byte_to_char micro-optimisation
  2019-06-14 12:37 string_char_to_byte and string_byte_to_char micro-optimisation Robert Pluim
@ 2019-06-14 16:53 ` Paul Eggert
  2019-06-14 19:00   ` Eli Zaretskii
  2019-06-17  9:37   ` Robert Pluim
  0 siblings, 2 replies; 9+ messages in thread
From: Paul Eggert @ 2019-06-14 16:53 UTC (permalink / raw)
  To: emacs-devel

On 6/14/19 5:37 AM, Robert Pluim wrote:
> What I donʼt have is any intuition on whether such a change actually
> makes any difference in real-world Emacs usage. Can anyone suggest any
> benchmarks?

My usual benchmark for this sort of thing is 'make compile-always' in 
the lisp directory.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string_char_to_byte and string_byte_to_char micro-optimisation
  2019-06-14 16:53 ` Paul Eggert
@ 2019-06-14 19:00   ` Eli Zaretskii
  2019-06-14 20:11     ` Stefan Monnier
  2019-06-17  9:37   ` Robert Pluim
  1 sibling, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2019-06-14 19:00 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Fri, 14 Jun 2019 09:53:50 -0700
> 
> On 6/14/19 5:37 AM, Robert Pluim wrote:
> > What I donʼt have is any intuition on whether such a change actually
> > makes any difference in real-world Emacs usage. Can anyone suggest any
> > benchmarks?
> 
> My usual benchmark for this sort of thing is 'make compile-always' in 
> the lisp directory.

I don't think that will do for this case.  Strings are used rather
rarely in Emacs.  We need to find a command that uses strings
extensively, and uses non-ASCII text in strings in particular.  Some
JSON processing with non-ASCII strings inside, perhaps?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string_char_to_byte and string_byte_to_char micro-optimisation
  2019-06-14 19:00   ` Eli Zaretskii
@ 2019-06-14 20:11     ` Stefan Monnier
  2019-06-15  6:22       ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2019-06-14 20:11 UTC (permalink / raw)
  To: emacs-devel

> I don't think that will do for this case.  Strings are used rather
> rarely in Emacs.  We need to find a command that uses strings
> extensively, and uses non-ASCII text in strings in particular.

... and uses `aref` on it extensively.
Most strings are used via regexp-search in which case the conversion
between charpos and bytepos is generally lost in the noise.


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string_char_to_byte and string_byte_to_char micro-optimisation
  2019-06-14 20:11     ` Stefan Monnier
@ 2019-06-15  6:22       ` Eli Zaretskii
  2019-06-15  7:48         ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2019-06-15  6:22 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Fri, 14 Jun 2019 16:11:47 -0400
> 
> > I don't think that will do for this case.  Strings are used rather
> > rarely in Emacs.  We need to find a command that uses strings
> > extensively, and uses non-ASCII text in strings in particular.
> 
> ... and uses `aref` on it extensively.

Right.  And/or 'aset'.  Other candidates are 'string-match' and
'replace-match'.  All that with non-ASCII strings, of course.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string_char_to_byte and string_byte_to_char micro-optimisation
  2019-06-15  6:22       ` Eli Zaretskii
@ 2019-06-15  7:48         ` Stefan Monnier
  2019-06-15 11:11           ` Noam Postavsky
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2019-06-15  7:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>> ... and uses `aref` on it extensively.
> Right.  And/or 'aset'.

Right, but `aset` is even more rare on multibyte strings.

> Other candidates are 'string-match' and 'replace-match'.

`replace-match` has to copy the string, so charpos<->bytepos conversion
doesn't slow it down significantly (I'd guess it's at most a factor of 2).

`string-match` is only affected by charpos<->bytepos is you use the
`start` argument, and the time to perform the actual regexp search will
usually dwarf the charpos<->bytepos conversion, so I think it can only
be noticeably slowed down by charpos<->bytepos conversion in
"pathological" cases where we `start` in the middle of a longish string
and we immediately find a short match.

In contrast, `aref` never does much more than the charpos<->bytepos
conversion itself.

        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string_char_to_byte and string_byte_to_char micro-optimisation
  2019-06-15  7:48         ` Stefan Monnier
@ 2019-06-15 11:11           ` Noam Postavsky
  2019-06-16 11:17             ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Noam Postavsky @ 2019-06-15 11:11 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, Emacs developers

On Sat, 15 Jun 2019 at 03:49, Stefan Monnier <monnier@iro.umontreal.ca> wrote:

> be noticeably slowed down by charpos<->bytepos conversion in
> "pathological" cases where we `start` in the middle of a longish string
> and we immediately find a short match.

Would this include cases where you iterate through string-match
results in a loop, incrementing the `start` argument each time, as in
replace-regexp-in-string? (I guess if its REP argument is a function
which aref's another multibyte string, then it should miss the cache
each time).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string_char_to_byte and string_byte_to_char micro-optimisation
  2019-06-15 11:11           ` Noam Postavsky
@ 2019-06-16 11:17             ` Stefan Monnier
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Monnier @ 2019-06-16 11:17 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Eli Zaretskii, Emacs developers

>> be noticeably slowed down by charpos<->bytepos conversion in
>> "pathological" cases where we `start` in the middle of a longish string
>> and we immediately find a short match.
> Would this include cases where you iterate through string-match
> results in a loop, incrementing the `start` argument each time, as in
> replace-regexp-in-string?

Yes, that's exactly the case I had in mind.


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string_char_to_byte and string_byte_to_char micro-optimisation
  2019-06-14 16:53 ` Paul Eggert
  2019-06-14 19:00   ` Eli Zaretskii
@ 2019-06-17  9:37   ` Robert Pluim
  1 sibling, 0 replies; 9+ messages in thread
From: Robert Pluim @ 2019-06-17  9:37 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

>>>>> On Fri, 14 Jun 2019 09:53:50 -0700, Paul Eggert <eggert@cs.ucla.edu> said:

    Paul> On 6/14/19 5:37 AM, Robert Pluim wrote:
    >> What I donʼt have is any intuition on whether such a change actually
    >> makes any difference in real-world Emacs usage. Can anyone suggest any
    >> benchmarks?

    Paul> My usual benchmark for this sort of thing is 'make compile-always' in
    Paul> the lisp directory.

It doesnʼt make a significant difference, so I donʼt think thereʼs any
point in complicating the code:

With patch, run 1:

real	4m21.097s
user	3m39.020s
sys	0m33.267s

With patch, run 2:

real	4m13.649s
user	3m34.102s
sys	0m31.834s

Without patch, run 1:

real	4m15.264s
user	3m34.305s
sys	0m32.719s

Without patch, run 2:

real	4m18.266s
user	3m36.531s
sys	0m33.315s



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-17  9:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-14 12:37 string_char_to_byte and string_byte_to_char micro-optimisation Robert Pluim
2019-06-14 16:53 ` Paul Eggert
2019-06-14 19:00   ` Eli Zaretskii
2019-06-14 20:11     ` Stefan Monnier
2019-06-15  6:22       ` Eli Zaretskii
2019-06-15  7:48         ` Stefan Monnier
2019-06-15 11:11           ` Noam Postavsky
2019-06-16 11:17             ` Stefan Monnier
2019-06-17  9:37   ` Robert Pluim

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).