master def6fa4246 2/2: Speed up string-lessp for multibyte strings

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* master def6fa4246 2/2: Speed up string-lessp for multibyte strings
@ 2022-10-07 19:25 Eli Zaretskii
  2022-10-08 16:49 ` Mattias Engdegård
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2022-10-07 19:25 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: emacs-devel

> +      /* Two arbitrary multibyte strings: we cannot use memcmp because
> +	 the encoding for raw bytes would sort those between U+007F and U+0080
> +	 which isn't where we want them.
> +	 Instead, we skip the longest common prefix and look at
> +	 what follows.  */

I don't think I understand this; please elaborate.  Didn't you say
that we never need to look beyond the first unequal byte?  Then why
does the order of raw bytes matter here?

Did you consider using memmem?

> +      /* First compare entire machine words.  (String data is allocated
> +	 with word alignment.)  */
> +      typedef size_t word_t;
> +      int ws = sizeof (word_t);

Are you sure about the alignment?  Can you show the details of your
reasoning about it?  At the very least, we should have an assertion
here verifying the alignment.

Also, what about AUTO_STRING -- is that also guaranteed to be aligned
as this code assumes?

And finally: why no tests for this?  We are changing a central part of
our code, and it would be unthinkable to do that without a test suite.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings
  2022-10-07 19:25 master def6fa4246 2/2: Speed up string-lessp for multibyte strings Eli Zaretskii
@ 2022-10-08 16:49 ` Mattias Engdegård
  2022-10-08 17:40   ` Stefan Monnier
  2022-10-08 18:25   ` Eli Zaretskii
  0 siblings, 2 replies; 6+ messages in thread
From: Mattias Engdegård @ 2022-10-08 16:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

7 okt. 2022 kl. 21.25 skrev Eli Zaretskii <eliz@gnu.org>:
> 
>> +      /* Two arbitrary multibyte strings: we cannot use memcmp because
>> +	 the encoding for raw bytes would sort those between U+007F and U+0080
>> +	 which isn't where we want them.
>> +	 Instead, we skip the longest common prefix and look at
>> +	 what follows.  */
> 
> I don't think I understand this; please elaborate.  Didn't you say
> that we never need to look beyond the first unequal byte?  Then why
> does the order of raw bytes matter here?

The comment explains why memcmp cannot be used to compare arbitrary multibyte strings and it's exactly as it says: a bytewise comparison would not produce the same order as string-lessp has used in the past because of how we encode raw bytes, that's all.

> Are you sure about the alignment?

Actually I had asked someone about that before and received the answer that string data alignment was guaranteed, and a semi-thorough reading of the code seemed to confirm this -- normal allocation ensures alignment via struct sdata (q.v.) and while AUTO_STRING does not, it only makes unibyte strings which do not concern us in the code path in question.

Of course I was wrong! String data from purespace can be unaligned even for multibyte. Thanks for making me take another look. (Of course, angry SPARC users would have let me know eventually.)

Rather than attempting to find and plug all cases where unaligned strings are produced, this part of the optimisation has now been restricted to platforms where unaligned accesses are safe using a architecture whitelist.

It may still be a good idea to ensure aligned allocation since it allows for more vectorisation of string operations but then again, most commonly used architectures handle unaligned accesses well.

> why no tests for this?

`string-lessp` has much better test coverage than what is typical for Emacs primitives (it had basically none until I added some a while ago) but there are now a few supplementary cases.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings
  2022-10-08 16:49 ` Mattias Engdegård
@ 2022-10-08 17:40   ` Stefan Monnier
  2022-10-09  8:42     ` Mattias Engdegård
  2022-10-08 18:25   ` Eli Zaretskii
  1 sibling, 1 reply; 6+ messages in thread
From: Stefan Monnier @ 2022-10-08 17:40 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Eli Zaretskii, emacs-devel

> Of course I was wrong! String data from purespace can be unaligned even for
> multibyte. Thanks for making me take another look. (Of course, angry SPARC
> users would have let me know eventually.)

[ Cue the "kill the purespace" song.  ]

> Rather than attempting to find and plug all cases where unaligned strings
> are produced, this part of the optimisation has now been restricted to
> platforms where unaligned accesses are safe using a architecture whitelist.

I haven't looked in detail at this code, but this reminds me of the code
I wrote to compute the hash of a string (`hash_string` in fns.c), where
I similarly want to use word operations rather than manipulate
individual bytes.  I ended up using `memcpy` which the compiler
helpfully turns into plain word-sized loads.  So we get code without
alignment or architecture assumptions and efficient code (even on
architectures that don't allow unaligned loads since the compiler can
still produce more efficient code than a byte-by-byte loop).

> It may still be a good idea to ensure aligned allocation since it allows for
> more vectorisation of string operations but then again, most commonly used
> architectures handle unaligned accesses well.

[ Over on comp.arch the general mood is that not supporting unaligned
  loads natively is a ridiculous mistake because it's so cheap to
  implement (and the software workarounds are much more costly in
  comparison).  ]

        Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings
  2022-10-08 16:49 ` Mattias Engdegård
  2022-10-08 17:40   ` Stefan Monnier
@ 2022-10-08 18:25   ` Eli Zaretskii
  2022-10-08 19:01     ` Mattias Engdegård
  1 sibling, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2022-10-08 18:25 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: emacs-devel

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Sat, 8 Oct 2022 18:49:11 +0200
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> 7 okt. 2022 kl. 21.25 skrev Eli Zaretskii <eliz@gnu.org>:
> > 
> >> +      /* Two arbitrary multibyte strings: we cannot use memcmp because
> >> +	 the encoding for raw bytes would sort those between U+007F and U+0080
> >> +	 which isn't where we want them.
> >> +	 Instead, we skip the longest common prefix and look at
> >> +	 what follows.  */
> > 
> > I don't think I understand this; please elaborate.  Didn't you say
> > that we never need to look beyond the first unequal byte?  Then why
> > does the order of raw bytes matter here?
> 
> The comment explains why memcmp cannot be used to compare arbitrary multibyte strings and it's exactly as it says: a bytewise comparison would not produce the same order as string-lessp has used in the past because of how we encode raw bytes, that's all.

As long as memcmp reports equality, we don't care, and once it reports
inequality, you can examine the first unequal bytes "by hand".  Right?
So I still don't understand the comment and how it led you to the
conclusion.

I also asked about memmem -- did you consider using that?

> > Are you sure about the alignment?
> 
> Actually I had asked someone about that before and received the answer that string data alignment was guaranteed, and a semi-thorough reading of the code seemed to confirm this -- normal allocation ensures alignment via struct sdata (q.v.) and while AUTO_STRING does not, it only makes unibyte strings which do not concern us in the code path in question.

AFAIU, AUTO_STRING can also generate stack-allocated multibyte strings.

> > why no tests for this?
> 
> `string-lessp` has much better test coverage than what is typical for Emacs primitives

For non-ASCII strings?



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings
  2022-10-08 18:25   ` Eli Zaretskii
@ 2022-10-08 19:01     ` Mattias Engdegård
  0 siblings, 0 replies; 6+ messages in thread
From: Mattias Engdegård @ 2022-10-08 19:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

8 okt. 2022 kl. 20.25 skrev Eli Zaretskii <eliz@gnu.org>:

> As long as memcmp reports equality, we don't care, and once it reports
> inequality, you can examine the first unequal bytes "by hand".  Right?

Unfortunately memcmp doesn't tell us the location of the inequality.

> I also asked about memmem -- did you consider using that?

No, it wouldn't help in any way.

> AUTO_STRING can also generate stack-allocated multibyte strings.

Maybe, but it doesn't look like it. Not that it matters much now.
(No fan of AUTO_STRING myself although I understand why it's used now and then.)

>> `string-lessp` has much better test coverage than what is typical for Emacs primitives
> 
> For non-ASCII strings?

Yes.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings
  2022-10-08 17:40   ` Stefan Monnier
@ 2022-10-09  8:42     ` Mattias Engdegård
  0 siblings, 0 replies; 6+ messages in thread
From: Mattias Engdegård @ 2022-10-09  8:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

8 okt. 2022 kl. 19.40 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> I ended up using `memcpy` which the compiler
> helpfully turns into plain word-sized loads.  So we get code without
> alignment or architecture assumptions and efficient code (even on
> architectures that don't allow unaligned loads since the compiler can
> still produce more efficient code than a byte-by-byte loop).

Yes, I considered memcpy but was worried that compilers would generate poor code (maybe a library call) on some platforms making a mockery of what was intended as an optimisation. (memcpy scores fractionally better on the C undefined-behaviour scale but I'm not overly worried.)
I may yet change my mind.

> [ Over on comp.arch the general mood is that not supporting unaligned
>  loads natively is a ridiculous mistake because it's so cheap to
>  implement (and the software workarounds are much more costly in
>  comparison).  ]

There's lots of merit in that, especially for code parsing network protocols where packets in nested layers appear inside and next to each other so that it's impossible to avoid at least one of them being unaligned no matter how well your frames are laid out. Systems people tend to like unaligned-friendly circuitry.

Naturally the standard-bearers of ultra-modern architecture, x86[-64] and s390[x], allow unaligned access!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-10-09  8:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-07 19:25 master def6fa4246 2/2: Speed up string-lessp for multibyte strings Eli Zaretskii
2022-10-08 16:49 ` Mattias Engdegård
2022-10-08 17:40   ` Stefan Monnier
2022-10-09  8:42     ` Mattias Engdegård
2022-10-08 18:25   ` Eli Zaretskii
2022-10-08 19:01     ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).