unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: emacs-devel@gnu.org
Subject: Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings
Date: Sat, 08 Oct 2022 21:25:29 +0300	[thread overview]
Message-ID: <83sfjyjhpi.fsf@gnu.org> (raw)
In-Reply-To: <069A384D-4D27-4787-B6BE-84B43FBDF952@acm.org> (message from Mattias Engdegård on Sat, 8 Oct 2022 18:49:11 +0200)

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Sat, 8 Oct 2022 18:49:11 +0200
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> 7 okt. 2022 kl. 21.25 skrev Eli Zaretskii <eliz@gnu.org>:
> > 
> >> +      /* Two arbitrary multibyte strings: we cannot use memcmp because
> >> +	 the encoding for raw bytes would sort those between U+007F and U+0080
> >> +	 which isn't where we want them.
> >> +	 Instead, we skip the longest common prefix and look at
> >> +	 what follows.  */
> > 
> > I don't think I understand this; please elaborate.  Didn't you say
> > that we never need to look beyond the first unequal byte?  Then why
> > does the order of raw bytes matter here?
> 
> The comment explains why memcmp cannot be used to compare arbitrary multibyte strings and it's exactly as it says: a bytewise comparison would not produce the same order as string-lessp has used in the past because of how we encode raw bytes, that's all.

As long as memcmp reports equality, we don't care, and once it reports
inequality, you can examine the first unequal bytes "by hand".  Right?
So I still don't understand the comment and how it led you to the
conclusion.

I also asked about memmem -- did you consider using that?

> > Are you sure about the alignment?
> 
> Actually I had asked someone about that before and received the answer that string data alignment was guaranteed, and a semi-thorough reading of the code seemed to confirm this -- normal allocation ensures alignment via struct sdata (q.v.) and while AUTO_STRING does not, it only makes unibyte strings which do not concern us in the code path in question.

AFAIU, AUTO_STRING can also generate stack-allocated multibyte strings.

> > why no tests for this?
> 
> `string-lessp` has much better test coverage than what is typical for Emacs primitives

For non-ASCII strings?



  parent reply	other threads:[~2022-10-08 18:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-07 19:25 master def6fa4246 2/2: Speed up string-lessp for multibyte strings Eli Zaretskii
2022-10-08 16:49 ` Mattias Engdegård
2022-10-08 17:40   ` Stefan Monnier
2022-10-09  8:42     ` Mattias Engdegård
2022-10-08 18:25   ` Eli Zaretskii [this message]
2022-10-08 19:01     ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83sfjyjhpi.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=mattiase@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).