all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: "Mattias Engdegård" <mattias.engdegard@gmail.com>
Cc: 58168@debbugs.gnu.org
Subject: bug#58168: string-lessp glitches and inconsistencies
Date: Sat, 08 Oct 2022 10:35:05 +0300	[thread overview]
Message-ID: <83lepqlqdy.fsf@gnu.org> (raw)
In-Reply-To: <C474D5DC-8D52-4A29-BF8F-B6FB26CBE9B7@gmail.com> (message from Mattias Engdegård on Fri, 7 Oct 2022 16:23:26 +0200)

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Fri, 7 Oct 2022 16:23:26 +0200
> Cc: 58168@debbugs.gnu.org
> 
> 6 okt. 2022 kl. 13.06 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > Cf. NaN comparisons with numerical values.
> 
> Emacs strings are completely different from floats and NaNs in just about every respect; no meaningful parallels can be drawn. (And do believe me when I say that we should be thankful for that.)

I'm totally aware that NaNs and unibyte strings are completely
different beasts, believe me.  I was just pointing out another
widespread case where comparison results are surprising and order is
not defined.  My point is that it isn't an unimaginable situation.

> > You missed me here.  Why are you suddenly talking about mismatches?
> > And if only mismatches matter here, why is it a problem to use memchr
> > in the first place?
> 
> Any lexicographic comparison is a matter of finding the first point of difference, then interpreting the difference at that point. `memchr` does not help with that, nor does `memcmp` unless we are doing a bytewise string comparison.

Wed are miscommunicating, because you remove too much of previous
context.  I suggested to use memchr to find whether a string has any
C0 or C1 bytes, _before_ doing the actual comparison, to find out
whether a multibyte string includes any raw bytes, which would then
require slower comparisons.  If there are no C0/C1 bytes, you could
use memcmp, which is always faster than hand-made word-wise comparison
we have there now.

I also suggested to try memmem as yet another possibility -- not sure
up front whether it can be faster in cases that matter.

> Similar improvements could be made to the comparison between unibyte and non-ASCII multibyte strings. These are less common and not quite as slow; I haven't made up my mind about whether it's worth the trouble.

I don't think it's worth the trouble.

> In any case, the situation is now better than it was before the bug was opened: string< is faster and the remaining problems have at least been chartered, whether or not an agreement to remedy them can be reached. Let's be happy about this!

This is me being happy.





  reply	other threads:[~2022-10-08  7:35 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-29 16:24 bug#58168: string-lessp glitches and inconsistencies Mattias Engdegård
2022-09-29 17:00 ` Mattias Engdegård
2022-09-29 17:11 ` Eli Zaretskii
2022-09-30 20:04   ` Mattias Engdegård
2022-10-01  5:22     ` Eli Zaretskii
2022-10-01 19:57       ` Mattias Engdegård
2022-10-02  5:36         ` Eli Zaretskii
2022-10-03 19:48           ` Mattias Engdegård
2022-10-04  5:55             ` Eli Zaretskii
2022-10-04 17:40               ` Richard Stallman
2022-10-04 18:07                 ` Eli Zaretskii
2022-10-06  9:05               ` Mattias Engdegård
2022-10-06 11:06                 ` Eli Zaretskii
2022-10-07 14:23                   ` Mattias Engdegård
2022-10-08  7:35                     ` Eli Zaretskii [this message]
2022-10-14 14:39                       ` Mattias Engdegård
2022-10-14 15:31                         ` Eli Zaretskii
2022-10-17 12:44                           ` Mattias Engdegård
2022-09-30 13:52 ` Lars Ingebrigtsen
2022-09-30 20:12   ` Mattias Engdegård
2022-10-01  5:34     ` Eli Zaretskii
2022-10-01 11:51       ` Mattias Engdegård
2022-10-01 10:02     ` Lars Ingebrigtsen
2022-10-01 10:12       ` Eli Zaretskii
2022-10-01 13:37       ` Mattias Engdegård
2022-10-01 13:43         ` Lars Ingebrigtsen
2022-10-03 19:48           ` Mattias Engdegård
2022-10-04 10:44             ` Lars Ingebrigtsen
2022-10-04 11:37             ` Eli Zaretskii
2022-10-04 14:44               ` Mattias Engdegård
2022-10-04 16:24                 ` Eli Zaretskii
2022-10-06  9:05                   ` Mattias Engdegård
2022-10-06 11:13                     ` Eli Zaretskii
2022-10-06 12:43                       ` Mattias Engdegård
2022-10-06 14:34                         ` Eli Zaretskii
2022-10-07 14:45                           ` Mattias Engdegård
2022-10-07 15:33                             ` Eli Zaretskii
2022-10-08 17:13                               ` Mattias Engdegård
2022-10-01 13:51         ` Eli Zaretskii
2022-10-01  5:30   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83lepqlqdy.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=58168@debbugs.gnu.org \
    --cc=mattias.engdegard@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.