From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#58168: string-lessp glitches and inconsistencies Date: Sat, 08 Oct 2022 10:35:05 +0300 Message-ID: <83lepqlqdy.fsf@gnu.org> References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@gmail.com> <83czbef6le.fsf@gnu.org> <6CB805F6-89EE-4D7C-A398-F29698733A42@gmail.com> <83h70oce4k.fsf@gnu.org> <83tu4mais1.fsf@gnu.org> <83wn9gw2sp.fsf@gnu.org> <83wn9dp5xp.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11532"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 58168@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Oct 08 09:37:31 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oh4On-0002k9-FD for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 08 Oct 2022 09:37:29 +0200 Original-Received: from localhost ([::1]:44746 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oh4Ol-0002lk-TD for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 08 Oct 2022 03:37:27 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40520) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oh4NP-0001nG-4i for bug-gnu-emacs@gnu.org; Sat, 08 Oct 2022 03:36:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:39045) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oh4NO-0004F4-Rv for bug-gnu-emacs@gnu.org; Sat, 08 Oct 2022 03:36:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oh4NO-0007rf-DH for bug-gnu-emacs@gnu.org; Sat, 08 Oct 2022 03:36:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 08 Oct 2022 07:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 58168 X-GNU-PR-Package: emacs Original-Received: via spool by 58168-submit@debbugs.gnu.org id=B58168.166521451130097 (code B ref 58168); Sat, 08 Oct 2022 07:36:02 +0000 Original-Received: (at 58168) by debbugs.gnu.org; 8 Oct 2022 07:35:11 +0000 Original-Received: from localhost ([127.0.0.1]:38122 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oh4MZ-0007pK-8Z for submit@debbugs.gnu.org; Sat, 08 Oct 2022 03:35:11 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:42412) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oh4MX-0007ow-TN for 58168@debbugs.gnu.org; Sat, 08 Oct 2022 03:35:10 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:42536) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oh4MS-00046L-L1; Sat, 08 Oct 2022 03:35:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=ceI/sIk2BU3vZ/TliBGrRdB6LJ15AbhiyfAN0ruNyQc=; b=YPkkTaISo4Ul/mG+jZa7 PYNa1x0VsglzRrfqWWI8xtuMNF+jPuwBAcArKT8wbF4KO7YwXXPgUuBJNU29S/+RVjkkif8qO6Rm/ yQuuYK+6LtIga7Zhne4/z1fFN4bNvPBIfMc7E/l1IGJ53CS84vzRhJfNeCDiJxbIz/OkwauXJ8HlE vfeP9twE803tzhcuv/OqDA3eo+qiInLZfdDx/HV+m9wNjltRShANHbJTUBQm7L7Gp4Rf6wN+EyESE pTImGBh531EjBHEcNrvOH2TJgW9MkYCBmnXyovP4rGV4XukHKDgTmDvu2kuisGz+iJ9ML9JiMwfQp d2VUPiQc+AsuXg==; Original-Received: from [87.69.77.57] (port=4463 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oh4MS-00022u-4e; Sat, 08 Oct 2022 03:35:04 -0400 In-Reply-To: (message from Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= on Fri, 7 Oct 2022 16:23:26 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:244869 Archived-At: > From: Mattias EngdegÄrd > Date: Fri, 7 Oct 2022 16:23:26 +0200 > Cc: 58168@debbugs.gnu.org > > 6 okt. 2022 kl. 13.06 skrev Eli Zaretskii : > > > Cf. NaN comparisons with numerical values. > > Emacs strings are completely different from floats and NaNs in just about every respect; no meaningful parallels can be drawn. (And do believe me when I say that we should be thankful for that.) I'm totally aware that NaNs and unibyte strings are completely different beasts, believe me. I was just pointing out another widespread case where comparison results are surprising and order is not defined. My point is that it isn't an unimaginable situation. > > You missed me here. Why are you suddenly talking about mismatches? > > And if only mismatches matter here, why is it a problem to use memchr > > in the first place? > > Any lexicographic comparison is a matter of finding the first point of difference, then interpreting the difference at that point. `memchr` does not help with that, nor does `memcmp` unless we are doing a bytewise string comparison. Wed are miscommunicating, because you remove too much of previous context. I suggested to use memchr to find whether a string has any C0 or C1 bytes, _before_ doing the actual comparison, to find out whether a multibyte string includes any raw bytes, which would then require slower comparisons. If there are no C0/C1 bytes, you could use memcmp, which is always faster than hand-made word-wise comparison we have there now. I also suggested to try memmem as yet another possibility -- not sure up front whether it can be faster in cases that matter. > Similar improvements could be made to the comparison between unibyte and non-ASCII multibyte strings. These are less common and not quite as slow; I haven't made up my mind about whether it's worth the trouble. I don't think it's worth the trouble. > In any case, the situation is now better than it was before the bug was opened: string< is faster and the remaining problems have at least been chartered, whether or not an agreement to remedy them can be reached. Let's be happy about this! This is me being happy.