From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#58168: string-lessp glitches and inconsistencies Date: Mon, 3 Oct 2022 21:48:14 +0200 Message-ID: References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@gmail.com> <83czbef6le.fsf@gnu.org> <6CB805F6-89EE-4D7C-A398-F29698733A42@gmail.com> <83h70oce4k.fsf@gnu.org> <83tu4mais1.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="40388"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 58168@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Oct 03 22:37:50 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ofSCD-000AMy-9q for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 03 Oct 2022 22:37:49 +0200 Original-Received: from localhost ([::1]:53976 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ofSCA-0003X6-DC for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 03 Oct 2022 16:37:46 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:55900) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ofRR1-0002tN-4r for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2022 15:49:09 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:52685) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ofRR0-000604-Pb for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2022 15:49:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ofRR0-0001cQ-LK for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2022 15:49:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 03 Oct 2022 19:49:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 58168 X-GNU-PR-Package: emacs Original-Received: via spool by 58168-submit@debbugs.gnu.org id=B58168.16648265036167 (code B ref 58168); Mon, 03 Oct 2022 19:49:02 +0000 Original-Received: (at 58168) by debbugs.gnu.org; 3 Oct 2022 19:48:23 +0000 Original-Received: from localhost ([127.0.0.1]:51762 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ofRQM-0001bO-Np for submit@debbugs.gnu.org; Mon, 03 Oct 2022 15:48:23 -0400 Original-Received: from mail-lf1-f49.google.com ([209.85.167.49]:35420) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ofRQL-0001b1-HG for 58168@debbugs.gnu.org; Mon, 03 Oct 2022 15:48:22 -0400 Original-Received: by mail-lf1-f49.google.com with SMTP id z4so18113052lft.2 for <58168@debbugs.gnu.org>; Mon, 03 Oct 2022 12:48:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject :date; bh=YS+G3ocWKVPaQrctHWiKh6OCpsDKjffkzwW+NKA2JYw=; b=J80zG99Jq2Bt3a3KhSHRQLdPVwwF011Q5LsHDE4TUO6HkjmMXA4HJBPw8n17Rf7Hom 0971uDqlgWAoeiWJTFreLjm2J6NWpo5CcXOg7UGACZDpdVbuwrdlmT6daN/JJ+IgRVCN HCP1RWA5zA5JyEpYb7NSSChVqOuiluZEv/jnEMHDAhAWScNy5pfeY0CjjD3gxTRI76CD KRMkl9JhOTHlHedM9usAlv3E7WMQgnBtPg//Fucx0Y/n0OZK/RRegzMiVQGbyzwkmgFB iQp2ZFu+8z04Z1F9Xdgcdz6NpddkkLmO+S5ldnnxq1zMfrRC6HjgJrSjACC18/2OTmNb sktA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:x-gm-message-state :from:to:cc:subject:date; bh=YS+G3ocWKVPaQrctHWiKh6OCpsDKjffkzwW+NKA2JYw=; b=h41ThII3hQ5KYHp/kMa8fnsqIsxJUzhfeQuQsrs1kfZv3KQzRWE+xFwbJRfuXZdJWa DlqYu0EQtbpZvm6nMgkPTzfZYAxrO6ChoWsX67TTi1dIB2xIkHwHLSolg14U6bIH0ZxD etKWDEXyahlolygH1w2hOk7fLc/idHDbr4dwo9HWXIP9H7csUQF6bkIZkyc7RZDf/h3n YBokB1gW9JyvMTsKINB6lRm9YlMB0Jzosn3P7iMaNSA8d7RwfkW6QHIzyKQVpf0ZBWP5 HolAk3bnTxotCbtEYqEws/qwoQw/mWAnWzTuB5Xhz2xwN3xSevNQaVZDyr/GE+QTOzaw x2DQ== X-Gm-Message-State: ACrzQf3FlGwL7x2xXcNHSf9xeFgS37sDr0wnilElswBmRbP6+RCHTzQ7 DnFVVraQNyAbfBxEQSZxWHY= X-Google-Smtp-Source: AMsMyM45iRm37xJgile7jzoQryolp11GSgqGU1LnZDCT6DqMOkSn3lws52XnrgoF9kUaqTk0hHxeuQ== X-Received: by 2002:ac2:4a78:0:b0:4a2:2974:c86d with SMTP id q24-20020ac24a78000000b004a22974c86dmr3803458lfp.514.1664826495601; Mon, 03 Oct 2022 12:48:15 -0700 (PDT) Original-Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se. [188.150.171.209]) by smtp.gmail.com with ESMTPSA id o15-20020a05651c050f00b0026de7597bffsm116993ljp.10.2022.10.03.12.48.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Oct 2022 12:48:14 -0700 (PDT) In-Reply-To: <83tu4mais1.fsf@gnu.org> X-Mailer: Apple Mail (2.3654.120.0.1.13) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:244333 Archived-At: 2 okt. 2022 kl. 07.36 skrev Eli Zaretskii : >> Comparison between objects is not only useful when someone cares = about their order, as in presenting a sorted list to the user. Often = what is important is an ability to impose an order, preferably total, = for use in building and searching data structures. I came across this = bug when implementing a string set. >=20 > Always converting to multibyte handles this case, doesn't it? I don't think it does -- string=3D treats raw bytes in unibyte and = multibyte strings as distinct; converting to multibyte does not preserve = (in)equality. >> Actually I was talking about multibyte-multibyte comparisons. >=20 > Then why did you mention raw bytes? their multibyte representation > presents no performance problems In a way they do -- the way raw bytes are represented (they start with = C0 or C1) causes memcmp to sort them between U+007F and U+0080. If we = accept that then comparisons are fast since memcmp will compare many = character per data-dependent branch. The current code requires several = data-dependent branches for each character. While we could probably bring down the comparison cost slightly by = clever hand-coding, it's unlikely to be even nearly as fast as a memcmp = and much messier. Since users are unlikely to care much about the = ordering between raw bytes and something else (as long as there is an = order), it would be a cheap way to improve performance while at the same = time fixing the string< / string=3D mismatch. > You can compare under the assumption that a unibyte string is > pure-ASCII until you bump into the first non-ASCII one. If that > happens, abandon the comparison, convert the unibyte string to its > multibyte representation, and compare again. I don't quite see how that would improve performance but may be missing = something.