From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#58168: string-lessp glitches and inconsistencies Date: Thu, 6 Oct 2022 11:05:51 +0200 Message-ID: <52286A5C-D947-4279-812E-173BB44046E1@gmail.com> References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@gmail.com> <877d1l55rn.fsf@gnus.org> <469814C2-197A-4BCA-8E2A-245577340C1E@gmail.com> <878rlzj1zv.fsf@gnus.org> <878rlzfylg.fsf@gnus.org> <017DAAA2-0383-4B47-855E-28348B2E9F06@gmail.com> <831qrnx1jc.fsf@gnu.org> <83k05fv9nv.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31920"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 58168@debbugs.gnu.org, larsi@gnus.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Oct 06 11:44:07 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ogNQE-00088j-29 for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 06 Oct 2022 11:44:06 +0200 Original-Received: from localhost ([::1]:56910 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ogNQC-0005sw-U7 for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 06 Oct 2022 05:44:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53734) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ogNIQ-0001rb-Ki for bug-gnu-emacs@gnu.org; Thu, 06 Oct 2022 05:36:04 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:59927) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ogNIQ-0006l1-9W for bug-gnu-emacs@gnu.org; Thu, 06 Oct 2022 05:36:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ogNIQ-0004Nf-3v for bug-gnu-emacs@gnu.org; Thu, 06 Oct 2022 05:36:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 06 Oct 2022 09:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 58168 X-GNU-PR-Package: emacs Original-Received: via spool by 58168-submit@debbugs.gnu.org id=B58168.166504891816775 (code B ref 58168); Thu, 06 Oct 2022 09:36:02 +0000 Original-Received: (at 58168) by debbugs.gnu.org; 6 Oct 2022 09:35:18 +0000 Original-Received: from localhost ([127.0.0.1]:59003 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ogNHh-0004MV-LG for submit@debbugs.gnu.org; Thu, 06 Oct 2022 05:35:17 -0400 Original-Received: from mail-lf1-f49.google.com ([209.85.167.49]:39561) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ogNHg-0004MF-91 for 58168@debbugs.gnu.org; Thu, 06 Oct 2022 05:35:16 -0400 Original-Received: by mail-lf1-f49.google.com with SMTP id b2so1847660lfp.6 for <58168@debbugs.gnu.org>; Thu, 06 Oct 2022 02:35:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject :date; bh=oormB79vgwWh0EdYHb1PLRlPl9Z8b1+KE4Qu/FUWyoI=; b=TttpRzpbNTPKA/jHGJGG8/Gh/3d56prXtKWB/n23G3ab5Dbl5AsKzo16SYiBatTP+X TR2EZ9BtJrmzcEyJsVeaYGhlAmLqixlESkAe4AFRmfnMCq0aWIcfVybZlMC8e74MYhyb 5RwM9PLQyVgHrxAFe0Ri4z+jXI1dAnEFQ+rmYm8m3g4JYmhZLA0mPzWrT6u7vhz5UWqN YiZNBldUt9agxnWAzXlg3/JnSaR/Ew+2wLG95jhJSLcBmxA3VcS8BEDFt8h2hPOHT5a/ +7HOCuaivNg4FTM8lHORgsyK19PtXmbcQ8SXSjNknozzVQf/6COTmzjn3DJTsR3uJyQN idNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:x-gm-message-state :from:to:cc:subject:date; bh=oormB79vgwWh0EdYHb1PLRlPl9Z8b1+KE4Qu/FUWyoI=; b=O/0Z5R8J9rwqOWLhcC9CIpeFWGAOECYX63NRaRIB89ij7eefrUvcWiZQtatLzKgV+3 m8e2J24dqpH+9PkVLGe+tEPrsPf+Ltof9VEb9bHObeqJPCCfN76wFSux8Hg7H7Y3n9pQ 3vBympLuWKyROqORXwc7iOiSTUBpdrp9JyDaeyuwMM6G2Z72LxPMhnghLcT45maeCViM 2sBbbLSqrNyyHB38+N1yle+OsUaC1vwWrTmeru7ZTC9HK0xt73LOk7fZcr2w4J9+ypgz NmNMCFilUDMVl4K9EPUExHNzGUqDhBOe2czCzDX6zBeyCB6IUa+Bhlofq+NqVtG+K7/T cChg== X-Gm-Message-State: ACrzQf1Z/PRIUHubzjfsyzGkYOI1JE+o5NURYIKa6cRsfppTc9twjEAw ElVfijZv7t5g59rLDdf2dFM= X-Google-Smtp-Source: AMsMyM6jeEi2+SVO3ws/prmvHGh2mIG1iTmyFlAZ9I5jIUtDwQYe8THpSs2CgJY1hkY0x9wkziCtWQ== X-Received: by 2002:a05:6512:2086:b0:4a2:3740:762c with SMTP id t6-20020a056512208600b004a23740762cmr1392468lfr.401.1665048910388; Thu, 06 Oct 2022 02:35:10 -0700 (PDT) Original-Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se. [188.150.171.209]) by smtp.gmail.com with ESMTPSA id g28-20020a2e391c000000b0026ddea22596sm1204567lja.37.2022.10.06.02.35.09 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Oct 2022 02:35:09 -0700 (PDT) In-Reply-To: <83k05fv9nv.fsf@gnu.org> X-Mailer: Apple Mail (2.3654.120.0.1.13) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:244623 Archived-At: 4 okt. 2022 kl. 18.24 skrev Eli Zaretskii : >> This treats unibyte format strings as if they were Latin-1 for the = purpose of the error message. >=20 > No, it doesn't. It shows the problematic characters as raw bytes, as > in "%\200" (where \200 is a single character). If you see something > different, please show the recipe. (format-message "%\345" 0) =3D> (error "Invalid format operation %=C3=A5") where the format string is a unibyte string of two bytes, % and 0xFC, = yet the error treats it as the Latin-1 character =C3=A5. In fact, (format-message "%=C3=A5" 0) yields the same error string. >> Not very important, of course, but maybe there should be a = UNIBYTE_TO_CHAR in the alternative branch? >=20 > No, that would show the multibyte codepoint, and will confuse users, > because the result would look very different from the problematic > format spec in this case. Yes, that's probably right. I suppose the right solution is something = like: unsigned char *p =3D (unsigned char *) format - 1; if (multibyte_format) error ("Invalid format operation %%%c", STRING_CHAR = (p)); else error (*p <=3D 127 ? "Invalid format operation %%%c" : "Invalid format operation char = 0x%02x", *p); but perhaps it's a rare error not worth the trouble. (If we don't bother = changing it, a little comment saying that we are aware of the glitch may = be a good idea.) > Who said anything about #x3fffc? The original code had #xfc, the > unibyte code for #x3ffffc. There seems to be a misunderstanding. The original (and current) code = attempts to display char #x3fffc, which is not a raw byte. It's just a = typo for #x3ffffc -- not a big deal. Of course I could have retained the 3fffc under a different label, but = everyone else reading the test would just assume it was a typo of 3ffffc = since 3fffc itself is not very interesting. I replaced it with 10abcd, a = wide Unicode value deliberately chosen to be arbitrary-looking. We could = use another value if you prefer. > I don't see why we shouldn't test both. > In the other problematic hunk you replaced \777774 with \374 -- why? 3fffc in octal is 777774; when changed to 3ffffc it becomes a raw byte, = fc, displayed as \374.