From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Vasilij Schneidermann Newsgroups: gmane.emacs.bugs Subject: bug#27270: display-raw-bytes-as-hex generates ambiguous output for Emacs strings Date: Sun, 24 Apr 2022 12:51:58 +0200 Message-ID: References: <29d6844f-2f6f-11c1-7877-a9d169e613f8@cs.ucla.edu> <83tw3s8jhr.fsf@gnu.org> <1c05b888-0c4a-05c8-248a-6e550637fff4@cs.ucla.edu> <8737bbxp6a.fsf@users.sourceforge.net> <2d5a8cd8-0884-bc1e-4298-a84dca61acbf@cs.ucla.edu> <831squ8no8.fsf@gnu.org> <93d9c575-4eb2-ea9e-d998-a8f3cff33a1e@cs.ucla.edu> <83y3t271ar.fsf@gnu.org> <83shja6yoq.fsf@gnu.org> <83r2yt7lad.fsf@gnu.org> <2202b54b-606f-0a10-abf7-5cb1a9164897@cs.ucla.edu> <87k0bfsxvk.fsf@gnus.org> <87sfq293q2.fsf@igel.home> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12078"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Lars Ingebrigtsen , Paul Eggert , 27270@debbugs.gnu.org, npostavs@users.sourceforge.net To: Andreas Schwab Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Apr 24 12:53:46 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1niZs9-000300-4u for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 24 Apr 2022 12:53:45 +0200 Original-Received: from localhost ([::1]:52138 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1niZs7-0001cE-Ku for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 24 Apr 2022 06:53:43 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36812) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1niZrU-0001br-38 for bug-gnu-emacs@gnu.org; Sun, 24 Apr 2022 06:53:04 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:35541) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1niZrS-0006Ks-O7 for bug-gnu-emacs@gnu.org; Sun, 24 Apr 2022 06:53:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1niZrS-0003cu-HK for bug-gnu-emacs@gnu.org; Sun, 24 Apr 2022 06:53:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Vasilij Schneidermann Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 24 Apr 2022 10:53:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 27270 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 27270-submit@debbugs.gnu.org id=B27270.165079754013509 (code B ref 27270); Sun, 24 Apr 2022 10:53:02 +0000 Original-Received: (at 27270) by debbugs.gnu.org; 24 Apr 2022 10:52:20 +0000 Original-Received: from localhost ([127.0.0.1]:57669 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1niZqk-0003Va-FY for submit@debbugs.gnu.org; Sun, 24 Apr 2022 06:52:19 -0400 Original-Received: from mail-lf1-f51.google.com ([209.85.167.51]:43967) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1niZqi-0003UW-0W for 27270@debbugs.gnu.org; Sun, 24 Apr 2022 06:52:16 -0400 Original-Received: by mail-lf1-f51.google.com with SMTP id x17so21565683lfa.10 for <27270@debbugs.gnu.org>; Sun, 24 Apr 2022 03:52:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KtHJEhhETl00M+xtb5Gi7BvzN4Of1iAv+dSXcwWQ/LE=; b=SjHZjt3B+2MPfnnPYuUJ1/ygFrqAKO5X6CoiqfbF8AfOhKqIcDIfdHmW+7XnTmDUJm dBAGUqZZAOAH1rQroUrPpLfwpSYQduotp0coqDJHJMcOJ3TqrxcBnuidSOrJzqsExNDs W21Eur3OiRyJmvM8bQGIIglyTw7MzETHoJza7Tg9UdEmL2/haDUGkAUbAAvN616DUlp6 tdu42zInQ1Xwh484aZkDuuVUQKeAfbUGsqx8cOOs/KErM3+IUMf95kXtBumamc26Kgb9 21ai5ft5Wzo1lzMYw/qbcbpzj/JhIYCVUcmALr16Het5/W6jpzm1Ll4uSvXMr1ZA34fK QXaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KtHJEhhETl00M+xtb5Gi7BvzN4Of1iAv+dSXcwWQ/LE=; b=jZHlfCpfzBqqRAKDX1PZ6fZ7bkJOhaPsbkdjFDVeCobrtCQX8dNlGUzXpplgR/q3OM Vg2Knh0RwYXiMh8Awg0nkk3CGnpUw8pzwzNZu81f9aufnyUgV9MHPEfNA4t7JFK4lWEc XTTDzVOZkqCiWzfxU40ig4JfmjpVrLAQxRq51vVNbim31C6KFoIC6ImsWw9o6Rwve+Qy UI4fFcvXPSOmc+XJ+HZAxL92Rg76ktXCfQySvF/VauM/i7ruGShPhIi4qFemjNsz7YnA kCRTbmQ8HGLT8D+okXjcx0k8UYdVdGeh/vqArz1HHCMLOeS9YLG5RMlMUdhtToSjPZvp NxIA== X-Gm-Message-State: AOAM533O8VuWA9tXfAwIvQIAgbW19SZiclnWE+Q6VAWrhLnkgyNFFd3k L8DAML+eFO9sZn3d2BlQI6Gf8m/Yk2+sHm1iODI= X-Google-Smtp-Source: ABdhPJw0SW81AdqjfkTl9r2eCF1RgCdunzVzqQ7x2ZEnP3yT0S8BiWzGk+BlI7KR+ZmPDqRcEfQ0axYghi6uYdd4usU= X-Received: by 2002:a05:6512:118b:b0:46b:a9ae:3a3b with SMTP id g11-20020a056512118b00b0046ba9ae3a3bmr9588931lfr.188.1650797529868; Sun, 24 Apr 2022 03:52:09 -0700 (PDT) In-Reply-To: <87sfq293q2.fsf@igel.home> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:230541 Archived-At: > You need to use a wide string: > > wslen(L"\x1234") > > > std::string("\x1234").length() // C++: compilation error > > Likewise: > > std::wstring(L"\x1234").length() Thank you for pointing this out. This gives us three camps: - Languages where "\x1234" is always one character (Emacs Lisp) - Languages where "\x1234" is an error, but may become one character when opting into this with wide literals (C, C++) - Languages where "\x1234" is always multiple characters (everything else under the sun) I propose Emacs Lisp to move into camp 3 (not really a point in moving to camp two as it requires new syntax for a hardly used feature). As evident by the bug report, this is a footgun waiting to happen. We already do have syntax in case one truly wants to specify a value greater than #xFF using Unicode names/values. This would require an amendment in `(info "(elisp) General Escape Syntax")`, point 3. Like with oldstyle backquotes, a warning could be emitted if greater hex values are used in a string. I've checked Emacs sources for usage of such hex escapes and only found org-entities.el to represent non-breaking space (nbsp) this way, so breakage should be limited. If there is interest, I could extend the survey to include whether character syntax is/should be affected the same way and/or include more languages.