all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Vasilij Schneidermann <v.schneidermann@gmail.com>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: Lars Ingebrigtsen <larsi@gnus.org>,
	27270@debbugs.gnu.org, npostavs@users.sourceforge.net
Subject: bug#27270: display-raw-bytes-as-hex generates ambiguous output for Emacs strings
Date: Sun, 24 Apr 2022 11:56:04 +0200	[thread overview]
Message-ID: <CAPGgwWRbGyBTzFQxy1MY_+BfjupPH+ox1B76AuTLBvrVTupBBQ@mail.gmail.com> (raw)
In-Reply-To: <d6acc39e-d6c9-1b69-2583-283e6428b38b@cs.ucla.edu>

> > I tend to think that introducing a new syntax just to fix it
> > isn't worth it.
>
> That's fine, so let's fix the problem as originally suggested. That is,
> display the string returned by (format "%c%c" #x9e #x66) as "\x9e\x66"
> (equivalent to (concat "\x9e" "\x66") which is correct) instead of as
> "\x9ef" (equivalent to "\N{BENGALI DIGIT NINE}" which is wrong).
>
> This fixes the problem and doesn't introduce new syntax.

Wait, hold up. Under which conditions exactly does the bug happen? If I
use GUI Emacs, thanks to font-lock it's pretty obvious that the output
is three bytes, the first one displayed using the hex escape syntax and
the remaining two bytes using hex letters.  If I copy-paste those into
another GUI Emacs, it's still the same three bytes. I don't know about
terminal Emacs, but trying to work around terminals being bad doesn't
seem worth the extra effort.

Besides, suppose it is worth it, what exactly should the logic be here?
Detect if there's a preceding hex escaped byte and if yes, display
adjacent bytes that are formatted using hex characters using escaping,
too? That seems too involved for something run in redisplay.

The other proposed alternative of tightening up read syntax seems
incompatible, but saner to me overall. Emacs Lisp is the odd one out
here anyway. Only C and C++ consider such sequences as potentially
having a greater length than 2 and they error out with a compilation
error for me.

    len("\x1234") # Python, Go: 3

    "\x1234".length # Ruby, JavaScript: 3

    length("\x1234") # Perl: 3

    (string-length "\x1234") ; Guile, Racket, CHICKEN: 3

    ;; Common Lisp absent because it lacks a lot of string escapes and
    ;; using FORMAT neatly sidesteps these issues

    ;; Clojure only has octal/unicode string escapes
    (count (seq "\u12345678")) ; Clojure: 5

    (length "\x1234") ; Emacs Lisp: 1

    strlen("\x1234") /* C: compilation error */

    std::string("\x1234").length() // C++: compilation error

    "\x1234".len() // Rust: 3

Before deciding on such a change, there should be efforts to figure out
whether anything could actually break due to this. That is, code with
long hex escapes in strings, be it manually authored (unlikely, people
either use raw bytes in strings or unicode escapes) or automatically
generated (cannot comment on that, maybe the byte-code compiler emits
such code?). If not, then it would be an obvious candidate for the next
major release of Emacs.

On Sun, Apr 24, 2022 at 9:10 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
>
> On 4/23/22 07:00, Lars Ingebrigtsen wrote:
> > we've had this format for half a decade now, and this doesn't
> > really seem to be a problem in practice
>
> Not surprising, since most people don't set display-raw-bytes-as-hex.
> But that doesn't mean it's not a problem. Quoting bugs can be issues
> even if they're unlikely to occur at random. (Think SQL injection. :-)
>
>
> > I tend to think that introducing a new syntax just to fix it
> > isn't worth it.
>
> That's fine, so let's fix the problem as originally suggested. That is,
> display the string returned by (format "%c%c" #x9e #x66) as "\x9e\x66"
> (equivalent to (concat "\x9e" "\x66") which is correct) instead of as
> "\x9ef" (equivalent to "\N{BENGALI DIGIT NINE}" which is wrong).
>
> This fixes the problem and doesn't introduce new syntax.





  reply	other threads:[~2022-04-24  9:56 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-07  3:57 bug#27270: display-raw-bytes-as-hex generates ambiguous output for Emacs strings Paul Eggert
2017-06-07  5:17 ` Eli Zaretskii
2017-06-08  0:49   ` Paul Eggert
2017-06-08  1:07     ` npostavs
2017-06-08 15:20       ` Eli Zaretskii
2017-06-08 15:56       ` Paul Eggert
2017-06-08 16:11         ` Eli Zaretskii
2017-06-08 16:24           ` Paul Eggert
2017-06-08 18:59             ` Eli Zaretskii
2017-06-08 19:43               ` Paul Eggert
2017-06-08 19:56                 ` Eli Zaretskii
2017-06-08 20:35                   ` Paul Eggert
2017-06-09  6:00                     ` Eli Zaretskii
2017-06-09 23:44                       ` Paul Eggert
2017-06-10  7:24                         ` Eli Zaretskii
2017-06-11  0:04                           ` Paul Eggert
2017-06-11 14:48                             ` Eli Zaretskii
2017-06-11 17:26                               ` Paul Eggert
2017-09-02 13:25                                 ` Eli Zaretskii
2022-04-23 14:00                         ` Lars Ingebrigtsen
2022-04-24  7:10                           ` Paul Eggert
2022-04-24  9:56                             ` Vasilij Schneidermann [this message]
2022-04-24 10:26                               ` Andreas Schwab
2022-04-24 10:51                                 ` Vasilij Schneidermann
2022-04-24 11:01                                   ` Andreas Schwab
2022-04-24 11:29                                     ` Lars Ingebrigtsen
2022-04-24 22:46                               ` Paul Eggert
2022-04-24 11:24                             ` Lars Ingebrigtsen
2022-04-24 22:35                               ` Paul Eggert
2022-04-25  7:40                                 ` Lars Ingebrigtsen
2022-04-25 16:49                                   ` Paul Eggert
2022-04-26 10:06                                     ` Lars Ingebrigtsen
2022-04-26 16:48                                       ` Paul Eggert
2022-04-27 12:13                                         ` Lars Ingebrigtsen
2022-04-27 17:21                                           ` Paul Eggert
2022-04-27 17:22                                             ` Lars Ingebrigtsen
2022-04-28 17:58                                               ` Paul Eggert
2017-06-10 22:52         ` npostavs
2017-06-11  0:10           ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPGgwWRbGyBTzFQxy1MY_+BfjupPH+ox1B76AuTLBvrVTupBBQ@mail.gmail.com \
    --to=v.schneidermann@gmail.com \
    --cc=27270@debbugs.gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=larsi@gnus.org \
    --cc=npostavs@users.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.