From: Eli Zaretskii <eliz@gnu.org>
To: Lars Ingebrigtsen <larsi@gnus.org>
Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org
Subject: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Mon, 04 Jul 2022 14:31:01 +0300 [thread overview]
Message-ID: <831qv19ksq.fsf@gnu.org> (raw)
In-Reply-To: <87wnct88ui.fsf@gnus.org> (message from Lars Ingebrigtsen on Mon, 04 Jul 2022 12:34:29 +0200)
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org
> Date: Mon, 04 Jul 2022 12:34:29 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > I see that it's actually 6 bytes _including_ the BOM. So I think this
> > is confusing: if we are going to return a string with the BOM, we
> > should not count the BOM as part of the LENGTH bytes. Because if I
> > requested to get characters which fit into N bytes, I should get those
> > N bytes of payload. Or maybe we should have an optional argument to
> > control whether LENGTH includes or excludes the BOM.
>
> It the caller has asked for a max number of bytes in a coding system
> that includes a BOM, then the BOM has to be counted -- otherwise the
> bytes won't fit into whatever field the protocol they're using limits
> the string to.
You obviously have a very specific use case in mind. But there are
others. Moreover, UTF and BOM is a special case, where the prefix is
known in advance. Other encodings, notably from the ISO-2022 family,
are harder because the exact shift-ion sequence is not always easy to
guess.
Which is why I thought a way to control this aspect could be needed.
But we could just document the subtlety and wait for someone to come
up with a practical scenario where it would be needed.
> (And we don't have a -without-signature variant, do we?)
We do: utf-16le and utf-16be.
> > In any case, we should mention this aspect in the doc string, I think.
>
> Yes. But should we have -without-signature variants for utf-16? Then
> the doc string could recommend using that if the caller wants BOM-less
> bytes.
See above.
next prev parent reply other threads:[~2022-07-04 11:31 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <0ed1c9c7-26c1-b801-1910-6d5bb50dec3d.ref@yahoo.de>
2021-05-09 19:14 ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-09 21:38 ` bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-10 14:17 ` Eli Zaretskii
[not found] ` <e250f934-6f7b-bec3-9df4-d2b242599a45@yahoo.de>
2021-05-10 16:13 ` Eli Zaretskii
2021-05-10 16:28 ` Lars Ingebrigtsen
2021-05-10 16:50 ` Andreas Schwab
2021-05-10 17:16 ` Eli Zaretskii
2021-05-10 17:43 ` R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-10 17:51 ` Eli Zaretskii
2021-05-10 18:05 ` Andreas Schwab
2021-05-11 12:04 ` Eli Zaretskii
2021-05-11 20:37 ` Glenn Morris
2021-05-12 13:50 ` Eli Zaretskii
2022-07-02 16:14 ` Lars Ingebrigtsen
2022-07-02 16:37 ` Eli Zaretskii
2022-07-03 11:08 ` Lars Ingebrigtsen
2022-07-03 12:07 ` Lars Ingebrigtsen
2022-07-03 13:00 ` Eli Zaretskii
2022-07-03 13:26 ` Eli Zaretskii
2022-07-03 13:48 ` Andreas Schwab
2022-07-03 13:51 ` Eli Zaretskii
2022-07-04 10:34 ` Lars Ingebrigtsen
2022-07-04 11:31 ` Eli Zaretskii [this message]
2022-07-05 11:08 ` Lars Ingebrigtsen
2022-07-03 13:28 ` Lars Ingebrigtsen
2021-05-10 17:06 ` Eli Zaretskii
2021-05-11 12:53 ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters Lars Ingebrigtsen
2022-07-02 15:59 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=831qv19ksq.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=48324@debbugs.gnu.org \
--cc=larsi@gnus.org \
--cc=rgm@gnu.org \
--cc=schwab@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).