From: Lars Ingebrigtsen <larsi@gnus.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org
Subject: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Sun, 03 Jul 2022 13:08:04 +0200 [thread overview]
Message-ID: <87pmimbgiz.fsf@gnus.org> (raw)
In-Reply-To: <83k08vbhe4.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 02 Jul 2022 19:37:07 +0300")
Eli Zaretskii <eliz@gnu.org> writes:
> The problem is not just with BOM. The problem will happen with any
> coding-system that produces prefix and/or suffix bytes when it encodes
> strings. The FIXME I added mentions ISO-2022 7-bit encodings as
> another example.
>
> And then there are coding-system's with pre-write-conversion, and
> those can produce any additions they like.
>
>> If we had both, then we could strip the BOM from the individual chars,
>> and add one to the front.
>
> AFAIR, what we have now already handles BOM in coding-system's that
> are known to produce a BOM. See encode-coding-char.
Ah, OK, it uses (coding-system-get coding-system :bom) and then
special-cases utf-8 and -16 to remove the BOM.
Hm... I guess the only reliable solution across all coding systems is
(like your comment in the code says) to drop the encode-every-char and
try encoding strings, and then see whether the result is short enough.
That could be done somewhat efficiently using a binary search. I'll
have a go at it...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
next prev parent reply other threads:[~2022-07-03 11:08 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <0ed1c9c7-26c1-b801-1910-6d5bb50dec3d.ref@yahoo.de>
2021-05-09 19:14 ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-09 21:38 ` bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-10 14:17 ` Eli Zaretskii
[not found] ` <e250f934-6f7b-bec3-9df4-d2b242599a45@yahoo.de>
2021-05-10 16:13 ` Eli Zaretskii
2021-05-10 16:28 ` Lars Ingebrigtsen
2021-05-10 16:50 ` Andreas Schwab
2021-05-10 17:16 ` Eli Zaretskii
2021-05-10 17:43 ` R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-10 17:51 ` Eli Zaretskii
2021-05-10 18:05 ` Andreas Schwab
2021-05-11 12:04 ` Eli Zaretskii
2021-05-11 20:37 ` Glenn Morris
2021-05-12 13:50 ` Eli Zaretskii
2022-07-02 16:14 ` Lars Ingebrigtsen
2022-07-02 16:37 ` Eli Zaretskii
2022-07-03 11:08 ` Lars Ingebrigtsen [this message]
2022-07-03 12:07 ` Lars Ingebrigtsen
2022-07-03 13:00 ` Eli Zaretskii
2022-07-03 13:26 ` Eli Zaretskii
2022-07-03 13:48 ` Andreas Schwab
2022-07-03 13:51 ` Eli Zaretskii
2022-07-04 10:34 ` Lars Ingebrigtsen
2022-07-04 11:31 ` Eli Zaretskii
2022-07-05 11:08 ` Lars Ingebrigtsen
2022-07-03 13:28 ` Lars Ingebrigtsen
2021-05-10 17:06 ` Eli Zaretskii
2021-05-11 12:53 ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters Lars Ingebrigtsen
2022-07-02 15:59 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pmimbgiz.fsf@gnus.org \
--to=larsi@gnus.org \
--cc=48324@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=rgm@gnu.org \
--cc=schwab@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).