all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: larsi@gnus.org
Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org
Subject: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Sun, 03 Jul 2022 16:26:54 +0300	[thread overview]
Message-ID: <83k08u9vj5.fsf@gnu.org> (raw)
In-Reply-To: <83pmim9wqo.fsf@gnu.org> (message from Eli Zaretskii on Sun, 03 Jul 2022 16:00:47 +0300)

> Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org
> Date: Sun, 03 Jul 2022 16:00:47 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > From: Lars Ingebrigtsen <larsi@gnus.org>
> > Cc: rgm@gnu.org,  schwab@linux-m68k.org,  48324@debbugs.gnu.org
> > Date: Sun, 03 Jul 2022 14:07:43 +0200
> > 
> > Lars Ingebrigtsen <larsi@gnus.org> writes:
> > 
> > > Hm...  I guess the only reliable solution across all coding systems is
> > > (like your comment in the code says) to drop the encode-every-char and
> > > try encoding strings, and then see whether the result is short enough.
> > > That could be done somewhat efficiently using a binary search.  I'll
> > > have a go at it...
> > 
> > And while I was at it, I changed it to return complete glyphs, not just
> > complete code points.
> > 
> > There's a behavioural change, though.  This: 
> > 
> > (string-limit "foóá" 6 t 'utf-16)
> > 
> > Now returns a string with a BOM, whereas previously it didn't.
> 
> So you get 6 characters + the BOM?

I see that it's actually 6 bytes _including_ the BOM.  So I think this
is confusing: if we are going to return a string with the BOM, we
should not count the BOM as part of the LENGTH bytes.  Because if I
requested to get characters which fit into N bytes, I should get those
N bytes of payload.  Or maybe we should have an optional argument to
control whether LENGTH includes or excludes the BOM.

In any case, we should mention this aspect in the doc string, I think.





  reply	other threads:[~2022-07-03 13:26 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0ed1c9c7-26c1-b801-1910-6d5bb50dec3d.ref@yahoo.de>
2021-05-09 19:14 ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-09 21:38   ` bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-10 14:17     ` Eli Zaretskii
     [not found]       ` <e250f934-6f7b-bec3-9df4-d2b242599a45@yahoo.de>
2021-05-10 16:13         ` Eli Zaretskii
2021-05-10 16:28           ` Lars Ingebrigtsen
2021-05-10 16:50             ` Andreas Schwab
2021-05-10 17:16               ` Eli Zaretskii
2021-05-10 17:43                 ` R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-10 17:51                   ` Eli Zaretskii
2021-05-10 18:05                     ` Andreas Schwab
2021-05-11 12:04                       ` Eli Zaretskii
2021-05-11 20:37                         ` Glenn Morris
2021-05-12 13:50                           ` Eli Zaretskii
2022-07-02 16:14                             ` Lars Ingebrigtsen
2022-07-02 16:37                               ` Eli Zaretskii
2022-07-03 11:08                                 ` Lars Ingebrigtsen
2022-07-03 12:07                                   ` Lars Ingebrigtsen
2022-07-03 13:00                                     ` Eli Zaretskii
2022-07-03 13:26                                       ` Eli Zaretskii [this message]
2022-07-03 13:48                                         ` Andreas Schwab
2022-07-03 13:51                                           ` Eli Zaretskii
2022-07-04 10:34                                         ` Lars Ingebrigtsen
2022-07-04 11:31                                           ` Eli Zaretskii
2022-07-05 11:08                                             ` Lars Ingebrigtsen
2022-07-03 13:28                                       ` Lars Ingebrigtsen
2021-05-10 17:06             ` Eli Zaretskii
2021-05-11 12:53   ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters Lars Ingebrigtsen
2022-07-02 15:59     ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83k08u9vj5.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=48324@debbugs.gnu.org \
    --cc=larsi@gnus.org \
    --cc=rgm@gnu.org \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.