all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Robert Pluim <rpluim@gmail.com>
Cc: 60750@debbugs.gnu.org
Subject: bug#60750: 29.0.60; encode-coding-char fails for utf-8-auto coding system
Date: Thu, 12 Jan 2023 14:32:52 +0200	[thread overview]
Message-ID: <83fscgaq6j.fsf@gnu.org> (raw)
In-Reply-To: <87zgaof7cg.fsf@gmail.com> (message from Robert Pluim on Thu, 12 Jan 2023 10:08:31 +0100)

> From: Robert Pluim <rpluim@gmail.com>
> Date: Thu, 12 Jan 2023 10:08:31 +0100
> 
> 
> src/emacs -Q
> M-x toggle-debug-on-error
> M-: (setq buffer-file-coding-system 'utf-8-auto)
> C-b
> C-u C-x =
> 
> =>
> Debugger entered--Lisp error: (args-out-of-range "))" 3 1)
>   encode-coding-char(41 utf-8-auto ascii)
>   describe-char(189)
>   what-cursor-position((4))
> 
> This is because utf-8-auto has a non-nil :bom property:
> 
> (define-coding-system 'utf-8-auto
>   "UTF-8 (auto-detect signature (BOM))"
>   :coding-type 'utf-8
>   :mnemonic ?U
>   :charset-list '(unicode)
>   :bom '(utf-8-with-signature . utf-8))

Right.  This is a very old bug in encoding with utf-8 family of
encoding which has a :bom property that is a cons cell.  The fix is
simple, but I wonder what will this break out there.  So:

> Iʼm not sure if this needs fixing, but it was surprising, and the
> docstring of `define-coding-system' didnʼt make it clear to me whether
> a BOM should have been produced here or not.

Actually, the doc string is clear:

  If the value is a cons cell, on decoding, check the first two bytes.
  If they are 0xFE 0xFF, use the car part coding system of the value.
  If they are 0xFF 0xFE, use the cdr part coding system of the value.
  Otherwise, treat them as bytes for a normal character.  On encoding,
  produce BOM bytes according to the value of ‘:endian’.

Note the last sentence: it should unconditionally produce the BOM on
encoding.  Which is what we do in your scenario.

> (Iʼm willing to be told that buffer-file-coding-system shouldnʼt be
> 'utf-8-auto, but I never set that explicitly as far as I know 😀)

Who does set utf-8-auto? where did you originally bump into this?
This is an obscure coding-system, and the fix to make it work as
documented will produce an incompatible change in behavior.  So before
I decide whether to make the change and on what branch, I'd like to
know how in the world did you encounter this.

Thanks.





  reply	other threads:[~2023-01-12 12:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12  9:08 bug#60750: 29.0.60; encode-coding-char fails for utf-8-auto coding system Robert Pluim
2023-01-12 12:32 ` Eli Zaretskii [this message]
2023-01-12 13:44   ` Robert Pluim
2023-01-12 14:04     ` Eli Zaretskii
2023-01-12 14:28       ` Robert Pluim
2023-01-12 14:39         ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83fscgaq6j.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=60750@debbugs.gnu.org \
    --cc=rpluim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.