all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Coding warning attributes to wrong char
@ 2023-06-17  4:22 Yuchen Pei
  2023-06-17  6:30 ` Eli Zaretskii
  0 siblings, 1 reply; 3+ messages in thread
From: Yuchen Pei @ 2023-06-17  4:22 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1779 bytes --]

Could reprod in 28.2 and 29.0.91:

1. Open the attached text file, or save the following in a file and open
   it (hopefully displayed correctly here in your email client...)
--8<---------------cut here---------------start------------->8---
   The issue is not with ’, but the  (nul, insert with C-q C-@).
--8<---------------cut here---------------end--------------->8---

2. M-x set-buffer-file-coding-system utf-8 <RET>

3. A warning appears, attributing the issue to the ’, the quote (in the
   following I have replaced the chars with literal strings

--8<---------------cut here---------------start------------->8---
These default coding systems were tried to encode the following
problematic characters in the buffer ‘encoding.txt’:
  Coding System           Pos  Codepoint  Char
  utf-8-unix               23  #x3FFFE2   \342
                           24  #x3FFF80   \200
                           25  #x3FFF99   \231

However, each of them encountered characters it couldn’t encode:
  utf-8-unix cannot encode these: \342 \200 \231

Click on a character (or switch to this window by ‘C-x o’
and select the characters by RET) to jump to the place it appears,
where ‘C-u C-x =’ will give information about it.

Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
   to remove or modify the problematic characters,
or specify any other coding system (and risk losing
   the problematic characters).

  raw-text no-conversion
--8<---------------cut here---------------end--------------->8---

Despite the warning, the correct fix is to remove the nul character.

This can be quite misleading, especially when one wants to fix encoding
issues in big text files.


[-- Attachment #2: encoding.txt --]
[-- Type: text/plain, Size: 43 bytes --]

The issue is not with ’, but the \0 (nul).

[-- Attachment #3: Type: text/plain, Size: 131 bytes --]


Best,
Yuchen

-- 
PGP Key: 47F9 D050 1E11 8879 9040  4941 2126 7E93 EF86 DFD0
          <https://ypei.org/assets/ypei-pubkey.txt>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Coding warning attributes to wrong char
  2023-06-17  4:22 Coding warning attributes to wrong char Yuchen Pei
@ 2023-06-17  6:30 ` Eli Zaretskii
  2023-06-17  9:20   ` Yuchen Pei
  0 siblings, 1 reply; 3+ messages in thread
From: Eli Zaretskii @ 2023-06-17  6:30 UTC (permalink / raw)
  To: Yuchen Pei; +Cc: emacs-devel

> From: Yuchen Pei <id@ypei.org>
> Date: Sat, 17 Jun 2023 14:22:18 +1000
> 
> These default coding systems were tried to encode the following
> problematic characters in the buffer ‘encoding.txt’:
>   Coding System           Pos  Codepoint  Char
>   utf-8-unix               23  #x3FFFE2   \342
>                            24  #x3FFF80   \200
>                            25  #x3FFF99   \231
> 
> However, each of them encountered characters it couldn’t encode:
>   utf-8-unix cannot encode these: \342 \200 \231
> 
> Click on a character (or switch to this window by ‘C-x o’
> and select the characters by RET) to jump to the place it appears,
> where ‘C-u C-x =’ will give information about it.
> 
> Select one of the safe coding systems listed below,
> or cancel the writing with C-g and edit the buffer
>    to remove or modify the problematic characters,
> or specify any other coding system (and risk losing
>    the problematic characters).
> 
>   raw-text no-conversion
> --8<---------------cut here---------------end--------------->8---
> 
> Despite the warning, the correct fix is to remove the nul character.
> 
> This can be quite misleading, especially when one wants to fix encoding
> issues in big text files.

What is your proposal for better dealing with this situation?

The basic problem here is that Emacs cannot know whether the null
characters are or aren't supposed to be in the file.  You as the user
do know, presumably because you know where this file came from or what
is its purpose.  But Emacs doesn't know.  It also cannot easily know
that removing the null character would solve all the other problems,
since it examines each such character individually.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Coding warning attributes to wrong char
  2023-06-17  6:30 ` Eli Zaretskii
@ 2023-06-17  9:20   ` Yuchen Pei
  0 siblings, 0 replies; 3+ messages in thread
From: Yuchen Pei @ 2023-06-17  9:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Sat 2023-06-17 09:30:51 +0300, Eli Zaretskii wrote:
> What is your proposal for better dealing with this situation?
>
> The basic problem here is that Emacs cannot know whether the null
> characters are or aren't supposed to be in the file.  You as the user
> do know, presumably because you know where this file came from or what
> is its purpose.

Huh? I don't know. I had to bisect a 10+MB file to find the offending
character.

Best,
Yuchen

-- 
PGP Key: 47F9 D050 1E11 8879 9040  4941 2126 7E93 EF86 DFD0
          <https://ypei.org/assets/ypei-pubkey.txt>



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-06-17  9:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-17  4:22 Coding warning attributes to wrong char Yuchen Pei
2023-06-17  6:30 ` Eli Zaretskii
2023-06-17  9:20   ` Yuchen Pei

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.