* Coding warning attributes to wrong char
@ 2023-06-17 4:22 Yuchen Pei
2023-06-17 6:30 ` Eli Zaretskii
0 siblings, 1 reply; 3+ messages in thread
From: Yuchen Pei @ 2023-06-17 4:22 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1779 bytes --]
Could reprod in 28.2 and 29.0.91:
1. Open the attached text file, or save the following in a file and open
it (hopefully displayed correctly here in your email client...)
--8<---------------cut here---------------start------------->8---
The issue is not with ’, but the (nul, insert with C-q C-@).
--8<---------------cut here---------------end--------------->8---
2. M-x set-buffer-file-coding-system utf-8 <RET>
3. A warning appears, attributing the issue to the ’, the quote (in the
following I have replaced the chars with literal strings
--8<---------------cut here---------------start------------->8---
These default coding systems were tried to encode the following
problematic characters in the buffer ‘encoding.txt’:
Coding System Pos Codepoint Char
utf-8-unix 23 #x3FFFE2 \342
24 #x3FFF80 \200
25 #x3FFF99 \231
However, each of them encountered characters it couldn’t encode:
utf-8-unix cannot encode these: \342 \200 \231
Click on a character (or switch to this window by ‘C-x o’
and select the characters by RET) to jump to the place it appears,
where ‘C-u C-x =’ will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
raw-text no-conversion
--8<---------------cut here---------------end--------------->8---
Despite the warning, the correct fix is to remove the nul character.
This can be quite misleading, especially when one wants to fix encoding
issues in big text files.
[-- Attachment #2: encoding.txt --]
[-- Type: text/plain, Size: 43 bytes --]
The issue is not with â, but the \0 (nul).
[-- Attachment #3: Type: text/plain, Size: 131 bytes --]
Best,
Yuchen
--
PGP Key: 47F9 D050 1E11 8879 9040 4941 2126 7E93 EF86 DFD0
<https://ypei.org/assets/ypei-pubkey.txt>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Coding warning attributes to wrong char
2023-06-17 4:22 Coding warning attributes to wrong char Yuchen Pei
@ 2023-06-17 6:30 ` Eli Zaretskii
2023-06-17 9:20 ` Yuchen Pei
0 siblings, 1 reply; 3+ messages in thread
From: Eli Zaretskii @ 2023-06-17 6:30 UTC (permalink / raw)
To: Yuchen Pei; +Cc: emacs-devel
> From: Yuchen Pei <id@ypei.org>
> Date: Sat, 17 Jun 2023 14:22:18 +1000
>
> These default coding systems were tried to encode the following
> problematic characters in the buffer ‘encoding.txt’:
> Coding System Pos Codepoint Char
> utf-8-unix 23 #x3FFFE2 \342
> 24 #x3FFF80 \200
> 25 #x3FFF99 \231
>
> However, each of them encountered characters it couldn’t encode:
> utf-8-unix cannot encode these: \342 \200 \231
>
> Click on a character (or switch to this window by ‘C-x o’
> and select the characters by RET) to jump to the place it appears,
> where ‘C-u C-x =’ will give information about it.
>
> Select one of the safe coding systems listed below,
> or cancel the writing with C-g and edit the buffer
> to remove or modify the problematic characters,
> or specify any other coding system (and risk losing
> the problematic characters).
>
> raw-text no-conversion
> --8<---------------cut here---------------end--------------->8---
>
> Despite the warning, the correct fix is to remove the nul character.
>
> This can be quite misleading, especially when one wants to fix encoding
> issues in big text files.
What is your proposal for better dealing with this situation?
The basic problem here is that Emacs cannot know whether the null
characters are or aren't supposed to be in the file. You as the user
do know, presumably because you know where this file came from or what
is its purpose. But Emacs doesn't know. It also cannot easily know
that removing the null character would solve all the other problems,
since it examines each such character individually.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Coding warning attributes to wrong char
2023-06-17 6:30 ` Eli Zaretskii
@ 2023-06-17 9:20 ` Yuchen Pei
0 siblings, 0 replies; 3+ messages in thread
From: Yuchen Pei @ 2023-06-17 9:20 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On Sat 2023-06-17 09:30:51 +0300, Eli Zaretskii wrote:
> What is your proposal for better dealing with this situation?
>
> The basic problem here is that Emacs cannot know whether the null
> characters are or aren't supposed to be in the file. You as the user
> do know, presumably because you know where this file came from or what
> is its purpose.
Huh? I don't know. I had to bisect a 10+MB file to find the offending
character.
Best,
Yuchen
--
PGP Key: 47F9 D050 1E11 8879 9040 4941 2126 7E93 EF86 DFD0
<https://ypei.org/assets/ypei-pubkey.txt>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-06-17 9:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-17 4:22 Coding warning attributes to wrong char Yuchen Pei
2023-06-17 6:30 ` Eli Zaretskii
2023-06-17 9:20 ` Yuchen Pei
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.