* 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
@ 2018-11-04 8:44 Zhang Haijun
2018-11-04 9:18 ` Eli Zaretskii
0 siblings, 1 reply; 14+ messages in thread
From: Zhang Haijun @ 2018-11-04 8:44 UTC (permalink / raw)
To: emacs-devel@gnu.org
I have sent a bug report mail to bug-gnu-emacs@gnu.org, but didn't
receive the bug number mail. So I send it here.
I put the attachment file to: http://119.37.194.6/upload/tmp/emacs-26.txt
Following is the bug report mail:
Open the attachment text file with "emacs -Q". There are many
unrecognized chars(like \342\200\230). Following is the encoding info of
the buffer.
-------------------------------------------------
= -- no-conversion (alias: binary)
Do no conversion.
When you visit a file with this coding, the file is read into a
unibyte buffer as is, thus each byte of a file is treated as a
character.
Type: raw-text (text with random binary characters)
EOL type: LF
--------------------------------------------------
But if I run the command revert-buffer, then there is no unrecognized
chars. Encoding info of the buffer becomes:
---------------------------------------------------
U -- utf-8-unix (alias: mule-utf-8-unix cp65001-unix)
UTF-8 (no signature (BOM))
Type: utf-8 (UTF-8: Emacs internal multibyte form)
EOL type: LF
This coding system encodes the following charsets:
unicode
---------------------------------------------------
In GNU Emacs 26.1.50 (build 4, x86_64-pc-linux-gnu, GTK+ Version 3.22.26)
of 2018-11-04 built on centos7.home
Repository revision: 7cadb328092e354225149bbc74c2ddaf4b49b638
Windowing system distributor 'The X.Org Foundation', version 11.0.11905000
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Quit [3 times]
user-error: Beginning of history; no preceding item
funcall-interactively: End of buffer
Configured using:
'configure --prefix=/home/jun/apps/emacs-26 --without-makeinfo
--with-x-toolkit=gtk3 --with-modules'
Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GSETTINGS GLIB NOTIFY
LIBSELINUX GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11
XDBE XIM MODULES THREADS
Important settings:
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: @im=fcitx
locale-coding-system: utf-8-unix
Major mode: Text
Minor modes in effect:
diff-auto-refine-mode: t
tooltip-mode: t
global-eldoc-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv cl-loaddefs cl-lib dired dired-loaddefs
format-spec rfc822 mml mml-sec password-cache epa derived epg epg-config
gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util mail-prsvr mail-utils vc-git diff-mode easymenu
easy-mmode elec-pair time-date mule-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow isearch timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads dbusbind inotify dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)
Memory information:
((conses 16 98466 14078)
(symbols 48 20784 1)
(miscs 40 42 154)
(strings 32 29892 1511)
(string-bytes 1 791061)
(vectors 16 14723)
(vector-slots 8 510132 7238)
(floats 8 51 372)
(intervals 56 222 0)
(buffers 992 13)
(heap 1024 24474 3071))
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-04 8:44 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer Zhang Haijun
@ 2018-11-04 9:18 ` Eli Zaretskii
[not found] ` <16B3CA28-C893-4854-AD64-1C224C1EDDB2@outlook.com>
0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2018-11-04 9:18 UTC (permalink / raw)
To: emacs-devel, Zhang Haijun, emacs-devel@gnu.org
On November 4, 2018 10:44:36 AM GMT+02:00, Zhang Haijun <ccsmile2008@outlook.com> wrote:
> I have sent a bug report mail to bug-gnu-emacs@gnu.org, but didn't
> receive the bug number mail. So I send it here.
>
> I put the attachment file to:
> http://119.37.194.6/upload/tmp/emacs-26.txt
>
>
> Following is the bug report mail:
>
> Open the attachment text file with "emacs -Q". There are many
> unrecognized chars(like \342\200\230). Following is the encoding info
> of
> the buffer.
>
> -------------------------------------------------
> = -- no-conversion (alias: binary)
>
> Do no conversion.
>
> When you visit a file with this coding, the file is read into a
> unibyte buffer as is, thus each byte of a file is treated as a
> character.
> Type: raw-text (text with random binary characters)
> EOL type: LF
> --------------------------------------------------
>
> But if I run the command revert-buffer, then there is no unrecognized
> chars. Encoding info of the buffer becomes:
>
> ---------------------------------------------------
> U -- utf-8-unix (alias: mule-utf-8-unix cp65001-unix)
>
> UTF-8 (no signature (BOM))
> Type: utf-8 (UTF-8: Emacs internal multibyte form)
> EOL type: LF
> This coding system encodes the following charsets:
> unicode
> ---------------------------------------------------
This file includes null bytes, which by default cause Emacs to disable all decoding, because such files are deemed to be binary files.
If you don't like this, set inhibit-null-byte-detection to a non-nil value.
This is not a bug, but the intended behavior.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
[not found] ` <16B3CA28-C893-4854-AD64-1C224C1EDDB2@outlook.com>
@ 2018-11-04 14:49 ` Eli Zaretskii
[not found] ` <B213388B-58E6-4F5B-8CE8-79AC5AD3062B@outlook.com>
0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2018-11-04 14:49 UTC (permalink / raw)
To: Zhang Haijun; +Cc: emacs-devel
> From: Zhang Haijun <ccsmile2008@outlook.com>
> CC: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
> Date: Sun, 4 Nov 2018 12:28:52 +0000
>
> > This file includes null bytes, which by default cause Emacs to disable all decoding, because such files are deemed to be binary files.
> > If you don't like this, set inhibit-null-byte-detection to a non-nil value.
> >
> > This is not a bug, but the intended behavior.
>
> Then why the encoding of the buffer changed after revert-buffer?
It's a subtle bug: revert-buffer reads and decodes the file in small
chunks, so by the time it gets to the furst null byte, it already
decided that the encoding is UTF-8. By contrast, find-file decodes
the entire file at once, so it sees the null bytes when it detects the
encoding.
We had this behavior since Emacs 23.1; Emacs 22 doesn't change the
encoding when this buffer is reverted.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
[not found] ` <B213388B-58E6-4F5B-8CE8-79AC5AD3062B@outlook.com>
@ 2018-11-04 17:13 ` Eli Zaretskii
2018-11-05 8:59 ` Zhang Haijun
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Eli Zaretskii @ 2018-11-04 17:13 UTC (permalink / raw)
To: Zhang Haijun; +Cc: emacs-devel
> From: Zhang Haijun <ccsmile2008@outlook.com>
> CC: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
> Date: Sun, 4 Nov 2018 15:14:07 +0000
>
> > It's a subtle bug: revert-buffer reads and decodes the file in small
> > chunks, so by the time it gets to the furst null byte, it already
> > decided that the encoding is UTF-8. By contrast, find-file decodes
> > the entire file at once, so it sees the null bytes when it detects the
> > encoding.
> >
> > We had this behavior since Emacs 23.1; Emacs 22 doesn't change the
> > encoding when this buffer is reverted.
>
> OK. Thanks for your explanation. I like the behavior of revert-buffer.
> It may be useful to print some warning message when there are invalid bytes.
> How to search invalid bytes in buffer?
They are not invalid bytes, they are zero bytes. You can search for
them like this:
C-s C-q C-SPC
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-04 17:13 ` Eli Zaretskii
@ 2018-11-05 8:59 ` Zhang Haijun
2018-11-05 9:00 ` Zhang Haijun
2018-11-05 9:00 ` Zhang Haijun
2 siblings, 0 replies; 14+ messages in thread
From: Zhang Haijun @ 2018-11-05 8:59 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel@gnu.org
On 11/05/2018 01:13 AM, Eli Zaretskii wrote:
>
> They are not invalid bytes, they are zero bytes. You can search for
> them like this:
>
> C-s C-q C-SPC
>
I mean chars like ^@, ^H and \342\200\230. How to search them? Is there
a regexp or a function for this?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-04 17:13 ` Eli Zaretskii
2018-11-05 8:59 ` Zhang Haijun
@ 2018-11-05 9:00 ` Zhang Haijun
2018-11-05 9:00 ` Zhang Haijun
2 siblings, 0 replies; 14+ messages in thread
From: Zhang Haijun @ 2018-11-05 9:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel@gnu.org
On 11/05/2018 01:13 AM, Eli Zaretskii wrote:
>
> They are not invalid bytes, they are zero bytes. You can search for
> them like this:
>
> C-s C-q C-SPC
>
I mean chars like ^@, ^H and \342\200\230. How to search them? Is there
a regexp or a function for this?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-04 17:13 ` Eli Zaretskii
2018-11-05 8:59 ` Zhang Haijun
2018-11-05 9:00 ` Zhang Haijun
@ 2018-11-05 9:00 ` Zhang Haijun
2018-11-05 9:39 ` Phil Sainty
2 siblings, 1 reply; 14+ messages in thread
From: Zhang Haijun @ 2018-11-05 9:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel@gnu.org
On 11/05/2018 01:13 AM, Eli Zaretskii wrote:
>
> They are not invalid bytes, they are zero bytes. You can search for
> them like this:
>
> C-s C-q C-SPC
>
I mean chars like ^@, ^H and \342\200\230. How to search them? Is there
a regexp or a function for this?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-05 9:00 ` Zhang Haijun
@ 2018-11-05 9:39 ` Phil Sainty
2018-11-05 10:10 ` Stephen Berman
2018-11-05 14:08 ` Zhang Haijun
0 siblings, 2 replies; 14+ messages in thread
From: Phil Sainty @ 2018-11-05 9:39 UTC (permalink / raw)
To: Zhang Haijun; +Cc: Eli Zaretskii, emacs-devel@gnu.org
On 5/11/18 10:00 PM, Zhang Haijun wrote:
> On 11/05/2018 01:13 AM, Eli Zaretskii wrote:
>> They are not invalid bytes, they are zero bytes. You can search for
>> them like this:
>>
>> C-s C-q C-SPC
>
> I mean chars like ^@, ^H and \342\200\230. How to search them?
^@ is the null char and Eli just showed you how to search for it.
Similarly, C-s C-q C-h will search for a ^H char.
Assuming \342\200\230 is three octal characters then, I would probably
resort to editing the search string and using `insert-char':
C-s M-e
C-x 8 RET #o342 RET
etc...
If you can *see* an instance of the character already, you might just
move point to that character and use C-s C-w (and maybe a bit of C-M-w
if that grabs too many chars).
Or if you mean "any non-ascii character" then the regexp [^[:ascii:]]
will match those.
-Phil
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-05 9:39 ` Phil Sainty
@ 2018-11-05 10:10 ` Stephen Berman
2018-11-05 14:08 ` Zhang Haijun
1 sibling, 0 replies; 14+ messages in thread
From: Stephen Berman @ 2018-11-05 10:10 UTC (permalink / raw)
To: Phil Sainty; +Cc: Eli Zaretskii, emacs-devel@gnu.org, Zhang Haijun
On Mon, 5 Nov 2018 22:39:45 +1300 Phil Sainty <psainty@orcon.net.nz> wrote:
[...]
> If you can *see* an instance of the character already, you might just
> move point to that character and use C-s C-w (and maybe a bit of C-M-w
> if that grabs too many chars). ^^^^^
[...]
This binding has changed (it keeps biting me too), see /etc/NEWS:
* Changes in Specialized Modes and Packages in Emacs 27.1
[...]
** Search and Replace
[...]
*** New isearch bindings.
'C-M-w' in isearch changed from isearch-del-char to the new function
isearch-yank-symbol-or-char. isearch-del-char is now bound to
'C-M-d'.
Steve Berman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-05 9:39 ` Phil Sainty
2018-11-05 10:10 ` Stephen Berman
@ 2018-11-05 14:08 ` Zhang Haijun
2018-11-05 15:02 ` Stephen Berman
2018-11-05 16:00 ` Eli Zaretskii
1 sibling, 2 replies; 14+ messages in thread
From: Zhang Haijun @ 2018-11-05 14:08 UTC (permalink / raw)
To: Phil Sainty; +Cc: Eli Zaretskii, emacs-devel@gnu.org
On 11/05/2018 05:39 PM, Phil Sainty wrote:
> On 5/11/18 10:00 PM, Zhang Haijun wrote:
>> On 11/05/2018 01:13 AM, Eli Zaretskii wrote:
>>> They are not invalid bytes, they are zero bytes. You can search for
>>> them like this:
>>>
>>> C-s C-q C-SPC
>>
>> I mean chars like ^@, ^H and \342\200\230. How to search them?
>
> ^@ is the null char and Eli just showed you how to search for it.
>
> Similarly, C-s C-q C-h will search for a ^H char.
>
> Assuming \342\200\230 is three octal characters then, I would probably
> resort to editing the search string and using `insert-char':
>
> C-s M-e
> C-x 8 RET #o342 RET
> etc...
>
> If you can *see* an instance of the character already, you might just
> move point to that character and use C-s C-w (and maybe a bit of C-M-w
> if that grabs too many chars).
>
> Or if you mean "any non-ascii character" then the regexp [^[:ascii:]]
> will match those.
>
>
> -Phil
>
I don't know the specific char to search. As the orignal problem I met,
I opened the text file. Emacs can't decode it and it didn't show any
warning message like position of the null byte. Then what should I do to
find the null byte(or other bytes which can prevent emacs from decoding)?
How to search these unknown bytes?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-05 14:08 ` Zhang Haijun
@ 2018-11-05 15:02 ` Stephen Berman
2018-11-05 16:00 ` Eli Zaretskii
1 sibling, 0 replies; 14+ messages in thread
From: Stephen Berman @ 2018-11-05 15:02 UTC (permalink / raw)
To: Zhang Haijun; +Cc: Phil Sainty, Eli Zaretskii, emacs-devel@gnu.org
On Mon, 5 Nov 2018 14:08:46 +0000 Zhang Haijun <ccsmile2008@outlook.com> wrote:
> On 11/05/2018 05:39 PM, Phil Sainty wrote:
>> On 5/11/18 10:00 PM, Zhang Haijun wrote:
>>> On 11/05/2018 01:13 AM, Eli Zaretskii wrote:
>>>> They are not invalid bytes, they are zero bytes. You can search for
>>>> them like this:
>>>>
>>>> C-s C-q C-SPC
>>>
>>> I mean chars like ^@, ^H and \342\200\230. How to search them?
>>
>> ^@ is the null char and Eli just showed you how to search for it.
>>
>> Similarly, C-s C-q C-h will search for a ^H char.
>>
>> Assuming \342\200\230 is three octal characters then, I would probably
>> resort to editing the search string and using `insert-char':
>>
>> C-s M-e
>> C-x 8 RET #o342 RET
>> etc...
>>
>> If you can *see* an instance of the character already, you might just
>> move point to that character and use C-s C-w (and maybe a bit of C-M-w
>> if that grabs too many chars).
>>
>> Or if you mean "any non-ascii character" then the regexp [^[:ascii:]]
>> will match those.
>>
>>
>> -Phil
>>
>
> I don't know the specific char to search. As the orignal problem I met,
> I opened the text file. Emacs can't decode it and it didn't show any
> warning message like position of the null byte. Then what should I do to
> find the null byte(or other bytes which can prevent emacs from decoding)?
>
> How to search these unknown bytes?
All of the above (the ascii control characters ^@ and ^H and octal
characters like \342\200\230) are non-printing characters, so you can
find them with this regexp isearch: `C-M-s [^[:print:]]'.
Steve Berman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-05 14:08 ` Zhang Haijun
2018-11-05 15:02 ` Stephen Berman
@ 2018-11-05 16:00 ` Eli Zaretskii
2018-11-06 1:39 ` Zhang Haijun
1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2018-11-05 16:00 UTC (permalink / raw)
To: Zhang Haijun; +Cc: psainty, emacs-devel
> From: Zhang Haijun <ccsmile2008@outlook.com>
> CC: Eli Zaretskii <eliz@gnu.org>, "emacs-devel@gnu.org" <emacs-devel@gnu.org>
> Date: Mon, 5 Nov 2018 14:08:46 +0000
>
> I don't know the specific char to search.
The _only_ character that can disable decoding is the null byte, so
you need to search only for null bytes, by typing "C-q C-SPC" at the
Isearch prompt.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-05 16:00 ` Eli Zaretskii
@ 2018-11-06 1:39 ` Zhang Haijun
2018-11-06 3:31 ` Eli Zaretskii
0 siblings, 1 reply; 14+ messages in thread
From: Zhang Haijun @ 2018-11-06 1:39 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: psainty@orcon.net.nz, emacs-devel@gnu.org
On 11/06/2018 12:00 AM, Eli Zaretskii wrote:
>> From: Zhang Haijun <ccsmile2008@outlook.com>
>> CC: Eli Zaretskii <eliz@gnu.org>, "emacs-devel@gnu.org" <emacs-devel@gnu.org>
>> Date: Mon, 5 Nov 2018 14:08:46 +0000
>>
>> I don't know the specific char to search.
>
> The _only_ character that can disable decoding is the null byte, so
> you need to search only for null bytes, by typing "C-q C-SPC" at the
> Isearch prompt.
>
OK. Thank you.
For Chinese, C-SPC is bound by OS to toggle the system input method. Is
"C-q C-SPC" the same as 'C-q C-@'?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer
2018-11-06 1:39 ` Zhang Haijun
@ 2018-11-06 3:31 ` Eli Zaretskii
0 siblings, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2018-11-06 3:31 UTC (permalink / raw)
To: Zhang Haijun; +Cc: psainty, emacs-devel
> From: Zhang Haijun <ccsmile2008@outlook.com>
> CC: "psainty@orcon.net.nz" <psainty@orcon.net.nz>, "emacs-devel@gnu.org"
> <emacs-devel@gnu.org>
> Date: Tue, 6 Nov 2018 01:39:53 +0000
>
> For Chinese, C-SPC is bound by OS to toggle the system input method. Is
> "C-q C-SPC" the same as 'C-q C-@'?
Yes.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-11-06 3:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-04 8:44 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer Zhang Haijun
2018-11-04 9:18 ` Eli Zaretskii
[not found] ` <16B3CA28-C893-4854-AD64-1C224C1EDDB2@outlook.com>
2018-11-04 14:49 ` Eli Zaretskii
[not found] ` <B213388B-58E6-4F5B-8CE8-79AC5AD3062B@outlook.com>
2018-11-04 17:13 ` Eli Zaretskii
2018-11-05 8:59 ` Zhang Haijun
2018-11-05 9:00 ` Zhang Haijun
2018-11-05 9:00 ` Zhang Haijun
2018-11-05 9:39 ` Phil Sainty
2018-11-05 10:10 ` Stephen Berman
2018-11-05 14:08 ` Zhang Haijun
2018-11-05 15:02 ` Stephen Berman
2018-11-05 16:00 ` Eli Zaretskii
2018-11-06 1:39 ` Zhang Haijun
2018-11-06 3:31 ` Eli Zaretskii
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.