* 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer @ 2018-11-04 8:44 Zhang Haijun 2018-11-04 9:18 ` Eli Zaretskii 0 siblings, 1 reply; 14+ messages in thread From: Zhang Haijun @ 2018-11-04 8:44 UTC (permalink / raw) To: emacs-devel@gnu.org I have sent a bug report mail to bug-gnu-emacs@gnu.org, but didn't receive the bug number mail. So I send it here. I put the attachment file to: http://119.37.194.6/upload/tmp/emacs-26.txt Following is the bug report mail: Open the attachment text file with "emacs -Q". There are many unrecognized chars(like \342\200\230). Following is the encoding info of the buffer. ------------------------------------------------- = -- no-conversion (alias: binary) Do no conversion. When you visit a file with this coding, the file is read into a unibyte buffer as is, thus each byte of a file is treated as a character. Type: raw-text (text with random binary characters) EOL type: LF -------------------------------------------------- But if I run the command revert-buffer, then there is no unrecognized chars. Encoding info of the buffer becomes: --------------------------------------------------- U -- utf-8-unix (alias: mule-utf-8-unix cp65001-unix) UTF-8 (no signature (BOM)) Type: utf-8 (UTF-8: Emacs internal multibyte form) EOL type: LF This coding system encodes the following charsets: unicode --------------------------------------------------- In GNU Emacs 26.1.50 (build 4, x86_64-pc-linux-gnu, GTK+ Version 3.22.26) of 2018-11-04 built on centos7.home Repository revision: 7cadb328092e354225149bbc74c2ddaf4b49b638 Windowing system distributor 'The X.Org Foundation', version 11.0.11905000 Recent messages: For information about GNU Emacs and the GNU system, type C-h C-a. Quit [3 times] user-error: Beginning of history; no preceding item funcall-interactively: End of buffer Configured using: 'configure --prefix=/home/jun/apps/emacs-26 --without-makeinfo --with-x-toolkit=gtk3 --with-modules' Configured features: XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GSETTINGS GLIB NOTIFY LIBSELINUX GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS Important settings: value of $LANG: en_US.UTF-8 value of $XMODIFIERS: @im=fcitx locale-coding-system: utf-8-unix Major mode: Text Minor modes in effect: diff-auto-refine-mode: t tooltip-mode: t global-eldoc-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Load-path shadows: None found. Features: (shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib dired dired-loaddefs format-spec rfc822 mml mml-sec password-cache epa derived epg epg-config gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils vc-git diff-mode easymenu easy-mmode elec-pair time-date mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote threads dbusbind inotify dynamic-setting system-font-setting font-render-setting move-toolbar gtk x-toolkit x multi-tty make-network-process emacs) Memory information: ((conses 16 98466 14078) (symbols 48 20784 1) (miscs 40 42 154) (strings 32 29892 1511) (string-bytes 1 791061) (vectors 16 14723) (vector-slots 8 510132 7238) (floats 8 51 372) (intervals 56 222 0) (buffers 992 13) (heap 1024 24474 3071)) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-04 8:44 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer Zhang Haijun @ 2018-11-04 9:18 ` Eli Zaretskii [not found] ` <16B3CA28-C893-4854-AD64-1C224C1EDDB2@outlook.com> 0 siblings, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2018-11-04 9:18 UTC (permalink / raw) To: emacs-devel, Zhang Haijun, emacs-devel@gnu.org On November 4, 2018 10:44:36 AM GMT+02:00, Zhang Haijun <ccsmile2008@outlook.com> wrote: > I have sent a bug report mail to bug-gnu-emacs@gnu.org, but didn't > receive the bug number mail. So I send it here. > > I put the attachment file to: > http://119.37.194.6/upload/tmp/emacs-26.txt > > > Following is the bug report mail: > > Open the attachment text file with "emacs -Q". There are many > unrecognized chars(like \342\200\230). Following is the encoding info > of > the buffer. > > ------------------------------------------------- > = -- no-conversion (alias: binary) > > Do no conversion. > > When you visit a file with this coding, the file is read into a > unibyte buffer as is, thus each byte of a file is treated as a > character. > Type: raw-text (text with random binary characters) > EOL type: LF > -------------------------------------------------- > > But if I run the command revert-buffer, then there is no unrecognized > chars. Encoding info of the buffer becomes: > > --------------------------------------------------- > U -- utf-8-unix (alias: mule-utf-8-unix cp65001-unix) > > UTF-8 (no signature (BOM)) > Type: utf-8 (UTF-8: Emacs internal multibyte form) > EOL type: LF > This coding system encodes the following charsets: > unicode > --------------------------------------------------- This file includes null bytes, which by default cause Emacs to disable all decoding, because such files are deemed to be binary files. If you don't like this, set inhibit-null-byte-detection to a non-nil value. This is not a bug, but the intended behavior. ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <16B3CA28-C893-4854-AD64-1C224C1EDDB2@outlook.com>]
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer [not found] ` <16B3CA28-C893-4854-AD64-1C224C1EDDB2@outlook.com> @ 2018-11-04 14:49 ` Eli Zaretskii [not found] ` <B213388B-58E6-4F5B-8CE8-79AC5AD3062B@outlook.com> 0 siblings, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2018-11-04 14:49 UTC (permalink / raw) To: Zhang Haijun; +Cc: emacs-devel > From: Zhang Haijun <ccsmile2008@outlook.com> > CC: "emacs-devel@gnu.org" <emacs-devel@gnu.org> > Date: Sun, 4 Nov 2018 12:28:52 +0000 > > > This file includes null bytes, which by default cause Emacs to disable all decoding, because such files are deemed to be binary files. > > If you don't like this, set inhibit-null-byte-detection to a non-nil value. > > > > This is not a bug, but the intended behavior. > > Then why the encoding of the buffer changed after revert-buffer? It's a subtle bug: revert-buffer reads and decodes the file in small chunks, so by the time it gets to the furst null byte, it already decided that the encoding is UTF-8. By contrast, find-file decodes the entire file at once, so it sees the null bytes when it detects the encoding. We had this behavior since Emacs 23.1; Emacs 22 doesn't change the encoding when this buffer is reverted. ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <B213388B-58E6-4F5B-8CE8-79AC5AD3062B@outlook.com>]
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer [not found] ` <B213388B-58E6-4F5B-8CE8-79AC5AD3062B@outlook.com> @ 2018-11-04 17:13 ` Eli Zaretskii 2018-11-05 8:59 ` Zhang Haijun ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Eli Zaretskii @ 2018-11-04 17:13 UTC (permalink / raw) To: Zhang Haijun; +Cc: emacs-devel > From: Zhang Haijun <ccsmile2008@outlook.com> > CC: "emacs-devel@gnu.org" <emacs-devel@gnu.org> > Date: Sun, 4 Nov 2018 15:14:07 +0000 > > > It's a subtle bug: revert-buffer reads and decodes the file in small > > chunks, so by the time it gets to the furst null byte, it already > > decided that the encoding is UTF-8. By contrast, find-file decodes > > the entire file at once, so it sees the null bytes when it detects the > > encoding. > > > > We had this behavior since Emacs 23.1; Emacs 22 doesn't change the > > encoding when this buffer is reverted. > > OK. Thanks for your explanation. I like the behavior of revert-buffer. > It may be useful to print some warning message when there are invalid bytes. > How to search invalid bytes in buffer? They are not invalid bytes, they are zero bytes. You can search for them like this: C-s C-q C-SPC ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-04 17:13 ` Eli Zaretskii @ 2018-11-05 8:59 ` Zhang Haijun 2018-11-05 9:00 ` Zhang Haijun 2018-11-05 9:00 ` Zhang Haijun 2 siblings, 0 replies; 14+ messages in thread From: Zhang Haijun @ 2018-11-05 8:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel@gnu.org On 11/05/2018 01:13 AM, Eli Zaretskii wrote: > > They are not invalid bytes, they are zero bytes. You can search for > them like this: > > C-s C-q C-SPC > I mean chars like ^@, ^H and \342\200\230. How to search them? Is there a regexp or a function for this? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-04 17:13 ` Eli Zaretskii 2018-11-05 8:59 ` Zhang Haijun @ 2018-11-05 9:00 ` Zhang Haijun 2018-11-05 9:00 ` Zhang Haijun 2 siblings, 0 replies; 14+ messages in thread From: Zhang Haijun @ 2018-11-05 9:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel@gnu.org On 11/05/2018 01:13 AM, Eli Zaretskii wrote: > > They are not invalid bytes, they are zero bytes. You can search for > them like this: > > C-s C-q C-SPC > I mean chars like ^@, ^H and \342\200\230. How to search them? Is there a regexp or a function for this? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-04 17:13 ` Eli Zaretskii 2018-11-05 8:59 ` Zhang Haijun 2018-11-05 9:00 ` Zhang Haijun @ 2018-11-05 9:00 ` Zhang Haijun 2018-11-05 9:39 ` Phil Sainty 2 siblings, 1 reply; 14+ messages in thread From: Zhang Haijun @ 2018-11-05 9:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel@gnu.org On 11/05/2018 01:13 AM, Eli Zaretskii wrote: > > They are not invalid bytes, they are zero bytes. You can search for > them like this: > > C-s C-q C-SPC > I mean chars like ^@, ^H and \342\200\230. How to search them? Is there a regexp or a function for this? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-05 9:00 ` Zhang Haijun @ 2018-11-05 9:39 ` Phil Sainty 2018-11-05 10:10 ` Stephen Berman 2018-11-05 14:08 ` Zhang Haijun 0 siblings, 2 replies; 14+ messages in thread From: Phil Sainty @ 2018-11-05 9:39 UTC (permalink / raw) To: Zhang Haijun; +Cc: Eli Zaretskii, emacs-devel@gnu.org On 5/11/18 10:00 PM, Zhang Haijun wrote: > On 11/05/2018 01:13 AM, Eli Zaretskii wrote: >> They are not invalid bytes, they are zero bytes. You can search for >> them like this: >> >> C-s C-q C-SPC > > I mean chars like ^@, ^H and \342\200\230. How to search them? ^@ is the null char and Eli just showed you how to search for it. Similarly, C-s C-q C-h will search for a ^H char. Assuming \342\200\230 is three octal characters then, I would probably resort to editing the search string and using `insert-char': C-s M-e C-x 8 RET #o342 RET etc... If you can *see* an instance of the character already, you might just move point to that character and use C-s C-w (and maybe a bit of C-M-w if that grabs too many chars). Or if you mean "any non-ascii character" then the regexp [^[:ascii:]] will match those. -Phil ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-05 9:39 ` Phil Sainty @ 2018-11-05 10:10 ` Stephen Berman 2018-11-05 14:08 ` Zhang Haijun 1 sibling, 0 replies; 14+ messages in thread From: Stephen Berman @ 2018-11-05 10:10 UTC (permalink / raw) To: Phil Sainty; +Cc: Eli Zaretskii, emacs-devel@gnu.org, Zhang Haijun On Mon, 5 Nov 2018 22:39:45 +1300 Phil Sainty <psainty@orcon.net.nz> wrote: [...] > If you can *see* an instance of the character already, you might just > move point to that character and use C-s C-w (and maybe a bit of C-M-w > if that grabs too many chars). ^^^^^ [...] This binding has changed (it keeps biting me too), see /etc/NEWS: * Changes in Specialized Modes and Packages in Emacs 27.1 [...] ** Search and Replace [...] *** New isearch bindings. 'C-M-w' in isearch changed from isearch-del-char to the new function isearch-yank-symbol-or-char. isearch-del-char is now bound to 'C-M-d'. Steve Berman ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-05 9:39 ` Phil Sainty 2018-11-05 10:10 ` Stephen Berman @ 2018-11-05 14:08 ` Zhang Haijun 2018-11-05 15:02 ` Stephen Berman 2018-11-05 16:00 ` Eli Zaretskii 1 sibling, 2 replies; 14+ messages in thread From: Zhang Haijun @ 2018-11-05 14:08 UTC (permalink / raw) To: Phil Sainty; +Cc: Eli Zaretskii, emacs-devel@gnu.org On 11/05/2018 05:39 PM, Phil Sainty wrote: > On 5/11/18 10:00 PM, Zhang Haijun wrote: >> On 11/05/2018 01:13 AM, Eli Zaretskii wrote: >>> They are not invalid bytes, they are zero bytes. You can search for >>> them like this: >>> >>> C-s C-q C-SPC >> >> I mean chars like ^@, ^H and \342\200\230. How to search them? > > ^@ is the null char and Eli just showed you how to search for it. > > Similarly, C-s C-q C-h will search for a ^H char. > > Assuming \342\200\230 is three octal characters then, I would probably > resort to editing the search string and using `insert-char': > > C-s M-e > C-x 8 RET #o342 RET > etc... > > If you can *see* an instance of the character already, you might just > move point to that character and use C-s C-w (and maybe a bit of C-M-w > if that grabs too many chars). > > Or if you mean "any non-ascii character" then the regexp [^[:ascii:]] > will match those. > > > -Phil > I don't know the specific char to search. As the orignal problem I met, I opened the text file. Emacs can't decode it and it didn't show any warning message like position of the null byte. Then what should I do to find the null byte(or other bytes which can prevent emacs from decoding)? How to search these unknown bytes? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-05 14:08 ` Zhang Haijun @ 2018-11-05 15:02 ` Stephen Berman 2018-11-05 16:00 ` Eli Zaretskii 1 sibling, 0 replies; 14+ messages in thread From: Stephen Berman @ 2018-11-05 15:02 UTC (permalink / raw) To: Zhang Haijun; +Cc: Phil Sainty, Eli Zaretskii, emacs-devel@gnu.org On Mon, 5 Nov 2018 14:08:46 +0000 Zhang Haijun <ccsmile2008@outlook.com> wrote: > On 11/05/2018 05:39 PM, Phil Sainty wrote: >> On 5/11/18 10:00 PM, Zhang Haijun wrote: >>> On 11/05/2018 01:13 AM, Eli Zaretskii wrote: >>>> They are not invalid bytes, they are zero bytes. You can search for >>>> them like this: >>>> >>>> C-s C-q C-SPC >>> >>> I mean chars like ^@, ^H and \342\200\230. How to search them? >> >> ^@ is the null char and Eli just showed you how to search for it. >> >> Similarly, C-s C-q C-h will search for a ^H char. >> >> Assuming \342\200\230 is three octal characters then, I would probably >> resort to editing the search string and using `insert-char': >> >> C-s M-e >> C-x 8 RET #o342 RET >> etc... >> >> If you can *see* an instance of the character already, you might just >> move point to that character and use C-s C-w (and maybe a bit of C-M-w >> if that grabs too many chars). >> >> Or if you mean "any non-ascii character" then the regexp [^[:ascii:]] >> will match those. >> >> >> -Phil >> > > I don't know the specific char to search. As the orignal problem I met, > I opened the text file. Emacs can't decode it and it didn't show any > warning message like position of the null byte. Then what should I do to > find the null byte(or other bytes which can prevent emacs from decoding)? > > How to search these unknown bytes? All of the above (the ascii control characters ^@ and ^H and octal characters like \342\200\230) are non-printing characters, so you can find them with this regexp isearch: `C-M-s [^[:print:]]'. Steve Berman ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-05 14:08 ` Zhang Haijun 2018-11-05 15:02 ` Stephen Berman @ 2018-11-05 16:00 ` Eli Zaretskii 2018-11-06 1:39 ` Zhang Haijun 1 sibling, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2018-11-05 16:00 UTC (permalink / raw) To: Zhang Haijun; +Cc: psainty, emacs-devel > From: Zhang Haijun <ccsmile2008@outlook.com> > CC: Eli Zaretskii <eliz@gnu.org>, "emacs-devel@gnu.org" <emacs-devel@gnu.org> > Date: Mon, 5 Nov 2018 14:08:46 +0000 > > I don't know the specific char to search. The _only_ character that can disable decoding is the null byte, so you need to search only for null bytes, by typing "C-q C-SPC" at the Isearch prompt. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-05 16:00 ` Eli Zaretskii @ 2018-11-06 1:39 ` Zhang Haijun 2018-11-06 3:31 ` Eli Zaretskii 0 siblings, 1 reply; 14+ messages in thread From: Zhang Haijun @ 2018-11-06 1:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: psainty@orcon.net.nz, emacs-devel@gnu.org On 11/06/2018 12:00 AM, Eli Zaretskii wrote: >> From: Zhang Haijun <ccsmile2008@outlook.com> >> CC: Eli Zaretskii <eliz@gnu.org>, "emacs-devel@gnu.org" <emacs-devel@gnu.org> >> Date: Mon, 5 Nov 2018 14:08:46 +0000 >> >> I don't know the specific char to search. > > The _only_ character that can disable decoding is the null byte, so > you need to search only for null bytes, by typing "C-q C-SPC" at the > Isearch prompt. > OK. Thank you. For Chinese, C-SPC is bound by OS to toggle the system input method. Is "C-q C-SPC" the same as 'C-q C-@'? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer 2018-11-06 1:39 ` Zhang Haijun @ 2018-11-06 3:31 ` Eli Zaretskii 0 siblings, 0 replies; 14+ messages in thread From: Eli Zaretskii @ 2018-11-06 3:31 UTC (permalink / raw) To: Zhang Haijun; +Cc: psainty, emacs-devel > From: Zhang Haijun <ccsmile2008@outlook.com> > CC: "psainty@orcon.net.nz" <psainty@orcon.net.nz>, "emacs-devel@gnu.org" > <emacs-devel@gnu.org> > Date: Tue, 6 Nov 2018 01:39:53 +0000 > > For Chinese, C-SPC is bound by OS to toggle the system input method. Is > "C-q C-SPC" the same as 'C-q C-@'? Yes. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-11-06 3:31 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-11-04 8:44 26.1.50; Emacs can't decode the text file on opening the file, but can decode it on revert-buffer Zhang Haijun 2018-11-04 9:18 ` Eli Zaretskii [not found] ` <16B3CA28-C893-4854-AD64-1C224C1EDDB2@outlook.com> 2018-11-04 14:49 ` Eli Zaretskii [not found] ` <B213388B-58E6-4F5B-8CE8-79AC5AD3062B@outlook.com> 2018-11-04 17:13 ` Eli Zaretskii 2018-11-05 8:59 ` Zhang Haijun 2018-11-05 9:00 ` Zhang Haijun 2018-11-05 9:00 ` Zhang Haijun 2018-11-05 9:39 ` Phil Sainty 2018-11-05 10:10 ` Stephen Berman 2018-11-05 14:08 ` Zhang Haijun 2018-11-05 15:02 ` Stephen Berman 2018-11-05 16:00 ` Eli Zaretskii 2018-11-06 1:39 ` Zhang Haijun 2018-11-06 3:31 ` Eli Zaretskii
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.