* bug#44486: 27.1; C-@ chars corrupt elisp buffer @ 2020-11-06 15:11 Thierry Volpiatto 2020-11-06 15:33 ` Andreas Schwab 0 siblings, 1 reply; 25+ messages in thread From: Thierry Volpiatto @ 2020-11-06 15:11 UTC (permalink / raw) To: 44486 1) emacs -Q 2) M-x find-file test.el 3) insert this in test.el buffer: ;; ààààà (foo "^@") 4) save buffer 5) M-x revert-buffer You should see now the line ;; ààààà corrupted: NOTE: in 3) Write "^@" with C-q C-@. In GNU Emacs 27.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.22.30, cairo version 1.15.10) of 2020-08-31 built on IPadS340 Windowing system distributor 'The X.Org Foundation', version 11.0.12008000 System Description: Linux Mint 19.3 Recent messages: Sending... Sending via mail... Decrypting /home/thierry/.authinfo.gpg...done Sending email Sending email done Saving file /home/thierry/Maildir/Posteo/Sent/cur/1604674326.396ddaa78615bfbe.ipads340:2,S... Wrote /home/thierry/Maildir/Posteo/Sent/cur/1604674326.396ddaa78615bfbe.ipads340:2,S Sending...done [mu4e] Message sent Do you want to exit emacs-w3m? (y or n) y Configured using: 'configure CFLAGS=-O3 --without-dbus --without-gconf --without-gsettings --with-mailutils --with-cairo' Configured features: XPM JPEG TIFF GIF PNG RSVG CAIRO SOUND GPM GLIB NOTIFY INOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE HARFBUZZ M17N_FLT LIBOTF ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS LIBSYSTEMD PDUMPER LCMS2 GMP Important settings: value of $LANG: fr_FR.UTF-8 locale-coding-system: utf-8-unix Major mode: Ilisp Minor modes in effect: global-magit-file-mode: t magit-auto-revert-mode: t global-git-commit-mode: t global-undo-tree-mode: t undo-tree-mode: t global-ligature-mode: t ligature-mode: t psession-mode: t psession-autosave-mode: t psession-savehist-mode: t global-git-gutter-mode: t eldoc-in-minibuffer-mode: t display-time-mode: t winner-mode: t show-paren-mode: t helm-epa-mode: t helm-descbinds-mode: t override-global-mode: t helm-adaptive-mode: t helm-mode: t helm-ff-cache-mode: t shell-dirtrack-mode: t async-bytecomp-package-mode: t dired-async-mode: t minibuffer-depth-indicate-mode: t straight-use-package-mode: t straight-package-neutering-mode: t tooltip-mode: t global-eldoc-mode: t eldoc-mode: t mouse-wheel-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t column-number-mode: t line-number-mode: t auto-fill-function: do-auto-fill transient-mark-mode: t Load-path shadows: None found. Features: (shadow emacsbug w3m-filter w3m-cookie w3m-tabmenu w3m-session w3m-search helm-w3m w3m-bookmark gnutls epa-file network-stream nsm mailalias helm-ring helm-dabbrev autocrypt-message epa-mail helm-firefox magit-extras face-remap magit-bookmark magit-submodule magit-obsolete magit-blame magit-stash magit-reflog magit-bisect magit-push magit-pull magit-fetch magit-clone magit-remote magit-commit magit-sequence magit-notes magit-worktree magit-tag magit-merge magit-branch magit-reset magit-files magit-refs magit-status magit magit-repos magit-apply magit-wip magit-log which-func magit-diff smerge-mode magit-core magit-autorevert autorevert filenotify magit-margin magit-transient magit-process magit-mode git-commit transient magit-git magit-section magit-utils crm log-edit add-log with-editor qp view sort gnus-cite smiley w3m-form w3m-symbol w3m timezone w3m-hist w3m-fb bookmark-w3m w3m-ems w3m-favicon w3m-image tab-line w3m-proc w3m-util mm-archive mail-extr autocrypt-gnus autocrypt-mu4e autocrypt rx addressbook-bookmark mu4e-config org-mu4e gnus-art mm-uu mml2015 mm-view mml-smime smime dig gnus-sum gnus-group gnus-undo gnus-start gnus-cloud nnimap nnmail mail-source utf7 netrc nnoo gnus-spec gnus-int gnus-range gnus-win gnus nnheader mu4e-patch mu4e-contrib eshell esh-cmd esh-ext esh-opt esh-proc esh-io esh-arg esh-module esh-groups esh-util mu4e mu4e-org mu4e-main mu4e-view mu4e-headers mu4e-compose mu4e-context mu4e-draft mu4e-actions ido rfc2368 smtpmail sendmail mu4e-mark mu4e-proc mu4e-utils doc-view image-mode exif mu4e-lists mu4e-message shr svg dom flow-fill hl-line mu4e-vars message rmc puny rfc822 mml mml-sec gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr mailabbrev mail-utils gmm-utils mailheader mu4e-meta helm-x-files helm-for-files helm-bookmark bookmark text-property-search pp helm-command flymake-proc flymake warnings conf-mode sh-script smie executable jka-compr bug-reference naquadah-theme solar cal-dst holidays hol-loaddefs tv-utils undo-tree diff undo-tree-autoloads ligature ligature-autoloads boxquote rect rainbow-mode-autoloads psession wgrep-helm wgrep grep compile wgrep-helm-autoloads wgrep-autoloads log-view pcvs-util pcmpl-git pcmpl-git-autoloads bash-completion-autoloads powerline powerline-separators color powerline-themes powerline-autoloads toc-org-autoloads cl-indent pcase ffap markdown-toc-autoloads markdown-mode-autoloads autocrypt-autoloads config-w3m w3m-autoloads git-gutter git-gutter-autoloads mule-util appt diary-lib diary-loaddefs anaconda-mode xref project pythonic f dash s anaconda-mode-autoloads pythonic-autoloads f-autoloads s-autoloads eldoc-eval emamux-autoloads magit-autoloads git-commit-autoloads with-editor-autoloads transient-autoloads dash-autoloads pcomplete-extension pcmpl-unix pcmpl-gnu iterator iedit-autoloads ledger-mode-autoloads wdired dired-extension org-config ob-gnuplot org-crypt net-utils time winner w3m-wget wget thingatpt wget-sysdep autotest-mode autoconf-mode paren woman man ediff ediff-merg ediff-mult ediff-wind ediff-diff ediff-help ediff-init ediff-util init-helm helm-fd epa derived epg epg-config helm-misc helm-apt helm-imenu imenu helm-elisp-package package url-handlers helm-find helm-org org ob ob-tangle ob-ref ob-lob ob-table ob-exp org-macro org-footnote org-src ob-comint org-pcomplete org-list org-faces org-entities noutline outline org-version ob-emacs-lisp ob-core ob-eval org-table ol org-keys org-compat org-macs org-loaddefs cal-menu calendar cal-loaddefs helm-external helm-net browse-url xml url url-proxy url-privacy url-expand url-methods url-history url-cookie url-domsuf url-util url-parse url-vars mailcap helm-descbinds cus-edit wid-edit helm-ls-git vc-git diff-mode vc vc-dispatcher helm-ipython helm-elisp helm-eval edebug backtrace find-func helm-info python tramp-sh use-package-bind-key bind-key helm-adaptive diminish helm-mode helm-files tramp tramp-loaddefs trampver tramp-integration files-x tramp-compat shell pcomplete comint ansi-color ring parse-time iso8601 time-date ls-lisp auth-source password-cache json map helm-buffers helm-occur helm-tags helm-locate helm-grep helm-regexp format-spec helm-utils helm-help helm-types use-package-diminish helm-extensions-autoloads helm-config helm-autoloads helm easy-mmode async-bytecomp helm-global-bindings helm-easymenu helm-source eieio-compat eieio eieio-core eieio-loaddefs helm-multi-match helm-lib dired-async advice dired-aux dired dired-loaddefs async emms-autoloads cl-seq use-package-core popup-autoloads finder-inf diminish-autoloads mb-depth server edmacro kmacro avoid cus-start cus-load use-package-autoloads bind-key-autoloads straight-autoloads info cl-extra help-mode easymenu seq byte-opt straight subr-x cl-macs gv bytecomp byte-compile cconv cl-loaddefs cl-lib tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote threads inotify lcms2 dynamic-setting font-render-setting cairo move-toolbar gtk x-toolkit x multi-tty make-network-process emacs) Memory information: ((conses 16 599805 404657) (symbols 48 41981 3) (strings 32 167881 57520) (string-bytes 1 6426344) (vectors 16 82748) (vector-slots 8 1666976 230836) (floats 8 1795 3081) (intervals 56 6849 3252) (buffers 1000 130)) -- Thierry ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-06 15:11 bug#44486: 27.1; C-@ chars corrupt elisp buffer Thierry Volpiatto @ 2020-11-06 15:33 ` Andreas Schwab 2020-11-06 15:40 ` Eli Zaretskii 2020-11-06 19:18 ` Thierry Volpiatto 0 siblings, 2 replies; 25+ messages in thread From: Andreas Schwab @ 2020-11-06 15:33 UTC (permalink / raw) To: Thierry Volpiatto; +Cc: 44486 The null byte causes the file to be detected as binary. You can use C-x C-m c to override the detection. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-06 15:33 ` Andreas Schwab @ 2020-11-06 15:40 ` Eli Zaretskii 2020-11-06 16:17 ` Eli Zaretskii 2020-11-06 19:18 ` Thierry Volpiatto 1 sibling, 1 reply; 25+ messages in thread From: Eli Zaretskii @ 2020-11-06 15:40 UTC (permalink / raw) To: Andreas Schwab; +Cc: thievol, 44486 > From: Andreas Schwab <schwab@linux-m68k.org> > Date: Fri, 06 Nov 2020 16:33:04 +0100 > Cc: 44486@debbugs.gnu.org > > The null byte causes the file to be detected as binary. You can use C-x > C-m c to override the detection. Right. Or set inhibit-nul-byte-detection to a non-nil value before reverting ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-06 15:40 ` Eli Zaretskii @ 2020-11-06 16:17 ` Eli Zaretskii 2020-11-06 20:07 ` Eli Zaretskii 0 siblings, 1 reply; 25+ messages in thread From: Eli Zaretskii @ 2020-11-06 16:17 UTC (permalink / raw) To: schwab; +Cc: thievol, 44486 > Date: Fri, 06 Nov 2020 17:40:50 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: thievol@posteo.net, 44486@debbugs.gnu.org > > Or set inhibit-nul-byte-detection to a non-nil value before > reverting Actually, this doesn't seem to work, but it looks like a bug... Btw, reverting while forcing a particular encoding can be invoked with "C-x C-m r". ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-06 16:17 ` Eli Zaretskii @ 2020-11-06 20:07 ` Eli Zaretskii 2020-11-09 15:44 ` Lars Ingebrigtsen 2020-11-14 12:43 ` Eli Zaretskii 0 siblings, 2 replies; 25+ messages in thread From: Eli Zaretskii @ 2020-11-06 20:07 UTC (permalink / raw) To: Kenichi Handa; +Cc: thievol, schwab, 44486 > Date: Fri, 06 Nov 2020 18:17:53 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: thievol@posteo.net, 44486@debbugs.gnu.org > > > Date: Fri, 06 Nov 2020 17:40:50 +0200 > > From: Eli Zaretskii <eliz@gnu.org> > > Cc: thievol@posteo.net, 44486@debbugs.gnu.org > > > > Or set inhibit-nul-byte-detection to a non-nil value before > > reverting > > Actually, this doesn't seem to work, but it looks like a bug... We don't specify that prefer-utf-8, which is used by default for *.el files, should heed this variable. Since prefer-utf-8 is a variant of 'undecided', i.e. it performs detection of encoding, I think this is a bug, because 'undecided' does pay attention to inhibit-null-byte-detection. So I propose the change below (for master). Any objections? diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el index e6e6135..16cd8cf 100644 --- a/lisp/international/mule-conf.el +++ b/lisp/international/mule-conf.el @@ -1251,7 +1251,9 @@ 'prefer-utf-8 :coding-type 'undecided :mnemonic ?- :charset-list '(emacs) - :prefer-utf-8 t) + :prefer-utf-8 t + :inhibit-null-byte-detection 0 + :inhibit-iso-escape-detection 0) (define-coding-system 'raw-text "Raw text, which means text contains random 8-bit codes. ^ permalink raw reply related [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-06 20:07 ` Eli Zaretskii @ 2020-11-09 15:44 ` Lars Ingebrigtsen 2020-11-09 16:14 ` Eli Zaretskii 2020-11-14 12:43 ` Eli Zaretskii 1 sibling, 1 reply; 25+ messages in thread From: Lars Ingebrigtsen @ 2020-11-09 15:44 UTC (permalink / raw) To: Eli Zaretskii; +Cc: thievol, schwab, 44486 Eli Zaretskii <eliz@gnu.org> writes: > So I propose the change below (for master). Any objections? [...] > - :prefer-utf-8 t) > + :prefer-utf-8 t > + :inhibit-null-byte-detection 0 > + :inhibit-iso-escape-detection 0) Makes sense to me, but is there any particular reason to use 0 instead of t here? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-09 15:44 ` Lars Ingebrigtsen @ 2020-11-09 16:14 ` Eli Zaretskii 2020-11-09 16:27 ` Lars Ingebrigtsen 2020-11-14 14:02 ` Stefan Monnier 0 siblings, 2 replies; 25+ messages in thread From: Eli Zaretskii @ 2020-11-09 16:14 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: thievol, schwab, 44486 > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: Kenichi Handa <handa@gnu.org>, thievol@posteo.net, > schwab@linux-m68k.org, 44486@debbugs.gnu.org > Date: Mon, 09 Nov 2020 16:44:00 +0100 > > > - :prefer-utf-8 t) > > + :prefer-utf-8 t > > + :inhibit-null-byte-detection 0 > > + :inhibit-iso-escape-detection 0) > > Makes sense to me, but is there any particular reason to use 0 instead > of t here? 0 is different: it says to obey the value of inhibit-null-byte-detection resp. inhibit-iso-escape-detection. t means inhibit the detection unconditionally, which is not what we want. (We could use any non-nil, non-t value, of course; I've chosen to use zero for consistency with what we do for 'undecided', see coding.c.) ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-09 16:14 ` Eli Zaretskii @ 2020-11-09 16:27 ` Lars Ingebrigtsen 2020-11-09 16:57 ` Eli Zaretskii 2020-11-14 14:02 ` Stefan Monnier 1 sibling, 1 reply; 25+ messages in thread From: Lars Ingebrigtsen @ 2020-11-09 16:27 UTC (permalink / raw) To: Eli Zaretskii; +Cc: thievol, schwab, 44486 Eli Zaretskii <eliz@gnu.org> writes: > 0 is different: it says to obey the value of > inhibit-null-byte-detection resp. inhibit-iso-escape-detection. t > means inhibit the detection unconditionally, which is not what we > want. > > (We could use any non-nil, non-t value, of course; I've chosen to use > zero for consistency with what we do for 'undecided', see coding.c.) I see. Perhaps the difference between the various non-nil values should be mentioned in the doc strings of the two variables? They only mention nil/non-nil now, as far as I can see. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-09 16:27 ` Lars Ingebrigtsen @ 2020-11-09 16:57 ` Eli Zaretskii 2020-11-10 14:29 ` Lars Ingebrigtsen 0 siblings, 1 reply; 25+ messages in thread From: Eli Zaretskii @ 2020-11-09 16:57 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: thievol, schwab, 44486 > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: handa@gnu.org, thievol@posteo.net, schwab@linux-m68k.org, > 44486@debbugs.gnu.org > Date: Mon, 09 Nov 2020 17:27:06 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > 0 is different: it says to obey the value of > > inhibit-null-byte-detection resp. inhibit-iso-escape-detection. t > > means inhibit the detection unconditionally, which is not what we > > want. > > > > (We could use any non-nil, non-t value, of course; I've chosen to use > > zero for consistency with what we do for 'undecided', see coding.c.) > > I see. Perhaps the difference between the various non-nil values should > be mentioned in the doc strings of the two variables? They only mention > nil/non-nil now, as far as I can see. The _variables_ are simple booleans; it's the value of the :inhibit-null-byte-detection _property_ of a coding-system that is a tri-state. And that fact is documented in the doc string of define-coding-system. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-09 16:57 ` Eli Zaretskii @ 2020-11-10 14:29 ` Lars Ingebrigtsen 2020-11-10 16:04 ` Eli Zaretskii 0 siblings, 1 reply; 25+ messages in thread From: Lars Ingebrigtsen @ 2020-11-10 14:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: thievol, schwab, 44486 Eli Zaretskii <eliz@gnu.org> writes: > The _variables_ are simple booleans; it's the value of the > :inhibit-null-byte-detection _property_ of a coding-system that is a > tri-state. And that fact is documented in the doc string of > define-coding-system. Ah; sorry for the noise. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-10 14:29 ` Lars Ingebrigtsen @ 2020-11-10 16:04 ` Eli Zaretskii 0 siblings, 0 replies; 25+ messages in thread From: Eli Zaretskii @ 2020-11-10 16:04 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: thievol, schwab, 44486 > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: handa@gnu.org, thievol@posteo.net, schwab@linux-m68k.org, > 44486@debbugs.gnu.org > Date: Tue, 10 Nov 2020 15:29:27 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > The _variables_ are simple booleans; it's the value of the > > :inhibit-null-byte-detection _property_ of a coding-system that is a > > tri-state. And that fact is documented in the doc string of > > define-coding-system. > > Ah; sorry for the noise. No noise heard here ;-) ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-09 16:14 ` Eli Zaretskii 2020-11-09 16:27 ` Lars Ingebrigtsen @ 2020-11-14 14:02 ` Stefan Monnier 2020-11-14 15:09 ` Eli Zaretskii 1 sibling, 1 reply; 25+ messages in thread From: Stefan Monnier @ 2020-11-14 14:02 UTC (permalink / raw) To: Eli Zaretskii; +Cc: thievol, Lars Ingebrigtsen, schwab, 44486 >> > - :prefer-utf-8 t) >> > + :prefer-utf-8 t >> > + :inhibit-null-byte-detection 0 >> > + :inhibit-iso-escape-detection 0) >> >> Makes sense to me, but is there any particular reason to use 0 instead >> of t here? > > 0 is different: it says to obey the value of > inhibit-null-byte-detection resp. inhibit-iso-escape-detection. t > means inhibit the detection unconditionally, which is not what we > want. Actually, for prefer-utf-8 files, I think we never want to automatically fallback to binary. IOW I think Thierry's situation shows a bug in Emacs rather than a pilot error. Stefan ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-14 14:02 ` Stefan Monnier @ 2020-11-14 15:09 ` Eli Zaretskii 2020-11-14 15:19 ` Stefan Monnier 0 siblings, 1 reply; 25+ messages in thread From: Eli Zaretskii @ 2020-11-14 15:09 UTC (permalink / raw) To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Lars Ingebrigtsen <larsi@gnus.org>, thievol@posteo.net, handa@gnu.org, > schwab@linux-m68k.org, 44486@debbugs.gnu.org > Date: Sat, 14 Nov 2020 09:02:16 -0500 > > > 0 is different: it says to obey the value of > > inhibit-null-byte-detection resp. inhibit-iso-escape-detection. t > > means inhibit the detection unconditionally, which is not what we > > want. > > Actually, for prefer-utf-8 files, I think we never want to automatically > fallback to binary. I think you are assuming prefer-utf-8 is something other than what it is. It is not a variant of UTF-8, it is a variant of 'undecided' (i.e. it starts by detecting the encoding), which prefers UTF-8 if that can decode the text. inhibit-null-byte-detection etc. are relevant to the detection phase, not to the decoding phase. It is wrong IMO to decide to use UTF-8 for a binary byte stream just because it includes valid UTF-8 byte sequences. If the input text is known to be UTF-8, even though it includes null bytes, the user or the application should either bind coding-system-for-read or inhibit-null-byte-detection. > IOW I think Thierry's situation shows a bug in Emacs rather than > a pilot error. I disagree. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-14 15:09 ` Eli Zaretskii @ 2020-11-14 15:19 ` Stefan Monnier 2020-11-14 16:13 ` Eli Zaretskii 0 siblings, 1 reply; 25+ messages in thread From: Stefan Monnier @ 2020-11-14 15:19 UTC (permalink / raw) To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486 >> Actually, for prefer-utf-8 files, I think we never want to automatically >> fallback to binary. > I think you are assuming prefer-utf-8 is something other than what it > is. It is not a variant of UTF-8, it is a variant of 'undecided' > (i.e. it starts by detecting the encoding), which prefers UTF-8 if > that can decode the text. My position is not based on principles but on pragmatic concerns. AFAIK `prefer-utf-8` is only ever used for files which are known to contain text and should almost always contain UTF-8 text. I believe if there's a NUL byte in such a files but it otherwise doesn't contain any invalid UTF-8 byte sequence, it will result in better behavior if we treat it as UFT-8 than as binary. Stefan ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-14 15:19 ` Stefan Monnier @ 2020-11-14 16:13 ` Eli Zaretskii 2020-11-14 17:55 ` Stefan Monnier 0 siblings, 1 reply; 25+ messages in thread From: Eli Zaretskii @ 2020-11-14 16:13 UTC (permalink / raw) To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: larsi@gnus.org, thievol@posteo.net, handa@gnu.org, > schwab@linux-m68k.org, 44486@debbugs.gnu.org > Date: Sat, 14 Nov 2020 10:19:57 -0500 > > >> Actually, for prefer-utf-8 files, I think we never want to automatically > >> fallback to binary. > > I think you are assuming prefer-utf-8 is something other than what it > > is. It is not a variant of UTF-8, it is a variant of 'undecided' > > (i.e. it starts by detecting the encoding), which prefers UTF-8 if > > that can decode the text. > > My position is not based on principles but on pragmatic concerns. > AFAIK `prefer-utf-8` is only ever used for files which are known to > contain text and should almost always contain UTF-8 text. For those, we should use utf-8, not prefer-utf-8. > I believe if there's a NUL byte in such a files but it otherwise doesn't > contain any invalid UTF-8 byte sequence, it will result in better > behavior if we treat it as UFT-8 than as binary. We treat null bytes as the _single_ telltale sign of a binary file. If we disable that in coding-systems that are supposed to _detect_ encoding, we will never be able to detect binary files. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-14 16:13 ` Eli Zaretskii @ 2020-11-14 17:55 ` Stefan Monnier 2020-11-14 18:08 ` Eli Zaretskii 0 siblings, 1 reply; 25+ messages in thread From: Stefan Monnier @ 2020-11-14 17:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486 >> >> Actually, for prefer-utf-8 files, I think we never want to automatically >> >> fallback to binary. >> > I think you are assuming prefer-utf-8 is something other than what it >> > is. It is not a variant of UTF-8, it is a variant of 'undecided' >> > (i.e. it starts by detecting the encoding), which prefers UTF-8 if >> > that can decode the text. >> My position is not based on principles but on pragmatic concerns. >> AFAIK `prefer-utf-8` is only ever used for files which are known to >> contain text and should almost always contain UTF-8 text. > For those, we should use utf-8, not prefer-utf-8. No, `utf-8` should be used when other coding systems should be considered as errors (i.e. not "almost always" but "always"), whereas `prefer-utf-8` is for use when utf-8 is the most likely one and other coding systems should be tried only when there's some evidence that the file actually doesn't use utf-8. `prefer-utf-8` was introduced specifically for `.el` files (and I don't know of any other use of that encoding so far). If `utf-8` is preferable over `prefer-utf-8` for this usage I think the problem is in `prefer-utf-8` since it was introduced specifically for that. >> I believe if there's a NUL byte in such a files but it otherwise doesn't >> contain any invalid UTF-8 byte sequence, it will result in better >> behavior if we treat it as UFT-8 than as binary. > We treat null bytes as the _single_ telltale sign of a binary file. A .el file should *never* be a binary file. > If we disable that in coding-systems that are supposed to _detect_ > encoding, we will never be able to detect binary files. In which scenario would it be beneficial to detect a `.el` file as being binary instead of utf-8? Stefan PS: Especially since NUL bytes can and do occur in ELisp code. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-14 17:55 ` Stefan Monnier @ 2020-11-14 18:08 ` Eli Zaretskii 2020-11-14 18:14 ` Eli Zaretskii 2020-11-14 22:53 ` Stefan Monnier 0 siblings, 2 replies; 25+ messages in thread From: Eli Zaretskii @ 2020-11-14 18:08 UTC (permalink / raw) To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: larsi@gnus.org, thievol@posteo.net, handa@gnu.org, > schwab@linux-m68k.org, 44486@debbugs.gnu.org > Date: Sat, 14 Nov 2020 12:55:51 -0500 > > >> AFAIK `prefer-utf-8` is only ever used for files which are known to > >> contain text and should almost always contain UTF-8 text. > > For those, we should use utf-8, not prefer-utf-8. > > No, `utf-8` should be used when other coding systems should be > considered as errors (i.e. not "almost always" but "always") Why? > whereas `prefer-utf-8` is for use when utf-8 is the most likely one > and other coding systems should be tried only when there's some > evidence that the file actually doesn't use utf-8. > > `prefer-utf-8` was introduced specifically for `.el` files (and I don't > know of any other use of that encoding so far). Maybe that was the history, but the reality is different. prefer-utf-8 is the same as 'undecided' with coding-systems' priorities tampered to prefer UTF-8. > If `utf-8` is preferable over `prefer-utf-8` for this usage I think > the problem is in `prefer-utf-8` since it was introduced > specifically for that. The implementation doesn't support your POV. > >> I believe if there's a NUL byte in such a files but it otherwise doesn't > >> contain any invalid UTF-8 byte sequence, it will result in better > >> behavior if we treat it as UFT-8 than as binary. > > We treat null bytes as the _single_ telltale sign of a binary file. > > A .el file should *never* be a binary file. We are not talking about .el files, we are talking about _any_ file read using prefer-utf-8. For .el files, we can always bind inhibit-null-byte-detection to t when we load or visit such files. > > If we disable that in coding-systems that are supposed to _detect_ > > encoding, we will never be able to detect binary files. > > In which scenario would it be beneficial to detect a `.el` file as being > binary instead of utf-8? I'm not talking about .el files. The coding-system's applicability is wider than that. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-14 18:08 ` Eli Zaretskii @ 2020-11-14 18:14 ` Eli Zaretskii 2020-11-14 22:56 ` Stefan Monnier 2020-11-14 22:53 ` Stefan Monnier 1 sibling, 1 reply; 25+ messages in thread From: Eli Zaretskii @ 2020-11-14 18:14 UTC (permalink / raw) To: monnier; +Cc: thievol, larsi, schwab, 44486 > Date: Sat, 14 Nov 2020 20:08:04 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: thievol@posteo.net, larsi@gnus.org, schwab@linux-m68k.org, > 44486@debbugs.gnu.org > > For .el files, we can always bind inhibit-null-byte-detection to t > when we load or visit such files. Alternatively, we could introduce a separate coding-system whose :inhibit-null-byte-detection property is t, and use that for *.el files. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-14 18:14 ` Eli Zaretskii @ 2020-11-14 22:56 ` Stefan Monnier 2020-11-15 15:14 ` Eli Zaretskii 0 siblings, 1 reply; 25+ messages in thread From: Stefan Monnier @ 2020-11-14 22:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486 >> For .el files, we can always bind inhibit-null-byte-detection to t >> when we load or visit such files. > Alternatively, we could introduce a separate coding-system whose > :inhibit-null-byte-detection property is t, and use that for *.el > files. If you want to go that route, that's fine by me. AFAIK noone else uses `prefer-utf-8`, so it doesn't seem worth the trouble, tho (especially since we don't have any evidence that potential other users would favor the current behavior over the inhibit-null-byte-detection one). Stefan ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-14 22:56 ` Stefan Monnier @ 2020-11-15 15:14 ` Eli Zaretskii 0 siblings, 0 replies; 25+ messages in thread From: Eli Zaretskii @ 2020-11-15 15:14 UTC (permalink / raw) To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: thievol@posteo.net, larsi@gnus.org, schwab@linux-m68k.org, > 44486@debbugs.gnu.org > Date: Sat, 14 Nov 2020 17:56:36 -0500 > > >> For .el files, we can always bind inhibit-null-byte-detection to t > >> when we load or visit such files. > > Alternatively, we could introduce a separate coding-system whose > > :inhibit-null-byte-detection property is t, and use that for *.el > > files. > > If you want to go that route, that's fine by me. I actually think that we don't need to do anything. We've lived for 7 years with a reality that is worse than what is now on master, and no one complained. But if you are very unhappy about this, we _could_ introduce a new coding-system for *.el files. > (especially since we don't have any evidence that potential other > users would favor the current behavior over the > inhibit-null-byte-detection one). The current behavior on master is to heed inhibit-null-byte-detection; the current behavior in Emacs 27 is to ignore it, and always consider a .el file with null bytes as binary. I hope you agree that the behavior on master is slightly better, at least in that it won't surprise users. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-14 18:08 ` Eli Zaretskii 2020-11-14 18:14 ` Eli Zaretskii @ 2020-11-14 22:53 ` Stefan Monnier 2020-11-15 15:08 ` Eli Zaretskii 1 sibling, 1 reply; 25+ messages in thread From: Stefan Monnier @ 2020-11-14 22:53 UTC (permalink / raw) To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486 >> If `utf-8` is preferable over `prefer-utf-8` for this usage I think >> the problem is in `prefer-utf-8` since it was introduced >> specifically for that. > The implementation doesn't support your POV. Then I think the implementation is in error. >> >> I believe if there's a NUL byte in such a files but it otherwise doesn't >> >> contain any invalid UTF-8 byte sequence, it will result in better >> >> behavior if we treat it as UFT-8 than as binary. >> > We treat null bytes as the _single_ telltale sign of a binary file. >> >> A .el file should *never* be a binary file. > > We are not talking about .el files, we are talking about _any_ file > read using prefer-utf-8. `prefer-utf-8` was not introduced because it seemed like a good idea and then we hoped someone would find it useful. It was introduced to solve a concrete need, which is that of `.el` files. It's quite possible that there are other situations that have the same needs as `.el` files, but from where I stand it looks like "the needs of .el files (and similar cases)" should determine the intended behavior of `prefer-utf-8` rather than its current implementation. > For .el files, we can always bind inhibit-null-byte-detection to t > when we load or visit such files. We could, but I'm having trouble imagining a situation where we'd want to use `prefer-utf-8` and not inhibit "NUL means binary". The "NUL mean binarys" heuristic fundamentally says that `binary` is the first coding system we try and only if this one fails (for lack of NUL bytes) we consider others. But for `prefer-utf-8` we should first consider utf-8 and only if this fails should we consider others (potentially including `binary` if you want, my opinion is not as strong there). > I'm not talking about .el files. The coding-system's applicability is > wider than that. Could be. But it's its "raison d'être" (and AFAIK currently still the sole application), so it should handle this case as best it can. Stefan ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-14 22:53 ` Stefan Monnier @ 2020-11-15 15:08 ` Eli Zaretskii 2020-11-15 18:31 ` Stefan Monnier 0 siblings, 1 reply; 25+ messages in thread From: Eli Zaretskii @ 2020-11-15 15:08 UTC (permalink / raw) To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: larsi@gnus.org, thievol@posteo.net, handa@gnu.org, > schwab@linux-m68k.org, 44486@debbugs.gnu.org > Date: Sat, 14 Nov 2020 17:53:57 -0500 > > >> If `utf-8` is preferable over `prefer-utf-8` for this usage I think > >> the problem is in `prefer-utf-8` since it was introduced > >> specifically for that. > > The implementation doesn't support your POV. > > Then I think the implementation is in error. But that ship has sailed 7 years ago. > > We are not talking about .el files, we are talking about _any_ file > > read using prefer-utf-8. > > `prefer-utf-8` was not introduced because it seemed like a good idea and > then we hoped someone would find it useful. It was introduced to solve > a concrete need, which is that of `.el` files. It's quite possible that > there are other situations that have the same needs as `.el` files, but > from where I stand it looks like "the needs of .el files (and similar > cases)" should determine the intended behavior of `prefer-utf-8` rather > than its current implementation. > > > For .el files, we can always bind inhibit-null-byte-detection to t > > when we load or visit such files. > > We could, but I'm having trouble imagining a situation where we'd want > to use `prefer-utf-8` and not inhibit "NUL means binary". > > The "NUL mean binarys" heuristic fundamentally says that `binary` is the > first coding system we try and only if this one fails (for lack of NUL > bytes) we consider others. But for `prefer-utf-8` we should first > consider utf-8 and only if this fails should we consider others > (potentially including `binary` if you want, my opinion is not as strong > there). > > > I'm not talking about .el files. The coding-system's applicability is > > wider than that. > > Could be. But it's its "raison d'être" (and AFAIK currently still the > sole application), so it should handle this case as best it can. We should have been having this discussion 7 years ago. And guess what? we did. In that discussion, you said, in response to a question from Kenichi: > * What to do with null byte detection. Previously, if a > *.el file contains a null byte and > inhibit-null-byte-detection is nil (the default), it's > detected as a binary file. Now utf-8 is forced regardless > of inhibit-null-byte-detection. I like the utf-8 better, but I don't know of any concrete case where it makes a significant difference, so either way is OK. ^^^^^^^^^^^^^^^^ Note that what actually got implemented ignored inhibit-null-byte-detection altogether, and _always_ considered the file binary if any null byte was found. My change, which prompted this present discussion, made prefer-utf-8 heed the variable's value, which is mid-way between what we had for 7 years and what you thought we should have. So, a small step forward ;-) ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-15 15:08 ` Eli Zaretskii @ 2020-11-15 18:31 ` Stefan Monnier 0 siblings, 0 replies; 25+ messages in thread From: Stefan Monnier @ 2020-11-15 18:31 UTC (permalink / raw) To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486 > > * What to do with null byte detection. Previously, if a > > *.el file contains a null byte and > > inhibit-null-byte-detection is nil (the default), it's > > detected as a binary file. Now utf-8 is forced regardless > > of inhibit-null-byte-detection. > > I like the utf-8 better, but I don't know of any concrete case where it > makes a significant difference, so either way is OK. > ^^^^^^^^^^^^^^^^ I'm glad to see that I now know better ;-) > we should have. So, a small step forward ;-) I'll take what I can get ;-) Stefan ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-06 20:07 ` Eli Zaretskii 2020-11-09 15:44 ` Lars Ingebrigtsen @ 2020-11-14 12:43 ` Eli Zaretskii 1 sibling, 0 replies; 25+ messages in thread From: Eli Zaretskii @ 2020-11-14 12:43 UTC (permalink / raw) To: handa; +Cc: thievol, schwab, 44486-done > Date: Fri, 06 Nov 2020 22:07:10 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: thievol@posteo.net, schwab@linux-m68k.org, 44486@debbugs.gnu.org > > We don't specify that prefer-utf-8, which is used by default for *.el > files, should heed this variable. Since prefer-utf-8 is a variant of > 'undecided', i.e. it performs detection of encoding, I think this is a > bug, because 'undecided' does pay attention to > inhibit-null-byte-detection. > > So I propose the change below (for master). Any objections? No objections, so I have now installed this on the master branch, and I'm closing this bug report. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer 2020-11-06 15:33 ` Andreas Schwab 2020-11-06 15:40 ` Eli Zaretskii @ 2020-11-06 19:18 ` Thierry Volpiatto 1 sibling, 0 replies; 25+ messages in thread From: Thierry Volpiatto @ 2020-11-06 19:18 UTC (permalink / raw) To: Andreas Schwab; +Cc: 44486 Andreas Schwab <schwab@linux-m68k.org> writes: > The null byte causes the file to be detected as binary. You can use C-x > C-m c to override the detection. Thanks for explanation, I work around this by using "\0" in my code instead of "^@". -- Thierry ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2020-11-15 18:31 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-11-06 15:11 bug#44486: 27.1; C-@ chars corrupt elisp buffer Thierry Volpiatto 2020-11-06 15:33 ` Andreas Schwab 2020-11-06 15:40 ` Eli Zaretskii 2020-11-06 16:17 ` Eli Zaretskii 2020-11-06 20:07 ` Eli Zaretskii 2020-11-09 15:44 ` Lars Ingebrigtsen 2020-11-09 16:14 ` Eli Zaretskii 2020-11-09 16:27 ` Lars Ingebrigtsen 2020-11-09 16:57 ` Eli Zaretskii 2020-11-10 14:29 ` Lars Ingebrigtsen 2020-11-10 16:04 ` Eli Zaretskii 2020-11-14 14:02 ` Stefan Monnier 2020-11-14 15:09 ` Eli Zaretskii 2020-11-14 15:19 ` Stefan Monnier 2020-11-14 16:13 ` Eli Zaretskii 2020-11-14 17:55 ` Stefan Monnier 2020-11-14 18:08 ` Eli Zaretskii 2020-11-14 18:14 ` Eli Zaretskii 2020-11-14 22:56 ` Stefan Monnier 2020-11-15 15:14 ` Eli Zaretskii 2020-11-14 22:53 ` Stefan Monnier 2020-11-15 15:08 ` Eli Zaretskii 2020-11-15 18:31 ` Stefan Monnier 2020-11-14 12:43 ` Eli Zaretskii 2020-11-06 19:18 ` Thierry Volpiatto
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.