bug#44486: 27.1; C-@ chars corrupt elisp buffer

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
@ 2020-11-06 15:11 Thierry Volpiatto
  2020-11-06 15:33 ` Andreas Schwab
  0 siblings, 1 reply; 25+ messages in thread
From: Thierry Volpiatto @ 2020-11-06 15:11 UTC (permalink / raw)
  To: 44486


1) emacs -Q
2) M-x find-file test.el
3) insert this in test.el buffer:
;; ààààà
(foo "^@")
4) save buffer
5) M-x revert-buffer

You should see now the line ;; ààààà corrupted:

NOTE: in 3) Write "^@" with C-q C-@.




In GNU Emacs 27.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.22.30, cairo version 1.15.10)
 of 2020-08-31 built on IPadS340
Windowing system distributor 'The X.Org Foundation', version 11.0.12008000
System Description: Linux Mint 19.3

Recent messages:
Sending...
Sending via mail...
Decrypting /home/thierry/.authinfo.gpg...done
Sending email  
Sending email done
Saving file /home/thierry/Maildir/Posteo/Sent/cur/1604674326.396ddaa78615bfbe.ipads340:2,S...
Wrote /home/thierry/Maildir/Posteo/Sent/cur/1604674326.396ddaa78615bfbe.ipads340:2,S
Sending...done
[mu4e] Message sent
Do you want to exit emacs-w3m? (y or n) y

Configured using:
 'configure CFLAGS=-O3 --without-dbus --without-gconf
 --without-gsettings --with-mailutils --with-cairo'

Configured features:
XPM JPEG TIFF GIF PNG RSVG CAIRO SOUND GPM GLIB NOTIFY INOTIFY ACL
LIBSELINUX GNUTLS LIBXML2 FREETYPE HARFBUZZ M17N_FLT LIBOTF ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS LIBSYSTEMD PDUMPER
LCMS2 GMP

Important settings:
  value of $LANG: fr_FR.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Ilisp

Minor modes in effect:
  global-magit-file-mode: t
  magit-auto-revert-mode: t
  global-git-commit-mode: t
  global-undo-tree-mode: t
  undo-tree-mode: t
  global-ligature-mode: t
  ligature-mode: t
  psession-mode: t
  psession-autosave-mode: t
  psession-savehist-mode: t
  global-git-gutter-mode: t
  eldoc-in-minibuffer-mode: t
  display-time-mode: t
  winner-mode: t
  show-paren-mode: t
  helm-epa-mode: t
  helm-descbinds-mode: t
  override-global-mode: t
  helm-adaptive-mode: t
  helm-mode: t
  helm-ff-cache-mode: t
  shell-dirtrack-mode: t
  async-bytecomp-package-mode: t
  dired-async-mode: t
  minibuffer-depth-indicate-mode: t
  straight-use-package-mode: t
  straight-package-neutering-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  auto-fill-function: do-auto-fill
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow emacsbug w3m-filter w3m-cookie w3m-tabmenu w3m-session
w3m-search helm-w3m w3m-bookmark gnutls epa-file network-stream nsm
mailalias helm-ring helm-dabbrev autocrypt-message epa-mail helm-firefox
magit-extras face-remap magit-bookmark magit-submodule magit-obsolete
magit-blame magit-stash magit-reflog magit-bisect magit-push magit-pull
magit-fetch magit-clone magit-remote magit-commit magit-sequence
magit-notes magit-worktree magit-tag magit-merge magit-branch
magit-reset magit-files magit-refs magit-status magit magit-repos
magit-apply magit-wip magit-log which-func magit-diff smerge-mode
magit-core magit-autorevert autorevert filenotify magit-margin
magit-transient magit-process magit-mode git-commit transient magit-git
magit-section magit-utils crm log-edit add-log with-editor qp view sort
gnus-cite smiley w3m-form w3m-symbol w3m timezone w3m-hist w3m-fb
bookmark-w3m w3m-ems w3m-favicon w3m-image tab-line w3m-proc w3m-util
mm-archive mail-extr autocrypt-gnus autocrypt-mu4e autocrypt rx
addressbook-bookmark mu4e-config org-mu4e gnus-art mm-uu mml2015 mm-view
mml-smime smime dig gnus-sum gnus-group gnus-undo gnus-start gnus-cloud
nnimap nnmail mail-source utf7 netrc nnoo gnus-spec gnus-int gnus-range
gnus-win gnus nnheader mu4e-patch mu4e-contrib eshell esh-cmd esh-ext
esh-opt esh-proc esh-io esh-arg esh-module esh-groups esh-util mu4e
mu4e-org mu4e-main mu4e-view mu4e-headers mu4e-compose mu4e-context
mu4e-draft mu4e-actions ido rfc2368 smtpmail sendmail mu4e-mark
mu4e-proc mu4e-utils doc-view image-mode exif mu4e-lists mu4e-message
shr svg dom flow-fill hl-line mu4e-vars message rmc puny rfc822 mml
mml-sec gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode
mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr
mailabbrev mail-utils gmm-utils mailheader mu4e-meta helm-x-files
helm-for-files helm-bookmark bookmark text-property-search pp
helm-command flymake-proc flymake warnings conf-mode sh-script smie
executable jka-compr bug-reference naquadah-theme solar cal-dst holidays
hol-loaddefs tv-utils undo-tree diff undo-tree-autoloads ligature
ligature-autoloads boxquote rect rainbow-mode-autoloads psession
wgrep-helm wgrep grep compile wgrep-helm-autoloads wgrep-autoloads
log-view pcvs-util pcmpl-git pcmpl-git-autoloads
bash-completion-autoloads powerline powerline-separators color
powerline-themes powerline-autoloads toc-org-autoloads cl-indent pcase
ffap markdown-toc-autoloads markdown-mode-autoloads autocrypt-autoloads
config-w3m w3m-autoloads git-gutter git-gutter-autoloads mule-util appt
diary-lib diary-loaddefs anaconda-mode xref project pythonic f dash s
anaconda-mode-autoloads pythonic-autoloads f-autoloads s-autoloads
eldoc-eval emamux-autoloads magit-autoloads git-commit-autoloads
with-editor-autoloads transient-autoloads dash-autoloads
pcomplete-extension pcmpl-unix pcmpl-gnu iterator iedit-autoloads
ledger-mode-autoloads wdired dired-extension org-config ob-gnuplot
org-crypt net-utils time winner w3m-wget wget thingatpt wget-sysdep
autotest-mode autoconf-mode paren woman man ediff ediff-merg ediff-mult
ediff-wind ediff-diff ediff-help ediff-init ediff-util init-helm helm-fd
epa derived epg epg-config helm-misc helm-apt helm-imenu imenu
helm-elisp-package package url-handlers helm-find helm-org org ob
ob-tangle ob-ref ob-lob ob-table ob-exp org-macro org-footnote org-src
ob-comint org-pcomplete org-list org-faces org-entities noutline outline
org-version ob-emacs-lisp ob-core ob-eval org-table ol org-keys
org-compat org-macs org-loaddefs cal-menu calendar cal-loaddefs
helm-external helm-net browse-url xml url url-proxy url-privacy
url-expand url-methods url-history url-cookie url-domsuf url-util
url-parse url-vars mailcap helm-descbinds cus-edit wid-edit helm-ls-git
vc-git diff-mode vc vc-dispatcher helm-ipython helm-elisp helm-eval
edebug backtrace find-func helm-info python tramp-sh
use-package-bind-key bind-key helm-adaptive diminish helm-mode
helm-files tramp tramp-loaddefs trampver tramp-integration files-x
tramp-compat shell pcomplete comint ansi-color ring parse-time iso8601
time-date ls-lisp auth-source password-cache json map helm-buffers
helm-occur helm-tags helm-locate helm-grep helm-regexp format-spec
helm-utils helm-help helm-types use-package-diminish
helm-extensions-autoloads helm-config helm-autoloads helm easy-mmode
async-bytecomp helm-global-bindings helm-easymenu helm-source
eieio-compat eieio eieio-core eieio-loaddefs helm-multi-match helm-lib
dired-async advice dired-aux dired dired-loaddefs async emms-autoloads
cl-seq use-package-core popup-autoloads finder-inf diminish-autoloads
mb-depth server edmacro kmacro avoid cus-start cus-load
use-package-autoloads bind-key-autoloads straight-autoloads info
cl-extra help-mode easymenu seq byte-opt straight subr-x cl-macs gv
bytecomp byte-compile cconv cl-loaddefs cl-lib tooltip eldoc electric
uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads inotify lcms2 dynamic-setting font-render-setting
cairo move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 599805 404657)
 (symbols 48 41981 3)
 (strings 32 167881 57520)
 (string-bytes 1 6426344)
 (vectors 16 82748)
 (vector-slots 8 1666976 230836)
 (floats 8 1795 3081)
 (intervals 56 6849 3252)
 (buffers 1000 130))

-- 
Thierry





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-06 15:11 bug#44486: 27.1; C-@ chars corrupt elisp buffer Thierry Volpiatto
@ 2020-11-06 15:33 ` Andreas Schwab
  2020-11-06 15:40   ` Eli Zaretskii
  2020-11-06 19:18   ` Thierry Volpiatto
  0 siblings, 2 replies; 25+ messages in thread
From: Andreas Schwab @ 2020-11-06 15:33 UTC (permalink / raw)
  To: Thierry Volpiatto; +Cc: 44486

The null byte causes the file to be detected as binary.  You can use C-x
C-m c to override the detection.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-06 15:33 ` Andreas Schwab
@ 2020-11-06 15:40   ` Eli Zaretskii
  2020-11-06 16:17     ` Eli Zaretskii
  2020-11-06 19:18   ` Thierry Volpiatto
  1 sibling, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-06 15:40 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: thievol, 44486

> From: Andreas Schwab <schwab@linux-m68k.org>
> Date: Fri, 06 Nov 2020 16:33:04 +0100
> Cc: 44486@debbugs.gnu.org
> 
> The null byte causes the file to be detected as binary.  You can use C-x
> C-m c to override the detection.

Right.  Or set inhibit-nul-byte-detection to a non-nil value before
reverting





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-06 15:40   ` Eli Zaretskii
@ 2020-11-06 16:17     ` Eli Zaretskii
  2020-11-06 20:07       ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-06 16:17 UTC (permalink / raw)
  To: schwab; +Cc: thievol, 44486

> Date: Fri, 06 Nov 2020 17:40:50 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: thievol@posteo.net, 44486@debbugs.gnu.org
> 
> Or set inhibit-nul-byte-detection to a non-nil value before
> reverting

Actually, this doesn't seem to work, but it looks like a bug...

Btw, reverting while forcing a particular encoding can be invoked with
"C-x C-m r".





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-06 16:17     ` Eli Zaretskii
@ 2020-11-06 20:07       ` Eli Zaretskii
  2020-11-09 15:44         ` Lars Ingebrigtsen
  2020-11-14 12:43         ` Eli Zaretskii
  0 siblings, 2 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-06 20:07 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: thievol, schwab, 44486

> Date: Fri, 06 Nov 2020 18:17:53 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: thievol@posteo.net, 44486@debbugs.gnu.org
> 
> > Date: Fri, 06 Nov 2020 17:40:50 +0200
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: thievol@posteo.net, 44486@debbugs.gnu.org
> > 
> > Or set inhibit-nul-byte-detection to a non-nil value before
> > reverting
> 
> Actually, this doesn't seem to work, but it looks like a bug...

We don't specify that prefer-utf-8, which is used by default for *.el
files, should heed this variable.  Since prefer-utf-8 is a variant of
'undecided', i.e. it performs detection of encoding, I think this is a
bug, because 'undecided' does pay attention to
inhibit-null-byte-detection.

So I propose the change below (for master).  Any objections?

diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
index e6e6135..16cd8cf 100644
--- a/lisp/international/mule-conf.el
+++ b/lisp/international/mule-conf.el
@@ -1251,7 +1251,9 @@ 'prefer-utf-8
   :coding-type 'undecided
   :mnemonic ?-
   :charset-list '(emacs)
-  :prefer-utf-8 t)
+  :prefer-utf-8 t
+  :inhibit-null-byte-detection 0
+  :inhibit-iso-escape-detection 0)
 
 (define-coding-system 'raw-text
   "Raw text, which means text contains random 8-bit codes.





^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-06 20:07       ` Eli Zaretskii
@ 2020-11-09 15:44         ` Lars Ingebrigtsen
  2020-11-09 16:14           ` Eli Zaretskii
  2020-11-14 12:43         ` Eli Zaretskii
  1 sibling, 1 reply; 25+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-09 15:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: thievol, schwab, 44486

Eli Zaretskii <eliz@gnu.org> writes:

> So I propose the change below (for master).  Any objections?

[...]

> -  :prefer-utf-8 t)
> +  :prefer-utf-8 t
> +  :inhibit-null-byte-detection 0
> +  :inhibit-iso-escape-detection 0)

Makes sense to me, but is there any particular reason to use 0 instead
of t here?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-09 15:44         ` Lars Ingebrigtsen
@ 2020-11-09 16:14           ` Eli Zaretskii
  2020-11-09 16:27             ` Lars Ingebrigtsen
  2020-11-14 14:02             ` Stefan Monnier
  0 siblings, 2 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-09 16:14 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: thievol, schwab, 44486

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Kenichi Handa <handa@gnu.org>,  thievol@posteo.net,
>   schwab@linux-m68k.org,  44486@debbugs.gnu.org
> Date: Mon, 09 Nov 2020 16:44:00 +0100
> 
> > -  :prefer-utf-8 t)
> > +  :prefer-utf-8 t
> > +  :inhibit-null-byte-detection 0
> > +  :inhibit-iso-escape-detection 0)
> 
> Makes sense to me, but is there any particular reason to use 0 instead
> of t here?

0 is different: it says to obey the value of
inhibit-null-byte-detection resp. inhibit-iso-escape-detection.  t
means inhibit the detection unconditionally, which is not what we
want.

(We could use any non-nil, non-t value, of course; I've chosen to use
zero for consistency with what we do for 'undecided', see coding.c.)





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-09 16:14           ` Eli Zaretskii
@ 2020-11-09 16:27             ` Lars Ingebrigtsen
  2020-11-09 16:57               ` Eli Zaretskii
  2020-11-14 14:02             ` Stefan Monnier
  1 sibling, 1 reply; 25+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-09 16:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: thievol, schwab, 44486

Eli Zaretskii <eliz@gnu.org> writes:

> 0 is different: it says to obey the value of
> inhibit-null-byte-detection resp. inhibit-iso-escape-detection.  t
> means inhibit the detection unconditionally, which is not what we
> want.
>
> (We could use any non-nil, non-t value, of course; I've chosen to use
> zero for consistency with what we do for 'undecided', see coding.c.)

I see.  Perhaps the difference between the various non-nil values should
be mentioned in the doc strings of the two variables?  They only mention
nil/non-nil now, as far as I can see.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-09 16:27             ` Lars Ingebrigtsen
@ 2020-11-09 16:57               ` Eli Zaretskii
  2020-11-10 14:29                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-09 16:57 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: thievol, schwab, 44486

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: handa@gnu.org,  thievol@posteo.net,  schwab@linux-m68k.org,
>   44486@debbugs.gnu.org
> Date: Mon, 09 Nov 2020 17:27:06 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > 0 is different: it says to obey the value of
> > inhibit-null-byte-detection resp. inhibit-iso-escape-detection.  t
> > means inhibit the detection unconditionally, which is not what we
> > want.
> >
> > (We could use any non-nil, non-t value, of course; I've chosen to use
> > zero for consistency with what we do for 'undecided', see coding.c.)
> 
> I see.  Perhaps the difference between the various non-nil values should
> be mentioned in the doc strings of the two variables?  They only mention
> nil/non-nil now, as far as I can see.

The _variables_ are simple booleans; it's the value of the
:inhibit-null-byte-detection _property_ of a coding-system that is a
tri-state.  And that fact is documented in the doc string of
define-coding-system.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-09 16:57               ` Eli Zaretskii
@ 2020-11-10 14:29                 ` Lars Ingebrigtsen
  2020-11-10 16:04                   ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-10 14:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: thievol, schwab, 44486

Eli Zaretskii <eliz@gnu.org> writes:

> The _variables_ are simple booleans; it's the value of the
> :inhibit-null-byte-detection _property_ of a coding-system that is a
> tri-state.  And that fact is documented in the doc string of
> define-coding-system.

Ah; sorry for the noise.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-10 14:29                 ` Lars Ingebrigtsen
@ 2020-11-10 16:04                   ` Eli Zaretskii
  0 siblings, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-10 16:04 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: thievol, schwab, 44486

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: handa@gnu.org,  thievol@posteo.net,  schwab@linux-m68k.org,
>   44486@debbugs.gnu.org
> Date: Tue, 10 Nov 2020 15:29:27 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > The _variables_ are simple booleans; it's the value of the
> > :inhibit-null-byte-detection _property_ of a coding-system that is a
> > tri-state.  And that fact is documented in the doc string of
> > define-coding-system.
> 
> Ah; sorry for the noise.

No noise heard here ;-)





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-09 16:14           ` Eli Zaretskii
  2020-11-09 16:27             ` Lars Ingebrigtsen
@ 2020-11-14 14:02             ` Stefan Monnier
  2020-11-14 15:09               ` Eli Zaretskii
  1 sibling, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2020-11-14 14:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: thievol, Lars Ingebrigtsen, schwab, 44486

>> > -  :prefer-utf-8 t)
>> > +  :prefer-utf-8 t
>> > +  :inhibit-null-byte-detection 0
>> > +  :inhibit-iso-escape-detection 0)
>> 
>> Makes sense to me, but is there any particular reason to use 0 instead
>> of t here?
>
> 0 is different: it says to obey the value of
> inhibit-null-byte-detection resp. inhibit-iso-escape-detection.  t
> means inhibit the detection unconditionally, which is not what we
> want.

Actually, for prefer-utf-8 files, I think we never want to automatically
fallback to binary.

IOW I think Thierry's situation shows a bug in Emacs rather than
a pilot error.


        Stefan






^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-14 14:02             ` Stefan Monnier
@ 2020-11-14 15:09               ` Eli Zaretskii
  2020-11-14 15:19                 ` Stefan Monnier
  0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-14 15:09 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  thievol@posteo.net,  handa@gnu.org,
>   schwab@linux-m68k.org,  44486@debbugs.gnu.org
> Date: Sat, 14 Nov 2020 09:02:16 -0500
> 
> > 0 is different: it says to obey the value of
> > inhibit-null-byte-detection resp. inhibit-iso-escape-detection.  t
> > means inhibit the detection unconditionally, which is not what we
> > want.
> 
> Actually, for prefer-utf-8 files, I think we never want to automatically
> fallback to binary.

I think you are assuming prefer-utf-8 is something other than what it
is.  It is not a variant of UTF-8, it is a variant of 'undecided'
(i.e. it starts by detecting the encoding), which prefers UTF-8 if
that can decode the text.  inhibit-null-byte-detection etc. are
relevant to the detection phase, not to the decoding phase.  It is
wrong IMO to decide to use UTF-8 for a binary byte stream just because
it includes valid UTF-8 byte sequences.  If the input text is known to
be UTF-8, even though it includes null bytes, the user or the
application should either bind coding-system-for-read or
inhibit-null-byte-detection.

> IOW I think Thierry's situation shows a bug in Emacs rather than
> a pilot error.

I disagree.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-14 15:09               ` Eli Zaretskii
@ 2020-11-14 15:19                 ` Stefan Monnier
  2020-11-14 16:13                   ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2020-11-14 15:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486

>> Actually, for prefer-utf-8 files, I think we never want to automatically
>> fallback to binary.
> I think you are assuming prefer-utf-8 is something other than what it
> is.  It is not a variant of UTF-8, it is a variant of 'undecided'
> (i.e. it starts by detecting the encoding), which prefers UTF-8 if
> that can decode the text.

My position is not based on principles but on pragmatic concerns.
AFAIK `prefer-utf-8` is only ever used for files which are known to
contain text and should almost always contain UTF-8 text.

I believe if there's a NUL byte in such a files but it otherwise doesn't
contain any invalid UTF-8 byte sequence, it will result in better
behavior if we treat it as UFT-8 than as binary.

        Stefan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-14 15:19                 ` Stefan Monnier
@ 2020-11-14 16:13                   ` Eli Zaretskii
  2020-11-14 17:55                     ` Stefan Monnier
  0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-14 16:13 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: larsi@gnus.org,  thievol@posteo.net,  handa@gnu.org,
>   schwab@linux-m68k.org,  44486@debbugs.gnu.org
> Date: Sat, 14 Nov 2020 10:19:57 -0500
> 
> >> Actually, for prefer-utf-8 files, I think we never want to automatically
> >> fallback to binary.
> > I think you are assuming prefer-utf-8 is something other than what it
> > is.  It is not a variant of UTF-8, it is a variant of 'undecided'
> > (i.e. it starts by detecting the encoding), which prefers UTF-8 if
> > that can decode the text.
> 
> My position is not based on principles but on pragmatic concerns.
> AFAIK `prefer-utf-8` is only ever used for files which are known to
> contain text and should almost always contain UTF-8 text.

For those, we should use utf-8, not prefer-utf-8.

> I believe if there's a NUL byte in such a files but it otherwise doesn't
> contain any invalid UTF-8 byte sequence, it will result in better
> behavior if we treat it as UFT-8 than as binary.

We treat null bytes as the _single_ telltale sign of a binary file.
If we disable that in coding-systems that are supposed to _detect_
encoding, we will never be able to detect binary files.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-14 16:13                   ` Eli Zaretskii
@ 2020-11-14 17:55                     ` Stefan Monnier
  2020-11-14 18:08                       ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2020-11-14 17:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486

>> >> Actually, for prefer-utf-8 files, I think we never want to automatically
>> >> fallback to binary.
>> > I think you are assuming prefer-utf-8 is something other than what it
>> > is.  It is not a variant of UTF-8, it is a variant of 'undecided'
>> > (i.e. it starts by detecting the encoding), which prefers UTF-8 if
>> > that can decode the text.
>> My position is not based on principles but on pragmatic concerns.
>> AFAIK `prefer-utf-8` is only ever used for files which are known to
>> contain text and should almost always contain UTF-8 text.
> For those, we should use utf-8, not prefer-utf-8.

No, `utf-8` should be used when other coding systems should be
considered as errors (i.e. not "almost always" but "always"), whereas
`prefer-utf-8` is for use when utf-8 is the most likely one and other
coding systems should be tried only when there's some evidence that the
file actually doesn't use utf-8.

`prefer-utf-8` was introduced specifically for `.el` files (and I don't
know of any other use of that encoding so far).  If `utf-8` is
preferable over `prefer-utf-8` for this usage I think the problem is in
`prefer-utf-8` since it was introduced specifically for that.

>> I believe if there's a NUL byte in such a files but it otherwise doesn't
>> contain any invalid UTF-8 byte sequence, it will result in better
>> behavior if we treat it as UFT-8 than as binary.
> We treat null bytes as the _single_ telltale sign of a binary file.

A .el file should *never* be a binary file.

> If we disable that in coding-systems that are supposed to _detect_
> encoding, we will never be able to detect binary files.

In which scenario would it be beneficial to detect a `.el` file as being
binary instead of utf-8?

        Stefan

PS: Especially since NUL bytes can and do occur in ELisp code.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-14 17:55                     ` Stefan Monnier
@ 2020-11-14 18:08                       ` Eli Zaretskii
  2020-11-14 18:14                         ` Eli Zaretskii
  2020-11-14 22:53                         ` Stefan Monnier
  0 siblings, 2 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-14 18:08 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: larsi@gnus.org,  thievol@posteo.net,  handa@gnu.org,
>   schwab@linux-m68k.org,  44486@debbugs.gnu.org
> Date: Sat, 14 Nov 2020 12:55:51 -0500
> 
> >> AFAIK `prefer-utf-8` is only ever used for files which are known to
> >> contain text and should almost always contain UTF-8 text.
> > For those, we should use utf-8, not prefer-utf-8.
> 
> No, `utf-8` should be used when other coding systems should be
> considered as errors (i.e. not "almost always" but "always")

Why?

> whereas `prefer-utf-8` is for use when utf-8 is the most likely one
> and other coding systems should be tried only when there's some
> evidence that the file actually doesn't use utf-8.
> 
> `prefer-utf-8` was introduced specifically for `.el` files (and I don't
> know of any other use of that encoding so far).

Maybe that was the history, but the reality is different.
prefer-utf-8 is the same as 'undecided' with coding-systems'
priorities tampered to prefer UTF-8.

> If `utf-8` is preferable over `prefer-utf-8` for this usage I think
> the problem is in `prefer-utf-8` since it was introduced
> specifically for that.

The implementation doesn't support your POV.

> >> I believe if there's a NUL byte in such a files but it otherwise doesn't
> >> contain any invalid UTF-8 byte sequence, it will result in better
> >> behavior if we treat it as UFT-8 than as binary.
> > We treat null bytes as the _single_ telltale sign of a binary file.
> 
> A .el file should *never* be a binary file.

We are not talking about .el files, we are talking about _any_ file
read using prefer-utf-8.

For .el files, we can always bind inhibit-null-byte-detection to t
when we load or visit such files.

> > If we disable that in coding-systems that are supposed to _detect_
> > encoding, we will never be able to detect binary files.
> 
> In which scenario would it be beneficial to detect a `.el` file as being
> binary instead of utf-8?

I'm not talking about .el files.  The coding-system's applicability is
wider than that.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-14 18:08                       ` Eli Zaretskii
@ 2020-11-14 18:14                         ` Eli Zaretskii
  2020-11-14 22:56                           ` Stefan Monnier
  2020-11-14 22:53                         ` Stefan Monnier
  1 sibling, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-14 18:14 UTC (permalink / raw)
  To: monnier; +Cc: thievol, larsi, schwab, 44486

> Date: Sat, 14 Nov 2020 20:08:04 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: thievol@posteo.net, larsi@gnus.org, schwab@linux-m68k.org,
>  44486@debbugs.gnu.org
> 
> For .el files, we can always bind inhibit-null-byte-detection to t
> when we load or visit such files.

Alternatively, we could introduce a separate coding-system whose
:inhibit-null-byte-detection property is t, and use that for *.el
files.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-14 18:14                         ` Eli Zaretskii
@ 2020-11-14 22:56                           ` Stefan Monnier
  2020-11-15 15:14                             ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2020-11-14 22:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486

>> For .el files, we can always bind inhibit-null-byte-detection to t
>> when we load or visit such files.
> Alternatively, we could introduce a separate coding-system whose
> :inhibit-null-byte-detection property is t, and use that for *.el
> files.

If you want to go that route, that's fine by me.  AFAIK noone else uses
`prefer-utf-8`, so it doesn't seem worth the trouble, tho (especially
since we don't have any evidence that potential other users would favor
the current behavior over the inhibit-null-byte-detection one).


        Stefan






^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-14 22:56                           ` Stefan Monnier
@ 2020-11-15 15:14                             ` Eli Zaretskii
  0 siblings, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-15 15:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: thievol@posteo.net,  larsi@gnus.org,  schwab@linux-m68k.org,
>   44486@debbugs.gnu.org
> Date: Sat, 14 Nov 2020 17:56:36 -0500
> 
> >> For .el files, we can always bind inhibit-null-byte-detection to t
> >> when we load or visit such files.
> > Alternatively, we could introduce a separate coding-system whose
> > :inhibit-null-byte-detection property is t, and use that for *.el
> > files.
> 
> If you want to go that route, that's fine by me.

I actually think that we don't need to do anything.  We've lived for 7
years with a reality that is worse than what is now on master, and no
one complained.

But if you are very unhappy about this, we _could_ introduce a new
coding-system for *.el files.

> (especially since we don't have any evidence that potential other
> users would favor the current behavior over the
> inhibit-null-byte-detection one).

The current behavior on master is to heed inhibit-null-byte-detection;
the current behavior in Emacs 27 is to ignore it, and always consider
a .el file with null bytes as binary.  I hope you agree that the
behavior on master is slightly better, at least in that it won't
surprise users.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-14 18:08                       ` Eli Zaretskii
  2020-11-14 18:14                         ` Eli Zaretskii
@ 2020-11-14 22:53                         ` Stefan Monnier
  2020-11-15 15:08                           ` Eli Zaretskii
  1 sibling, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2020-11-14 22:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486

>> If `utf-8` is preferable over `prefer-utf-8` for this usage I think
>> the problem is in `prefer-utf-8` since it was introduced
>> specifically for that.
> The implementation doesn't support your POV.

Then I think the implementation is in error.

>> >> I believe if there's a NUL byte in such a files but it otherwise doesn't
>> >> contain any invalid UTF-8 byte sequence, it will result in better
>> >> behavior if we treat it as UFT-8 than as binary.
>> > We treat null bytes as the _single_ telltale sign of a binary file.
>> 
>> A .el file should *never* be a binary file.
>
> We are not talking about .el files, we are talking about _any_ file
> read using prefer-utf-8.

`prefer-utf-8` was not introduced because it seemed like a good idea and
then we hoped someone would find it useful.  It was introduced to solve
a concrete need, which is that of `.el` files.  It's quite possible that
there are other situations that have the same needs as `.el` files, but
from where I stand it looks like "the needs of .el files (and similar
cases)" should determine the intended behavior of `prefer-utf-8` rather
than its current implementation.

> For .el files, we can always bind inhibit-null-byte-detection to t
> when we load or visit such files.

We could, but I'm having trouble imagining a situation where we'd want
to use `prefer-utf-8` and not inhibit "NUL means binary".

The "NUL mean binarys" heuristic fundamentally says that `binary` is the
first coding system we try and only if this one fails (for lack of NUL
bytes) we consider others.  But for `prefer-utf-8` we should first
consider utf-8 and only if this fails should we consider others
(potentially including `binary` if you want, my opinion is not as strong
there).

> I'm not talking about .el files.  The coding-system's applicability is
> wider than that.

Could be.  But it's its "raison d'être" (and AFAIK currently still the
sole application), so it should handle this case as best it can.

        Stefan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-14 22:53                         ` Stefan Monnier
@ 2020-11-15 15:08                           ` Eli Zaretskii
  2020-11-15 18:31                             ` Stefan Monnier
  0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-15 15:08 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: larsi@gnus.org,  thievol@posteo.net,  handa@gnu.org,
>   schwab@linux-m68k.org,  44486@debbugs.gnu.org
> Date: Sat, 14 Nov 2020 17:53:57 -0500
> 
> >> If `utf-8` is preferable over `prefer-utf-8` for this usage I think
> >> the problem is in `prefer-utf-8` since it was introduced
> >> specifically for that.
> > The implementation doesn't support your POV.
> 
> Then I think the implementation is in error.

But that ship has sailed 7 years ago.

> > We are not talking about .el files, we are talking about _any_ file
> > read using prefer-utf-8.
> 
> `prefer-utf-8` was not introduced because it seemed like a good idea and
> then we hoped someone would find it useful.  It was introduced to solve
> a concrete need, which is that of `.el` files.  It's quite possible that
> there are other situations that have the same needs as `.el` files, but
> from where I stand it looks like "the needs of .el files (and similar
> cases)" should determine the intended behavior of `prefer-utf-8` rather
> than its current implementation.
> 
> > For .el files, we can always bind inhibit-null-byte-detection to t
> > when we load or visit such files.
> 
> We could, but I'm having trouble imagining a situation where we'd want
> to use `prefer-utf-8` and not inhibit "NUL means binary".
> 
> The "NUL mean binarys" heuristic fundamentally says that `binary` is the
> first coding system we try and only if this one fails (for lack of NUL
> bytes) we consider others.  But for `prefer-utf-8` we should first
> consider utf-8 and only if this fails should we consider others
> (potentially including `binary` if you want, my opinion is not as strong
> there).
> 
> > I'm not talking about .el files.  The coding-system's applicability is
> > wider than that.
> 
> Could be.  But it's its "raison d'être" (and AFAIK currently still the
> sole application), so it should handle this case as best it can.

We should have been having this discussion 7 years ago.  And guess
what? we did.  In that discussion, you said, in response to a question
from Kenichi:

   > * What to do with null byte detection.  Previously, if a
   >   *.el file contains a null byte and
   >   inhibit-null-byte-detection is nil (the default), it's
   >   detected as a binary file.  Now utf-8 is forced regardless
   >   of inhibit-null-byte-detection.

   I like the utf-8 better, but I don't know of any concrete case where it
   makes a significant difference, so either way is OK.
                                      ^^^^^^^^^^^^^^^^
Note that what actually got implemented ignored
inhibit-null-byte-detection altogether, and _always_ considered the
file binary if any null byte was found.  My change, which prompted
this present discussion, made prefer-utf-8 heed the variable's value,
which is mid-way between what we had for 7 years and what you thought
we should have.  So, a small step forward ;-)





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-15 15:08                           ` Eli Zaretskii
@ 2020-11-15 18:31                             ` Stefan Monnier
  0 siblings, 0 replies; 25+ messages in thread
From: Stefan Monnier @ 2020-11-15 18:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486

>    > * What to do with null byte detection.  Previously, if a
>    >   *.el file contains a null byte and
>    >   inhibit-null-byte-detection is nil (the default), it's
>    >   detected as a binary file.  Now utf-8 is forced regardless
>    >   of inhibit-null-byte-detection.
>
>    I like the utf-8 better, but I don't know of any concrete case where it
>    makes a significant difference, so either way is OK.
>                                       ^^^^^^^^^^^^^^^^

I'm glad to see that I now know better ;-)

> we should have.  So, a small step forward ;-)

I'll take what I can get ;-)


        Stefan






^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-06 20:07       ` Eli Zaretskii
  2020-11-09 15:44         ` Lars Ingebrigtsen
@ 2020-11-14 12:43         ` Eli Zaretskii
  1 sibling, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-14 12:43 UTC (permalink / raw)
  To: handa; +Cc: thievol, schwab, 44486-done

> Date: Fri, 06 Nov 2020 22:07:10 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: thievol@posteo.net, schwab@linux-m68k.org, 44486@debbugs.gnu.org
> 
> We don't specify that prefer-utf-8, which is used by default for *.el
> files, should heed this variable.  Since prefer-utf-8 is a variant of
> 'undecided', i.e. it performs detection of encoding, I think this is a
> bug, because 'undecided' does pay attention to
> inhibit-null-byte-detection.
> 
> So I propose the change below (for master).  Any objections?

No objections, so I have now installed this on the master branch, and
I'm closing this bug report.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#44486: 27.1; C-@ chars corrupt elisp buffer
  2020-11-06 15:33 ` Andreas Schwab
  2020-11-06 15:40   ` Eli Zaretskii
@ 2020-11-06 19:18   ` Thierry Volpiatto
  1 sibling, 0 replies; 25+ messages in thread
From: Thierry Volpiatto @ 2020-11-06 19:18 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 44486


Andreas Schwab <schwab@linux-m68k.org> writes:

> The null byte causes the file to be detected as binary.  You can use C-x
> C-m c to override the detection.

Thanks for explanation, I work around this by using "\0" in my code
instead of "^@".


-- 
Thierry





^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-11-15 18:31 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-06 15:11 bug#44486: 27.1; C-@ chars corrupt elisp buffer Thierry Volpiatto
2020-11-06 15:33 ` Andreas Schwab
2020-11-06 15:40   ` Eli Zaretskii
2020-11-06 16:17     ` Eli Zaretskii
2020-11-06 20:07       ` Eli Zaretskii
2020-11-09 15:44         ` Lars Ingebrigtsen
2020-11-09 16:14           ` Eli Zaretskii
2020-11-09 16:27             ` Lars Ingebrigtsen
2020-11-09 16:57               ` Eli Zaretskii
2020-11-10 14:29                 ` Lars Ingebrigtsen
2020-11-10 16:04                   ` Eli Zaretskii
2020-11-14 14:02             ` Stefan Monnier
2020-11-14 15:09               ` Eli Zaretskii
2020-11-14 15:19                 ` Stefan Monnier
2020-11-14 16:13                   ` Eli Zaretskii
2020-11-14 17:55                     ` Stefan Monnier
2020-11-14 18:08                       ` Eli Zaretskii
2020-11-14 18:14                         ` Eli Zaretskii
2020-11-14 22:56                           ` Stefan Monnier
2020-11-15 15:14                             ` Eli Zaretskii
2020-11-14 22:53                         ` Stefan Monnier
2020-11-15 15:08                           ` Eli Zaretskii
2020-11-15 18:31                             ` Stefan Monnier
2020-11-14 12:43         ` Eli Zaretskii
2020-11-06 19:18   ` Thierry Volpiatto

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.