* bug#44486: 27.1; C-@ chars corrupt elisp buffer
@ 2020-11-06 15:11 Thierry Volpiatto
2020-11-06 15:33 ` Andreas Schwab
0 siblings, 1 reply; 25+ messages in thread
From: Thierry Volpiatto @ 2020-11-06 15:11 UTC (permalink / raw)
To: 44486
1) emacs -Q
2) M-x find-file test.el
3) insert this in test.el buffer:
;; ààààà
(foo "^@")
4) save buffer
5) M-x revert-buffer
You should see now the line ;; ààààà corrupted:
NOTE: in 3) Write "^@" with C-q C-@.
In GNU Emacs 27.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.22.30, cairo version 1.15.10)
of 2020-08-31 built on IPadS340
Windowing system distributor 'The X.Org Foundation', version 11.0.12008000
System Description: Linux Mint 19.3
Recent messages:
Sending...
Sending via mail...
Decrypting /home/thierry/.authinfo.gpg...done
Sending email
Sending email done
Saving file /home/thierry/Maildir/Posteo/Sent/cur/1604674326.396ddaa78615bfbe.ipads340:2,S...
Wrote /home/thierry/Maildir/Posteo/Sent/cur/1604674326.396ddaa78615bfbe.ipads340:2,S
Sending...done
[mu4e] Message sent
Do you want to exit emacs-w3m? (y or n) y
Configured using:
'configure CFLAGS=-O3 --without-dbus --without-gconf
--without-gsettings --with-mailutils --with-cairo'
Configured features:
XPM JPEG TIFF GIF PNG RSVG CAIRO SOUND GPM GLIB NOTIFY INOTIFY ACL
LIBSELINUX GNUTLS LIBXML2 FREETYPE HARFBUZZ M17N_FLT LIBOTF ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS LIBSYSTEMD PDUMPER
LCMS2 GMP
Important settings:
value of $LANG: fr_FR.UTF-8
locale-coding-system: utf-8-unix
Major mode: Ilisp
Minor modes in effect:
global-magit-file-mode: t
magit-auto-revert-mode: t
global-git-commit-mode: t
global-undo-tree-mode: t
undo-tree-mode: t
global-ligature-mode: t
ligature-mode: t
psession-mode: t
psession-autosave-mode: t
psession-savehist-mode: t
global-git-gutter-mode: t
eldoc-in-minibuffer-mode: t
display-time-mode: t
winner-mode: t
show-paren-mode: t
helm-epa-mode: t
helm-descbinds-mode: t
override-global-mode: t
helm-adaptive-mode: t
helm-mode: t
helm-ff-cache-mode: t
shell-dirtrack-mode: t
async-bytecomp-package-mode: t
dired-async-mode: t
minibuffer-depth-indicate-mode: t
straight-use-package-mode: t
straight-package-neutering-mode: t
tooltip-mode: t
global-eldoc-mode: t
eldoc-mode: t
mouse-wheel-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
column-number-mode: t
line-number-mode: t
auto-fill-function: do-auto-fill
transient-mark-mode: t
Load-path shadows:
None found.
Features:
(shadow emacsbug w3m-filter w3m-cookie w3m-tabmenu w3m-session
w3m-search helm-w3m w3m-bookmark gnutls epa-file network-stream nsm
mailalias helm-ring helm-dabbrev autocrypt-message epa-mail helm-firefox
magit-extras face-remap magit-bookmark magit-submodule magit-obsolete
magit-blame magit-stash magit-reflog magit-bisect magit-push magit-pull
magit-fetch magit-clone magit-remote magit-commit magit-sequence
magit-notes magit-worktree magit-tag magit-merge magit-branch
magit-reset magit-files magit-refs magit-status magit magit-repos
magit-apply magit-wip magit-log which-func magit-diff smerge-mode
magit-core magit-autorevert autorevert filenotify magit-margin
magit-transient magit-process magit-mode git-commit transient magit-git
magit-section magit-utils crm log-edit add-log with-editor qp view sort
gnus-cite smiley w3m-form w3m-symbol w3m timezone w3m-hist w3m-fb
bookmark-w3m w3m-ems w3m-favicon w3m-image tab-line w3m-proc w3m-util
mm-archive mail-extr autocrypt-gnus autocrypt-mu4e autocrypt rx
addressbook-bookmark mu4e-config org-mu4e gnus-art mm-uu mml2015 mm-view
mml-smime smime dig gnus-sum gnus-group gnus-undo gnus-start gnus-cloud
nnimap nnmail mail-source utf7 netrc nnoo gnus-spec gnus-int gnus-range
gnus-win gnus nnheader mu4e-patch mu4e-contrib eshell esh-cmd esh-ext
esh-opt esh-proc esh-io esh-arg esh-module esh-groups esh-util mu4e
mu4e-org mu4e-main mu4e-view mu4e-headers mu4e-compose mu4e-context
mu4e-draft mu4e-actions ido rfc2368 smtpmail sendmail mu4e-mark
mu4e-proc mu4e-utils doc-view image-mode exif mu4e-lists mu4e-message
shr svg dom flow-fill hl-line mu4e-vars message rmc puny rfc822 mml
mml-sec gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode
mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr
mailabbrev mail-utils gmm-utils mailheader mu4e-meta helm-x-files
helm-for-files helm-bookmark bookmark text-property-search pp
helm-command flymake-proc flymake warnings conf-mode sh-script smie
executable jka-compr bug-reference naquadah-theme solar cal-dst holidays
hol-loaddefs tv-utils undo-tree diff undo-tree-autoloads ligature
ligature-autoloads boxquote rect rainbow-mode-autoloads psession
wgrep-helm wgrep grep compile wgrep-helm-autoloads wgrep-autoloads
log-view pcvs-util pcmpl-git pcmpl-git-autoloads
bash-completion-autoloads powerline powerline-separators color
powerline-themes powerline-autoloads toc-org-autoloads cl-indent pcase
ffap markdown-toc-autoloads markdown-mode-autoloads autocrypt-autoloads
config-w3m w3m-autoloads git-gutter git-gutter-autoloads mule-util appt
diary-lib diary-loaddefs anaconda-mode xref project pythonic f dash s
anaconda-mode-autoloads pythonic-autoloads f-autoloads s-autoloads
eldoc-eval emamux-autoloads magit-autoloads git-commit-autoloads
with-editor-autoloads transient-autoloads dash-autoloads
pcomplete-extension pcmpl-unix pcmpl-gnu iterator iedit-autoloads
ledger-mode-autoloads wdired dired-extension org-config ob-gnuplot
org-crypt net-utils time winner w3m-wget wget thingatpt wget-sysdep
autotest-mode autoconf-mode paren woman man ediff ediff-merg ediff-mult
ediff-wind ediff-diff ediff-help ediff-init ediff-util init-helm helm-fd
epa derived epg epg-config helm-misc helm-apt helm-imenu imenu
helm-elisp-package package url-handlers helm-find helm-org org ob
ob-tangle ob-ref ob-lob ob-table ob-exp org-macro org-footnote org-src
ob-comint org-pcomplete org-list org-faces org-entities noutline outline
org-version ob-emacs-lisp ob-core ob-eval org-table ol org-keys
org-compat org-macs org-loaddefs cal-menu calendar cal-loaddefs
helm-external helm-net browse-url xml url url-proxy url-privacy
url-expand url-methods url-history url-cookie url-domsuf url-util
url-parse url-vars mailcap helm-descbinds cus-edit wid-edit helm-ls-git
vc-git diff-mode vc vc-dispatcher helm-ipython helm-elisp helm-eval
edebug backtrace find-func helm-info python tramp-sh
use-package-bind-key bind-key helm-adaptive diminish helm-mode
helm-files tramp tramp-loaddefs trampver tramp-integration files-x
tramp-compat shell pcomplete comint ansi-color ring parse-time iso8601
time-date ls-lisp auth-source password-cache json map helm-buffers
helm-occur helm-tags helm-locate helm-grep helm-regexp format-spec
helm-utils helm-help helm-types use-package-diminish
helm-extensions-autoloads helm-config helm-autoloads helm easy-mmode
async-bytecomp helm-global-bindings helm-easymenu helm-source
eieio-compat eieio eieio-core eieio-loaddefs helm-multi-match helm-lib
dired-async advice dired-aux dired dired-loaddefs async emms-autoloads
cl-seq use-package-core popup-autoloads finder-inf diminish-autoloads
mb-depth server edmacro kmacro avoid cus-start cus-load
use-package-autoloads bind-key-autoloads straight-autoloads info
cl-extra help-mode easymenu seq byte-opt straight subr-x cl-macs gv
bytecomp byte-compile cconv cl-loaddefs cl-lib tooltip eldoc electric
uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads inotify lcms2 dynamic-setting font-render-setting
cairo move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)
Memory information:
((conses 16 599805 404657)
(symbols 48 41981 3)
(strings 32 167881 57520)
(string-bytes 1 6426344)
(vectors 16 82748)
(vector-slots 8 1666976 230836)
(floats 8 1795 3081)
(intervals 56 6849 3252)
(buffers 1000 130))
--
Thierry
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-06 15:11 bug#44486: 27.1; C-@ chars corrupt elisp buffer Thierry Volpiatto
@ 2020-11-06 15:33 ` Andreas Schwab
2020-11-06 15:40 ` Eli Zaretskii
2020-11-06 19:18 ` Thierry Volpiatto
0 siblings, 2 replies; 25+ messages in thread
From: Andreas Schwab @ 2020-11-06 15:33 UTC (permalink / raw)
To: Thierry Volpiatto; +Cc: 44486
The null byte causes the file to be detected as binary. You can use C-x
C-m c to override the detection.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-06 15:33 ` Andreas Schwab
@ 2020-11-06 15:40 ` Eli Zaretskii
2020-11-06 16:17 ` Eli Zaretskii
2020-11-06 19:18 ` Thierry Volpiatto
1 sibling, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-06 15:40 UTC (permalink / raw)
To: Andreas Schwab; +Cc: thievol, 44486
> From: Andreas Schwab <schwab@linux-m68k.org>
> Date: Fri, 06 Nov 2020 16:33:04 +0100
> Cc: 44486@debbugs.gnu.org
>
> The null byte causes the file to be detected as binary. You can use C-x
> C-m c to override the detection.
Right. Or set inhibit-nul-byte-detection to a non-nil value before
reverting
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-06 15:40 ` Eli Zaretskii
@ 2020-11-06 16:17 ` Eli Zaretskii
2020-11-06 20:07 ` Eli Zaretskii
0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-06 16:17 UTC (permalink / raw)
To: schwab; +Cc: thievol, 44486
> Date: Fri, 06 Nov 2020 17:40:50 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: thievol@posteo.net, 44486@debbugs.gnu.org
>
> Or set inhibit-nul-byte-detection to a non-nil value before
> reverting
Actually, this doesn't seem to work, but it looks like a bug...
Btw, reverting while forcing a particular encoding can be invoked with
"C-x C-m r".
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-06 15:33 ` Andreas Schwab
2020-11-06 15:40 ` Eli Zaretskii
@ 2020-11-06 19:18 ` Thierry Volpiatto
1 sibling, 0 replies; 25+ messages in thread
From: Thierry Volpiatto @ 2020-11-06 19:18 UTC (permalink / raw)
To: Andreas Schwab; +Cc: 44486
Andreas Schwab <schwab@linux-m68k.org> writes:
> The null byte causes the file to be detected as binary. You can use C-x
> C-m c to override the detection.
Thanks for explanation, I work around this by using "\0" in my code
instead of "^@".
--
Thierry
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-06 16:17 ` Eli Zaretskii
@ 2020-11-06 20:07 ` Eli Zaretskii
2020-11-09 15:44 ` Lars Ingebrigtsen
2020-11-14 12:43 ` Eli Zaretskii
0 siblings, 2 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-06 20:07 UTC (permalink / raw)
To: Kenichi Handa; +Cc: thievol, schwab, 44486
> Date: Fri, 06 Nov 2020 18:17:53 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: thievol@posteo.net, 44486@debbugs.gnu.org
>
> > Date: Fri, 06 Nov 2020 17:40:50 +0200
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: thievol@posteo.net, 44486@debbugs.gnu.org
> >
> > Or set inhibit-nul-byte-detection to a non-nil value before
> > reverting
>
> Actually, this doesn't seem to work, but it looks like a bug...
We don't specify that prefer-utf-8, which is used by default for *.el
files, should heed this variable. Since prefer-utf-8 is a variant of
'undecided', i.e. it performs detection of encoding, I think this is a
bug, because 'undecided' does pay attention to
inhibit-null-byte-detection.
So I propose the change below (for master). Any objections?
diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
index e6e6135..16cd8cf 100644
--- a/lisp/international/mule-conf.el
+++ b/lisp/international/mule-conf.el
@@ -1251,7 +1251,9 @@ 'prefer-utf-8
:coding-type 'undecided
:mnemonic ?-
:charset-list '(emacs)
- :prefer-utf-8 t)
+ :prefer-utf-8 t
+ :inhibit-null-byte-detection 0
+ :inhibit-iso-escape-detection 0)
(define-coding-system 'raw-text
"Raw text, which means text contains random 8-bit codes.
^ permalink raw reply related [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-06 20:07 ` Eli Zaretskii
@ 2020-11-09 15:44 ` Lars Ingebrigtsen
2020-11-09 16:14 ` Eli Zaretskii
2020-11-14 12:43 ` Eli Zaretskii
1 sibling, 1 reply; 25+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-09 15:44 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: thievol, schwab, 44486
Eli Zaretskii <eliz@gnu.org> writes:
> So I propose the change below (for master). Any objections?
[...]
> - :prefer-utf-8 t)
> + :prefer-utf-8 t
> + :inhibit-null-byte-detection 0
> + :inhibit-iso-escape-detection 0)
Makes sense to me, but is there any particular reason to use 0 instead
of t here?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-09 15:44 ` Lars Ingebrigtsen
@ 2020-11-09 16:14 ` Eli Zaretskii
2020-11-09 16:27 ` Lars Ingebrigtsen
2020-11-14 14:02 ` Stefan Monnier
0 siblings, 2 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-09 16:14 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: thievol, schwab, 44486
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Kenichi Handa <handa@gnu.org>, thievol@posteo.net,
> schwab@linux-m68k.org, 44486@debbugs.gnu.org
> Date: Mon, 09 Nov 2020 16:44:00 +0100
>
> > - :prefer-utf-8 t)
> > + :prefer-utf-8 t
> > + :inhibit-null-byte-detection 0
> > + :inhibit-iso-escape-detection 0)
>
> Makes sense to me, but is there any particular reason to use 0 instead
> of t here?
0 is different: it says to obey the value of
inhibit-null-byte-detection resp. inhibit-iso-escape-detection. t
means inhibit the detection unconditionally, which is not what we
want.
(We could use any non-nil, non-t value, of course; I've chosen to use
zero for consistency with what we do for 'undecided', see coding.c.)
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-09 16:14 ` Eli Zaretskii
@ 2020-11-09 16:27 ` Lars Ingebrigtsen
2020-11-09 16:57 ` Eli Zaretskii
2020-11-14 14:02 ` Stefan Monnier
1 sibling, 1 reply; 25+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-09 16:27 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: thievol, schwab, 44486
Eli Zaretskii <eliz@gnu.org> writes:
> 0 is different: it says to obey the value of
> inhibit-null-byte-detection resp. inhibit-iso-escape-detection. t
> means inhibit the detection unconditionally, which is not what we
> want.
>
> (We could use any non-nil, non-t value, of course; I've chosen to use
> zero for consistency with what we do for 'undecided', see coding.c.)
I see. Perhaps the difference between the various non-nil values should
be mentioned in the doc strings of the two variables? They only mention
nil/non-nil now, as far as I can see.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-09 16:27 ` Lars Ingebrigtsen
@ 2020-11-09 16:57 ` Eli Zaretskii
2020-11-10 14:29 ` Lars Ingebrigtsen
0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-09 16:57 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: thievol, schwab, 44486
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: handa@gnu.org, thievol@posteo.net, schwab@linux-m68k.org,
> 44486@debbugs.gnu.org
> Date: Mon, 09 Nov 2020 17:27:06 +0100
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > 0 is different: it says to obey the value of
> > inhibit-null-byte-detection resp. inhibit-iso-escape-detection. t
> > means inhibit the detection unconditionally, which is not what we
> > want.
> >
> > (We could use any non-nil, non-t value, of course; I've chosen to use
> > zero for consistency with what we do for 'undecided', see coding.c.)
>
> I see. Perhaps the difference between the various non-nil values should
> be mentioned in the doc strings of the two variables? They only mention
> nil/non-nil now, as far as I can see.
The _variables_ are simple booleans; it's the value of the
:inhibit-null-byte-detection _property_ of a coding-system that is a
tri-state. And that fact is documented in the doc string of
define-coding-system.
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-09 16:57 ` Eli Zaretskii
@ 2020-11-10 14:29 ` Lars Ingebrigtsen
2020-11-10 16:04 ` Eli Zaretskii
0 siblings, 1 reply; 25+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-10 14:29 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: thievol, schwab, 44486
Eli Zaretskii <eliz@gnu.org> writes:
> The _variables_ are simple booleans; it's the value of the
> :inhibit-null-byte-detection _property_ of a coding-system that is a
> tri-state. And that fact is documented in the doc string of
> define-coding-system.
Ah; sorry for the noise.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-10 14:29 ` Lars Ingebrigtsen
@ 2020-11-10 16:04 ` Eli Zaretskii
0 siblings, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-10 16:04 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: thievol, schwab, 44486
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: handa@gnu.org, thievol@posteo.net, schwab@linux-m68k.org,
> 44486@debbugs.gnu.org
> Date: Tue, 10 Nov 2020 15:29:27 +0100
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > The _variables_ are simple booleans; it's the value of the
> > :inhibit-null-byte-detection _property_ of a coding-system that is a
> > tri-state. And that fact is documented in the doc string of
> > define-coding-system.
>
> Ah; sorry for the noise.
No noise heard here ;-)
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-06 20:07 ` Eli Zaretskii
2020-11-09 15:44 ` Lars Ingebrigtsen
@ 2020-11-14 12:43 ` Eli Zaretskii
1 sibling, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-14 12:43 UTC (permalink / raw)
To: handa; +Cc: thievol, schwab, 44486-done
> Date: Fri, 06 Nov 2020 22:07:10 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: thievol@posteo.net, schwab@linux-m68k.org, 44486@debbugs.gnu.org
>
> We don't specify that prefer-utf-8, which is used by default for *.el
> files, should heed this variable. Since prefer-utf-8 is a variant of
> 'undecided', i.e. it performs detection of encoding, I think this is a
> bug, because 'undecided' does pay attention to
> inhibit-null-byte-detection.
>
> So I propose the change below (for master). Any objections?
No objections, so I have now installed this on the master branch, and
I'm closing this bug report.
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-09 16:14 ` Eli Zaretskii
2020-11-09 16:27 ` Lars Ingebrigtsen
@ 2020-11-14 14:02 ` Stefan Monnier
2020-11-14 15:09 ` Eli Zaretskii
1 sibling, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2020-11-14 14:02 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: thievol, Lars Ingebrigtsen, schwab, 44486
>> > - :prefer-utf-8 t)
>> > + :prefer-utf-8 t
>> > + :inhibit-null-byte-detection 0
>> > + :inhibit-iso-escape-detection 0)
>>
>> Makes sense to me, but is there any particular reason to use 0 instead
>> of t here?
>
> 0 is different: it says to obey the value of
> inhibit-null-byte-detection resp. inhibit-iso-escape-detection. t
> means inhibit the detection unconditionally, which is not what we
> want.
Actually, for prefer-utf-8 files, I think we never want to automatically
fallback to binary.
IOW I think Thierry's situation shows a bug in Emacs rather than
a pilot error.
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-14 14:02 ` Stefan Monnier
@ 2020-11-14 15:09 ` Eli Zaretskii
2020-11-14 15:19 ` Stefan Monnier
0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-14 15:09 UTC (permalink / raw)
To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, thievol@posteo.net, handa@gnu.org,
> schwab@linux-m68k.org, 44486@debbugs.gnu.org
> Date: Sat, 14 Nov 2020 09:02:16 -0500
>
> > 0 is different: it says to obey the value of
> > inhibit-null-byte-detection resp. inhibit-iso-escape-detection. t
> > means inhibit the detection unconditionally, which is not what we
> > want.
>
> Actually, for prefer-utf-8 files, I think we never want to automatically
> fallback to binary.
I think you are assuming prefer-utf-8 is something other than what it
is. It is not a variant of UTF-8, it is a variant of 'undecided'
(i.e. it starts by detecting the encoding), which prefers UTF-8 if
that can decode the text. inhibit-null-byte-detection etc. are
relevant to the detection phase, not to the decoding phase. It is
wrong IMO to decide to use UTF-8 for a binary byte stream just because
it includes valid UTF-8 byte sequences. If the input text is known to
be UTF-8, even though it includes null bytes, the user or the
application should either bind coding-system-for-read or
inhibit-null-byte-detection.
> IOW I think Thierry's situation shows a bug in Emacs rather than
> a pilot error.
I disagree.
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-14 15:09 ` Eli Zaretskii
@ 2020-11-14 15:19 ` Stefan Monnier
2020-11-14 16:13 ` Eli Zaretskii
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2020-11-14 15:19 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486
>> Actually, for prefer-utf-8 files, I think we never want to automatically
>> fallback to binary.
> I think you are assuming prefer-utf-8 is something other than what it
> is. It is not a variant of UTF-8, it is a variant of 'undecided'
> (i.e. it starts by detecting the encoding), which prefers UTF-8 if
> that can decode the text.
My position is not based on principles but on pragmatic concerns.
AFAIK `prefer-utf-8` is only ever used for files which are known to
contain text and should almost always contain UTF-8 text.
I believe if there's a NUL byte in such a files but it otherwise doesn't
contain any invalid UTF-8 byte sequence, it will result in better
behavior if we treat it as UFT-8 than as binary.
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-14 15:19 ` Stefan Monnier
@ 2020-11-14 16:13 ` Eli Zaretskii
2020-11-14 17:55 ` Stefan Monnier
0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-14 16:13 UTC (permalink / raw)
To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: larsi@gnus.org, thievol@posteo.net, handa@gnu.org,
> schwab@linux-m68k.org, 44486@debbugs.gnu.org
> Date: Sat, 14 Nov 2020 10:19:57 -0500
>
> >> Actually, for prefer-utf-8 files, I think we never want to automatically
> >> fallback to binary.
> > I think you are assuming prefer-utf-8 is something other than what it
> > is. It is not a variant of UTF-8, it is a variant of 'undecided'
> > (i.e. it starts by detecting the encoding), which prefers UTF-8 if
> > that can decode the text.
>
> My position is not based on principles but on pragmatic concerns.
> AFAIK `prefer-utf-8` is only ever used for files which are known to
> contain text and should almost always contain UTF-8 text.
For those, we should use utf-8, not prefer-utf-8.
> I believe if there's a NUL byte in such a files but it otherwise doesn't
> contain any invalid UTF-8 byte sequence, it will result in better
> behavior if we treat it as UFT-8 than as binary.
We treat null bytes as the _single_ telltale sign of a binary file.
If we disable that in coding-systems that are supposed to _detect_
encoding, we will never be able to detect binary files.
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-14 16:13 ` Eli Zaretskii
@ 2020-11-14 17:55 ` Stefan Monnier
2020-11-14 18:08 ` Eli Zaretskii
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2020-11-14 17:55 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486
>> >> Actually, for prefer-utf-8 files, I think we never want to automatically
>> >> fallback to binary.
>> > I think you are assuming prefer-utf-8 is something other than what it
>> > is. It is not a variant of UTF-8, it is a variant of 'undecided'
>> > (i.e. it starts by detecting the encoding), which prefers UTF-8 if
>> > that can decode the text.
>> My position is not based on principles but on pragmatic concerns.
>> AFAIK `prefer-utf-8` is only ever used for files which are known to
>> contain text and should almost always contain UTF-8 text.
> For those, we should use utf-8, not prefer-utf-8.
No, `utf-8` should be used when other coding systems should be
considered as errors (i.e. not "almost always" but "always"), whereas
`prefer-utf-8` is for use when utf-8 is the most likely one and other
coding systems should be tried only when there's some evidence that the
file actually doesn't use utf-8.
`prefer-utf-8` was introduced specifically for `.el` files (and I don't
know of any other use of that encoding so far). If `utf-8` is
preferable over `prefer-utf-8` for this usage I think the problem is in
`prefer-utf-8` since it was introduced specifically for that.
>> I believe if there's a NUL byte in such a files but it otherwise doesn't
>> contain any invalid UTF-8 byte sequence, it will result in better
>> behavior if we treat it as UFT-8 than as binary.
> We treat null bytes as the _single_ telltale sign of a binary file.
A .el file should *never* be a binary file.
> If we disable that in coding-systems that are supposed to _detect_
> encoding, we will never be able to detect binary files.
In which scenario would it be beneficial to detect a `.el` file as being
binary instead of utf-8?
Stefan
PS: Especially since NUL bytes can and do occur in ELisp code.
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-14 17:55 ` Stefan Monnier
@ 2020-11-14 18:08 ` Eli Zaretskii
2020-11-14 18:14 ` Eli Zaretskii
2020-11-14 22:53 ` Stefan Monnier
0 siblings, 2 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-14 18:08 UTC (permalink / raw)
To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: larsi@gnus.org, thievol@posteo.net, handa@gnu.org,
> schwab@linux-m68k.org, 44486@debbugs.gnu.org
> Date: Sat, 14 Nov 2020 12:55:51 -0500
>
> >> AFAIK `prefer-utf-8` is only ever used for files which are known to
> >> contain text and should almost always contain UTF-8 text.
> > For those, we should use utf-8, not prefer-utf-8.
>
> No, `utf-8` should be used when other coding systems should be
> considered as errors (i.e. not "almost always" but "always")
Why?
> whereas `prefer-utf-8` is for use when utf-8 is the most likely one
> and other coding systems should be tried only when there's some
> evidence that the file actually doesn't use utf-8.
>
> `prefer-utf-8` was introduced specifically for `.el` files (and I don't
> know of any other use of that encoding so far).
Maybe that was the history, but the reality is different.
prefer-utf-8 is the same as 'undecided' with coding-systems'
priorities tampered to prefer UTF-8.
> If `utf-8` is preferable over `prefer-utf-8` for this usage I think
> the problem is in `prefer-utf-8` since it was introduced
> specifically for that.
The implementation doesn't support your POV.
> >> I believe if there's a NUL byte in such a files but it otherwise doesn't
> >> contain any invalid UTF-8 byte sequence, it will result in better
> >> behavior if we treat it as UFT-8 than as binary.
> > We treat null bytes as the _single_ telltale sign of a binary file.
>
> A .el file should *never* be a binary file.
We are not talking about .el files, we are talking about _any_ file
read using prefer-utf-8.
For .el files, we can always bind inhibit-null-byte-detection to t
when we load or visit such files.
> > If we disable that in coding-systems that are supposed to _detect_
> > encoding, we will never be able to detect binary files.
>
> In which scenario would it be beneficial to detect a `.el` file as being
> binary instead of utf-8?
I'm not talking about .el files. The coding-system's applicability is
wider than that.
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-14 18:08 ` Eli Zaretskii
@ 2020-11-14 18:14 ` Eli Zaretskii
2020-11-14 22:56 ` Stefan Monnier
2020-11-14 22:53 ` Stefan Monnier
1 sibling, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-14 18:14 UTC (permalink / raw)
To: monnier; +Cc: thievol, larsi, schwab, 44486
> Date: Sat, 14 Nov 2020 20:08:04 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: thievol@posteo.net, larsi@gnus.org, schwab@linux-m68k.org,
> 44486@debbugs.gnu.org
>
> For .el files, we can always bind inhibit-null-byte-detection to t
> when we load or visit such files.
Alternatively, we could introduce a separate coding-system whose
:inhibit-null-byte-detection property is t, and use that for *.el
files.
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-14 18:08 ` Eli Zaretskii
2020-11-14 18:14 ` Eli Zaretskii
@ 2020-11-14 22:53 ` Stefan Monnier
2020-11-15 15:08 ` Eli Zaretskii
1 sibling, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2020-11-14 22:53 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486
>> If `utf-8` is preferable over `prefer-utf-8` for this usage I think
>> the problem is in `prefer-utf-8` since it was introduced
>> specifically for that.
> The implementation doesn't support your POV.
Then I think the implementation is in error.
>> >> I believe if there's a NUL byte in such a files but it otherwise doesn't
>> >> contain any invalid UTF-8 byte sequence, it will result in better
>> >> behavior if we treat it as UFT-8 than as binary.
>> > We treat null bytes as the _single_ telltale sign of a binary file.
>>
>> A .el file should *never* be a binary file.
>
> We are not talking about .el files, we are talking about _any_ file
> read using prefer-utf-8.
`prefer-utf-8` was not introduced because it seemed like a good idea and
then we hoped someone would find it useful. It was introduced to solve
a concrete need, which is that of `.el` files. It's quite possible that
there are other situations that have the same needs as `.el` files, but
from where I stand it looks like "the needs of .el files (and similar
cases)" should determine the intended behavior of `prefer-utf-8` rather
than its current implementation.
> For .el files, we can always bind inhibit-null-byte-detection to t
> when we load or visit such files.
We could, but I'm having trouble imagining a situation where we'd want
to use `prefer-utf-8` and not inhibit "NUL means binary".
The "NUL mean binarys" heuristic fundamentally says that `binary` is the
first coding system we try and only if this one fails (for lack of NUL
bytes) we consider others. But for `prefer-utf-8` we should first
consider utf-8 and only if this fails should we consider others
(potentially including `binary` if you want, my opinion is not as strong
there).
> I'm not talking about .el files. The coding-system's applicability is
> wider than that.
Could be. But it's its "raison d'être" (and AFAIK currently still the
sole application), so it should handle this case as best it can.
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-14 18:14 ` Eli Zaretskii
@ 2020-11-14 22:56 ` Stefan Monnier
2020-11-15 15:14 ` Eli Zaretskii
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2020-11-14 22:56 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486
>> For .el files, we can always bind inhibit-null-byte-detection to t
>> when we load or visit such files.
> Alternatively, we could introduce a separate coding-system whose
> :inhibit-null-byte-detection property is t, and use that for *.el
> files.
If you want to go that route, that's fine by me. AFAIK noone else uses
`prefer-utf-8`, so it doesn't seem worth the trouble, tho (especially
since we don't have any evidence that potential other users would favor
the current behavior over the inhibit-null-byte-detection one).
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-14 22:53 ` Stefan Monnier
@ 2020-11-15 15:08 ` Eli Zaretskii
2020-11-15 18:31 ` Stefan Monnier
0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-15 15:08 UTC (permalink / raw)
To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: larsi@gnus.org, thievol@posteo.net, handa@gnu.org,
> schwab@linux-m68k.org, 44486@debbugs.gnu.org
> Date: Sat, 14 Nov 2020 17:53:57 -0500
>
> >> If `utf-8` is preferable over `prefer-utf-8` for this usage I think
> >> the problem is in `prefer-utf-8` since it was introduced
> >> specifically for that.
> > The implementation doesn't support your POV.
>
> Then I think the implementation is in error.
But that ship has sailed 7 years ago.
> > We are not talking about .el files, we are talking about _any_ file
> > read using prefer-utf-8.
>
> `prefer-utf-8` was not introduced because it seemed like a good idea and
> then we hoped someone would find it useful. It was introduced to solve
> a concrete need, which is that of `.el` files. It's quite possible that
> there are other situations that have the same needs as `.el` files, but
> from where I stand it looks like "the needs of .el files (and similar
> cases)" should determine the intended behavior of `prefer-utf-8` rather
> than its current implementation.
>
> > For .el files, we can always bind inhibit-null-byte-detection to t
> > when we load or visit such files.
>
> We could, but I'm having trouble imagining a situation where we'd want
> to use `prefer-utf-8` and not inhibit "NUL means binary".
>
> The "NUL mean binarys" heuristic fundamentally says that `binary` is the
> first coding system we try and only if this one fails (for lack of NUL
> bytes) we consider others. But for `prefer-utf-8` we should first
> consider utf-8 and only if this fails should we consider others
> (potentially including `binary` if you want, my opinion is not as strong
> there).
>
> > I'm not talking about .el files. The coding-system's applicability is
> > wider than that.
>
> Could be. But it's its "raison d'être" (and AFAIK currently still the
> sole application), so it should handle this case as best it can.
We should have been having this discussion 7 years ago. And guess
what? we did. In that discussion, you said, in response to a question
from Kenichi:
> * What to do with null byte detection. Previously, if a
> *.el file contains a null byte and
> inhibit-null-byte-detection is nil (the default), it's
> detected as a binary file. Now utf-8 is forced regardless
> of inhibit-null-byte-detection.
I like the utf-8 better, but I don't know of any concrete case where it
makes a significant difference, so either way is OK.
^^^^^^^^^^^^^^^^
Note that what actually got implemented ignored
inhibit-null-byte-detection altogether, and _always_ considered the
file binary if any null byte was found. My change, which prompted
this present discussion, made prefer-utf-8 heed the variable's value,
which is mid-way between what we had for 7 years and what you thought
we should have. So, a small step forward ;-)
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-14 22:56 ` Stefan Monnier
@ 2020-11-15 15:14 ` Eli Zaretskii
0 siblings, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2020-11-15 15:14 UTC (permalink / raw)
To: Stefan Monnier; +Cc: thievol, larsi, schwab, 44486
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: thievol@posteo.net, larsi@gnus.org, schwab@linux-m68k.org,
> 44486@debbugs.gnu.org
> Date: Sat, 14 Nov 2020 17:56:36 -0500
>
> >> For .el files, we can always bind inhibit-null-byte-detection to t
> >> when we load or visit such files.
> > Alternatively, we could introduce a separate coding-system whose
> > :inhibit-null-byte-detection property is t, and use that for *.el
> > files.
>
> If you want to go that route, that's fine by me.
I actually think that we don't need to do anything. We've lived for 7
years with a reality that is worse than what is now on master, and no
one complained.
But if you are very unhappy about this, we _could_ introduce a new
coding-system for *.el files.
> (especially since we don't have any evidence that potential other
> users would favor the current behavior over the
> inhibit-null-byte-detection one).
The current behavior on master is to heed inhibit-null-byte-detection;
the current behavior in Emacs 27 is to ignore it, and always consider
a .el file with null bytes as binary. I hope you agree that the
behavior on master is slightly better, at least in that it won't
surprise users.
^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#44486: 27.1; C-@ chars corrupt elisp buffer
2020-11-15 15:08 ` Eli Zaretskii
@ 2020-11-15 18:31 ` Stefan Monnier
0 siblings, 0 replies; 25+ messages in thread
From: Stefan Monnier @ 2020-11-15 18:31 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: thievol, larsi, schwab, 44486
> > * What to do with null byte detection. Previously, if a
> > *.el file contains a null byte and
> > inhibit-null-byte-detection is nil (the default), it's
> > detected as a binary file. Now utf-8 is forced regardless
> > of inhibit-null-byte-detection.
>
> I like the utf-8 better, but I don't know of any concrete case where it
> makes a significant difference, so either way is OK.
> ^^^^^^^^^^^^^^^^
I'm glad to see that I now know better ;-)
> we should have. So, a small step forward ;-)
I'll take what I can get ;-)
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2020-11-15 18:31 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-06 15:11 bug#44486: 27.1; C-@ chars corrupt elisp buffer Thierry Volpiatto
2020-11-06 15:33 ` Andreas Schwab
2020-11-06 15:40 ` Eli Zaretskii
2020-11-06 16:17 ` Eli Zaretskii
2020-11-06 20:07 ` Eli Zaretskii
2020-11-09 15:44 ` Lars Ingebrigtsen
2020-11-09 16:14 ` Eli Zaretskii
2020-11-09 16:27 ` Lars Ingebrigtsen
2020-11-09 16:57 ` Eli Zaretskii
2020-11-10 14:29 ` Lars Ingebrigtsen
2020-11-10 16:04 ` Eli Zaretskii
2020-11-14 14:02 ` Stefan Monnier
2020-11-14 15:09 ` Eli Zaretskii
2020-11-14 15:19 ` Stefan Monnier
2020-11-14 16:13 ` Eli Zaretskii
2020-11-14 17:55 ` Stefan Monnier
2020-11-14 18:08 ` Eli Zaretskii
2020-11-14 18:14 ` Eli Zaretskii
2020-11-14 22:56 ` Stefan Monnier
2020-11-15 15:14 ` Eli Zaretskii
2020-11-14 22:53 ` Stefan Monnier
2020-11-15 15:08 ` Eli Zaretskii
2020-11-15 18:31 ` Stefan Monnier
2020-11-14 12:43 ` Eli Zaretskii
2020-11-06 19:18 ` Thierry Volpiatto
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).