* bug#29189: 25.3; Dired does not work with binary filenames
[not found] <CADbSrJytKAysg4DRNR4iJD5JJbn7iwi_28Gh_oabaE-rnVyqLw@mail.gmail.com>
@ 2018-09-09 0:31 ` Allen Li
2018-09-09 6:12 ` Eli Zaretskii
0 siblings, 1 reply; 27+ messages in thread
From: Allen Li @ 2018-09-09 0:31 UTC (permalink / raw)
To: 29189
I believe this bug was fixed in 26?
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2018-09-09 0:31 ` bug#29189: 25.3; Dired does not work with binary filenames Allen Li
@ 2018-09-09 6:12 ` Eli Zaretskii
0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2018-09-09 6:12 UTC (permalink / raw)
To: Allen Li; +Cc: 29189-done
> From: Allen Li <darkfeline@felesatra.moe>
> Date: Sat, 8 Sep 2018 17:31:14 -0700
>
> I believe this bug was fixed in 26?
Yes. I even sent a message to that effect at the time I pushed the
changes, but I see now that I goofed with the bug address, so neither
the message nor the instruction to close the bug made it to the bug
tracker. Reproducing the important part below:
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Kenichi Handa <handa@gnu.org>, vianchielfaura@gmail.com, 29189@debbugs.gnu.org, schwab@suse.de
> Date: Fri, 05 Jan 2018 17:16:30 -0500
>
> > I found that the alternative patch below solves the original problem
> > without any changes needed in files.el, and without introducing any
> > performance hits. Does anyone see a problem with this proposed patch?
>
> [ With Andreas's adjustment] It looks sane to me.
I'd still want Handa-san's review of the patch, but I went ahead and
pushed it to the emacs-26 branch, and I'm closing the bug.
Thanks for pointing out this blunder.
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
@ 2017-11-07 9:03 Allen Li
2017-11-07 10:00 ` Andreas Schwab
0 siblings, 1 reply; 27+ messages in thread
From: Allen Li @ 2017-11-07 9:03 UTC (permalink / raw)
To: 29189
Dired does not work with binary filenames
For example, create such a file with Bash:
touch $'\265'
1. Navigate to the directory containing said file with Dired
2. Mark file for deletion with d
3. x
Expected:
File deleted
Actual:
(file-error Removing old name No such file or directory /home/bob/tmp/\300\265)
In GNU Emacs 25.3.1 (x86_64-pc-linux-gnu, GTK+ Version 3.22.19)
of 2017-09-16 built on juergen
Windowing system distributor 'The X.Org Foundation', version 11.0.11905000
Configured using:
'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
--localstatedir=/var --with-x-toolkit=gtk3 --with-xft --with-modules
'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong
-fno-plt' CPPFLAGS=-D_FORTIFY_SOURCE=2
LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'
Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GCONF GSETTINGS
NOTIFY ACL GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES
Important settings:
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: @im=ibus
locale-coding-system: utf-8-unix
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-07 9:03 Allen Li
@ 2017-11-07 10:00 ` Andreas Schwab
2017-11-07 17:08 ` Eli Zaretskii
2017-11-08 6:22 ` Allen Li
0 siblings, 2 replies; 27+ messages in thread
From: Andreas Schwab @ 2017-11-07 10:00 UTC (permalink / raw)
To: Allen Li; +Cc: 29189
On Nov 07 2017, Allen Li <vianchielfaura@gmail.com> wrote:
> Dired does not work with binary filenames
>
> For example, create such a file with Bash:
>
> touch $'\265'
>
> 1. Navigate to the directory containing said file with Dired
> 2. Mark file for deletion with d
> 3. x
>
> Expected:
>
> File deleted
It works if you add b to dired-listing-switches, though the buffer isn't
properly updated afterwards.
Andreas.
--
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-07 10:00 ` Andreas Schwab
@ 2017-11-07 17:08 ` Eli Zaretskii
2017-11-08 5:12 ` Allen Li
2017-11-08 6:22 ` Allen Li
1 sibling, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2017-11-07 17:08 UTC (permalink / raw)
To: Andreas Schwab; +Cc: vianchielfaura, 29189
> From: Andreas Schwab <schwab@suse.de>
> Date: Tue, 07 Nov 2017 11:00:13 +0100
> Cc: 29189@debbugs.gnu.org
>
> > touch $'\265'
> >
> > 1. Navigate to the directory containing said file with Dired
> > 2. Mark file for deletion with d
> > 3. x
> >
> > Expected:
> >
> > File deleted
>
> It works if you add b to dired-listing-switches, though the buffer isn't
> properly updated afterwards.
Here (with the latest emacs-26 branch) it doesn't produce any error
messages, and the buffer is updated to remove the name of that file,
but 'g' brings it back, so it wasn't really deleted.
I stepped into Fdelete_file, and saw that the file name is correctly
encoded before calling 'unlink'.
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-07 17:08 ` Eli Zaretskii
@ 2017-11-08 5:12 ` Allen Li
0 siblings, 0 replies; 27+ messages in thread
From: Allen Li @ 2017-11-08 5:12 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Andreas Schwab, 29189
On Tue, Nov 7, 2017 at 9:08 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Andreas Schwab <schwab@suse.de>
>> It works if you add b to dired-listing-switches, though the buffer isn't
>> properly updated afterwards.
>
> Here (with the latest emacs-26 branch) it doesn't produce any error
> messages, and the buffer is updated to remove the name of that file,
> but 'g' brings it back, so it wasn't really deleted.
>
> I stepped into Fdelete_file, and saw that the file name is correctly
> encoded before calling 'unlink'.
I can confirm the behavior Andreas describes on Emacs 25. There is a
slight bug where the buffer is not updated after the delete, but the
delete itself works
I do not know about Emacs 26, but it sounds like there's a regression.
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-07 10:00 ` Andreas Schwab
2017-11-07 17:08 ` Eli Zaretskii
@ 2017-11-08 6:22 ` Allen Li
2017-11-08 8:44 ` Eli Zaretskii
1 sibling, 1 reply; 27+ messages in thread
From: Allen Li @ 2017-11-08 6:22 UTC (permalink / raw)
To: Andreas Schwab; +Cc: 29189
On Tue, Nov 7, 2017 at 2:00 AM, Andreas Schwab <schwab@suse.de> wrote:
> On Nov 07 2017, Allen Li <vianchielfaura@gmail.com> wrote:
>
>> Dired does not work with binary filenames
>>
>> For example, create such a file with Bash:
>>
>> touch $'\265'
>>
>> 1. Navigate to the directory containing said file with Dired
>> 2. Mark file for deletion with d
>> 3. x
>>
>> Expected:
>>
>> File deleted
>
> It works if you add b to dired-listing-switches, though the buffer isn't
> properly updated afterwards.
I just discovered that -b does not play well with wdired. Should I
file a separate bug for that?
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-08 6:22 ` Allen Li
@ 2017-11-08 8:44 ` Eli Zaretskii
2017-11-11 6:59 ` Allen Li
0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2017-11-08 8:44 UTC (permalink / raw)
To: 29189, vianchielfaura, schwab
On November 8, 2017 8:22:01 AM GMT+02:00, Allen Li <vianchielfaura@gmail.com> wrote:
> On Tue, Nov 7, 2017 at 2:00 AM, Andreas Schwab <schwab@suse.de> wrote:
> > On Nov 07 2017, Allen Li <vianchielfaura@gmail.com> wrote:
> >
> >> Dired does not work with binary filenames
> >>
> >> For example, create such a file with Bash:
> >>
> >> touch $'\265'
> >>
> >> 1. Navigate to the directory containing said file with Dired
> >> 2. Mark file for deletion with d
> >> 3. x
> >>
> >> Expected:
> >>
> >> File deleted
> >
> > It works if you add b to dired-listing-switches, though the buffer
> isn't
> > properly updated afterwards.
>
> I just discovered that -b does not play well with wdired. Should I
> file a separate bug for that?
I don't understand why -b should be used at all. In my testing, it
wasn't needed. Maybe this depends on the locale? What's yours?
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-08 8:44 ` Eli Zaretskii
@ 2017-11-11 6:59 ` Allen Li
2017-11-11 8:20 ` Eli Zaretskii
0 siblings, 1 reply; 27+ messages in thread
From: Allen Li @ 2017-11-11 6:59 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: schwab, 29189
On Wed, Nov 8, 2017 at 12:44 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> I don't understand why -b should be used at all. In my testing, it
> wasn't needed. Maybe this depends on the locale? What's yours?
en_US.UTF-8
Without -b, the filename in Dired is two binary characters, \300 and
\265. With -b, the filename in Dired is four characters, \265
I'm using Emacs 25.3.1 and ls (GNU coreutils) 8.28
It sounds like Andreas is seeing the same behavior as me. What
behavior are you seeing when deleting the file with -b vs no -b?
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-11 6:59 ` Allen Li
@ 2017-11-11 8:20 ` Eli Zaretskii
2017-11-11 14:18 ` Eli Zaretskii
0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2017-11-11 8:20 UTC (permalink / raw)
To: Allen Li; +Cc: schwab, 29189
> From: Allen Li <vianchielfaura@gmail.com>
> Date: Fri, 10 Nov 2017 22:59:56 -0800
> Cc: bug-gnu-emacs@gnu.org, Andreas Schwab <schwab@suse.de>, 29189@debbugs.gnu.org
>
> On Wed, Nov 8, 2017 at 12:44 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> > I don't understand why -b should be used at all. In my testing, it
> > wasn't needed. Maybe this depends on the locale? What's yours?
>
> en_US.UTF-8
>
> Without -b, the filename in Dired is two binary characters, \300 and
> \265. With -b, the filename in Dired is four characters, \265
Sorry, it seems I was confused. You didn't originally say what file
name you expected to see in Dired. If the expected file name is \265,
a single byte, but you see \300\265 instead, then the problem is not
in deletion, the problem is in how Dired prepares file names for
display. I will look into that when I have time, if no one beats me
to it.
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-11 8:20 ` Eli Zaretskii
@ 2017-11-11 14:18 ` Eli Zaretskii
2017-11-11 15:21 ` Eli Zaretskii
0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2017-11-11 14:18 UTC (permalink / raw)
To: vianchielfaura, Kenichi Handa; +Cc: schwab, 29189
> Date: Sat, 11 Nov 2017 10:20:36 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: schwab@suse.de, 29189@debbugs.gnu.org
>
> > Without -b, the filename in Dired is two binary characters, \300 and
> > \265. With -b, the filename in Dired is four characters, \265
>
> Sorry, it seems I was confused. You didn't originally say what file
> name you expected to see in Dired. If the expected file name is \265,
> a single byte, but you see \300\265 instead, then the problem is not
> in deletion, the problem is in how Dired prepares file names for
> display. I will look into that when I have time, if no one beats me
> to it.
The problem is in insert-directory. It manually decodes each file
name which was output by 'ls', and that produces strangely
inconsistent results when the file name includes raw bytes: sometimes
we get the 2-byte sequence starting with \300, sometimes the original
byte survives unchanged, and sometimes I see the sequence \301\200
instead of a lone \300 in the file name. I'm trying to understand
what's going on and find a solution to that.
CC'ing Handa-san in the hope that he could comment on this or provide
some suggestions.
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-11 14:18 ` Eli Zaretskii
@ 2017-11-11 15:21 ` Eli Zaretskii
2017-11-16 6:31 ` Allen Li
0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2017-11-11 15:21 UTC (permalink / raw)
To: vianchielfaura; +Cc: 29189, schwab
> Date: Sat, 11 Nov 2017 16:18:20 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: schwab@suse.de, 29189@debbugs.gnu.org
>
> The problem is in insert-directory. It manually decodes each file
> name which was output by 'ls', and that produces strangely
> inconsistent results when the file name includes raw bytes: sometimes
> we get the 2-byte sequence starting with \300, sometimes the original
> byte survives unchanged, and sometimes I see the sequence \301\200
> instead of a lone \300 in the file name. I'm trying to understand
> what's going on and find a solution to that.
Can you please try the patch below? (You will need to re-dump Emacs
after patching files.el.)
diff --git a/lisp/files.el b/lisp/files.el
index b47411f..43198bc 100644
--- a/lisp/files.el
+++ b/lisp/files.el
@@ -6803,10 +6803,13 @@ insert-directory
val (get-text-property (point) 'dired-filename))
(goto-char (next-single-property-change
(point) 'dired-filename nil (point-max)))
- ;; Force no eol conversion on a file name, so
- ;; that CR is preserved.
- (decode-coding-region pos (point)
- (if val coding-no-eol coding))
+ (let ((fn (buffer-substring-no-properties pos (point))))
+ (delete-region pos (point))
+ (insert
+ ;; Force no eol conversion on a file name, so
+ ;; that CR is preserved.
+ (decode-coding-string (string-make-unibyte fn)
+ (if val coding-no-eol coding))))
(if val
(put-text-property pos (point)
'dired-filename t)))))))
^ permalink raw reply related [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-11 15:21 ` Eli Zaretskii
@ 2017-11-16 6:31 ` Allen Li
2017-11-16 16:00 ` Eli Zaretskii
0 siblings, 1 reply; 27+ messages in thread
From: Allen Li @ 2017-11-16 6:31 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 29189, Andreas Schwab
On Sat, Nov 11, 2017 at 7:21 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Sat, 11 Nov 2017 16:18:20 +0200
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: schwab@suse.de, 29189@debbugs.gnu.org
>>
>> The problem is in insert-directory. It manually decodes each file
>> name which was output by 'ls', and that produces strangely
>> inconsistent results when the file name includes raw bytes: sometimes
>> we get the 2-byte sequence starting with \300, sometimes the original
>> byte survives unchanged, and sometimes I see the sequence \301\200
>> instead of a lone \300 in the file name. I'm trying to understand
>> what's going on and find a solution to that.
>
> Can you please try the patch below? (You will need to re-dump Emacs
> after patching files.el.)
>
> diff --git a/lisp/files.el b/lisp/files.el
> index b47411f..43198bc 100644
> --- a/lisp/files.el
> +++ b/lisp/files.el
> @@ -6803,10 +6803,13 @@ insert-directory
> val (get-text-property (point) 'dired-filename))
> (goto-char (next-single-property-change
> (point) 'dired-filename nil (point-max)))
> - ;; Force no eol conversion on a file name, so
> - ;; that CR is preserved.
> - (decode-coding-region pos (point)
> - (if val coding-no-eol coding))
> + (let ((fn (buffer-substring-no-properties pos (point))))
> + (delete-region pos (point))
> + (insert
> + ;; Force no eol conversion on a file name, so
> + ;; that CR is preserved.
> + (decode-coding-string (string-make-unibyte fn)
> + (if val coding-no-eol coding))))
> (if val
> (put-text-property pos (point)
> 'dired-filename t)))))))
This patch works for me.
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-16 6:31 ` Allen Li
@ 2017-11-16 16:00 ` Eli Zaretskii
2017-11-18 14:42 ` Eli Zaretskii
0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2017-11-16 16:00 UTC (permalink / raw)
To: Allen Li; +Cc: 29189, schwab
> From: Allen Li <vianchielfaura@gmail.com>
> Date: Wed, 15 Nov 2017 22:31:48 -0800
> Cc: handa@gnu.org, Andreas Schwab <schwab@suse.de>, 29189@debbugs.gnu.org
>
> > diff --git a/lisp/files.el b/lisp/files.el
> > index b47411f..43198bc 100644
> > --- a/lisp/files.el
> > +++ b/lisp/files.el
> > @@ -6803,10 +6803,13 @@ insert-directory
> > val (get-text-property (point) 'dired-filename))
> > (goto-char (next-single-property-change
> > (point) 'dired-filename nil (point-max)))
> > - ;; Force no eol conversion on a file name, so
> > - ;; that CR is preserved.
> > - (decode-coding-region pos (point)
> > - (if val coding-no-eol coding))
> > + (let ((fn (buffer-substring-no-properties pos (point))))
> > + (delete-region pos (point))
> > + (insert
> > + ;; Force no eol conversion on a file name, so
> > + ;; that CR is preserved.
> > + (decode-coding-string (string-make-unibyte fn)
> > + (if val coding-no-eol coding))))
> > (if val
> > (put-text-property pos (point)
> > 'dired-filename t)))))))
>
> This patch works for me.
Thanks for testing. I'm still worried that we need to force text to
be unibyte in order for the decoding to work. So I'd like to dig into
the code to understand why, and maybe try to fix it if I find some
problems there. If I succeed, the result will work faster, because
the above patch is less efficient that decode-coding-region. Let me
look into this and get back to you in a few days.
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-16 16:00 ` Eli Zaretskii
@ 2017-11-18 14:42 ` Eli Zaretskii
2017-11-20 9:48 ` Andreas Schwab
2018-01-05 22:16 ` Stefan Monnier
0 siblings, 2 replies; 27+ messages in thread
From: Eli Zaretskii @ 2017-11-18 14:42 UTC (permalink / raw)
To: Kenichi Handa; +Cc: vianchielfaura, 29189, schwab
> Date: Thu, 16 Nov 2017 18:00:55 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 29189@debbugs.gnu.org, schwab@suse.de
>
> > From: Allen Li <vianchielfaura@gmail.com>
> > Date: Wed, 15 Nov 2017 22:31:48 -0800
> > Cc: handa@gnu.org, Andreas Schwab <schwab@suse.de>, 29189@debbugs.gnu.org
> >
> > > diff --git a/lisp/files.el b/lisp/files.el
> > > index b47411f..43198bc 100644
> > > --- a/lisp/files.el
> > > +++ b/lisp/files.el
> > > @@ -6803,10 +6803,13 @@ insert-directory
> > > val (get-text-property (point) 'dired-filename))
> > > (goto-char (next-single-property-change
> > > (point) 'dired-filename nil (point-max)))
> > > - ;; Force no eol conversion on a file name, so
> > > - ;; that CR is preserved.
> > > - (decode-coding-region pos (point)
> > > - (if val coding-no-eol coding))
> > > + (let ((fn (buffer-substring-no-properties pos (point))))
> > > + (delete-region pos (point))
> > > + (insert
> > > + ;; Force no eol conversion on a file name, so
> > > + ;; that CR is preserved.
> > > + (decode-coding-string (string-make-unibyte fn)
> > > + (if val coding-no-eol coding))))
> > > (if val
> > > (put-text-property pos (point)
> > > 'dired-filename t)))))))
> >
> > This patch works for me.
>
> Thanks for testing. I'm still worried that we need to force text to
> be unibyte in order for the decoding to work. So I'd like to dig into
> the code to understand why, and maybe try to fix it if I find some
> problems there. If I succeed, the result will work faster, because
> the above patch is less efficient that decode-coding-region. Let me
> look into this and get back to you in a few days.
I found that the alternative patch below solves the original problem
without any changes needed in files.el, and without introducing any
performance hits. Does anyone see a problem with this proposed patch?
Kenichi?
diff --git a/src/coding.c b/src/coding.c
index d790ad0..eaad0d7 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -7423,10 +7423,21 @@ decode_coding (struct coding_system *coding)
while (nbytes-- > 0)
{
- int c = *src++;
+ int c;
- if (c & 0x80)
- c = BYTE8_TO_CHAR (c);
+ /* Copy raw bytes in their 2-byte forms as single characters. */
+ if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
+ {
+ c = STRING_CHAR_ADVANCE (src);
+ nbytes--;
+ }
+ else
+ {
+ c = *src++;
+
+ if (c & 0x80)
+ c = BYTE8_TO_CHAR (c);
+ }
coding->charbuf[coding->charbuf_used++] = c;
}
produce_chars (coding, Qnil, 1);
^ permalink raw reply related [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-18 14:42 ` Eli Zaretskii
@ 2017-11-20 9:48 ` Andreas Schwab
2017-11-20 18:15 ` Eli Zaretskii
2018-01-05 22:16 ` Stefan Monnier
1 sibling, 1 reply; 27+ messages in thread
From: Andreas Schwab @ 2017-11-20 9:48 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 29189, vianchielfaura
On Nov 18 2017, Eli Zaretskii <eliz@gnu.org> wrote:
> I found that the alternative patch below solves the original problem
> without any changes needed in files.el, and without introducing any
> performance hits. Does anyone see a problem with this proposed patch?
> Kenichi?
>
> diff --git a/src/coding.c b/src/coding.c
> index d790ad0..eaad0d7 100644
> --- a/src/coding.c
> +++ b/src/coding.c
> @@ -7423,10 +7423,21 @@ decode_coding (struct coding_system *coding)
>
> while (nbytes-- > 0)
> {
> - int c = *src++;
> + int c;
>
> - if (c & 0x80)
> - c = BYTE8_TO_CHAR (c);
> + /* Copy raw bytes in their 2-byte forms as single characters. */
> + if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> + {
> + c = STRING_CHAR_ADVANCE (src);
CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
strings. I don't think it makes sense to use them for unibyte strings.
Andreas.
--
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-20 9:48 ` Andreas Schwab
@ 2017-11-20 18:15 ` Eli Zaretskii
2017-11-24 8:52 ` Eli Zaretskii
2017-12-02 5:21 ` Allen Li
0 siblings, 2 replies; 27+ messages in thread
From: Eli Zaretskii @ 2017-11-20 18:15 UTC (permalink / raw)
To: Andreas Schwab, handa; +Cc: vianchielfaura, 29189
> From: Andreas Schwab <schwab@suse.de>
> Cc: Kenichi Handa <handa@gnu.org>, vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> Date: Mon, 20 Nov 2017 10:48:09 +0100
>
> > + /* Copy raw bytes in their 2-byte forms as single characters. */
> > + if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > + {
> > + c = STRING_CHAR_ADVANCE (src);
>
> CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
> strings. I don't think it makes sense to use them for unibyte strings.
Right you are, thanks. Updated patch below.
diff --git a/src/coding.c b/src/coding.c
index d790ad0..ac55f87 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
while (nbytes-- > 0)
{
- int c = *src++;
+ int c;
- if (c & 0x80)
- c = BYTE8_TO_CHAR (c);
+ /* Copy raw bytes in their 2-byte forms from multibyte
+ text as single characters. */
+ if (coding->src_multibyte
+ && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
+ {
+ c = STRING_CHAR_ADVANCE (src);
+ nbytes--;
+ }
+ else
+ {
+ c = *src++;
+
+ if (c & 0x80)
+ c = BYTE8_TO_CHAR (c);
+ }
coding->charbuf[coding->charbuf_used++] = c;
}
produce_chars (coding, Qnil, 1);
^ permalink raw reply related [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-20 18:15 ` Eli Zaretskii
@ 2017-11-24 8:52 ` Eli Zaretskii
2017-12-01 8:41 ` Eli Zaretskii
2017-12-02 5:21 ` Allen Li
1 sibling, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2017-11-24 8:52 UTC (permalink / raw)
To: Kenichi Handa; +Cc: vianchielfaura, 29189
Ping! Kenichi, any comments on this issue or the proposed patch?
> Date: Mon, 20 Nov 2017 20:15:55 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: vianchielfaura@gmail.com, 29189@debbugs.gnu.org
>
> > From: Andreas Schwab <schwab@suse.de>
> > Cc: Kenichi Handa <handa@gnu.org>, vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> > Date: Mon, 20 Nov 2017 10:48:09 +0100
> >
> > > + /* Copy raw bytes in their 2-byte forms as single characters. */
> > > + if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > + {
> > > + c = STRING_CHAR_ADVANCE (src);
> >
> > CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
> > strings. I don't think it makes sense to use them for unibyte strings.
>
> Right you are, thanks. Updated patch below.
>
> diff --git a/src/coding.c b/src/coding.c
> index d790ad0..ac55f87 100644
> --- a/src/coding.c
> +++ b/src/coding.c
> @@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
>
> while (nbytes-- > 0)
> {
> - int c = *src++;
> + int c;
>
> - if (c & 0x80)
> - c = BYTE8_TO_CHAR (c);
> + /* Copy raw bytes in their 2-byte forms from multibyte
> + text as single characters. */
> + if (coding->src_multibyte
> + && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> + {
> + c = STRING_CHAR_ADVANCE (src);
> + nbytes--;
> + }
> + else
> + {
> + c = *src++;
> +
> + if (c & 0x80)
> + c = BYTE8_TO_CHAR (c);
> + }
> coding->charbuf[coding->charbuf_used++] = c;
> }
> produce_chars (coding, Qnil, 1);
>
>
>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-24 8:52 ` Eli Zaretskii
@ 2017-12-01 8:41 ` Eli Zaretskii
2017-12-09 9:03 ` Eli Zaretskii
0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2017-12-01 8:41 UTC (permalink / raw)
To: handa; +Cc: vianchielfaura, 29189
Ping! Ping! I'd _really_ like to fix this for Emacs 26, but I'm
bothered by the potential adverse consequences of making changes in
such a central piece of code.
Still hoping to get comments from Handa-san.
> Date: Fri, 24 Nov 2017 10:52:19 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: vianchielfaura@gmail.com, 29189@debbugs.gnu.org
>
> Ping! Kenichi, any comments on this issue or the proposed patch?
>
> > Date: Mon, 20 Nov 2017 20:15:55 +0200
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> >
> > > From: Andreas Schwab <schwab@suse.de>
> > > Cc: Kenichi Handa <handa@gnu.org>, vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> > > Date: Mon, 20 Nov 2017 10:48:09 +0100
> > >
> > > > + /* Copy raw bytes in their 2-byte forms as single characters. */
> > > > + if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > > + {
> > > > + c = STRING_CHAR_ADVANCE (src);
> > >
> > > CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
> > > strings. I don't think it makes sense to use them for unibyte strings.
> >
> > Right you are, thanks. Updated patch below.
> >
> > diff --git a/src/coding.c b/src/coding.c
> > index d790ad0..ac55f87 100644
> > --- a/src/coding.c
> > +++ b/src/coding.c
> > @@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
> >
> > while (nbytes-- > 0)
> > {
> > - int c = *src++;
> > + int c;
> >
> > - if (c & 0x80)
> > - c = BYTE8_TO_CHAR (c);
> > + /* Copy raw bytes in their 2-byte forms from multibyte
> > + text as single characters. */
> > + if (coding->src_multibyte
> > + && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > + {
> > + c = STRING_CHAR_ADVANCE (src);
> > + nbytes--;
> > + }
> > + else
> > + {
> > + c = *src++;
> > +
> > + if (c & 0x80)
> > + c = BYTE8_TO_CHAR (c);
> > + }
> > coding->charbuf[coding->charbuf_used++] = c;
> > }
> > produce_chars (coding, Qnil, 1);
> >
> >
> >
> >
>
>
>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-12-01 8:41 ` Eli Zaretskii
@ 2017-12-09 9:03 ` Eli Zaretskii
2017-12-15 9:09 ` Eli Zaretskii
0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2017-12-09 9:03 UTC (permalink / raw)
To: handa; +Cc: vianchielfaura, 29189
Ping! Ping! Ping!
> Date: Fri, 01 Dec 2017 10:41:47 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: vianchielfaura@gmail.com, 29189@debbugs.gnu.org
>
> Ping! Ping! I'd _really_ like to fix this for Emacs 26, but I'm
> bothered by the potential adverse consequences of making changes in
> such a central piece of code.
>
> Still hoping to get comments from Handa-san.
>
> > Date: Fri, 24 Nov 2017 10:52:19 +0200
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> >
> > Ping! Kenichi, any comments on this issue or the proposed patch?
> >
> > > Date: Mon, 20 Nov 2017 20:15:55 +0200
> > > From: Eli Zaretskii <eliz@gnu.org>
> > > Cc: vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> > >
> > > > From: Andreas Schwab <schwab@suse.de>
> > > > Cc: Kenichi Handa <handa@gnu.org>, vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> > > > Date: Mon, 20 Nov 2017 10:48:09 +0100
> > > >
> > > > > + /* Copy raw bytes in their 2-byte forms as single characters. */
> > > > > + if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > > > + {
> > > > > + c = STRING_CHAR_ADVANCE (src);
> > > >
> > > > CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
> > > > strings. I don't think it makes sense to use them for unibyte strings.
> > >
> > > Right you are, thanks. Updated patch below.
> > >
> > > diff --git a/src/coding.c b/src/coding.c
> > > index d790ad0..ac55f87 100644
> > > --- a/src/coding.c
> > > +++ b/src/coding.c
> > > @@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
> > >
> > > while (nbytes-- > 0)
> > > {
> > > - int c = *src++;
> > > + int c;
> > >
> > > - if (c & 0x80)
> > > - c = BYTE8_TO_CHAR (c);
> > > + /* Copy raw bytes in their 2-byte forms from multibyte
> > > + text as single characters. */
> > > + if (coding->src_multibyte
> > > + && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > + {
> > > + c = STRING_CHAR_ADVANCE (src);
> > > + nbytes--;
> > > + }
> > > + else
> > > + {
> > > + c = *src++;
> > > +
> > > + if (c & 0x80)
> > > + c = BYTE8_TO_CHAR (c);
> > > + }
> > > coding->charbuf[coding->charbuf_used++] = c;
> > > }
> > > produce_chars (coding, Qnil, 1);
> > >
> > >
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-12-09 9:03 ` Eli Zaretskii
@ 2017-12-15 9:09 ` Eli Zaretskii
0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2017-12-15 9:09 UTC (permalink / raw)
To: handa; +Cc: vianchielfaura, 29189
Ping! Ping! Ping! Ping!
> Date: Sat, 09 Dec 2017 11:03:57 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: vianchielfaura@gmail.com, 29189@debbugs.gnu.org
>
> Ping! Ping! Ping!
>
> > Date: Fri, 01 Dec 2017 10:41:47 +0200
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> >
> > Ping! Ping! I'd _really_ like to fix this for Emacs 26, but I'm
> > bothered by the potential adverse consequences of making changes in
> > such a central piece of code.
> >
> > Still hoping to get comments from Handa-san.
> >
> > > Date: Fri, 24 Nov 2017 10:52:19 +0200
> > > From: Eli Zaretskii <eliz@gnu.org>
> > > Cc: vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> > >
> > > Ping! Kenichi, any comments on this issue or the proposed patch?
> > >
> > > > Date: Mon, 20 Nov 2017 20:15:55 +0200
> > > > From: Eli Zaretskii <eliz@gnu.org>
> > > > Cc: vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> > > >
> > > > > From: Andreas Schwab <schwab@suse.de>
> > > > > Cc: Kenichi Handa <handa@gnu.org>, vianchielfaura@gmail.com, 29189@debbugs.gnu.org
> > > > > Date: Mon, 20 Nov 2017 10:48:09 +0100
> > > > >
> > > > > > + /* Copy raw bytes in their 2-byte forms as single characters. */
> > > > > > + if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > > > > + {
> > > > > > + c = STRING_CHAR_ADVANCE (src);
> > > > >
> > > > > CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
> > > > > strings. I don't think it makes sense to use them for unibyte strings.
> > > >
> > > > Right you are, thanks. Updated patch below.
> > > >
> > > > diff --git a/src/coding.c b/src/coding.c
> > > > index d790ad0..ac55f87 100644
> > > > --- a/src/coding.c
> > > > +++ b/src/coding.c
> > > > @@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
> > > >
> > > > while (nbytes-- > 0)
> > > > {
> > > > - int c = *src++;
> > > > + int c;
> > > >
> > > > - if (c & 0x80)
> > > > - c = BYTE8_TO_CHAR (c);
> > > > + /* Copy raw bytes in their 2-byte forms from multibyte
> > > > + text as single characters. */
> > > > + if (coding->src_multibyte
> > > > + && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > > + {
> > > > + c = STRING_CHAR_ADVANCE (src);
> > > > + nbytes--;
> > > > + }
> > > > + else
> > > > + {
> > > > + c = *src++;
> > > > +
> > > > + if (c & 0x80)
> > > > + c = BYTE8_TO_CHAR (c);
> > > > + }
> > > > coding->charbuf[coding->charbuf_used++] = c;
> > > > }
> > > > produce_chars (coding, Qnil, 1);
> > > >
> > > >
>
>
>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-20 18:15 ` Eli Zaretskii
2017-11-24 8:52 ` Eli Zaretskii
@ 2017-12-02 5:21 ` Allen Li
2017-12-02 9:01 ` Eli Zaretskii
1 sibling, 1 reply; 27+ messages in thread
From: Allen Li @ 2017-12-02 5:21 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Andreas Schwab, 29189
On Mon, Nov 20, 2017 at 10:15 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> Right you are, thanks. Updated patch below.
>
> diff --git a/src/coding.c b/src/coding.c
> index d790ad0..ac55f87 100644
> --- a/src/coding.c
> +++ b/src/coding.c
> @@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
>
> while (nbytes-- > 0)
> {
> - int c = *src++;
> + int c;
>
> - if (c & 0x80)
> - c = BYTE8_TO_CHAR (c);
> + /* Copy raw bytes in their 2-byte forms from multibyte
> + text as single characters. */
> + if (coding->src_multibyte
> + && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> + {
> + c = STRING_CHAR_ADVANCE (src);
> + nbytes--;
> + }
> + else
> + {
> + c = *src++;
> +
> + if (c & 0x80)
> + c = BYTE8_TO_CHAR (c);
> + }
> coding->charbuf[coding->charbuf_used++] = c;
> }
> produce_chars (coding, Qnil, 1);
I applied this patch to master (0b6f4f2c60) and it fixes the bug and
doesn't crash Emacs immediately. The code also looks right, but I am
not familiar with Emacs's C code. A few questions.
Why do we have to handle multibyte strings in this function
decode_coding? (I found the answer in the docs)
Can you briefly explain how Emacs internally stores unibyte and
multibyte strings? (I found the answer in character.h)
After doing the above research, I can more confidently say this is
right, but having an expert opinion would be nice.
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-12-02 5:21 ` Allen Li
@ 2017-12-02 9:01 ` Eli Zaretskii
0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2017-12-02 9:01 UTC (permalink / raw)
To: Allen Li; +Cc: schwab, 29189
> From: Allen Li <vianchielfaura@gmail.com>
> Date: Fri, 1 Dec 2017 21:21:14 -0800
> Cc: Andreas Schwab <schwab@suse.de>, handa@gnu.org, 29189@debbugs.gnu.org
>
> I applied this patch to master (0b6f4f2c60) and it fixes the bug and
> doesn't crash Emacs immediately. The code also looks right, but I am
> not familiar with Emacs's C code. A few questions.
>
> Why do we have to handle multibyte strings in this function
> decode_coding? (I found the answer in the docs)
decode_coding needs to work when a series of raw bytes is inserted
into a multibyte buffer (which happens in the Dired case).
> Can you briefly explain how Emacs internally stores unibyte and
> multibyte strings? (I found the answer in character.h)
Right, the details are in that header file.
> After doing the above research, I can more confidently say this is
> right, but having an expert opinion would be nice.
Thanks for proofreading the patch and testing it. I'm waiting for
Handa-san to comment on it.
^ permalink raw reply [flat|nested] 27+ messages in thread
* bug#29189: 25.3; Dired does not work with binary filenames
2017-11-18 14:42 ` Eli Zaretskii
2017-11-20 9:48 ` Andreas Schwab
@ 2018-01-05 22:16 ` Stefan Monnier
[not found] ` <83h8rz9x6k.fsf@gnu.org>
1 sibling, 1 reply; 27+ messages in thread
From: Stefan Monnier @ 2018-01-05 22:16 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 29189, vianchielfaura, schwab
> I found that the alternative patch below solves the original problem
> without any changes needed in files.el, and without introducing any
> performance hits. Does anyone see a problem with this proposed patch?
[ With Andreas's adjustment] It looks sane to me.
Decoding applied to multibyte text is a rather odd situation (tho I'm
surprised this problem hasn't been noticed until now).
It should very much come with a few tests to verify that
(decode-coding-string (string-to-multibyte (encode-coding-string X)))
is just a more expensive alternative to
(decode-coding-string (encode-coding-string X))).
I'd also be tempted to additionally signal an error if a non-byte
(i.e. a char that's neither ASCII nor eight-bit-byte) is found since
"decoding" in that case is meaningless. Tho this is obviously not for
the emacs-26 branch.
Stefan
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2018-09-09 6:12 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CADbSrJytKAysg4DRNR4iJD5JJbn7iwi_28Gh_oabaE-rnVyqLw@mail.gmail.com>
2018-09-09 0:31 ` bug#29189: 25.3; Dired does not work with binary filenames Allen Li
2018-09-09 6:12 ` Eli Zaretskii
2017-11-07 9:03 Allen Li
2017-11-07 10:00 ` Andreas Schwab
2017-11-07 17:08 ` Eli Zaretskii
2017-11-08 5:12 ` Allen Li
2017-11-08 6:22 ` Allen Li
2017-11-08 8:44 ` Eli Zaretskii
2017-11-11 6:59 ` Allen Li
2017-11-11 8:20 ` Eli Zaretskii
2017-11-11 14:18 ` Eli Zaretskii
2017-11-11 15:21 ` Eli Zaretskii
2017-11-16 6:31 ` Allen Li
2017-11-16 16:00 ` Eli Zaretskii
2017-11-18 14:42 ` Eli Zaretskii
2017-11-20 9:48 ` Andreas Schwab
2017-11-20 18:15 ` Eli Zaretskii
2017-11-24 8:52 ` Eli Zaretskii
2017-12-01 8:41 ` Eli Zaretskii
2017-12-09 9:03 ` Eli Zaretskii
2017-12-15 9:09 ` Eli Zaretskii
2017-12-02 5:21 ` Allen Li
2017-12-02 9:01 ` Eli Zaretskii
2018-01-05 22:16 ` Stefan Monnier
[not found] ` <83h8rz9x6k.fsf@gnu.org>
[not found] ` <jwvbmi72fak.fsf-monnier+emacs@gnu.org>
2018-01-06 16:04 ` Eli Zaretskii
2018-01-07 15:20 ` Stefan Monnier
2018-01-07 17:53 ` Eli Zaretskii
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.