* emacs misbehaves without --unibyte
@ 2002-05-28 20:08 Paul Stoeber
2002-05-28 21:40 ` Eli Zaretskii
0 siblings, 1 reply; 19+ messages in thread
From: Paul Stoeber @ 2002-05-28 20:08 UTC (permalink / raw)
Miles Bader miles@gnu.org on Tue, 28 May 2002 16:49:04 GMT:
> It should work fine if you've set the languange environment correctly
> (so it knows what the non-ASCII characters are), e.g. by using
> `set-language-environment', or having an appropriate setting of LANG.
That means, my LANG=C is inappropriate for emacs.
I don't care about l10n/i18n (that's what LANG=C should convey to programs),
and I want to steer through alien filesystems and text/binary files gracefully
(e.g. using octal escapes).
In its default usage, emacs doesn't meet this requirement---it doesn't have
the robust 8-bit cleanness of bash, nvi, perl and most other unix tools.
But I guess that's okay, it's just different from the tools I'm used to, and
it was very irritating until I found one possible fix (--unibyte).
Neither (setq set-language-environment "Latin-1") nor
(setq file-name-coding-system 'latin-1) creates an 8-bit clean
environment. Trying to open the file $'gr\200\340ve' in dired opens
$'gr\236\240\201\340ve' instead.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-28 20:08 emacs misbehaves without --unibyte Paul Stoeber
@ 2002-05-28 21:40 ` Eli Zaretskii
2002-05-29 0:18 ` Paul Stoeber
2002-05-30 17:00 ` Richard Stallman
0 siblings, 2 replies; 19+ messages in thread
From: Eli Zaretskii @ 2002-05-28 21:40 UTC (permalink / raw)
Cc: bug-gnu-emacs
> From: Paul Stoeber <paul.stoeber@stud.tu-ilmenau.de>
> Date: Tue, 28 May 2002 22:08:14 +0200
>
> Neither (setq set-language-environment "Latin-1") nor
> (setq file-name-coding-system 'latin-1) creates an 8-bit clean
> environment.
Emacs is a text editor, not a binary file editor. So 8-bit cleanness
is not the most important goal for it.
There are specialized modes, such as hexl, for editing binary files.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-28 21:40 ` Eli Zaretskii
@ 2002-05-29 0:18 ` Paul Stoeber
2002-05-29 6:23 ` Eli Zaretskii
2002-05-30 17:00 ` Richard Stallman
1 sibling, 1 reply; 19+ messages in thread
From: Paul Stoeber @ 2002-05-29 0:18 UTC (permalink / raw)
On Wed, May 29, 2002 at 12:40:42AM +0300, Eli Zaretskii wrote:
> > From: Paul Stoeber <paul.stoeber@stud.tu-ilmenau.de>
> > Date: Tue, 28 May 2002 22:08:14 +0200
> >
> > Neither (setq set-language-environment "Latin-1") nor
> > (setq file-name-coding-system 'latin-1) creates an 8-bit clean
> > environment.
>
> Emacs is a text editor, not a binary file editor. So 8-bit cleanness
> is not the most important goal for it.
>
> There are specialized modes, such as hexl, for editing binary files.
(How is Emacs not a binary file editor when it has hexl mode?)
I started this thread because default emacs wouldn't let me navigate
filesystems that contain funny filenames, so the "8-bit cleanness"
discussion only applies to file name handling (although I had also
mentioned "text/binary files" in a general statement). Is it reasonable
for Emacs to refuse to open existing files and to invent new file names
in place of existing ones?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-29 0:18 ` Paul Stoeber
@ 2002-05-29 6:23 ` Eli Zaretskii
2002-05-29 8:56 ` Paul Stoeber
0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2002-05-29 6:23 UTC (permalink / raw)
Cc: bug-gnu-emacs
On Wed, 29 May 2002, Paul Stoeber wrote:
> (How is Emacs not a binary file editor when it has hexl mode?)
It's not a binary file editor if you are in a mode other than hexl.
> I started this thread because default emacs wouldn't let me navigate
> filesystems that contain funny filenames, so the "8-bit cleanness"
> discussion only applies to file name handling (although I had also
> mentioned "text/binary files" in a general statement).
For that, Miles gave the solution: you should set up your language
environment correctly, or set file-name-coding-system explicitly.
I replied in addition to what Miles said, thinking that you really meant
8-bit cleanliness throughout.
> Is it reasonable
> for Emacs to refuse to open existing files and to invent new file names
> in place of existing ones?
No. But the ``reasonable'' thing is hard to implement without hints from
the user's environment. Please remember that Emacs decides where a file
name starts and ends in the Dired buffer by using a set of convoluted
regexps designed to parse the "ls -la" output for file's name, date,
time, attributes, etc. A stray 8-bit byte can cause spurious wrong
matches of those regexps, and the net effect is what you reported.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-29 6:23 ` Eli Zaretskii
@ 2002-05-29 8:56 ` Paul Stoeber
2002-05-29 8:58 ` Eli Zaretskii
2002-05-29 9:00 ` Eli Zaretskii
0 siblings, 2 replies; 19+ messages in thread
From: Paul Stoeber @ 2002-05-29 8:56 UTC (permalink / raw)
On Wed, May 29, 2002 at 09:23:11AM +0300, Eli Zaretskii wrote:
> > I started this thread because default emacs wouldn't let me navigate
> > filesystems that contain funny filenames, so the "8-bit cleanness"
> > discussion only applies to file name handling (although I had also
> > mentioned "text/binary files" in a general statement).
>
> For that, Miles gave the solution: you should set up your language
> environment correctly, or set file-name-coding-system explicitly.
Yes. If you simply want to use dired as a robust filesystem browser
(like bash, only more comfortable), regardless of your language
or the language of who created the files, then
(setq file-name-coding-system 'no-conversion)
seems to be a solution. It works in the real life cases I've tried,
but will stop working if someone chooses to put a newline in a name.
> Please remember that Emacs decides where a file
> name starts and ends in the Dired buffer by using a set of convoluted
> regexps designed to parse the "ls -la" output for file's name, date,
> time, attributes, etc. A stray 8-bit byte can cause spurious wrong
> matches of those regexps, and the net effect is what you reported.
"ls -la"'s output is made for users' eyes and trying to use it
as a back-end sacrifices total robustness.
I once had the same problem with smbclient. I wanted to use it in a script,
but didn't want to sacrifice any robustness. So I added a --batch-ouput
option, which was really effortless because all the data was at hand in the
C code, it was just a matter of changing the `printf's. I made the output
so that it was (a) unambiguous and (b) easy to parse. And the script
has been performing nicely without any glitches ever since.
Maybe that's not an option for Emacs because it wants to use whatever
/bin/ls is available on the system.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-29 8:56 ` Paul Stoeber
@ 2002-05-29 8:58 ` Eli Zaretskii
2002-05-29 9:00 ` Eli Zaretskii
1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2002-05-29 8:58 UTC (permalink / raw)
Cc: bug-gnu-emacs
On Wed, 29 May 2002, Paul Stoeber wrote:
> I once had the same problem with smbclient. I wanted to use it in a script,
> but didn't want to sacrifice any robustness. So I added a --batch-ouput
> option, which was really effortless because all the data was at hand in the
> C code, it was just a matter of changing the `printf's. I made the output
> so that it was (a) unambiguous and (b) easy to parse. And the script
> has been performing nicely without any glitches ever since.
>
> Maybe that's not an option for Emacs because it wants to use whatever
> /bin/ls is available on the system.
Exactly. GNU `ls' already has such an option, but there are many `ls'
varieties out there that don't support it.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-29 8:56 ` Paul Stoeber
2002-05-29 8:58 ` Eli Zaretskii
@ 2002-05-29 9:00 ` Eli Zaretskii
2002-05-29 9:10 ` Miles Bader
` (2 more replies)
1 sibling, 3 replies; 19+ messages in thread
From: Eli Zaretskii @ 2002-05-29 9:00 UTC (permalink / raw)
Cc: Paul Stoeber
On Wed, 29 May 2002, Paul Stoeber wrote:
> On Wed, May 29, 2002 at 09:23:11AM +0300, Eli Zaretskii wrote:
> > > I started this thread because default emacs wouldn't let me navigate
> > > filesystems that contain funny filenames, so the "8-bit cleanness"
> > > discussion only applies to file name handling (although I had also
> > > mentioned "text/binary files" in a general statement).
> >
> > For that, Miles gave the solution: you should set up your language
> > environment correctly, or set file-name-coding-system explicitly.
>
> Yes. If you simply want to use dired as a robust filesystem browser
> (like bash, only more comfortable), regardless of your language
> or the language of who created the files, then
>
> (setq file-name-coding-system 'no-conversion)
>
> seems to be a solution.
Should we perhaps make no-conversion be the default value of
file-name-coding-system, instead of nil?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-29 9:00 ` Eli Zaretskii
@ 2002-05-29 9:10 ` Miles Bader
2002-05-29 10:06 ` Eli Zaretskii
2002-05-29 13:13 ` Paul Stoeber
2002-05-30 17:04 ` Richard Stallman
2 siblings, 1 reply; 19+ messages in thread
From: Miles Bader @ 2002-05-29 9:10 UTC (permalink / raw)
Cc: emacs-devel, handa, Paul Stoeber
Eli Zaretskii <eliz@is.elta.co.il> writes:
> Should we perhaps make no-conversion be the default value of
> file-name-coding-system, instead of nil?
I presume you mean, if there's no language environment. Then people
will always see non-ascii characters in filenames as octal escapes,
which I guess is the best that can be done...
-Miles
--
`Life is a boundless sea of bitterness'
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-29 9:10 ` Miles Bader
@ 2002-05-29 10:06 ` Eli Zaretskii
2002-05-29 13:11 ` Robert J. Chassell
0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2002-05-29 10:06 UTC (permalink / raw)
Cc: emacs-devel, handa, Paul Stoeber
On 29 May 2002, Miles Bader wrote:
> Eli Zaretskii <eliz@is.elta.co.il> writes:
> > Should we perhaps make no-conversion be the default value of
> > file-name-coding-system, instead of nil?
>
> I presume you mean, if there's no language environment.
Yes.
For some reason, many systems have file-name-coding-system set to nil by
default (I never had time to find out why), which is bad mantra.
> Then people
> will always see non-ascii characters in filenames as octal escapes,
> which I guess is the best that can be done...
If they use standard-display-8bit, they might even see non-ASCII characters
instead.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-29 10:06 ` Eli Zaretskii
@ 2002-05-29 13:11 ` Robert J. Chassell
2002-05-29 17:02 ` Eli Zaretskii
0 siblings, 1 reply; 19+ messages in thread
From: Robert J. Chassell @ 2002-05-29 13:11 UTC (permalink / raw)
Eli Zaretskii <eliz@is.elta.co.il> writes:
For some reason, many systems have file-name-coding-system set to nil by
default (I never had time to find out why), which is bad mantra.
This is because nil is the default setting for
`file-name-coding-system' when for
emacs -q --no-site-file --eval '(blink-cursor-mode 0)'
from today's CVS snapshot, 2002 May 29 12:39 UTC,
GNU Emacs 21.3.50.7 (i686-pc-linux-gnu, X toolkit)
It turns out that the documentation for `file-name-coding-system' says
If it is nil, `default-file-name-coding-system' (which see) is used.
And the default value for `default-file-name-coding-system' is also
nil.
The documentation for `default-file-name-coding-system' refers back
to the documentation for `file-name-coding-system' in a circular loop.
What would be a good value to provide `default-file-name-coding-system'?
It seems odd to:
(set-language-environment 'English)
[Hmmm... in my test instance of Emacs, the preceding expression does
not change the reported value of `default-file-name-coding-system'; is
this a bug in Emacs or am I doing something wrong?]
--
Robert J. Chassell bob@rattlesnake.com
Rattlesnake Enterprises http://www.rattlesnake.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-29 13:11 ` Robert J. Chassell
@ 2002-05-29 17:02 ` Eli Zaretskii
2002-05-31 7:04 ` Richard Stallman
0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2002-05-29 17:02 UTC (permalink / raw)
Cc: emacs-devel
> From: "Robert J. Chassell" <bob@rattlesnake.com>
> Date: Wed, 29 May 2002 13:11:23 +0000 (UTC)
>
> It turns out that the documentation for `file-name-coding-system' says
>
> If it is nil, `default-file-name-coding-system' (which see) is used.
>
> And the default value for `default-file-name-coding-system' is also
> nil.
Indeed.
> What would be a good value to provide `default-file-name-coding-system'?
I thought no-conversion would be good, but it sounds like it isn't.
How about raw-text, will that do better?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-29 9:00 ` Eli Zaretskii
2002-05-29 9:10 ` Miles Bader
@ 2002-05-29 13:13 ` Paul Stoeber
2002-05-30 17:04 ` Richard Stallman
2 siblings, 0 replies; 19+ messages in thread
From: Paul Stoeber @ 2002-05-29 13:13 UTC (permalink / raw)
On Wed, 29 May 2002, Paul Stoeber wrote:
> If you simply want to use dired as a robust filesystem browser
> (like bash, only more comfortable), regardless of your language
> or the language of who created the files, then
>
> (setq file-name-coding-system 'no-conversion)
>
> seems to be a solution.
Alas, this is no longer true:
q@xyz:~$ mkdir -p $'\340/\350'
q@xyz:~$ echo XXX > $'\340/\350/x'
q@xyz:~$ /e/bin/emacs --eval "(progn (setq file-name-coding-system 'no-conversion) (dired \"~/\"))"
Go to \340. RET. Works. Go to \350. RET.
"File no longer exists; type `g' to update Dired buffer".
Again, --unibyte fixes this, even when not setting file-name-coding-system.
Emacs docs:
dired-listing-switches's value is "-al"
Documentation:
*Switches passed to `ls' for dired. MUST contain the `l' option.
May contain all other options that don't contradict `-l';
may contain even `F', `b', `i' and `s'. See also the variable
`dired-ls-F-marks-symlinks' concerning the `F' switch.
This looks very promising, especially the `b' option.
After
(setq dired-listing-switches "-alb")
, not using --unibyte, the above bug (changing into \340 but not \350) still
happens, but dired will correctly open the files $'a\340b' and even $'a\nb'
(filename with a newline in it. and C-x C-s will even write to the correct
file).
So, using the `b' option of ls seems to be the right
way to go for robustness, but the `b' support is still buggy:
dired won't open files with spaces in their name (for example
q@xyz:~$ echo XXX > 'a b'
). It says "File no longer exists; type `g' to update Dired buffer",
and of course *that* is not fixed by --unibyte.
When using the `b' option, dired-do-shell-command passes some filenames
incorrectly to the program: 'a b' and $'a\nb' are both passed as
'ab', but $'a\tb' and $'a\340b' are passed correctly (without --unibyte).
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-29 9:00 ` Eli Zaretskii
2002-05-29 9:10 ` Miles Bader
2002-05-29 13:13 ` Paul Stoeber
@ 2002-05-30 17:04 ` Richard Stallman
2 siblings, 0 replies; 19+ messages in thread
From: Richard Stallman @ 2002-05-30 17:04 UTC (permalink / raw)
Cc: emacs-devel, handa, paul.stoeber
Should we perhaps make no-conversion be the default value of
file-name-coding-system, instead of nil?
That is an interesting idea. Does anyone see any problems with it?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-28 21:40 ` Eli Zaretskii
2002-05-29 0:18 ` Paul Stoeber
@ 2002-05-30 17:00 ` Richard Stallman
2002-05-30 18:46 ` Paul Stoeber
1 sibling, 1 reply; 19+ messages in thread
From: Richard Stallman @ 2002-05-30 17:00 UTC (permalink / raw)
Cc: eliz
There are specialized modes, such as hexl, for editing binary files.
Does M-x find-file-literally do what you want?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-30 17:00 ` Richard Stallman
@ 2002-05-30 18:46 ` Paul Stoeber
2002-05-31 21:28 ` Richard Stallman
0 siblings, 1 reply; 19+ messages in thread
From: Paul Stoeber @ 2002-05-30 18:46 UTC (permalink / raw)
Cc: bug-gnu-emacs
On Thu, May 30, 2002 at 11:00:54AM -0600, Richard Stallman wrote:
> There are specialized modes, such as hexl, for editing binary files.
>
> Does M-x find-file-literally do what you want?
Yes, it handles file contents in exactly the way I like best, thanks.
In this thread, I'm concerned with how Emacs (specifically Dired)
handles filenames.
I'm trying to use Dired plus customizations as a replacement for
Midnight Commander (mc).
I've seen that Dired is not as robust as mc when confronted with arbitrary
filenames in {\001, \002, ..., \377}+, so I'm bugging bug.gnu.emacs with
reports about how dired-find-file and dired-do-shell-command fail when exposed
to all sorts of funny filenames. My latest findings are expressed in the
posting that starts with "Alas, " after the quoteblock.
It might well be that I'll continue to post reports of this sort until all
filenames the C library can handle are handled correctly by Dired too, unless
the Emacs developers vote that this is not a worthwhile goal to pursue.
^ permalink raw reply [flat|nested] 19+ messages in thread
* emacs misbehaves without --unibyte
@ 2002-05-28 16:12 Paul Stoeber
2002-05-28 16:49 ` Miles Bader
0 siblings, 1 reply; 19+ messages in thread
From: Paul Stoeber @ 2002-05-28 16:12 UTC (permalink / raw)
In GNU Emacs 21.2.1 (powerpc-unknown-linux-gnu)
of 2002-05-26 on xyz
configured using `configure --prefix=/e --without-x'
Important settings:
value of $LC_ALL: nil
value of $LC_COLLATE: nil
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: C
locale-coding-system: nil
default-enable-multibyte-characters: t
Built without modification from
MD5:f4b58e5c2d923fc92495e0c2f167c5db URL:ftp://ftp.cs.tu-berlin.de/pub/gnu/emacs/emacs-21.2.tar.gz
In bash in my home directory:
q@xyz:~$ echo gr340ve > $'gr\340ve'
q@xyz:~$ /e/bin/emacs --eval '(dired "~/")'
Now in dired, I go to the file `gr?ve' and type RET. The message line
says "File no longer exists; type `g' to update Dired buffer". (I think
it should have opened the file.) C-x C-c.
q@xyz:~$ rm gr?ve
q@xyz:~$ mkdir -p $'gr\340ve'/x/y/z
q@xyz:~$ /e/bin/emacs --eval '(dired "~/")'
Now in dired, I go to the directory `gr?ve' and type RET. The message line
says "File no longer exists; type `g' to update Dired buffer". (I think
it should have changed into the directory. Now I can't browse the directory
tree below `gr?ve'. This is a grave limitation.) C-x C-c.
q@xyz:~$ rm -rf gr?ve
q@xyz:~$ echo gr340ve > $'gr\340ve'
q@xyz:~$ /e/bin/emacs --eval '(dired "~/")'
Now in dired, I go to the file `gr?ve' and type the `a' key. There's an
_empty_ buffer with the name `gr?ve' in its status line. (The file
$'gr\340ve' contains the string "gr340ve", but due to the first
experiment I'm not surprised.) X X X RET C-x C-s C-x C-c.
q@xyz:~$ ls gr* | cat -vet
grM-^AM-`ve$
grM-`ve$
q@xyz:~$ cat $'gr\201\340ve'
XXX
q@xyz:~$
Is that supposed to happen?
Everything is fine if I run emacs as `/e/bin/emacs --unibyte'. If multibyte
support is so intrusive, shouldn't --unibyte be the default? I think all
sites that have some 8-bit filenames will need it. If you don't decide to make
it the default, maybe it should be mentioned in `(efaq) Bugs and problems'.
Last but not least:
q@xyz:~$ rm gr?ve gr??ve
q@xyz:~$ echo gr200340ve > $'gr\200\340ve' # that's two-oh-oh, not 201
q@xyz:~$ echo gr340ve > $'gr\340ve'
q@xyz:~$ /e/bin/emacs --eval '(dired "~/")'
The last two lines of an otherwise cleared screen:
Fatal error (11).Segmentation fault
q@xyz:~$
Using --unibyte fixes this too.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: emacs misbehaves without --unibyte
2002-05-28 16:12 Paul Stoeber
@ 2002-05-28 16:49 ` Miles Bader
0 siblings, 0 replies; 19+ messages in thread
From: Miles Bader @ 2002-05-28 16:49 UTC (permalink / raw)
It should work fine if you've set the languange environment correctly
(so it knows what the non-ASCII characters are), e.g. by using
`set-language-environment', or having an appropriate setting of LANG.
If you don't want to do that, you can just tell it how to interpret file
names, like:
(setq file-name-coding-system 'latin-1)
-Miles
--
80% of success is just showing up. --Woody Allen
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2002-05-31 21:28 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-28 20:08 emacs misbehaves without --unibyte Paul Stoeber
2002-05-28 21:40 ` Eli Zaretskii
2002-05-29 0:18 ` Paul Stoeber
2002-05-29 6:23 ` Eli Zaretskii
2002-05-29 8:56 ` Paul Stoeber
2002-05-29 8:58 ` Eli Zaretskii
2002-05-29 9:00 ` Eli Zaretskii
2002-05-29 9:10 ` Miles Bader
2002-05-29 10:06 ` Eli Zaretskii
2002-05-29 13:11 ` Robert J. Chassell
2002-05-29 17:02 ` Eli Zaretskii
2002-05-31 7:04 ` Richard Stallman
2002-05-29 13:13 ` Paul Stoeber
2002-05-30 17:04 ` Richard Stallman
2002-05-30 17:00 ` Richard Stallman
2002-05-30 18:46 ` Paul Stoeber
2002-05-31 21:28 ` Richard Stallman
-- strict thread matches above, loose matches on Subject: below --
2002-05-28 16:12 Paul Stoeber
2002-05-28 16:49 ` Miles Bader
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.