unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* emacs misbehaves without --unibyte
@ 2002-05-28 16:12 Paul Stoeber
  2002-05-28 16:49 ` Miles Bader
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Stoeber @ 2002-05-28 16:12 UTC (permalink / raw)


In GNU Emacs 21.2.1 (powerpc-unknown-linux-gnu)
 of 2002-05-26 on xyz
configured using `configure  --prefix=/e --without-x'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: C
  locale-coding-system: nil
  default-enable-multibyte-characters: t

Built without modification from
MD5:f4b58e5c2d923fc92495e0c2f167c5db URL:ftp://ftp.cs.tu-berlin.de/pub/gnu/emacs/emacs-21.2.tar.gz


In bash in my home directory:

q@xyz:~$ echo gr340ve > $'gr\340ve'
q@xyz:~$ /e/bin/emacs --eval '(dired "~/")'

Now in dired, I go to the file `gr?ve' and type RET.  The message line
says "File no longer exists; type `g' to update Dired buffer".  (I think
it should have opened the file.)   C-x C-c.

q@xyz:~$ rm gr?ve
q@xyz:~$ mkdir -p $'gr\340ve'/x/y/z
q@xyz:~$ /e/bin/emacs --eval '(dired "~/")'

Now in dired, I go to the directory `gr?ve' and type RET.  The message line
says "File no longer exists; type `g' to update Dired buffer".  (I think
it should have changed into the directory.  Now I can't browse the directory
tree below `gr?ve'.  This is a grave limitation.)   C-x C-c.

q@xyz:~$ rm -rf gr?ve
q@xyz:~$ echo gr340ve > $'gr\340ve'
q@xyz:~$ /e/bin/emacs --eval '(dired "~/")'

Now in dired, I go to the file `gr?ve' and type the `a' key.  There's an
_empty_ buffer with the name `gr?ve' in its status line.  (The file
$'gr\340ve' contains the string "gr340ve", but due to the first
experiment I'm not surprised.)  X X X RET C-x C-s C-x C-c.

q@xyz:~$ ls gr* | cat -vet
grM-^AM-`ve$
grM-`ve$
q@xyz:~$ cat $'gr\201\340ve'
XXX
q@xyz:~$ 

Is that supposed to happen?

Everything is fine if I run emacs as `/e/bin/emacs --unibyte'.  If multibyte
support is so intrusive, shouldn't --unibyte be the default?  I think all
sites that have some 8-bit filenames will need it.  If you don't decide to make
it the default, maybe it should be mentioned in `(efaq) Bugs and problems'.

Last but not least:

q@xyz:~$ rm gr?ve gr??ve
q@xyz:~$ echo gr200340ve > $'gr\200\340ve'  # that's two-oh-oh, not 201
q@xyz:~$ echo gr340ve > $'gr\340ve'
q@xyz:~$ /e/bin/emacs --eval '(dired "~/")'

The last two lines of an otherwise cleared screen:

Fatal error (11).Segmentation fault
q@xyz:~$

Using --unibyte fixes this too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: emacs misbehaves without --unibyte
  2002-05-28 16:12 Paul Stoeber
@ 2002-05-28 16:49 ` Miles Bader
  0 siblings, 0 replies; 12+ messages in thread
From: Miles Bader @ 2002-05-28 16:49 UTC (permalink / raw)


It should work fine if you've set the languange environment correctly
(so it knows what the non-ASCII characters are), e.g. by using
`set-language-environment', or having an appropriate setting of LANG.

If you don't want to do that, you can just tell it how to interpret file
names, like:

   (setq file-name-coding-system 'latin-1)

-Miles
-- 
80% of success is just showing up.  --Woody Allen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* emacs misbehaves without --unibyte
@ 2002-05-28 20:08 Paul Stoeber
  2002-05-28 21:40 ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Stoeber @ 2002-05-28 20:08 UTC (permalink / raw)


Miles Bader miles@gnu.org on Tue, 28 May 2002 16:49:04 GMT:
> It should work fine if you've set the languange environment correctly
> (so it knows what the non-ASCII characters are), e.g. by using
> `set-language-environment', or having an appropriate setting of LANG.

That means, my LANG=C is inappropriate for emacs.

I don't care about l10n/i18n (that's what LANG=C should convey to programs),
and I want to steer through alien filesystems and text/binary files gracefully
(e.g. using octal escapes).

In its default usage, emacs doesn't meet this requirement---it doesn't have
the robust 8-bit cleanness of bash, nvi, perl and most other unix tools.

But I guess that's okay, it's just different from the tools I'm used to, and
it was very irritating until I found one possible fix (--unibyte).

Neither (setq set-language-environment "Latin-1") nor
(setq file-name-coding-system 'latin-1) creates an 8-bit clean
environment.  Trying to open the file $'gr\200\340ve' in dired opens
$'gr\236\240\201\340ve' instead.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: emacs misbehaves without --unibyte
  2002-05-28 20:08 emacs misbehaves without --unibyte Paul Stoeber
@ 2002-05-28 21:40 ` Eli Zaretskii
  2002-05-29  0:18   ` Paul Stoeber
  2002-05-30 17:00   ` Richard Stallman
  0 siblings, 2 replies; 12+ messages in thread
From: Eli Zaretskii @ 2002-05-28 21:40 UTC (permalink / raw)
  Cc: bug-gnu-emacs

> From: Paul Stoeber <paul.stoeber@stud.tu-ilmenau.de>
> Date: Tue, 28 May 2002 22:08:14 +0200
> 
> Neither (setq set-language-environment "Latin-1") nor
> (setq file-name-coding-system 'latin-1) creates an 8-bit clean
> environment.

Emacs is a text editor, not a binary file editor.  So 8-bit cleanness
is not the most important goal for it.

There are specialized modes, such as hexl, for editing binary files.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: emacs misbehaves without --unibyte
  2002-05-28 21:40 ` Eli Zaretskii
@ 2002-05-29  0:18   ` Paul Stoeber
  2002-05-29  6:23     ` Eli Zaretskii
  2002-05-30 17:00   ` Richard Stallman
  1 sibling, 1 reply; 12+ messages in thread
From: Paul Stoeber @ 2002-05-29  0:18 UTC (permalink / raw)


On Wed, May 29, 2002 at 12:40:42AM +0300, Eli Zaretskii wrote:
> > From: Paul Stoeber <paul.stoeber@stud.tu-ilmenau.de>
> > Date: Tue, 28 May 2002 22:08:14 +0200
> > 
> > Neither (setq set-language-environment "Latin-1") nor
> > (setq file-name-coding-system 'latin-1) creates an 8-bit clean
> > environment.
> 
> Emacs is a text editor, not a binary file editor.  So 8-bit cleanness
> is not the most important goal for it.
> 
> There are specialized modes, such as hexl, for editing binary files.

(How is Emacs not a binary file editor when it has hexl mode?)

I started this thread because default emacs wouldn't let me navigate
filesystems that contain funny filenames, so the "8-bit cleanness"
discussion only applies to file name handling (although I had also
mentioned "text/binary files" in a general statement).  Is it reasonable
for Emacs to refuse to open existing files and to invent new file names
in place of existing ones?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: emacs misbehaves without --unibyte
  2002-05-29  0:18   ` Paul Stoeber
@ 2002-05-29  6:23     ` Eli Zaretskii
  2002-05-29  8:56       ` Paul Stoeber
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2002-05-29  6:23 UTC (permalink / raw)
  Cc: bug-gnu-emacs


On Wed, 29 May 2002, Paul Stoeber wrote:

> (How is Emacs not a binary file editor when it has hexl mode?)

It's not a binary file editor if you are in a mode other than hexl.

> I started this thread because default emacs wouldn't let me navigate
> filesystems that contain funny filenames, so the "8-bit cleanness"
> discussion only applies to file name handling (although I had also
> mentioned "text/binary files" in a general statement).

For that, Miles gave the solution: you should set up your language 
environment correctly, or set file-name-coding-system explicitly.

I replied in addition to what Miles said, thinking that you really meant 
8-bit cleanliness throughout.

> Is it reasonable
> for Emacs to refuse to open existing files and to invent new file names
> in place of existing ones?

No.  But the ``reasonable'' thing is hard to implement without hints from 
the user's environment.  Please remember that Emacs decides where a file 
name starts and ends in the Dired buffer by using a set of convoluted 
regexps designed to parse the "ls -la" output for file's name, date, 
time, attributes, etc.  A stray 8-bit byte can cause spurious wrong 
matches of those regexps, and the net effect is what you reported.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: emacs misbehaves without --unibyte
  2002-05-29  6:23     ` Eli Zaretskii
@ 2002-05-29  8:56       ` Paul Stoeber
  2002-05-29  8:58         ` Eli Zaretskii
       [not found]         ` <Pine.SUN.3.91.1020529115904.29375B@is>
  0 siblings, 2 replies; 12+ messages in thread
From: Paul Stoeber @ 2002-05-29  8:56 UTC (permalink / raw)


On Wed, May 29, 2002 at 09:23:11AM +0300, Eli Zaretskii wrote:
> > I started this thread because default emacs wouldn't let me navigate
> > filesystems that contain funny filenames, so the "8-bit cleanness"
> > discussion only applies to file name handling (although I had also
> > mentioned "text/binary files" in a general statement).
> 
> For that, Miles gave the solution: you should set up your language 
> environment correctly, or set file-name-coding-system explicitly.

Yes.  If you simply want to use dired as a robust filesystem browser
(like bash, only more comfortable), regardless of your language
or the language of who created the files, then

	(setq file-name-coding-system 'no-conversion)

seems to be a solution.  It works in the real life cases I've tried,
but will stop working if someone chooses to put a newline in a name.

> Please remember that Emacs decides where a file 
> name starts and ends in the Dired buffer by using a set of convoluted 
> regexps designed to parse the "ls -la" output for file's name, date, 
> time, attributes, etc.  A stray 8-bit byte can cause spurious wrong 
> matches of those regexps, and the net effect is what you reported.

"ls -la"'s output is made for users' eyes and trying to use it
as a back-end sacrifices total robustness.

I once had the same problem with smbclient.  I wanted to use it in a script,
but didn't want to sacrifice any robustness.  So I added a --batch-ouput
option, which was really effortless because all the data was at hand in the
C code, it was just a matter of changing the `printf's.  I made the output
so that it was (a) unambiguous and (b) easy to parse.  And the script
has been performing nicely without any glitches ever since.

Maybe that's not an option for Emacs because it wants to use whatever
/bin/ls is available on the system.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: emacs misbehaves without --unibyte
  2002-05-29  8:56       ` Paul Stoeber
@ 2002-05-29  8:58         ` Eli Zaretskii
       [not found]         ` <Pine.SUN.3.91.1020529115904.29375B@is>
  1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2002-05-29  8:58 UTC (permalink / raw)
  Cc: bug-gnu-emacs


On Wed, 29 May 2002, Paul Stoeber wrote:

> I once had the same problem with smbclient.  I wanted to use it in a script,
> but didn't want to sacrifice any robustness.  So I added a --batch-ouput
> option, which was really effortless because all the data was at hand in the
> C code, it was just a matter of changing the `printf's.  I made the output
> so that it was (a) unambiguous and (b) easy to parse.  And the script
> has been performing nicely without any glitches ever since.
> 
> Maybe that's not an option for Emacs because it wants to use whatever
> /bin/ls is available on the system.

Exactly.  GNU `ls' already has such an option, but there are many `ls' 
varieties out there that don't support it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: emacs misbehaves without --unibyte
       [not found]         ` <Pine.SUN.3.91.1020529115904.29375B@is>
@ 2002-05-29 13:13           ` Paul Stoeber
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Stoeber @ 2002-05-29 13:13 UTC (permalink / raw)


On Wed, 29 May 2002, Paul Stoeber wrote:
> If you simply want to use dired as a robust filesystem browser
> (like bash, only more comfortable), regardless of your language
> or the language of who created the files, then
> 
> 	(setq file-name-coding-system 'no-conversion)
> 
> seems to be a solution.

Alas, this is no longer true:

q@xyz:~$ mkdir -p $'\340/\350'
q@xyz:~$ echo XXX > $'\340/\350/x'
q@xyz:~$ /e/bin/emacs --eval "(progn (setq file-name-coding-system 'no-conversion) (dired \"~/\"))"

Go to \340.  RET.  Works.  Go to \350.  RET.
"File no longer exists; type `g' to update Dired buffer".

Again, --unibyte fixes this, even when not setting file-name-coding-system.

Emacs docs:
	dired-listing-switches's value is "-al"

	Documentation:
	*Switches passed to `ls' for dired.  MUST contain the `l' option.
	May contain all other options that don't contradict `-l';
	may contain even `F', `b', `i' and `s'.  See also the variable
	`dired-ls-F-marks-symlinks' concerning the `F' switch.

This looks very promising, especially the `b' option.

After

	(setq dired-listing-switches "-alb")

, not using --unibyte, the above bug (changing into \340 but not \350) still
happens, but dired will correctly open the files $'a\340b' and even $'a\nb'
(filename with a newline in it.  and C-x C-s will even write to the correct
file).

So, using the `b' option of ls seems to be the right
way to go for robustness, but the `b' support is still buggy:
dired won't open files with spaces in their name (for example

q@xyz:~$ echo XXX > 'a b'

). It says "File no longer exists; type `g' to update Dired buffer",
and of course *that* is not fixed by --unibyte.

When using the `b' option, dired-do-shell-command passes some filenames
incorrectly to the program: 'a b' and $'a\nb' are both passed as
'ab', but $'a\tb' and $'a\340b' are passed correctly (without --unibyte).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: emacs misbehaves without --unibyte
  2002-05-28 21:40 ` Eli Zaretskii
  2002-05-29  0:18   ` Paul Stoeber
@ 2002-05-30 17:00   ` Richard Stallman
  2002-05-30 18:46     ` Paul Stoeber
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Stallman @ 2002-05-30 17:00 UTC (permalink / raw)
  Cc: eliz

    There are specialized modes, such as hexl, for editing binary files.

Does M-x find-file-literally do what you want?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: emacs misbehaves without --unibyte
  2002-05-30 17:00   ` Richard Stallman
@ 2002-05-30 18:46     ` Paul Stoeber
  2002-05-31 21:28       ` Richard Stallman
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Stoeber @ 2002-05-30 18:46 UTC (permalink / raw)
  Cc: bug-gnu-emacs

On Thu, May 30, 2002 at 11:00:54AM -0600, Richard Stallman wrote:
>     There are specialized modes, such as hexl, for editing binary files.
> 
> Does M-x find-file-literally do what you want?

Yes, it handles file contents in exactly the way I like best, thanks.

In this thread, I'm concerned with how Emacs (specifically Dired)
handles filenames.

I'm trying to use Dired plus customizations as a replacement for
Midnight Commander (mc).

I've seen that Dired is not as robust as mc when confronted with arbitrary
filenames in {\001, \002, ..., \377}+, so I'm bugging bug.gnu.emacs with
reports about how dired-find-file and dired-do-shell-command fail when exposed
to all sorts of funny filenames.  My latest findings are expressed in the
posting that starts with "Alas, " after the quoteblock.

It might well be that I'll continue to post reports of this sort until all
filenames the C library can handle are handled correctly by Dired too, unless
the Emacs developers vote that this is not a worthwhile goal to pursue.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: emacs misbehaves without --unibyte
  2002-05-30 18:46     ` Paul Stoeber
@ 2002-05-31 21:28       ` Richard Stallman
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Stallman @ 2002-05-31 21:28 UTC (permalink / raw)
  Cc: bug-gnu-emacs

    Yes, it handles file contents in exactly the way I like best, thanks.

    In this thread, I'm concerned with how Emacs (specifically Dired)
    handles filenames.

Yes, I saw that later on.  I thought I deleted that message
and I am surprised it was sent out.  Sorry.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-05-31 21:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-28 20:08 emacs misbehaves without --unibyte Paul Stoeber
2002-05-28 21:40 ` Eli Zaretskii
2002-05-29  0:18   ` Paul Stoeber
2002-05-29  6:23     ` Eli Zaretskii
2002-05-29  8:56       ` Paul Stoeber
2002-05-29  8:58         ` Eli Zaretskii
     [not found]         ` <Pine.SUN.3.91.1020529115904.29375B@is>
2002-05-29 13:13           ` Paul Stoeber
2002-05-30 17:00   ` Richard Stallman
2002-05-30 18:46     ` Paul Stoeber
2002-05-31 21:28       ` Richard Stallman
  -- strict thread matches above, loose matches on Subject: below --
2002-05-28 16:12 Paul Stoeber
2002-05-28 16:49 ` Miles Bader

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).