unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* dired doesn't work properly with a multibyte locale
@ 2003-01-06  6:04 Miles Bader
  2003-01-11 20:00 ` Stefan Monnier
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Miles Bader @ 2003-01-06  6:04 UTC (permalink / raw)
  Cc: emacs-devel

I'm now using a multibyte locale (LANG=ja_JP.eucJP), and dired is
screwed up: it can't properly find filenames in the directory listing.

The reason seems to be that dired uses `ls --dired', which encodes the
positions of filenames as byte-offsets into the ls output.  However, my
system's `ls' program sees the non-C LANG, and so the `total' line at the
beginning of the ls output is now a multibyte-encoded word.  Emacs decodes
this fine, but the number of characters in the decoded word is _not_ the
same as the number of bytes in the original ls output, so all the offsets
from --dired are wrong.  [note that if there are multibyte-encoded
filenames, the offsets will get screwed up further later in the listing]

It doesn't seem simple to get the byte offset information, so perhaps the
best thing to do is simply not use --dired if `file-name-coding-system' is
a multibyte encoding.  That change is simple to make in dired (and I just
manually set `dired-use-ls-dired' to nil), but I'm not sure how to tell if
a particular coding system is multibyte or not.  It'd be nice if there was
a function like `coding-system-multibyte-p'...

Thanks,

-Miles
-- 
We live, as we dream -- alone....

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-06  6:04 dired doesn't work properly with a multibyte locale Miles Bader
@ 2003-01-11 20:00 ` Stefan Monnier
  2003-01-11 20:16   ` Miles Bader
  2003-01-12 11:56 ` Richard Stallman
  2003-01-15 10:43 ` Kenichi Handa
  2 siblings, 1 reply; 38+ messages in thread
From: Stefan Monnier @ 2003-01-11 20:00 UTC (permalink / raw)
  Cc: emacs-devel

> It doesn't seem simple to get the byte offset information, so perhaps the
> best thing to do is simply not use --dired if `file-name-coding-system' is
> a multibyte encoding.  That change is simple to make in dired (and I just
> manually set `dired-use-ls-dired' to nil), but I'm not sure how to tell if
> a particular coding system is multibyte or not.  It'd be nice if there was
> a function like `coding-system-multibyte-p'...

The other solution is to get "ls --dired" output with a "binary"
coding system, then use the byte-offsets to add text-properties, and
then do the decode-coding-region.


	Stefan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-11 20:00 ` Stefan Monnier
@ 2003-01-11 20:16   ` Miles Bader
  0 siblings, 0 replies; 38+ messages in thread
From: Miles Bader @ 2003-01-11 20:16 UTC (permalink / raw)
  Cc: emacs-devel

On Sat, Jan 11, 2003 at 03:00:12PM -0500, Stefan Monnier wrote:
> > It doesn't seem simple to get the byte offset information, so perhaps the
> > best thing to do is simply not use --dired if `file-name-coding-system' is
> > a multibyte encoding.  That change is simple to make in dired (and I just
> > manually set `dired-use-ls-dired' to nil), but I'm not sure how to tell if
> > a particular coding system is multibyte or not.  It'd be nice if there was
> > a function like `coding-system-multibyte-p'...
> 
> The other solution is to get "ls --dired" output with a "binary"
> coding system, then use the byte-offsets to add text-properties, and
> then do the decode-coding-region.

Won't the decode-coding-region smash all the text-properties?

-Miles
-- 
Come now, if we were really planning to harm you, would we be waiting here, 
 beside the path, in the very darkest part of the forest?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-06  6:04 dired doesn't work properly with a multibyte locale Miles Bader
  2003-01-11 20:00 ` Stefan Monnier
@ 2003-01-12 11:56 ` Richard Stallman
  2003-01-15 10:43 ` Kenichi Handa
  2 siblings, 0 replies; 38+ messages in thread
From: Richard Stallman @ 2003-01-12 11:56 UTC (permalink / raw)
  Cc: emacs-devel

    It doesn't seem simple to get the byte offset information, so perhaps the
    best thing to do is simply not use --dired if `file-name-coding-system' is
    a multibyte encoding.

For the moment, this may be the best we can do.  But that means the other bugs
that --dired was meant to fix come back.  For the longer term, we need
a way to make --dired work properly with multibyte decoding in Emacs.

One idea is that decoding could optionally keep a record of every
place it changes the length of text.  Most of the time, this optional
feature would be turned off, but dired would turn it on.  The output
would be a list of elements of the form (POS . CHANGE) where POS is
the buffer position (or perhaps, position relative to the start of the
input), and CHANGE would be the change in size of the text at that
place.  With all this information, it would be straightforward
to correct all the information provided by --dired.

Handa, do you think this would be feasible?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-06  6:04 dired doesn't work properly with a multibyte locale Miles Bader
  2003-01-11 20:00 ` Stefan Monnier
  2003-01-12 11:56 ` Richard Stallman
@ 2003-01-15 10:43 ` Kenichi Handa
  2003-01-15 23:30   ` Richard Stallman
  2003-01-23  4:31   ` Miles Bader
  2 siblings, 2 replies; 38+ messages in thread
From: Kenichi Handa @ 2003-01-15 10:43 UTC (permalink / raw)
  Cc: emacs-devel

Sorry for the late reply.

In article <buok7hibyqd.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:

> I'm now using a multibyte locale (LANG=ja_JP.eucJP), and dired is
> screwed up: it can't properly find filenames in the directory listing.

> The reason seems to be that dired uses `ls --dired', which encodes the
> positions of filenames as byte-offsets into the ls output.  However, my
> system's `ls' program sees the non-C LANG, and so the `total' line at the
> beginning of the ls output is now a multibyte-encoded word.  Emacs decodes
> this fine, but the number of characters in the decoded word is _not_ the
> same as the number of bytes in the original ls output, so all the offsets
> from --dired are wrong.  [note that if there are multibyte-encoded
> filenames, the offsets will get screwed up further later in the listing]

> It doesn't seem simple to get the byte offset information, so perhaps the
> best thing to do is simply not use --dired if `file-name-coding-system' is
> a multibyte encoding.  That change is simple to make in dired (and I just
> manually set `dired-use-ls-dired' to nil), but I'm not sure how to tell if
> a particular coding system is multibyte or not.  It'd be nice if there was
> a function like `coding-system-multibyte-p'...

Even if we have such a function, it's very hard to correct
the byte offset information for a multibyte coding system.

Miles Bader <miles@gnu.org> writes:
> On Sat, Jan 11, 2003 at 03:00:12PM -0500, Stefan Monnier wrote:
>>  > It doesn't seem simple to get the byte offset
>>  > information, so perhaps the best thing to do is simply
>>  > not use --dired if `file-name-coding-system' is a
>>  > multibyte encoding.  That change is simple to make in
>>  > dired (and I just manually set `dired-use-ls-dired' to
>>  > nil), but I'm not sure how to tell if a particular
>>  > coding system is multibyte or not.  It'd be nice if
>>  > there was a function like
>>  > `coding-system-multibyte-p'...
>>  
>>  The other solution is to get "ls --dired" output with a "binary"
>>  coding system, then use the byte-offsets to add text-properties, and
>>  then do the decode-coding-region.

Yes.  I think that is the correct fix.

> Won't the decode-coding-region smash all the text-properties?

It surely removes all text properties.  But, we can preserve
the text-property `dired-filename' by decoding one bunch by
one.  Could you please try the attached patch?  I have not
yet installed it because I don't have such a system at hand
and can't test it.

---
Ken'ichi HANDA
handa@m17n.org

2003-01-15  Kenichi Handa  <handa@m17n.org>

	* files.el (insert-directory): Read the output of "ls" by
	no-conversion, and decode it later while preserving
	`dired-filename' property.

*** files.el.~1.630.~	Wed Jan 15 13:12:22 2003
--- files.el	Wed Jan 15 17:44:45 2003
***************
*** 4017,4028 ****
  
  	  ;; Read the actual directory using `insert-directory-program'.
  	  ;; RESULT gets the status code.
! 	  (let* ((coding-system-for-read
  		  (and enable-multibyte-characters
  		       (or file-name-coding-system
! 			   default-file-name-coding-system)))
! 		 ;; This is to control encoding the arguments in call-process.
! 		 (coding-system-for-write coding-system-for-read))
  	    (setq result
  		  (if wildcard
  		      ;; Run ls in the directory part of the file pattern
--- 4017,4031 ----
  
  	  ;; Read the actual directory using `insert-directory-program'.
  	  ;; RESULT gets the status code.
! 	  (let* (;; We at first read by no-conversion, then after
! 		 ;; putting text property `dired-filename, decode one
! 		 ;; bunch by one to preserve that property.
! 		 (coding-system-for-read 'no-conversion)
! 		 ;; This is to control encoding the arguments in call-process.
! 		 (coding-system-for-write 
  		  (and enable-multibyte-characters
  		       (or file-name-coding-system
! 			   default-file-name-coding-system))))
  	    (setq result
  		  (if wildcard
  		      ;; Run ls in the directory part of the file pattern
***************
*** 4105,4110 ****
--- 4108,4130 ----
  	      (goto-char end)
  	      (beginning-of-line)
  	      (delete-region (point) (progn (forward-line 2) (point)))))
+ 
+ 	  ;; Now decode what read if necessary.
+ 	  (let ((coding (or coding-system-for-write
+ 			    (detect-coding-region beg (point) t)))
+ 		val pos)
+ 	    (if (not (eq (coding-system-base coding) 'undecided))
+ 		(save-restriction
+ 		  (narrow-to-region beg (point))
+ 		  (goto-char (point-min))
+ 		  (while (not (eobp))
+ 		    (setq pos (point)
+ 			  val (get-text-property (point) 'dired-filename))
+ 		    (goto-char (next-single-property-change
+ 				(point) 'dired-filename nil (point-max)))
+ 		    (decode-coding-region pos (point) coding)
+ 		    (if val
+ 			(put-text-property pos (point) 'dired-filename t))))))
  
  	  (if full-directory-p
  	      ;; Try to insert the amount of free space.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-15 10:43 ` Kenichi Handa
@ 2003-01-15 23:30   ` Richard Stallman
  2003-01-23  4:31   ` Miles Bader
  1 sibling, 0 replies; 38+ messages in thread
From: Richard Stallman @ 2003-01-15 23:30 UTC (permalink / raw)
  Cc: miles

    It surely removes all text properties.  But, we can preserve
    the text-property `dired-filename' by decoding one bunch by
    one.  Could you please try the attached patch?  I have not
    yet installed it because I don't have such a system at hand
    and can't test it.

That is a clever solution.  It might be slow, but perhaps it is fast
enough for the job at hand.  Miles, do you find it fast enough?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-15 10:43 ` Kenichi Handa
  2003-01-15 23:30   ` Richard Stallman
@ 2003-01-23  4:31   ` Miles Bader
  2003-01-23  6:02     ` Kenichi Handa
  2003-01-24  5:42     ` Richard Stallman
  1 sibling, 2 replies; 38+ messages in thread
From: Miles Bader @ 2003-01-23  4:31 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:
> >  The other solution is to get "ls --dired" output with a "binary"
> >  coding system, then use the byte-offsets to add text-properties,
> >  and then do the decode-coding-region.
> 
> Yes.  I think that is the correct fix. ...  we can preserve
> the text-property `dired-filename' by decoding one bunch by
> one.  Could you please try the attached patch?

This patch seems to work well for me (it correctly parses directories
that are completely screwed up by the old code).

Richard Stallman <rms@gnu.org> writes:
> That is a clever solution.  It might be slow, but perhaps it is fast
> enough for the job at hand.  Miles, do you find it fast enough?

It doesn't seem any noticably slower than the old dired on my system
(both take some time to display a large directory, but I don't notice
any difference between them).

-Miles
-- 
I'm beginning to think that life is just one long Yoko Ono album; no rhyme
or reason, just a lot of incoherent shrieks and then it's over.  --Ian Wolff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-23  4:31   ` Miles Bader
@ 2003-01-23  6:02     ` Kenichi Handa
  2003-01-23  6:12       ` Miles Bader
  2003-01-24  5:42     ` Richard Stallman
  1 sibling, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2003-01-23  6:02 UTC (permalink / raw)
  Cc: emacs-devel

In article <buovg0gpjvb.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
>>  Yes.  I think that is the correct fix. ...  we can preserve
>>  the text-property `dired-filename' by decoding one bunch by
>>  one.  Could you please try the attached patch?

> This patch seems to work well for me (it correctly parses directories
> that are completely screwed up by the old code).

> Richard Stallman <rms@gnu.org> writes:
>>  That is a clever solution.  It might be slow, but perhaps it is fast
>>  enough for the job at hand.  Miles, do you find it fast enough?

> It doesn't seem any noticably slower than the old dired on my system
> (both take some time to display a large directory, but I don't notice
> any difference between them).

Thank you for testing it.  I've just installed that patch in
HEAD.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-23  6:02     ` Kenichi Handa
@ 2003-01-23  6:12       ` Miles Bader
  2003-01-25  0:49         ` Kenichi Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Miles Bader @ 2003-01-23  6:12 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:
> > This patch seems to work well for me (it correctly parses directories
> > that are completely screwed up by the old code).
>
> Thank you for testing it.  I've just installed that patch in
> HEAD.

I did find one case where things still don't seem to work properly:

If there's a file containing a newline, then if LANG=C, dired can
correctly deal with it (e.g., I can put the cursor on it and hit RET,
and it visits that file), but if LANG=ja_JP.eucjp, then it correctly
displays all _other_ files, but you can't use RET to visit the
newline-in-the-file-name file (it says `File no longer exists; type `g'
to update Dired buffer').

Since other files work OK in that case, the offsets must be correct, but
perhaps the chunk-decoding screws up the newline somehow?  Does there
need to be some sort of fiddling with eol-type?

-Miles
-- 
A zen-buddhist walked into a pizza shop and
said, "Make me one with everything."

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-23  4:31   ` Miles Bader
  2003-01-23  6:02     ` Kenichi Handa
@ 2003-01-24  5:42     ` Richard Stallman
  1 sibling, 0 replies; 38+ messages in thread
From: Richard Stallman @ 2003-01-24  5:42 UTC (permalink / raw)
  Cc: handa

    Richard Stallman <rms@gnu.org> writes:
    > That is a clever solution.  It might be slow, but perhaps it is fast
    > enough for the job at hand.  Miles, do you find it fast enough?

    It doesn't seem any noticably slower than the old dired on my system
    (both take some time to display a large directory, but I don't notice
    any difference between them).

That seems to answer the question conclusively.  Please install the
fix.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-23  6:12       ` Miles Bader
@ 2003-01-25  0:49         ` Kenichi Handa
  2003-01-27  4:17           ` Miles Bader
  0 siblings, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2003-01-25  0:49 UTC (permalink / raw)
  Cc: emacs-devel

In article <buoel74pf7k.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>  > This patch seems to work well for me (it correctly parses directories
>>  > that are completely screwed up by the old code).
>> 
>>  Thank you for testing it.  I've just installed that patch in
>>  HEAD.

> I did find one case where things still don't seem to work properly:

> If there's a file containing a newline, then if LANG=C, dired can
> correctly deal with it (e.g., I can put the cursor on it and hit RET,
> and it visits that file), but if LANG=ja_JP.eucjp, then it correctly
> displays all _other_ files, but you can't use RET to visit the
> newline-in-the-file-name file (it says `File no longer exists; type `g'
> to update Dired buffer').

> Since other files work OK in that case, the offsets must be correct, but
> perhaps the chunk-decoding screws up the newline somehow?  Does there
> need to be some sort of fiddling with eol-type?

I've just installed ja_JP.eucJP locale on by debian machine,
and made a file "abc\ndef".  But, it works well in my case
even in ja_JP.eucJP locale.  I tried also a file that
contains Japanese characters and a new line, but still it
works.

So, I have no idea what's wrong with the current code.
Could you debug it?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-25  0:49         ` Kenichi Handa
@ 2003-01-27  4:17           ` Miles Bader
  2003-01-27  5:01             ` Kenichi Handa
  2003-01-27 10:56             ` Andreas Schwab
  0 siblings, 2 replies; 38+ messages in thread
From: Miles Bader @ 2003-01-27  4:17 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:
> > If there's a file containing a newline, then if LANG=C, dired can
> > correctly deal with it (e.g., I can put the cursor on it and hit RET,
> > and it visits that file), but if LANG=ja_JP.eucjp, then it correctly
> > displays all _other_ files, but you can't use RET to visit the
> > newline-in-the-file-name file (it says `File no longer exists; type `g'
> > to update Dired buffer').
> 
> So, I have no idea what's wrong with the current code.
> Could you debug it?

Hmm, it actually seems to be a bug with `ls'!

I created two files, one called `abc\ndef' (where \n is a newline), and
one called `1234567'.  Here's what ls prints if stdout is a tty (I've
indented the output by 3 spaces):

   (tmp) LANG=ja_JP.eucJP ls -l --dired abc* 123*
     -rw-rw----    1 miles           6 2003-01-27 13:03 1234567
     -rw-rw----    1 miles           6 2003-01-27 12:58 abc?def
   //DIRED// 53 60 114 121
   //DIRED-OPTIONS// --quoting-style=literal

[note that the start/end offsets of each filename differ by 7]

But here's what the _same_ command prints if stdout is a pipe (which I
presume is the case for dired):

   (tmp) LANG=ja_JP.eucJP ls -l --dired abc* 123* | cat
     -rw-rw----    1 miles           6 2003-01-27 13:03 1234567
     -rw-rw----    1 miles           6 2003-01-27 12:58 abc
   def
   //DIRED// 53 60 114 120
   //DIRED-OPTIONS// --quoting-style=literal

Now the start/end offsets of `abc\ndef' now only differ by 6 (which is
obviously wrong, since the filename is 7 characters long)!  Morever
this problem only seems to occur if LANG=ja_JP.eucJP, _not_ if LANG=C.

My ls --version says:  ls (coreutils) 4.5.4
What version do you have?

I guess I'll report a bug against ls...

-Miles
-- 
Is it true that nothing can be known?  If so how do we know this?  -Woody Allen

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-27  4:17           ` Miles Bader
@ 2003-01-27  5:01             ` Kenichi Handa
  2003-01-27 10:58               ` Andreas Schwab
  2003-01-27 10:56             ` Andreas Schwab
  1 sibling, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2003-01-27  5:01 UTC (permalink / raw)
  Cc: emacs-devel

In article <buoptqjb51e.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
> Hmm, it actually seems to be a bug with `ls'!

> I created two files, one called `abc\ndef' (where \n is a newline), and
> one called `1234567'.  Here's what ls prints if stdout is a tty (I've
> indented the output by 3 spaces):

>    (tmp) LANG=ja_JP.eucJP ls -l --dired abc* 123*
>      -rw-rw----    1 miles           6 2003-01-27 13:03 1234567
>      -rw-rw----    1 miles           6 2003-01-27 12:58 abc?def
>    //DIRED// 53 60 114 121
>    //DIRED-OPTIONS// --quoting-style=literal

> [note that the start/end offsets of each filename differ by 7]

> But here's what the _same_ command prints if stdout is a pipe (which I
> presume is the case for dired):

>    (tmp) LANG=ja_JP.eucJP ls -l --dired abc* 123* | cat
>      -rw-rw----    1 miles           6 2003-01-27 13:03 1234567
>      -rw-rw----    1 miles           6 2003-01-27 12:58 abc
>    def
>    //DIRED// 53 60 114 120
>    //DIRED-OPTIONS// --quoting-style=literal

> Now the start/end offsets of `abc\ndef' now only differ by 6 (which is
> obviously wrong, since the filename is 7 characters long)!  Morever
> this problem only seems to occur if LANG=ja_JP.eucJP, _not_ if LANG=C.

> My ls --version says:  ls (coreutils) 4.5.4
> What version do you have?

Mine is "ls (fileutils) 4.1", and that works correctly even
if the stdout is a pile in any locales as below.

[~/tmp:709] LANG=ja_JP.eucJP \ls -l --dired abc* 123*|cat
  -rw-rw-r--    1 handa    handa           6 Jan 27 13:50 1234567
  -rw-rw-r--    1 handa    handa           6 Jan 27 13:51 abc
def
//DIRED// 58 65 124 131
//DIRED-OPTIONS// --quoting-style=(null)
[~/tmp:710] LANG=de_DE \ls -l --dired abc* 123*|cat
  -rw-rw-r--    1 handa    handa           6 Jan 27 13:50 1234567
  -rw-rw-r--    1 handa    handa           6 Jan 27 13:51 abc
def
//DIRED// 58 65 124 131
//DIRED-OPTIONS// --quoting-style=(null)
[~/tmp:711] LANG=C \ls -l --dired abc* 123*|cat
  -rw-rw-r--    1 handa    handa           6 Jan 27 13:50 1234567
  -rw-rw-r--    1 handa    handa           6 Jan 27 13:51 abc
def
//DIRED// 58 65 124 131
//DIRED-OPTIONS// --quoting-style=(null)

The difference with your case is in --quoting-style, I don't
know that meaning.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-27  4:17           ` Miles Bader
  2003-01-27  5:01             ` Kenichi Handa
@ 2003-01-27 10:56             ` Andreas Schwab
  2003-01-27 13:35               ` Jim Meyering
  1 sibling, 1 reply; 38+ messages in thread
From: Andreas Schwab @ 2003-01-27 10:56 UTC (permalink / raw)
  Cc: Kenichi Handa

Miles Bader <miles@lsi.nec.co.jp> writes:

|> Kenichi Handa <handa@m17n.org> writes:
|> > > If there's a file containing a newline, then if LANG=C, dired can
|> > > correctly deal with it (e.g., I can put the cursor on it and hit RET,
|> > > and it visits that file), but if LANG=ja_JP.eucjp, then it correctly
|> > > displays all _other_ files, but you can't use RET to visit the
|> > > newline-in-the-file-name file (it says `File no longer exists; type `g'
|> > > to update Dired buffer').
|> > 
|> > So, I have no idea what's wrong with the current code.
|> > Could you debug it?
|> 
|> Hmm, it actually seems to be a bug with `ls'!
|> 
|> I created two files, one called `abc\ndef' (where \n is a newline), and
|> one called `1234567'.  Here's what ls prints if stdout is a tty (I've
|> indented the output by 3 spaces):
|> 
|>    (tmp) LANG=ja_JP.eucJP ls -l --dired abc* 123*
|>      -rw-rw----    1 miles           6 2003-01-27 13:03 1234567
|>      -rw-rw----    1 miles           6 2003-01-27 12:58 abc?def
|>    //DIRED// 53 60 114 121
|>    //DIRED-OPTIONS// --quoting-style=literal
|> 
|> [note that the start/end offsets of each filename differ by 7]
|> 
|> But here's what the _same_ command prints if stdout is a pipe (which I
|> presume is the case for dired):
|> 
|>    (tmp) LANG=ja_JP.eucJP ls -l --dired abc* 123* | cat
|>      -rw-rw----    1 miles           6 2003-01-27 13:03 1234567
|>      -rw-rw----    1 miles           6 2003-01-27 12:58 abc
|>    def
|>    //DIRED// 53 60 114 120
|>    //DIRED-OPTIONS// --quoting-style=literal
|> 
|> Now the start/end offsets of `abc\ndef' now only differ by 6 (which is
|> obviously wrong, since the filename is 7 characters long)!  Morever
|> this problem only seems to occur if LANG=ja_JP.eucJP, _not_ if LANG=C.

Here is a patch.  The dired offset are documented as being byte counts,
not character counts.  The bug happens in any multibyte locale.

Andreas.

2003-01-27  Andreas Schwab  <schwab@suse.de>

	* src/ls.c (quote_name): Add fourth parameter width into which to
	store the screen columns and return number of bytes instead.
	(print_dir): Pass NULL as fourth parameter of quote_name.
	(print_name_with_quoting): Likewise.
	(length_of_file_name_and_frills): Get the width from the fourth
	parameter of quote_name instead of return value.

--- src/ls.c	2002/12/16 18:58:01	1.3
+++ src/ls.c	2003/01/27 10:50:51
@@ -255,7 +255,8 @@ char *getgroup ();
 char *getuser ();
 
 static size_t quote_name PARAMS ((FILE *out, const char *name,
-				  struct quoting_options const *options));
+				  struct quoting_options const *options,
+				  size_t *width));
 static char *make_link_path PARAMS ((const char *path, const char *linkname));
 static int decode_switches PARAMS ((int argc, char **argv));
 static int file_interesting PARAMS ((const struct dirent *next));
@@ -2222,7 +2223,7 @@ print_dir (const char *name, const char 
       DIRED_INDENT ();
       PUSH_CURRENT_DIRED_POS (&subdired_obstack);
       dired_pos += quote_name (stdout, realname ? realname : name,
-			       dirname_quoting_options);
+			       dirname_quoting_options, NULL);
       PUSH_CURRENT_DIRED_POS (&subdired_obstack);
       DIRED_FPUTS_LITERAL (":\n", stdout);
     }
@@ -3064,11 +3065,13 @@ print_long_format (const struct fileinfo
 
 /* Output to OUT a quoted representation of the file name NAME,
    using OPTIONS to control quoting.  Produce no output if OUT is NULL.
-   Return the number of screen columns occupied by NAME's quoted
-   representation.  */
+   Store the number of screen columns occupied by NAME's quoted
+   representation into WIDTH, if non-NULL.  Return the number of bytes
+   produced.  */
 
 static size_t
-quote_name (FILE *out, const char *name, struct quoting_options const *options)
+quote_name (FILE *out, const char *name, struct quoting_options const *options,
+	    size_t *width)
 {
   char smallbuf[BUFSIZ];
   size_t len = quotearg_buffer (smallbuf, sizeof smallbuf, name, -1, options);
@@ -3203,20 +3206,32 @@ quote_name (FILE *out, const char *name,
 	  displayed_width = len;
 	}
     }
-  else
+  else if (width != NULL)
     {
-      /* Assume unprintable characters have a displayed_width of 1.  */
 #if HAVE_MBRTOWC
       if (MB_CUR_MAX > 1)
 	displayed_width = mbsnwidth (buf, len, 0);
       else
 #endif
-	displayed_width = len;
+	{
+	  char *p = buf;
+	  char const *plimit = buf + len;
+
+	  displayed_width = 0;
+	  while (p < plimit)
+	    {
+	      if (ISPRINT ((unsigned char) *p))
+		displayed_width++;
+	      p++;
+	    }
+	}
     }
 
   if (out != NULL)
     fwrite (buf, 1, len, out);
-  return displayed_width;
+  if (width != NULL)
+    *width = displayed_width;
+  return len;
 }
 
 static void
@@ -3229,7 +3244,7 @@ print_name_with_quoting (const char *p, 
   if (stack)
     PUSH_CURRENT_DIRED_POS (stack);
 
-  dired_pos += quote_name (stdout, p, filename_quoting_options);
+  dired_pos += quote_name (stdout, p, filename_quoting_options, NULL);
 
   if (stack)
     PUSH_CURRENT_DIRED_POS (stack);
@@ -3395,6 +3410,7 @@ static int
 length_of_file_name_and_frills (const struct fileinfo *f)
 {
   register int len = 0;
+  size_t name_width;
 
   if (print_inode)
     len += INODE_DIGITS + 1;
@@ -3402,7 +3418,8 @@ length_of_file_name_and_frills (const st
   if (print_block_size)
     len += 1 + block_size_size;
 
-  len += quote_name (NULL, f->name, filename_quoting_options);
+  quote_name (NULL, f->name, filename_quoting_options, &name_width);
+  len += name_width;
 
   if (indicator_style != none)
     {

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-27  5:01             ` Kenichi Handa
@ 2003-01-27 10:58               ` Andreas Schwab
  2003-01-27 11:09                 ` Kenichi Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Andreas Schwab @ 2003-01-27 10:58 UTC (permalink / raw)
  Cc: miles

Kenichi Handa <handa@m17n.org> writes:

|> Mine is "ls (fileutils) 4.1", and that works correctly even
|> if the stdout is a pile in any locales as below.

fileutils 4.1 didn't support multibyte yet, IIRC.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-27 10:58               ` Andreas Schwab
@ 2003-01-27 11:09                 ` Kenichi Handa
  2003-01-27 12:15                   ` Andreas Schwab
  0 siblings, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2003-01-27 11:09 UTC (permalink / raw)
  Cc: miles

In article <jeu1fun9kw.fsf@sykes.suse.de>, Andreas Schwab <schwab@suse.de> writes:
> Kenichi Handa <handa@m17n.org> writes:
> |> Mine is "ls (fileutils) 4.1", and that works correctly even
> |> if the stdout is a pile in any locales as below.

> fileutils 4.1 didn't support multibyte yet, IIRC.

What do you mean by "don't support multibyte"?

It seems that it is at least localized.  In ja_JP.eucJP
locale, ls -l --dired outputs the Japanese text meaing
"total" at the first line, and offests to file names are
also correct.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-27 11:09                 ` Kenichi Handa
@ 2003-01-27 12:15                   ` Andreas Schwab
  2003-02-03  0:17                     ` Kenichi Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Andreas Schwab @ 2003-01-27 12:15 UTC (permalink / raw)
  Cc: miles

Kenichi Handa <handa@m17n.org> writes:

|> In article <jeu1fun9kw.fsf@sykes.suse.de>, Andreas Schwab <schwab@suse.de> writes:
|> > Kenichi Handa <handa@m17n.org> writes:
|> > |> Mine is "ls (fileutils) 4.1", and that works correctly even
|> > |> if the stdout is a pile in any locales as below.
|> 
|> > fileutils 4.1 didn't support multibyte yet, IIRC.
|> 
|> What do you mean by "don't support multibyte"?

I just checked, 4.1 has already all support.  Sorry for confusion.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-27 10:56             ` Andreas Schwab
@ 2003-01-27 13:35               ` Jim Meyering
  0 siblings, 0 replies; 38+ messages in thread
From: Jim Meyering @ 2003-01-27 13:35 UTC (permalink / raw)
  Cc: Miles Bader

> Here is a patch.  The dired offset are documented as being byte counts,
> not character counts.  The bug happens in any multibyte locale.
>
> Andreas.
>
> 2003-01-27  Andreas Schwab  <schwab@suse.de>
>
> 	* src/ls.c (quote_name): Add fourth parameter width into which to
> 	store the screen columns and return number of bytes instead.
> 	(print_dir): Pass NULL as fourth parameter of quote_name.
> 	(print_name_with_quoting): Likewise.
> 	(length_of_file_name_and_frills): Get the width from the fourth
> 	parameter of quote_name instead of return value.

Hi Andreas!

Thanks for that patch.
I've applied it.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-01-27 12:15                   ` Andreas Schwab
@ 2003-02-03  0:17                     ` Kenichi Handa
  2003-02-03  1:24                       ` Miles Bader
                                         ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Kenichi Handa @ 2003-02-03  0:17 UTC (permalink / raw)
  Cc: miles

In article <jeof62n5zn.fsf@sykes.suse.de>, Andreas Schwab <schwab@suse.de> writes:
> I just checked, 4.1 has already all support.  Sorry for confusion.

I see.  But, anyway, "ls (coreutils) 4.5.4" has a bug.  If
this version of "ls" is already widely spread, shouldn't
Emacs pay special attention to such a buggy "ls"?

Dave Love <d.love@dl.ac.uk> writes:
> It seems more useful to count in characters (assuming they're
> decodable), but the problem remains that it's now broken on current
> systems.  If there are incompatible versions of ls, I think you have
> to check `ls --version' to decide what to do.

I don't know the relation of "ls (coreutils)" and "ls
(fileutils)".  Does anyone know how to detect a buggy "ls"?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  0:17                     ` Kenichi Handa
@ 2003-02-03  1:24                       ` Miles Bader
  2003-02-03  2:11                         ` Kenichi Handa
  2003-02-03 17:44                         ` Dave Love
  2003-02-03  9:37                       ` Jim Meyering
  2003-02-03 17:20                       ` Richard Stallman
  2 siblings, 2 replies; 38+ messages in thread
From: Miles Bader @ 2003-02-03  1:24 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:
> > I just checked, 4.1 has already all support.  Sorry for confusion.
> 
> I see.  But, anyway, "ls (coreutils) 4.5.4" has a bug.  If
> this version of "ls" is already widely spread, shouldn't
> Emacs pay special attention to such a buggy "ls"?

Is it worth the trouble?  As far as I know the problem only occurs with
newlines in filenames, which is an extremely rare thing; as long as it
gets fixed in the next version, that seems good enough to me...

> Dave Love <d.love@dl.ac.uk> writes:
> > It seems more useful to count in characters (assuming they're
> > decodable), but the problem remains that it's now broken on current
> > systems.

Why is it more useful to count in characters?

Of course that makes things a bit simpler for emacs, but counting in
bytes has the advantage that a tool doesn't have to be support the
coding system ls does in order to grab the filenames.  Since it seems
easier for a `smart' (coding-system aware) tool like emacs to act
`dumb', than for a dumb tool to act smart, that suggest to me that it's
better to use the dumb method (byte counts).

Of course I suppose you could just argue that --dired is for emacs'
use, and should just do whatever it the most convenient for emacs.

[actually, does ls itself even know the actual character counts, or is
it just regurgitating binary chunks that it doesn't interpret?]

> > If there are incompatible versions of ls, I think you have
> > to check `ls --version' to decide what to do.
> 
> I don't know the relation of "ls (coreutils)" and "ls
> (fileutils)".  Does anyone know how to detect a buggy "ls"?

coreutils is just a merge of fileutils + shutils + ...

The NEWS file for coreutils says:

   [4.5.1]
     ...
     This package is the union of the following:
     textutils-2.1, fileutils-4.1.11, sh-utils-2.0.15.

-Miles
-- 
I have seen the enemy, and he is us.  -- Pogo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  1:24                       ` Miles Bader
@ 2003-02-03  2:11                         ` Kenichi Handa
  2003-02-03  2:22                           ` Miles Bader
  2003-02-03 17:47                           ` Dave Love
  2003-02-03 17:44                         ` Dave Love
  1 sibling, 2 replies; 38+ messages in thread
From: Kenichi Handa @ 2003-02-03  2:11 UTC (permalink / raw)
  Cc: emacs-devel

In article <buoptqarwax.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>  > I just checked, 4.1 has already all support.  Sorry for confusion.
>>  
>>  I see.  But, anyway, "ls (coreutils) 4.5.4" has a bug.  If
>>  this version of "ls" is already widely spread, shouldn't
>>  Emacs pay special attention to such a buggy "ls"?

> Is it worth the trouble?  As far as I know the problem only occurs with
> newlines in filenames, which is an extremely rare thing; as long as it
> gets fixed in the next version, that seems good enough to me...

If the bug happens only for such filenames, I agree that
it's not worth working on it anymore.

But...

Andreas Schwab <schwab@suse.de> writes:
> Here is a patch.  The dired offset are documented as being byte counts,
> not character counts.  The bug happens in any multibyte locale.

This statement reads that any character encoded by multiple
bytes in a filename causes a trouble, for instance any CJK
characters in CJK locales or any non-ASCII chars in UTF-8
locale.  As you can use ja_JP.eucJP locale, could you please
try some Japanese file name?

> coreutils is just a merge of fileutils + shutils + ...

> The NEWS file for coreutils says:

>    [4.5.1]
>      ...
>      This package is the union of the following:
>      textutils-2.1, fileutils-4.1.11, sh-utils-2.0.15.

Hmmm, then, it's strange that "ls (fileutils) 4.1" works,
but "ls (coreutils) 4.5" doesn't.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  2:11                         ` Kenichi Handa
@ 2003-02-03  2:22                           ` Miles Bader
  2003-02-03  8:40                             ` Kenichi Handa
  2003-02-03 17:47                           ` Dave Love
  1 sibling, 1 reply; 38+ messages in thread
From: Miles Bader @ 2003-02-03  2:22 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:
> > Is it worth the trouble?  As far as I know the problem only occurs with
> > newlines in filenames, which is an extremely rare thing; as long as it
> > gets fixed in the next version, that seems good enough to me...
> 
> If the bug happens only for such filenames, I agree that
> it's not worth working on it anymore.
...
> This statement reads that any character encoded by multiple bytes in a
> filename causes a trouble, for instance any CJK characters in CJK
> locales or any non-ASCII chars in UTF-8 locale.  As you can use
> ja_JP.eucJP locale, could you please try some Japanese file name?

They work fine for me.

My impression was that the bug only occured with newlines, and looking
at Andrea's patch seems to confirm this.  I think that when Andreas
mentioned byte vs character counts it was merely to clarify the issue
(as earlier there was some confusion about that in this thread).

> >    [4.5.1]
> >      ...
> >      This package is the union of the following:
> >      textutils-2.1, fileutils-4.1.11, sh-utils-2.0.15.
> 
> Hmmm, then, it's strange that "ls (fileutils) 4.1" works,
> but "ls (coreutils) 4.5" doesn't.

Why?  I think it just means that the bug was introduced after fileutils-4.1.

-miles
-- 
`Life is a boundless sea of bitterness'

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  2:22                           ` Miles Bader
@ 2003-02-03  8:40                             ` Kenichi Handa
  2003-02-03  9:02                               ` Miles Bader
  0 siblings, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2003-02-03  8:40 UTC (permalink / raw)
  Cc: emacs-devel

In article <buobs1urtm3.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
>>  This statement reads that any character encoded by multiple bytes in a
>>  filename causes a trouble, for instance any CJK characters in CJK
>>  locales or any non-ASCII chars in UTF-8 locale.  As you can use
>>  ja_JP.eucJP locale, could you please try some Japanese file name?

> They work fine for me.

> My impression was that the bug only occured with newlines, and looking
> at Andrea's patch seems to confirm this.  I think that when Andreas
> mentioned byte vs character counts it was merely to clarify the issue
> (as earlier there was some confusion about that in this thread).

I read his mail again, and found that the bug is not in
chars vs bytes, but columns vs bytes.

So, ja_JP.eucJP was not a good example because, in that
locale, column numbers and bytes are equal.

Please try some UTF-8 locale (e.g. en_US.UTF-8) with Latin-1
filenames.  I believe that the current dired will be
confused.

>>  >      This package is the union of the following:
>>  >      textutils-2.1, fileutils-4.1.11, sh-utils-2.0.15.
>>  
>>  Hmmm, then, it's strange that "ls (fileutils) 4.1" works,
>>  but "ls (coreutils) 4.5" doesn't.

> Why?  I think it just means that the bug was introduced after fileutils-4.1.

Ah, sure.  I blindly thought that the last digits "11" was
not important.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  8:40                             ` Kenichi Handa
@ 2003-02-03  9:02                               ` Miles Bader
  2003-02-03  9:10                                 ` Kenichi Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Miles Bader @ 2003-02-03  9:02 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:
> I read his mail again, and found that the bug is not in
> chars vs bytes, but columns vs bytes.
> 
> So, ja_JP.eucJP was not a good example because, in that
> locale, column numbers and bytes are equal.
>
> Please try some UTF-8 locale (e.g. en_US.UTF-8) with Latin-1
> filenames.  I believe that the current dired will be
> confused.

Um, yes.  If LANG=en_US.utf-8, then ls's output seems to be correct,
counting by bytes, but dired screws it up for some reason...
I thought the current dired was supposed to also count by bytes?

-Miles
-- 
Yo mama's so fat when she gets on an elevator it HAS to go down.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  9:02                               ` Miles Bader
@ 2003-02-03  9:10                                 ` Kenichi Handa
  2003-02-03  9:22                                   ` Miles Bader
  2003-02-03 11:00                                   ` Andreas Schwab
  0 siblings, 2 replies; 38+ messages in thread
From: Kenichi Handa @ 2003-02-03  9:10 UTC (permalink / raw)
  Cc: emacs-devel

In article <buod6m9rb2j.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
>>  Please try some UTF-8 locale (e.g. en_US.UTF-8) with Latin-1
>>  filenames.  I believe that the current dired will be
>>  confused.

> Um, yes.  If LANG=en_US.utf-8, then ls's output seems to be correct,
> counting by bytes, 

Really?  I thought ls's output counts columns, thus, for
instnace, the filename "À" is counted as 1, not 2.
Otherwise, the current dired should work well.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  9:10                                 ` Kenichi Handa
@ 2003-02-03  9:22                                   ` Miles Bader
  2003-02-03  9:37                                     ` Jim Meyering
  2003-02-03 11:00                                   ` Andreas Schwab
  1 sibling, 1 reply; 38+ messages in thread
From: Miles Bader @ 2003-02-03  9:22 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:
> > Um, yes.  If LANG=en_US.utf-8, then ls's output seems to be correct,
> > counting by bytes, 
> 
> Really?  I thought ls's output counts columns, thus, for
> instnace, the filename "À" is counted as 1, not 2.
> Otherwise, the current dired should work well.

You seem to be correct, if I create that file, then `ls --dired' says
it has a lengh of 1, but of course, it actually has a length of 2 bytes.

Will Andrea's patch fix this?

-Miles
-- 
Run away!  Run away!

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  0:17                     ` Kenichi Handa
  2003-02-03  1:24                       ` Miles Bader
@ 2003-02-03  9:37                       ` Jim Meyering
  2003-02-03 17:20                       ` Richard Stallman
  2 siblings, 0 replies; 38+ messages in thread
From: Jim Meyering @ 2003-02-03  9:37 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> wrote:
> In article <jeof62n5zn.fsf@sykes.suse.de>, Andreas Schwab <schwab@suse.de> writes:
>> I just checked, 4.1 has already all support.  Sorry for confusion.
>
> I see.  But, anyway, "ls (coreutils) 4.5.4" has a bug.  If
> this version of "ls" is already widely spread, shouldn't
> Emacs pay special attention to such a buggy "ls"?

FYI, I've just released coreutils-4.5.5.
The announcement went to a couple of lists;
it takes a while to reach the GNU archives

  http://mail.gnu.org/mailman/listinfo/coreutils-announce
  http://mail.gnu.org/archive/html/coreutils-announce/  (archives)

Hmm.  I see that it has already reached this archive:

  http://article.gmane.org/gmane.comp.gnu.fileutils.bugs/561

The sources are here:

  ftp://alpha.gnu.org/gnu/coreutils/coreutils-4.5.5.tar.bz2

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  9:22                                   ` Miles Bader
@ 2003-02-03  9:37                                     ` Jim Meyering
  0 siblings, 0 replies; 38+ messages in thread
From: Jim Meyering @ 2003-02-03  9:37 UTC (permalink / raw)
  Cc: emacs-devel

Miles Bader <miles@lsi.nec.co.jp> wrote:
...
> You seem to be correct, if I create that file, then `ls --dired' says
> it has a lengh of 1, but of course, it actually has a length of 2 bytes.
>
> Will Andrea's patch fix this?

Yes.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  9:10                                 ` Kenichi Handa
  2003-02-03  9:22                                   ` Miles Bader
@ 2003-02-03 11:00                                   ` Andreas Schwab
  2003-02-03 11:17                                     ` Kenichi Handa
  1 sibling, 1 reply; 38+ messages in thread
From: Andreas Schwab @ 2003-02-03 11:00 UTC (permalink / raw)
  Cc: miles

Kenichi Handa <handa@m17n.org> writes:

|> In article <buod6m9rb2j.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
|> >>  Please try some UTF-8 locale (e.g. en_US.UTF-8) with Latin-1
|> >>  filenames.  I believe that the current dired will be
|> >>  confused.
|> 
|> > Um, yes.  If LANG=en_US.utf-8, then ls's output seems to be correct,
|> > counting by bytes, 
|> 
|> Really?  I thought ls's output counts columns, thus, for
|> instnace, the filename "À" is counted as 1, not 2.
|> Otherwise, the current dired should work well.

The dired offsets are explicitly documented as counting bytes, *note
(coreutils)What information is listed::.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03 11:00                                   ` Andreas Schwab
@ 2003-02-03 11:17                                     ` Kenichi Handa
  2003-02-13 13:58                                       ` Dave Love
  0 siblings, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2003-02-03 11:17 UTC (permalink / raw)
  Cc: miles

Andreas Schwab <schwab@suse.de> writes:
> |> Really?  I thought ls's output counts columns, thus, for
> |> instnace, the filename "À" is counted as 1, not 2.
> |> Otherwise, the current dired should work well.

> The dired offsets are explicitly documented as counting bytes, *note
> (coreutils)What information is listed::.

Ah, yes, I know that.  What I meant was "that buggy ls's
output counts columns".

Miles Bader <miles@lsi.nec.co.jp> writes:
> You seem to be correct, if I create that file, then `ls --dired' says
> it has a lengh of 1, but of course, it actually has a length of 2 bytes.

Ok.  Then what should we do?   I think checking version of
ls is too kludgy.

Please try this workaround.  It avoids setting
`dired-filename' property if the next character of filename
is not a newline.  I think it detects the problem of "ls" in
most cases by a low cost.

---
Ken'ichi HANDA
handa@m17n.org


*** files.el.~1.632.~	2003-02-01 00:16:47.000000000 +0900
--- files.el	2003-02-03 20:03:30.000000000 +0900
***************
*** 4106,4112 ****
  	      (while (< (point) end)
  		(let ((start (+ beg (read (current-buffer))))
  		      (end (+ beg (read (current-buffer)))))
! 		  (put-text-property start end 'dired-filename t)))
  	      (goto-char end)
  	      (beginning-of-line)
  	      (delete-region (point) (progn (forward-line 2) (point)))))
--- 4106,4117 ----
  	      (while (< (point) end)
  		(let ((start (+ beg (read (current-buffer))))
  		      (end (+ beg (read (current-buffer)))))
! 		  (if (= (char-after end) ?\n)
! 		      (put-text-property start end 'dired-filename t)
! 		    ;; It seems that we can't trust ls's output as to
! 		    ;; byte positions of filenames.
! 		    (put-text-property beg (point) 'dired-filename nil)
! 		    (end-of-line))))
  	      (goto-char end)
  	      (beginning-of-line)
  	      (delete-region (point) (progn (forward-line 2) (point)))))

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  0:17                     ` Kenichi Handa
  2003-02-03  1:24                       ` Miles Bader
  2003-02-03  9:37                       ` Jim Meyering
@ 2003-02-03 17:20                       ` Richard Stallman
  2003-02-03 18:53                         ` Andreas Schwab
  2 siblings, 1 reply; 38+ messages in thread
From: Richard Stallman @ 2003-02-03 17:20 UTC (permalink / raw)
  Cc: emacs-devel

    I see.  But, anyway, "ls (coreutils) 4.5.4" has a bug.  If
    this version of "ls" is already widely spread, shouldn't
    Emacs pay special attention to such a buggy "ls"?

Maybe it should.  It depends how hard that is to do, versus how
hard it is for people to upgrade coreutils.  Is a fixed version
of coreutils available?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  1:24                       ` Miles Bader
  2003-02-03  2:11                         ` Kenichi Handa
@ 2003-02-03 17:44                         ` Dave Love
  2003-02-03 18:45                           ` Michael Livshin
  1 sibling, 1 reply; 38+ messages in thread
From: Dave Love @ 2003-02-03 17:44 UTC (permalink / raw)
  Cc: Kenichi Handa

Miles Bader <miles@lsi.nec.co.jp> writes:

> Is it worth the trouble?

I'm confused by what's being discussed here, but it is surely worth
the trouble to have dired working properly in multibyte locales.

> As far as I know the problem only occurs with
> newlines in filenames,

No.  In locale en_GB.UTF-8 in Debian Woody, create a file with
arbitrary Latin-1 characters in the name and observe that dired
positioning screws up after that filename occurs.

It works in 21.2 (using my utf-8 language definition) and worked in
the development code as it was before Christmas.  It no longer does in
the development code.  [I made some other changes to dired, unrelated
to encoding and not installed, and thought I'd broken it somehow.]

> Why is it more useful to count in characters?

Because that makes it easier for Emacs, which is --dired's stated
intention.

> Of course that makes things a bit simpler for emacs, but counting in
> bytes has the advantage that a tool doesn't have to be support the
> coding system ls does in order to grab the filenames.

That's exactly what I said, but if you don't support the encoding, you
lose anyhow.

[None of this actually helps people with an ls which doesn't support
--dired, of course.  I still think you should consider using a
specified LC_TIME.  If it's a real problem that users won't get the
date in the format they expect, run ls twice, first to find the names
with LC_TIME=C and then to display the results.]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03  2:11                         ` Kenichi Handa
  2003-02-03  2:22                           ` Miles Bader
@ 2003-02-03 17:47                           ` Dave Love
  1 sibling, 0 replies; 38+ messages in thread
From: Dave Love @ 2003-02-03 17:47 UTC (permalink / raw)
  Cc: miles

Kenichi Handa <handa@m17n.org> writes:

> Hmmm, then, it's strange that "ls (fileutils) 4.1" works,
> but "ls (coreutils) 4.5" doesn't.

No.  The latest (as of a few days ago) Dired used with `ls (fileutils)
4.1' on Debian Woody (the current release of the GNU preferred system)
is broken as I described.

I don't know whether or not ls has had relevant changes applied by
Debian, but anyhow if there are well-established incompatible
versions, Emacs needs to test for them and DTRT.  [I expect you agree
-- I'm just stating the position.]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03 17:44                         ` Dave Love
@ 2003-02-03 18:45                           ` Michael Livshin
  2003-02-03 19:13                             ` Eli Zaretskii
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Livshin @ 2003-02-03 18:45 UTC (permalink / raw)


Dave Love <d.love@dl.ac.uk> writes:

> Miles Bader <miles@lsi.nec.co.jp> writes:
>
>> Is it worth the trouble?
>
> I'm confused by what's being discussed here, but it is surely worth
> the trouble to have dired working properly in multibyte locales.

I'm sorry for possibly asking a stupid question, but why does dired
use `ls' *at* *all*?

it looks like all the needed primitives are there in Emacs.

changing dired to use those sounds (to the naive me) far easier than
debugging Emacs<->ls interaction in all possible values of
(language-environment X ls-version).

just a thought,
--m

-- 
Being really good at C++ is like being really good at using rocks to
sharpen sticks.
                -- Thant Tessman

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03 17:20                       ` Richard Stallman
@ 2003-02-03 18:53                         ` Andreas Schwab
  0 siblings, 0 replies; 38+ messages in thread
From: Andreas Schwab @ 2003-02-03 18:53 UTC (permalink / raw)
  Cc: Kenichi Handa

Richard Stallman <rms@gnu.org> writes:

|>     I see.  But, anyway, "ls (coreutils) 4.5.4" has a bug.  If
|>     this version of "ls" is already widely spread, shouldn't
|>     Emacs pay special attention to such a buggy "ls"?
|> 
|> Maybe it should.  It depends how hard that is to do, versus how
|> hard it is for people to upgrade coreutils.  Is a fixed version
|> of coreutils available?

The fix has just been released as part of coreutils 4.5.5.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03 18:45                           ` Michael Livshin
@ 2003-02-03 19:13                             ` Eli Zaretskii
  0 siblings, 0 replies; 38+ messages in thread
From: Eli Zaretskii @ 2003-02-03 19:13 UTC (permalink / raw)
  Cc: emacs-devel

> From: Michael Livshin <usenet@cmm.kakpryg.net>
> Date: Mon, 03 Feb 2003 20:45:11 +0200
> 
> I'm sorry for possibly asking a stupid question, but why does dired
> use `ls' *at* *all*?
> 
> it looks like all the needed primitives are there in Emacs.

If you mean that ls-lisp.el should be used on all platforms, not only
on those where `ls' might not be installed, then I think the reason
is ls-lisp is slower.

Also, some of the `ls' options are not yet emulated by ls-lisp.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-03 11:17                                     ` Kenichi Handa
@ 2003-02-13 13:58                                       ` Dave Love
  2003-02-17  6:19                                         ` Kenichi Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Love @ 2003-02-13 13:58 UTC (permalink / raw)
  Cc: miles

Kenichi Handa <handa@m17n.org> writes:

> Please try this workaround.  It avoids setting
> `dired-filename' property if the next character of filename
> is not a newline.  I think it detects the problem of "ls" in
> most cases by a low cost.

That appears to work for my case.  I suggest installing it.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: dired doesn't work properly with a multibyte locale
  2003-02-13 13:58                                       ` Dave Love
@ 2003-02-17  6:19                                         ` Kenichi Handa
  0 siblings, 0 replies; 38+ messages in thread
From: Kenichi Handa @ 2003-02-17  6:19 UTC (permalink / raw)
  Cc: miles

In article <rzq65rojn90.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:

> Kenichi Handa <handa@m17n.org> writes:
>>  Please try this workaround.  It avoids setting
>>  `dired-filename' property if the next character of filename
>>  is not a newline.  I think it detects the problem of "ls" in
>>  most cases by a low cost.

> That appears to work for my case.  I suggest installing it.

Thank you for testing it.  I've just installed that change.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2003-02-17  6:19 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-06  6:04 dired doesn't work properly with a multibyte locale Miles Bader
2003-01-11 20:00 ` Stefan Monnier
2003-01-11 20:16   ` Miles Bader
2003-01-12 11:56 ` Richard Stallman
2003-01-15 10:43 ` Kenichi Handa
2003-01-15 23:30   ` Richard Stallman
2003-01-23  4:31   ` Miles Bader
2003-01-23  6:02     ` Kenichi Handa
2003-01-23  6:12       ` Miles Bader
2003-01-25  0:49         ` Kenichi Handa
2003-01-27  4:17           ` Miles Bader
2003-01-27  5:01             ` Kenichi Handa
2003-01-27 10:58               ` Andreas Schwab
2003-01-27 11:09                 ` Kenichi Handa
2003-01-27 12:15                   ` Andreas Schwab
2003-02-03  0:17                     ` Kenichi Handa
2003-02-03  1:24                       ` Miles Bader
2003-02-03  2:11                         ` Kenichi Handa
2003-02-03  2:22                           ` Miles Bader
2003-02-03  8:40                             ` Kenichi Handa
2003-02-03  9:02                               ` Miles Bader
2003-02-03  9:10                                 ` Kenichi Handa
2003-02-03  9:22                                   ` Miles Bader
2003-02-03  9:37                                     ` Jim Meyering
2003-02-03 11:00                                   ` Andreas Schwab
2003-02-03 11:17                                     ` Kenichi Handa
2003-02-13 13:58                                       ` Dave Love
2003-02-17  6:19                                         ` Kenichi Handa
2003-02-03 17:47                           ` Dave Love
2003-02-03 17:44                         ` Dave Love
2003-02-03 18:45                           ` Michael Livshin
2003-02-03 19:13                             ` Eli Zaretskii
2003-02-03  9:37                       ` Jim Meyering
2003-02-03 17:20                       ` Richard Stallman
2003-02-03 18:53                         ` Andreas Schwab
2003-01-27 10:56             ` Andreas Schwab
2003-01-27 13:35               ` Jim Meyering
2003-01-24  5:42     ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).