From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: dired doesn't work properly with a multibyte locale Date: Wed, 15 Jan 2003 19:43:55 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200301151043.TAA09856@etlken.m17n.org> References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1042627495 17986 80.91.224.249 (15 Jan 2003 10:44:55 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 15 Jan 2003 10:44:55 +0000 (UTC) Cc: emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18Yl2T-0004fx-00 for ; Wed, 15 Jan 2003 11:44:53 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18YlAs-0001JV-00 for ; Wed, 15 Jan 2003 11:53:34 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18Yl3D-0001gh-08 for emacs-devel@quimby.gnus.org; Wed, 15 Jan 2003 05:45:39 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18Yl2f-0001O6-00 for emacs-devel@gnu.org; Wed, 15 Jan 2003 05:45:05 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18Yl1p-00007p-00 for emacs-devel@gnu.org; Wed, 15 Jan 2003 05:44:15 -0500 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18Yl1b-0007l5-00; Wed, 15 Jan 2003 05:43:59 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h0FAhuk07769; Wed, 15 Jan 2003 19:43:56 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) h0FAhtR06434; Wed, 15 Jan 2003 19:43:55 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id TAA09856; Wed, 15 Jan 2003 19:43:55 +0900 (JST) Original-To: miles@gnu.org In-reply-to: (message from Miles Bader on 06 Jan 2003 15:04:26 +0900) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Original-cc: emacs-pretest-bug@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:10743 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:10743 Sorry for the late reply. In article , Miles Bader writes: > I'm now using a multibyte locale (LANG=ja_JP.eucJP), and dired is > screwed up: it can't properly find filenames in the directory listing. > The reason seems to be that dired uses `ls --dired', which encodes the > positions of filenames as byte-offsets into the ls output. However, my > system's `ls' program sees the non-C LANG, and so the `total' line at the > beginning of the ls output is now a multibyte-encoded word. Emacs decodes > this fine, but the number of characters in the decoded word is _not_ the > same as the number of bytes in the original ls output, so all the offsets > from --dired are wrong. [note that if there are multibyte-encoded > filenames, the offsets will get screwed up further later in the listing] > It doesn't seem simple to get the byte offset information, so perhaps the > best thing to do is simply not use --dired if `file-name-coding-system' is > a multibyte encoding. That change is simple to make in dired (and I just > manually set `dired-use-ls-dired' to nil), but I'm not sure how to tell if > a particular coding system is multibyte or not. It'd be nice if there was > a function like `coding-system-multibyte-p'... Even if we have such a function, it's very hard to correct the byte offset information for a multibyte coding system. Miles Bader writes: > On Sat, Jan 11, 2003 at 03:00:12PM -0500, Stefan Monnier wrote: >> > It doesn't seem simple to get the byte offset >> > information, so perhaps the best thing to do is simply >> > not use --dired if `file-name-coding-system' is a >> > multibyte encoding. That change is simple to make in >> > dired (and I just manually set `dired-use-ls-dired' to >> > nil), but I'm not sure how to tell if a particular >> > coding system is multibyte or not. It'd be nice if >> > there was a function like >> > `coding-system-multibyte-p'... >> >> The other solution is to get "ls --dired" output with a "binary" >> coding system, then use the byte-offsets to add text-properties, and >> then do the decode-coding-region. Yes. I think that is the correct fix. > Won't the decode-coding-region smash all the text-properties? It surely removes all text properties. But, we can preserve the text-property `dired-filename' by decoding one bunch by one. Could you please try the attached patch? I have not yet installed it because I don't have such a system at hand and can't test it. --- Ken'ichi HANDA handa@m17n.org 2003-01-15 Kenichi Handa * files.el (insert-directory): Read the output of "ls" by no-conversion, and decode it later while preserving `dired-filename' property. *** files.el.~1.630.~ Wed Jan 15 13:12:22 2003 --- files.el Wed Jan 15 17:44:45 2003 *************** *** 4017,4028 **** ;; Read the actual directory using `insert-directory-program'. ;; RESULT gets the status code. ! (let* ((coding-system-for-read (and enable-multibyte-characters (or file-name-coding-system ! default-file-name-coding-system))) ! ;; This is to control encoding the arguments in call-process. ! (coding-system-for-write coding-system-for-read)) (setq result (if wildcard ;; Run ls in the directory part of the file pattern --- 4017,4031 ---- ;; Read the actual directory using `insert-directory-program'. ;; RESULT gets the status code. ! (let* (;; We at first read by no-conversion, then after ! ;; putting text property `dired-filename, decode one ! ;; bunch by one to preserve that property. ! (coding-system-for-read 'no-conversion) ! ;; This is to control encoding the arguments in call-process. ! (coding-system-for-write (and enable-multibyte-characters (or file-name-coding-system ! default-file-name-coding-system)))) (setq result (if wildcard ;; Run ls in the directory part of the file pattern *************** *** 4105,4110 **** --- 4108,4130 ---- (goto-char end) (beginning-of-line) (delete-region (point) (progn (forward-line 2) (point))))) + + ;; Now decode what read if necessary. + (let ((coding (or coding-system-for-write + (detect-coding-region beg (point) t))) + val pos) + (if (not (eq (coding-system-base coding) 'undecided)) + (save-restriction + (narrow-to-region beg (point)) + (goto-char (point-min)) + (while (not (eobp)) + (setq pos (point) + val (get-text-property (point) 'dired-filename)) + (goto-char (next-single-property-change + (point) 'dired-filename nil (point-max))) + (decode-coding-region pos (point) coding) + (if val + (put-text-property pos (point) 'dired-filename t)))))) (if full-directory-p ;; Try to insert the amount of free space.