From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Strange behaviour with dired and UTF8 Date: Wed, 7 May 2003 10:08:23 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200305070108.KAA22606@etlken.m17n.org> References: <6129D384-7FED-11D7-81D0-00039363E640@swipnet.se> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1052269868 8151 80.91.224.249 (7 May 2003 01:11:08 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 7 May 2003 01:11:08 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed May 07 03:11:06 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19DDSJ-00026O-00 for ; Wed, 07 May 2003 03:10:47 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 19DDW3-0005FJ-00 for ; Wed, 07 May 2003 03:14:39 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19DDRF-0005Pw-00 for emacs-devel@quimby.gnus.org; Tue, 06 May 2003 21:09:41 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 19DDQk-00051b-00 for emacs-devel@gnu.org; Tue, 06 May 2003 21:09:10 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 19DDQ9-0003sa-00 for emacs-devel@gnu.org; Tue, 06 May 2003 21:09:04 -0400 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19DDQ6-0003dn-00 for emacs-devel@gnu.org; Tue, 06 May 2003 21:08:31 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h4718No12843; Wed, 7 May 2003 10:08:23 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) h4718NA07782; Wed, 7 May 2003 10:08:23 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id KAA22606; Wed, 7 May 2003 10:08:23 +0900 (JST) Original-To: jan.h.d@swipnet.se In-reply-to: <6129D384-7FED-11D7-81D0-00039363E640@swipnet.se> (jan.h.d@swipnet.se) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:13728 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13728 In article <6129D384-7FED-11D7-81D0-00039363E640@swipnet.se>, "Jan D." writes: >>> I agree that this is bad, but I am not sure anything can be done >>> about it. >> >> How about my proposal? Doesn't it solve this problem? > It depends on what the file-name-coding-system-alist looks like. If it > contains full file name path, it could. Maybe it is best to try it. It should contain a regular expression matching a directory or a file name. > I think it is bad to hawe multiple information sources that has to > be consulted to figure out the original file name (the display file > name, the buffer encoding, file system encoding, and the new alist). > At some point Emacs must have had the original file name. It is a > shame to throw away that knowledge and then try to reconstruct it. Unless we have a mechanism to always keep that knowlege, it is not reliable. For instance, even if we keep the original filename as a text property of a filename string, a filename string may be modified in various ways and make the property value obsolete. And, I don't know if the names listed in *Completion* buffer can keep that property. So, I think keeping the information about the original filename in an alist is the most reliable way. In addition, we can use that information in the future emacs session, which is also an important point. > An other approach would be to always keep file names as is (i.e. > the original file name) and put some sort of property on it that is the > encoding. This would require that the display engine can display these > with right encoding. That way the manipulations is always done on and > with the original file name. I strongly oppose to that method. Emacs should not work on undecoded raw bytes. A filename is a kind of text, and thus a user should be able to handle it as a text (edit, copy&paste, etc). >>> I am not sure your case covers all cases. If a file name was >>> latin-1 and then converted to UTF8 (outside Emacs), Emacs would think >>> it is >>> still latin-1, no? >>> It involves a bit of user interaction, making it intrusive. >> >> Yes, but I think Emacs doesn't have to care about such a >> case. > Why not? I think this is about as bad as the failure of the > *Completion* buffer. Maybe worse, because you can not open the file > at all. If that filename is recoded as latin-1 in file-name-coding-system-alist, we can open that file by customizing file-name-coding-system-alist. If that filename is not recoded in the alist, we can open that file by switching to utf-8 lang. env., or by setting file-name-coding-system to utf-8, or by customizing file-name-coding-system-alist. --- Ken'ichi HANDA handa@m17n.org