* Strange behaviour with dired and UTF8 @ 2003-04-24 11:43 Jan D. 2003-04-25 13:20 ` Kai Großjohann 2003-05-01 6:52 ` Kenichi Handa 0 siblings, 2 replies; 28+ messages in thread From: Jan D. @ 2003-04-24 11:43 UTC (permalink / raw) Hello. Maybe I am doing this wrong, but here is what I try to do. My language environment is ISO-8859-1. I have a directory that contains files with file names in UTF-8. I start dired on that directory. I want to see the UTF-8 characters so I do C-x RET r utf-8. File names display OK now. But when trying to operate on a file, say opening it, I get "File no longer exists; type `g' to update Dired buffer" It seems that dired does not keep the original file name around, but tries to open with the display name representation of the file name. When I type g, I loose the UTF-8 coding and files are now displayed as ISO-8859-1 again. Setting buffer coding to UTF-8 does not help. Do I have to set file-name-coding-system to UTF-8? This solves the problem, but my file-name-coding-system is really ISO-8859-1, it is just this one directory that is UTF-8. Thanks, Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-04-24 11:43 Strange behaviour with dired and UTF8 Jan D. @ 2003-04-25 13:20 ` Kai Großjohann 2003-05-01 6:52 ` Kenichi Handa 1 sibling, 0 replies; 28+ messages in thread From: Kai Großjohann @ 2003-04-25 13:20 UTC (permalink / raw) "Jan D." <jan.h.d@swipnet.se> writes: > But when trying to operate on a file, say opening it, I get > "File no longer exists; type `g' to update Dired buffer" > It seems that dired does not keep the original file name around, but > tries to open with the display name representation of the file name. Yeah, it seems that's how dired operates: it inserts the output from "ls -l" into the buffer and then does operations on that buffer to find the file name and suchlike. Hm. And the "ls -l" output contains not only the file names, it also contains the dates. I was going to suggest to have dired-find-file bind file-name-coding-system to the value used for reading the "ls -l" output, but that will break when the date and the file names use different encodings. -- file-error; Data: (Opening input file no such file or directory ~/.signature) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-04-24 11:43 Strange behaviour with dired and UTF8 Jan D. 2003-04-25 13:20 ` Kai Großjohann @ 2003-05-01 6:52 ` Kenichi Handa 2003-05-02 6:41 ` Kai Großjohann 2003-05-02 8:16 ` Jan D. 1 sibling, 2 replies; 28+ messages in thread From: Kenichi Handa @ 2003-05-01 6:52 UTC (permalink / raw) Cc: emacs-devel In article <200304241235.h3OCZdbL023178@stubby.bodenonline.com>, "Jan D." <jan.h.d@swipnet.se> writes: > Maybe I am doing this wrong, but here is what I try to do. > My language environment is ISO-8859-1. > I have a directory that contains files with file names in UTF-8. > I start dired on that directory. I want to see the UTF-8 characters > so I do C-x RET r utf-8. File names display OK now. > But when trying to operate on a file, say opening it, I get > "File no longer exists; type `g' to update Dired buffer" > It seems that dired does not keep the original file name around, but > tries to open with the display name representation of the file name. > When I type g, I loose the UTF-8 coding and files are now displayed > as ISO-8859-1 again. Setting buffer coding to UTF-8 does not help. > Do I have to set file-name-coding-system to UTF-8? This solves the > problem, but my file-name-coding-system is really ISO-8859-1, it is > just this one directory that is UTF-8. The current Emacs doesn't have a facility to cope with such a situation well. How about this? (1) Make a customizable variable file-name-coding-system-alist; the format is the same as file-coding-system-alist. (2) Make the macro ENCODE_FILE and DECODE_FILE to check that variable before using file-name-coding-system and default-file-name-coding-system. (3) Enhance the function dired-revert to update file-name-coding-system-alist automatically if it is called with coding-system-for-read being bound to non-nil. In that case, it may also have to ask a user to save that modification for the future session (via customize). What do people think? Aren't there any better idea? --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-01 6:52 ` Kenichi Handa @ 2003-05-02 6:41 ` Kai Großjohann 2003-05-02 8:16 ` Jan D. 1 sibling, 0 replies; 28+ messages in thread From: Kai Großjohann @ 2003-05-02 6:41 UTC (permalink / raw) Kenichi Handa <handa@m17n.org> writes: > What do people think? Aren't there any better idea? Your idea sounds good to me. Automatically saving the changes is potentially dangerous¹, but oh, well. ¹ Makes it easy for the user to shoot themselves in the foot: put setq statements for file-name-coding-system-alist after custom-set-variables, bam! -- file-error; Data: (Opening input file no such file or directory ~/.signature) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-01 6:52 ` Kenichi Handa 2003-05-02 6:41 ` Kai Großjohann @ 2003-05-02 8:16 ` Jan D. 2003-05-02 8:56 ` Kenichi Handa 1 sibling, 1 reply; 28+ messages in thread From: Jan D. @ 2003-05-02 8:16 UTC (permalink / raw) Cc: emacs-devel > In article <200304241235.h3OCZdbL023178@stubby.bodenonline.com>, "Jan > D." <jan.h.d@swipnet.se> writes: >> Maybe I am doing this wrong, but here is what I try to do. >> My language environment is ISO-8859-1. >> I have a directory that contains files with file names in UTF-8. >> I start dired on that directory. I want to see the UTF-8 characters >> so I do C-x RET r utf-8. File names display OK now. > >> But when trying to operate on a file, say opening it, I get >> "File no longer exists; type `g' to update Dired buffer" >> It seems that dired does not keep the original file name around, but >> tries to open with the display name representation of the file name. > >> When I type g, I loose the UTF-8 coding and files are now displayed >> as ISO-8859-1 again. Setting buffer coding to UTF-8 does not help. > >> Do I have to set file-name-coding-system to UTF-8? This solves the >> problem, but my file-name-coding-system is really ISO-8859-1, it is >> just this one directory that is UTF-8. > > The current Emacs doesn't have a facility to cope with such > a situation well. > > How about this? > > (1) Make a customizable variable > file-name-coding-system-alist; the format is the same as > file-coding-system-alist. > > (2) Make the macro ENCODE_FILE and DECODE_FILE to check that > variable before using file-name-coding-system and > default-file-name-coding-system. > > (3) Enhance the function dired-revert to update > file-name-coding-system-alist automatically if it is > called with coding-system-for-read being bound to > non-nil. In that case, it may also have to ask a user > to save that modification for the future session (via > customize). > > What do people think? Aren't there any better idea? This sounds very complicated. As I understand it, dired first gets the file name from ls (original representation), then converts that to whatever encoding it shall use to show it in the buffer (view representation). When dired operates on the file (opening for example), it converts back from the view representation, hoping to get the original representation. But this may fail, since conversion from view back to original is not one-to-one. This work (original representation -> view representation -> original representation) should not be needed, IMHO. Why just not keep the original representation around (some kind of text property on the file name?) and always use that when operating on the file? That change would be transparent to users. I do not know how dired works, but I think a separation of original representation and view representation would make it easier for dired to use any encoding to view the files. Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-02 8:16 ` Jan D. @ 2003-05-02 8:56 ` Kenichi Handa 2003-05-02 9:59 ` Jan D. 2003-05-03 15:03 ` Richard Stallman 0 siblings, 2 replies; 28+ messages in thread From: Kenichi Handa @ 2003-05-02 8:56 UTC (permalink / raw) Cc: emacs-devel In article <6DDE98F0-7C76-11D7-8080-00039363E640@swipnet.se>, "Jan D." <jan.h.d@swipnet.se> writes: >> How about this? >> >> (1) Make a customizable variable >> file-name-coding-system-alist; the format is the same as >> file-coding-system-alist. >> >> (2) Make the macro ENCODE_FILE and DECODE_FILE to check that >> variable before using file-name-coding-system and >> default-file-name-coding-system. >> >> (3) Enhance the function dired-revert to update >> file-name-coding-system-alist automatically if it is >> called with coding-system-for-read being bound to >> non-nil. In that case, it may also have to ask a user >> to save that modification for the future session (via >> customize). >> >> What do people think? Aren't there any better idea? > This sounds very complicated. As I understand it, dired first gets > the file name from ls (original representation), then converts that to > whatever encoding it shall use to show it in the buffer (view > representation). When dired operates on the file (opening for example), > it converts back from the view representation, hoping to get the > original representation. But this may fail, since conversion > from view back to original is not one-to-one. It is sure that there's a possibility that encoding a filename can't get the original filename. But, Emacs anyway can't handle such a filename. > This work (original representation -> view representation -> original > representation) should not be needed, IMHO. Why just not keep the > original representation around (some kind of text property on the file > name?) and always use that when operating on the file? That change > would be transparent to users. A user may type C-x C-f FILENAME in the dired buffer. With the above method, we don't know how to encode FILENAME. And, even if one types `f' to visit a file, in that file buffer, we loose the information of the original representation. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-02 8:56 ` Kenichi Handa @ 2003-05-02 9:59 ` Jan D. 2003-05-02 11:22 ` Kenichi Handa 2003-05-03 15:03 ` Richard Stallman 1 sibling, 1 reply; 28+ messages in thread From: Jan D. @ 2003-05-02 9:59 UTC (permalink / raw) Cc: emacs-devel >> This sounds very complicated. As I understand it, dired first gets >> the file name from ls (original representation), then converts that to >> whatever encoding it shall use to show it in the buffer (view >> representation). When dired operates on the file (opening for >> example), >> it converts back from the view representation, hoping to get the >> original representation. But this may fail, since conversion >> from view back to original is not one-to-one. > > It is sure that there's a possibility that encoding a > filename can't get the original filename. But, Emacs anyway > can't handle such a filename. Why not if it has the original filename? >> This work (original representation -> view representation -> original >> representation) should not be needed, IMHO. Why just not keep the >> original representation around (some kind of text property on the file >> name?) and always use that when operating on the file? That change >> would be transparent to users. > > A user may type C-x C-f FILENAME in the dired buffer. With > the above method, we don't know how to encode FILENAME. Why would this change? I am only talking about file names that dired reads from a directory. No need to change C-x C-f. > And, even if one types `f' to visit a file, in that file > buffer, we loose the information of the original > representation. Then Emacs as a whole should change. If I open a file from dired, modify it and save it, I expect it to save to the same file name. Are you saying there are situations where Emacs fails to do this? That sounds like a major bug to me. Maybe the buffer itself also needs to keep the original file name around. Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-02 9:59 ` Jan D. @ 2003-05-02 11:22 ` Kenichi Handa 2003-05-02 12:44 ` Jan D. 0 siblings, 1 reply; 28+ messages in thread From: Kenichi Handa @ 2003-05-02 11:22 UTC (permalink / raw) Cc: emacs-devel In article <C667D673-7C84-11D7-B30E-00039363E640@swipnet.se>, "Jan D." <jan.h.d@swipnet.se> writes: >> It is sure that there's a possibility that encoding a >> filename can't get the original filename. But, Emacs anyway >> can't handle such a filename. > Why not if it has the original filename? I'm talking about the general situation, not restricted to dired. I think this problem must be fixed in general cases, not only for dired. And, always carrying around the original filename with a filename is one means. But that requires huge change to Emacs. In addition, there are many cases that modify a filename as a string. >>> This work (original representation -> view representation -> original >>> representation) should not be needed, IMHO. Why just not keep the >>> original representation around (some kind of text property on the file >>> name?) and always use that when operating on the file? That change >>> would be transparent to users. >> >> A user may type C-x C-f FILENAME in the dired buffer. With >> the above method, we don't know how to encode FILENAME. > Why would this change? I am only talking about file names that dired > reads from a directory. No need to change C-x C-f. Typing `f' works fine but C-x C-f doesn't, which is not a good behaviour. >> And, even if one types `f' to visit a file, in that file >> buffer, we loose the information of the original >> representation. > Then Emacs as a whole should change. Yes, my proposal is to change Emacs' behavior as to filename handing as a whole in a fairly low cost. > If I open a file from dired, modify it and save it, I > expect it to save to the same file name. Are you saying > there are situations where Emacs fails to do this? No. As far as I know, there's no system that allows stateful encoding on filenames. And if Emacs decodes a filename by one of stateless coding systems (despite that it is the correct one or not), it can be encoded back correctly by the same coding system. For instance, I think you can open and save a file of utf-8 name in latin-1 lang. env. in dired correctly (although the filename is not shown correctly). By the way, I've just thought of this weird situation. One has a file of utf-8 name in a directly of latin-1 name. :-( I think we can say sorry in such a case. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-02 11:22 ` Kenichi Handa @ 2003-05-02 12:44 ` Jan D. 2003-05-03 15:03 ` Richard Stallman ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Jan D. @ 2003-05-02 12:44 UTC (permalink / raw) Cc: emacs-devel > >> A user may type C-x C-f FILENAME in the dired buffer. With > >> the above method, we don't know how to encode FILENAME. > > > Why would this change? I am only talking about file names that dired > > reads from a directory. No need to change C-x C-f. > > Typing `f' works fine but C-x C-f doesn't, which is not a > good behaviour. I think I understand now. You mean if dired uses UTF8, and file system coding is Latin-1, C-x C-f would then use Latin-1, and possibly fail? I agree that this is bad, but I am not sure anything can be done about it. Both KDE and GNOME file managers and file dialogs fail to open the right file in certain cases. I think it is worse if dired fails on 'f' since in that case the file name is supplied by dired, not the user. For C-x C-f there is always TAB to see what Emacs thinks the file is called. > > >> And, even if one types `f' to visit a file, in that file > >> buffer, we loose the information of the original > >> representation. > > > Then Emacs as a whole should change. > > Yes, my proposal is to change Emacs' behavior as to filename > handing as a whole in a fairly low cost. > I am not sure your case covers all cases. If a file name was latin-1 and then converted to UTF8 (outside Emacs), Emacs would think it is still latin-1, no? It involves a bit of user interaction, making it intrusive. > By the way, I've just thought of this weird situation. One > has a file of utf-8 name in a directly of latin-1 name. :-( > I think we can say sorry in such a case. But then you would be using non-printable latin-1 characters. I don't think this is something one has to handle. Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-02 12:44 ` Jan D. @ 2003-05-03 15:03 ` Richard Stallman 2003-05-03 18:04 ` Jan D. 2003-05-03 15:59 ` Stephen J. Turnbull 2003-05-05 9:20 ` Kenichi Handa 2 siblings, 1 reply; 28+ messages in thread From: Richard Stallman @ 2003-05-03 15:03 UTC (permalink / raw) Cc: handa I think I understand now. You mean if dired uses UTF8, and file system coding is Latin-1, Why would Dired use UTF8 if the file name encoding is Latin-1? Is this because the user set up perverse settings? Or is there some natural, normal set of options for which this would occur? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-03 15:03 ` Richard Stallman @ 2003-05-03 18:04 ` Jan D. 2003-05-05 14:32 ` Richard Stallman 0 siblings, 1 reply; 28+ messages in thread From: Jan D. @ 2003-05-03 18:04 UTC (permalink / raw) Cc: emacs-devel lördagen den 3 maj 2003 kl 17.03 skrev Richard Stallman: > I think I understand now. You mean if dired uses UTF8, and file > system > coding is Latin-1, > > Why would Dired use UTF8 if the file name encoding is Latin-1? > Is this because the user set up perverse settings? > Or is there some natural, normal set of options > for which this would occur? The situiation I have is that there are directories with file names in different encodings. Latin-1 is most frequent, which is why I say file name encoding is latin-1. But some directories contain other encodings, UTF-8 among them. Some of these are on network file systems, so I have no control over them. But I would like to be able to view them in Emacs. I guess UTF-8 will win out in the end, but there are a lot of old systems around. Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-03 18:04 ` Jan D. @ 2003-05-05 14:32 ` Richard Stallman 2003-05-07 15:51 ` Jan D. 0 siblings, 1 reply; 28+ messages in thread From: Richard Stallman @ 2003-05-05 14:32 UTC (permalink / raw) Cc: emacs-devel The situiation I have is that there are directories with file names in different encodings. Latin-1 is most frequent, which is why I say file name encoding is latin-1. But some directories contain other encodings, UTF-8 among them. Perhaps what we should do is record the proper coding system to use for a given buffer's file name string. That way, when you visit a buffer from a directory whose names are UTF-8 encoded, the buffer will say "use UTF-8 to encode my file name." We could also conceivably record this info in the file-name string itself; but I have a bad feeling that that will lead to some sort of incoherence that I cannot see at present. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-05 14:32 ` Richard Stallman @ 2003-05-07 15:51 ` Jan D. 2003-05-07 16:09 ` Stefan Monnier 0 siblings, 1 reply; 28+ messages in thread From: Jan D. @ 2003-05-07 15:51 UTC (permalink / raw) Cc: handa > The situiation I have is that there are directories with file > names in > different encodings. Latin-1 is most frequent, which is why I say > file name encoding is latin-1. But some directories contain other > encodings, UTF-8 among them. > > Perhaps what we should do is record the proper coding system to use > for a given buffer's file name string. That way, when you visit a > buffer from a directory whose names are UTF-8 encoded, the > buffer will say "use UTF-8 to encode my file name." This is basically Handa:s proposal. > We could also conceivably record this info in the file-name string > itself; but I have a bad feeling that that will lead to some sort > of incoherence that I cannot see at present. This is basically my proposal. I think Handa:s proposal is easier to implement. Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-07 15:51 ` Jan D. @ 2003-05-07 16:09 ` Stefan Monnier 2003-05-09 11:19 ` Richard Stallman 0 siblings, 1 reply; 28+ messages in thread From: Stefan Monnier @ 2003-05-07 16:09 UTC (permalink / raw) Cc: handa I don't exactly understand the Handa's proposal, so could someone explain to me how it handles a situation such as /<foo>/<bar> where <foo> is encoded in latin-1 and <bar> in utf-8 ? Stefan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-07 16:09 ` Stefan Monnier @ 2003-05-09 11:19 ` Richard Stallman 0 siblings, 0 replies; 28+ messages in thread From: Richard Stallman @ 2003-05-09 11:19 UTC (permalink / raw) Cc: emacs-devel I don't exactly understand the Handa's proposal, so could someone explain to me how it handles a situation such as /<foo>/<bar> where <foo> is encoded in latin-1 and <bar> in utf-8 ? If you literally mean that the absolute file name in the file system consists of a Latin-1 part and a UTF-8 part, my first reaction would have been "give up". But it occurs to me that if Emacs decodes the components one by one, it might be able to handle this case correctly without too much work. Re-encoding such names is more difficult. I think the only possible method is to record the proper coding system in text properties in the string. We would have to make expand-file-name preserve these properties when it makes sense; likewise other functions that operate on file names. It adds up to a fair amount of work--not impossible, but perhaps not worth the trouble. I mean, of course Emacs should do better than the rest of the crowd, but if most/all other applications fail miserably, then it's unlikely that people will use such setups and it would be wrong for Emacs to make it easier to create such a setup I agree with that point. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-02 12:44 ` Jan D. 2003-05-03 15:03 ` Richard Stallman @ 2003-05-03 15:59 ` Stephen J. Turnbull 2003-05-03 17:59 ` Jan D. 2003-05-05 9:20 ` Kenichi Handa 2 siblings, 1 reply; 28+ messages in thread From: Stephen J. Turnbull @ 2003-05-03 15:59 UTC (permalink / raw) Cc: Kenichi Handa >>>>> "Jan" == Jan D <jan.h.d@swipnet.se> writes: Jan> But then you would be using non-printable latin-1 characters. That's impossible. By definition, all Latin 1 is printable. Jan> I don't think this is something one has to handle. Maybe not you. Emacs has high standards, though. :-) -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-03 15:59 ` Stephen J. Turnbull @ 2003-05-03 17:59 ` Jan D. 0 siblings, 0 replies; 28+ messages in thread From: Jan D. @ 2003-05-03 17:59 UTC (permalink / raw) Cc: emacs-devel lördagen den 3 maj 2003 kl 17.59 skrev Stephen J. Turnbull: >>>>>> "Jan" == Jan D <jan.h.d@swipnet.se> writes: > > Jan> But then you would be using non-printable latin-1 characters. > > That's impossible. By definition, all Latin 1 is printable Then every implementation of isprint() is wrong :-). Character 128-159 is not printable, I think. Nor is the non-printable part that is in the ASCII subset. > Jan> I don't think this is something one has to handle. > > Maybe not you. Emacs has high standards, though. :-) Emacs would have to be able to read minds to do this correctly. It is possible to make a buffer contain characters that looks fine when viewed as UTF-8, but Emacs can not know if the user actually wanted this to be latin-1. It is just an interpretation of how octets shall be viewed. That is why I would like to say to dired "Show me these file names interpreted as UTF-8" and then later "show me these file names interpreted as latin-1", and also be able to operate on the files. Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-02 12:44 ` Jan D. 2003-05-03 15:03 ` Richard Stallman 2003-05-03 15:59 ` Stephen J. Turnbull @ 2003-05-05 9:20 ` Kenichi Handa 2003-05-06 18:05 ` Jan D. 2 siblings, 1 reply; 28+ messages in thread From: Kenichi Handa @ 2003-05-05 9:20 UTC (permalink / raw) Cc: emacs-devel In article <200305021336.h42DaHbN022640@stubby.bodenonline.com>, "Jan D." <jan.h.d@swipnet.se> writes: > I think I understand now. You mean if dired uses UTF8, and file system > coding is Latin-1, C-x C-f would then use Latin-1, and possibly fail? Yes. > I agree that this is bad, but I am not sure anything can be done > about it. How about my proposal? Doesn't it solve this problem? > Both KDE and GNOME file managers and file dialogs fail to open > the right file in certain cases. I think it is worse if dired fails on > 'f' since in that case the file name is supplied by dired, not the user. > For C-x C-f there is always TAB to see what Emacs thinks the file is called. But, *Completion* buffer doesn't show correct file names because there are names encoded by latin-1. How one can choose what he want? In addtion, TAB says "[no match]" if one has already typed some non-ASCII characters. > I am not sure your case covers all cases. If a file name was > latin-1 and then converted to UTF8 (outside Emacs), Emacs would think it is > still latin-1, no? > It involves a bit of user interaction, making it intrusive. Yes, but I think Emacs doesn't have to care about such a case. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-05 9:20 ` Kenichi Handa @ 2003-05-06 18:05 ` Jan D. 2003-05-07 1:08 ` Kenichi Handa 0 siblings, 1 reply; 28+ messages in thread From: Jan D. @ 2003-05-06 18:05 UTC (permalink / raw) Cc: emacs-devel >> I agree that this is bad, but I am not sure anything can be done >> about it. > > How about my proposal? Doesn't it solve this problem? It depends on what the file-name-coding-system-alist looks like. If it contains full file name path, it could. Maybe it is best to try it. I think it is bad to hawe multiple information sources that has to be consulted to figure out the original file name (the display file name, the buffer encoding, file system encoding, and the new alist). At some point Emacs must have had the original file name. It is a shame to throw away that knowledge and then try to reconstruct it. >> Both KDE and GNOME file managers and file dialogs fail to open >> the right file in certain cases. I think it is worse if dired fails >> on >> 'f' since in that case the file name is supplied by dired, not the >> user. >> For C-x C-f there is always TAB to see what Emacs thinks the file is >> called. > > But, *Completion* buffer doesn't show correct file names > because there are names encoded by latin-1. How one can > choose what he want? In addtion, TAB says "[no match]" if > one has already typed some non-ASCII characters. An other approach would be to always keep file names as is (i.e. the original file name) and put some sort of property on it that is the encoding. This would require that the display engine can display these with right encoding. That way the manipulations is always done on and with the original file name. This is of course some work. >> I am not sure your case covers all cases. If a file name was >> latin-1 and then converted to UTF8 (outside Emacs), Emacs would think >> it is >> still latin-1, no? >> It involves a bit of user interaction, making it intrusive. > > Yes, but I think Emacs doesn't have to care about such a > case. Why not? I think this is about as bad as the failure of the *Completion* buffer. Maybe worse, because you can not open the file at all. Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-06 18:05 ` Jan D. @ 2003-05-07 1:08 ` Kenichi Handa 2003-05-07 15:43 ` Jan D. 0 siblings, 1 reply; 28+ messages in thread From: Kenichi Handa @ 2003-05-07 1:08 UTC (permalink / raw) Cc: emacs-devel In article <6129D384-7FED-11D7-81D0-00039363E640@swipnet.se>, "Jan D." <jan.h.d@swipnet.se> writes: >>> I agree that this is bad, but I am not sure anything can be done >>> about it. >> >> How about my proposal? Doesn't it solve this problem? > It depends on what the file-name-coding-system-alist looks like. If it > contains full file name path, it could. Maybe it is best to try it. It should contain a regular expression matching a directory or a file name. > I think it is bad to hawe multiple information sources that has to > be consulted to figure out the original file name (the display file > name, the buffer encoding, file system encoding, and the new alist). > At some point Emacs must have had the original file name. It is a > shame to throw away that knowledge and then try to reconstruct it. Unless we have a mechanism to always keep that knowlege, it is not reliable. For instance, even if we keep the original filename as a text property of a filename string, a filename string may be modified in various ways and make the property value obsolete. And, I don't know if the names listed in *Completion* buffer can keep that property. So, I think keeping the information about the original filename in an alist is the most reliable way. In addition, we can use that information in the future emacs session, which is also an important point. > An other approach would be to always keep file names as is (i.e. > the original file name) and put some sort of property on it that is the > encoding. This would require that the display engine can display these > with right encoding. That way the manipulations is always done on and > with the original file name. I strongly oppose to that method. Emacs should not work on undecoded raw bytes. A filename is a kind of text, and thus a user should be able to handle it as a text (edit, copy&paste, etc). >>> I am not sure your case covers all cases. If a file name was >>> latin-1 and then converted to UTF8 (outside Emacs), Emacs would think >>> it is >>> still latin-1, no? >>> It involves a bit of user interaction, making it intrusive. >> >> Yes, but I think Emacs doesn't have to care about such a >> case. > Why not? I think this is about as bad as the failure of the > *Completion* buffer. Maybe worse, because you can not open the file > at all. If that filename is recoded as latin-1 in file-name-coding-system-alist, we can open that file by customizing file-name-coding-system-alist. If that filename is not recoded in the alist, we can open that file by switching to utf-8 lang. env., or by setting file-name-coding-system to utf-8, or by customizing file-name-coding-system-alist. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-07 1:08 ` Kenichi Handa @ 2003-05-07 15:43 ` Jan D. 0 siblings, 0 replies; 28+ messages in thread From: Jan D. @ 2003-05-07 15:43 UTC (permalink / raw) Cc: emacs-devel > In article <6129D384-7FED-11D7-81D0-00039363E640@swipnet.se>, "Jan D." > <jan.h.d@swipnet.se> writes: >>>> I agree that this is bad, but I am not sure anything can be done >>>> about it. >>> >>> How about my proposal? Doesn't it solve this problem? > >> It depends on what the file-name-coding-system-alist looks like. If >> it >> contains full file name path, it could. Maybe it is best to try it. > > It should contain a regular expression matching a directory > or a file name. Can you give an example? > So, I think keeping the information about the original > filename in an alist is the most reliable way. In addition, > we can use that information in the future emacs session, > which is also an important point. Here the danger of the two unrelated information sources to get out of sync is apparent. > I strongly oppose to that method. Emacs should not work on > undecoded raw bytes. A filename is a kind of text, and thus > a user should be able to handle it as a text (edit, > copy&paste, etc). It is more than that, it is an identifier to an entity that is external to Emacs. Normal text is not that. When using it as an identifier it should work on undecoded raw bytes (it tries to do that today, by converting back from the display representation to the original representation). There is nothing that prevents edit of the text. >>>> I am not sure your case covers all cases. If a file name was >>>> latin-1 and then converted to UTF8 (outside Emacs), Emacs would >>>> think >>>> it is >>>> still latin-1, no? >>>> It involves a bit of user interaction, making it intrusive. >>> >>> Yes, but I think Emacs doesn't have to care about such a >>> case. > >> Why not? I think this is about as bad as the failure of the >> *Completion* buffer. Maybe worse, because you can not open the file >> at all. > > If that filename is recoded as latin-1 in > file-name-coding-system-alist, we can open that file by > customizing file-name-coding-system-alist. If that filename > is not recoded in the alist, we can open that file by > switching to utf-8 lang. env., or by setting > file-name-coding-system to utf-8, or by customizing > file-name-coding-system-alist. Who is "we" that is doing all this? The user, Emacs, someone else? It seems as a lot of user interaction, but maybe you have another mechanism in mind? Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-02 8:56 ` Kenichi Handa 2003-05-02 9:59 ` Jan D. @ 2003-05-03 15:03 ` Richard Stallman 2003-05-03 18:11 ` Jan D. 2003-05-06 5:39 ` Kenichi Handa 1 sibling, 2 replies; 28+ messages in thread From: Richard Stallman @ 2003-05-03 15:03 UTC (permalink / raw) Cc: jan.h.d It would be fundamentally clean to make sure that decoding of file names is never many-one. Is that possible? Some of your messages suggest it is already the case. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-03 15:03 ` Richard Stallman @ 2003-05-03 18:11 ` Jan D. 2003-05-06 5:39 ` Kenichi Handa 1 sibling, 0 replies; 28+ messages in thread From: Jan D. @ 2003-05-03 18:11 UTC (permalink / raw) Cc: Kenichi Handa > It would be fundamentally clean to make sure that decoding of file > names is never many-one. Is that possible? Some of your messages > suggest it is already the case. I don't think it is possible as long as Emacs only has one file system encoding (file-name-coding-system). The original problem is this: file-name-coding-system is latin-1 Open dired on a directory with UTF-8 file names. Do C-x RET r utf-8. Try to operate on a file with non-ascii characters gives "File no longer exists; type `g' to update Dired buffer" This is because when decoding the file name Emacs uses latin-1 and thus doesn't get the original file name back. As long as there can be file names with different encodings this problem can occur. Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-03 15:03 ` Richard Stallman 2003-05-03 18:11 ` Jan D. @ 2003-05-06 5:39 ` Kenichi Handa 2003-05-06 14:41 ` Richard Stallman 2003-05-07 15:49 ` Jan D. 1 sibling, 2 replies; 28+ messages in thread From: Kenichi Handa @ 2003-05-06 5:39 UTC (permalink / raw) Cc: jan.h.d In article <E19ByYD-0000t0-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > It would be fundamentally clean to make sure that decoding of file > names is never many-one. Is that possible? For that, we must inhibit to set file-name-coding-system to such a coding system that will do many-to-one decoding (e.g. iso-2022-jp). But, we don't have a general mechanism to inhibit a symbol to be bound to a specific value. > Some of your messages suggest it is already the case. As far as I know, there's no system that allows a coding system that does many-to-one decoding for filenames. So, we don't have to care such a case. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-06 5:39 ` Kenichi Handa @ 2003-05-06 14:41 ` Richard Stallman 2003-05-07 15:49 ` Jan D. 1 sibling, 0 replies; 28+ messages in thread From: Richard Stallman @ 2003-05-06 14:41 UTC (permalink / raw) Cc: jan.h.d For that, we must inhibit to set file-name-coding-system to such a coding system that will do many-to-one decoding (e.g. iso-2022-jp). But, we don't have a general mechanism to inhibit a symbol to be bound to a specific value. That's one way to do it. Another would be to refuse to use such a value if the symbol does have it. Another way is to discourage users from using such coding systems. As far as I know, there's no system that allows a coding system that does many-to-one decoding for filenames. So, we don't have to care such a case. It sounds like the third method has already been implemented. That's good. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-06 5:39 ` Kenichi Handa 2003-05-06 14:41 ` Richard Stallman @ 2003-05-07 15:49 ` Jan D. 2003-05-07 16:31 ` Stefan Monnier 1 sibling, 1 reply; 28+ messages in thread From: Jan D. @ 2003-05-07 15:49 UTC (permalink / raw) Cc: emacs-devel > As far as I know, there's no system that allows a coding > system that does many-to-one decoding for filenames. So, we > don't have to care such a case. I don't understand what you mean here. We must be talking about different things. Say I have two files, one in UTF-8 and one in latin-1. Emacs has only one coding system for file names, say it is latin-1. Now, since Emacs only has one coding system, it assumes there is a one-to-one correspondence between file names and encodings. Clearly this is not the case. In this case there are two separate mappings to map from display string to original file name. That is what I mean with many-to-one. Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-07 15:49 ` Jan D. @ 2003-05-07 16:31 ` Stefan Monnier 2003-05-07 17:40 ` Jan D. 0 siblings, 1 reply; 28+ messages in thread From: Stefan Monnier @ 2003-05-07 16:31 UTC (permalink / raw) Cc: Kenichi Handa > Say I have two files, one in UTF-8 and one in latin-1. Emacs has only > one coding system for file names, say it is latin-1. Question: how do other applications deal with such situations ? I mean, of course Emacs should do better than the rest of the crowd, but if most/all other applications fail miserably, then it's unlikely that people will use such setups and it would be wrong for Emacs to make it easier to create such a setup (unless maybe only Emacs will ever care about those file names, of course). Stefan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Strange behaviour with dired and UTF8 2003-05-07 16:31 ` Stefan Monnier @ 2003-05-07 17:40 ` Jan D. 0 siblings, 0 replies; 28+ messages in thread From: Jan D. @ 2003-05-07 17:40 UTC (permalink / raw) Cc: Kenichi Handa onsdagen den 7 maj 2003 kl 18.31 skrev Stefan Monnier: >> Say I have two files, one in UTF-8 and one in latin-1. Emacs has only >> one coding system for file names, say it is latin-1. > > Question: how do other applications deal with such situations ? > > I mean, of course Emacs should do better than the rest of the crowd, > but if most/all other applications fail miserably, then it's unlikely > that people will use such setups and it would be wrong for Emacs to > make it easier to create such a setup (unless maybe only Emacs > will ever care about those file names, of course). I can only say that GNOME (Nautilus) deals with this fine, better than most. It can actually display two files, one in latin-1 and the other in UTF-8 that has the same display representation so it looks like the two files have the same name. When clicking on them (to open for example), it opens the correct file (I use the size of the files to tell them apart). When renaming a file, it uses UTF-8 always. I think this is as good as it gets. I don't know in detail, but given that UTF-8 is so fundamental to GNOME, I think Nautilus first tries UTF-8, and if the name isn't valid UTF-8, it tries the users locale. Actually Nautilus behaves better than most other GNOME applications. For example, gedit always tries UTF-8 for displaying the file name and says "invalid UTF-8" if that fails. KDE (Konquerer) seems to use the locale character set always. Other systems can change the view character set. Much like you can do in Netscape/Mozilla. Open up a directory and then you can toggle the coding system used to display file names (in Mozilla: View -> Character coding). This is what I thought Emacs could do, but it lost the original file name. Jan D. ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2003-05-09 11:19 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-04-24 11:43 Strange behaviour with dired and UTF8 Jan D. 2003-04-25 13:20 ` Kai Großjohann 2003-05-01 6:52 ` Kenichi Handa 2003-05-02 6:41 ` Kai Großjohann 2003-05-02 8:16 ` Jan D. 2003-05-02 8:56 ` Kenichi Handa 2003-05-02 9:59 ` Jan D. 2003-05-02 11:22 ` Kenichi Handa 2003-05-02 12:44 ` Jan D. 2003-05-03 15:03 ` Richard Stallman 2003-05-03 18:04 ` Jan D. 2003-05-05 14:32 ` Richard Stallman 2003-05-07 15:51 ` Jan D. 2003-05-07 16:09 ` Stefan Monnier 2003-05-09 11:19 ` Richard Stallman 2003-05-03 15:59 ` Stephen J. Turnbull 2003-05-03 17:59 ` Jan D. 2003-05-05 9:20 ` Kenichi Handa 2003-05-06 18:05 ` Jan D. 2003-05-07 1:08 ` Kenichi Handa 2003-05-07 15:43 ` Jan D. 2003-05-03 15:03 ` Richard Stallman 2003-05-03 18:11 ` Jan D. 2003-05-06 5:39 ` Kenichi Handa 2003-05-06 14:41 ` Richard Stallman 2003-05-07 15:49 ` Jan D. 2003-05-07 16:31 ` Stefan Monnier 2003-05-07 17:40 ` Jan D.
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).