* Encoding for a file containing filenames? @ 2007-11-08 15:05 Juanma Barranquero 2007-11-08 16:32 ` Stefan Monnier 0 siblings, 1 reply; 21+ messages in thread From: Juanma Barranquero @ 2007-11-08 15:05 UTC (permalink / raw) To: Emacs Devel A few days ago I installed in the trunk a patch for ido.el which forces the ido history file (an elisp file containing several lists of filenames and directories) to be saved in a given coding system and that coding system recorded as a local variable. At that moment I used (or file-name-coding-system default-file-name-coding-system) because it seems logical to save filenames in the current filename coding system. However, I'm having second thoughts. Wouldn't be safer to use an encoding that can save anything, like emacs-mule? Should it depend on the setting of default-enable-multibyte-characters? Juanma ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-08 15:05 Encoding for a file containing filenames? Juanma Barranquero @ 2007-11-08 16:32 ` Stefan Monnier 2007-11-08 16:49 ` Juanma Barranquero 0 siblings, 1 reply; 21+ messages in thread From: Stefan Monnier @ 2007-11-08 16:32 UTC (permalink / raw) To: Juanma Barranquero; +Cc: Emacs Devel > However, I'm having second thoughts. Wouldn't be safer to use an > encoding that can save anything, like emacs-mule? Should it depend on > the setting of default-enable-multibyte-characters? Yes. I'd recommend utf-8 (good for the future) or emacs-mule (good for the present). Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-08 16:32 ` Stefan Monnier @ 2007-11-08 16:49 ` Juanma Barranquero 2007-11-08 20:50 ` Eli Zaretskii 2007-11-09 0:41 ` Kenichi Handa 0 siblings, 2 replies; 21+ messages in thread From: Juanma Barranquero @ 2007-11-08 16:49 UTC (permalink / raw) To: Stefan Monnier; +Cc: Emacs Devel On 11/8/07, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > Yes. Thanks. > I'd recommend utf-8 (good for the future) or emacs-mule (good for the > present). The unicode branch can read emacs-mule files. And the ido history is not read by other programs, so external compatibility is not an issue. Juanma ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-08 16:49 ` Juanma Barranquero @ 2007-11-08 20:50 ` Eli Zaretskii 2007-11-08 22:38 ` Stefan Monnier 2007-11-09 0:41 ` Kenichi Handa 1 sibling, 1 reply; 21+ messages in thread From: Eli Zaretskii @ 2007-11-08 20:50 UTC (permalink / raw) To: Juanma Barranquero; +Cc: monnier, emacs-devel > Date: Thu, 8 Nov 2007 17:49:11 +0100 > From: "Juanma Barranquero" <lekktu@gmail.com> > Cc: Emacs Devel <emacs-devel@gnu.org> > > > I'd recommend utf-8 (good for the future) or emacs-mule (good for the > > present). > > The unicode branch can read emacs-mule files. And the ido history is > not read by other programs, so external compatibility is not an issue. It is still better to use UTF-8 because that makes the file readable by other programs. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-08 20:50 ` Eli Zaretskii @ 2007-11-08 22:38 ` Stefan Monnier 2007-11-08 23:42 ` Jason Rumney 0 siblings, 1 reply; 21+ messages in thread From: Stefan Monnier @ 2007-11-08 22:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Juanma Barranquero, emacs-devel > It is still better to use UTF-8 because that makes the file readable > by other programs. But it will fail in Emacs-22 if the file (which contains file names) contains chars that Emacs-22 doesn't know how to encode to (and decode from) utf-8. Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-08 22:38 ` Stefan Monnier @ 2007-11-08 23:42 ` Jason Rumney 2007-11-09 4:01 ` Stefan Monnier 0 siblings, 1 reply; 21+ messages in thread From: Jason Rumney @ 2007-11-08 23:42 UTC (permalink / raw) To: Stefan Monnier; +Cc: Juanma Barranquero, Eli Zaretskii, emacs-devel Stefan Monnier wrote: > But it will fail in Emacs-22 if the file (which contains file names) > contains chars that Emacs-22 doesn't know how to encode to (and decode > from) utf-8. Are there any such chars that are likely to be used in filenames? Or is it just the mule specific charsets that Emacs-22 cannot encode as utf-8. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-08 23:42 ` Jason Rumney @ 2007-11-09 4:01 ` Stefan Monnier 2007-11-09 10:03 ` Eli Zaretskii 2007-11-09 10:34 ` Kenichi Handa 0 siblings, 2 replies; 21+ messages in thread From: Stefan Monnier @ 2007-11-09 4:01 UTC (permalink / raw) To: Jason Rumney; +Cc: Juanma Barranquero, Eli Zaretskii, emacs-devel >> But it will fail in Emacs-22 if the file (which contains file names) >> contains chars that Emacs-22 doesn't know how to encode to (and decode >> from) utf-8. > Are there any such chars that are likely to be used in filenames? Or is it > just the mule specific charsets that Emacs-22 cannot encode as utf-8. It's actually a bit worse: it shouldn't just be encodable with utf-8, but it should also be the case that encoding to utf-8 and back should return the exact same string (since these are filenames and will be compared with simple byte-comparison in the kernel). Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 4:01 ` Stefan Monnier @ 2007-11-09 10:03 ` Eli Zaretskii 2007-11-09 11:05 ` Jan Djärv 2007-11-09 11:07 ` Andreas Schwab 2007-11-09 10:34 ` Kenichi Handa 1 sibling, 2 replies; 21+ messages in thread From: Eli Zaretskii @ 2007-11-09 10:03 UTC (permalink / raw) To: Stefan Monnier; +Cc: lekktu, jasonr, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Eli Zaretskii <eliz@gnu.org>, Juanma Barranquero <lekktu@gmail.com>, emacs-devel@gnu.org > Date: Thu, 08 Nov 2007 23:01:31 -0500 > > It's actually a bit worse: it shouldn't just be encodable with utf-8, > but it should also be the case that encoding to utf-8 and back should > return the exact same string (since these are filenames and will be > compared with simple byte-comparison in the kernel). What kernel are we talking about here? The Windows filesystem, for example, does not compare bytes, but rather 16-bit words (UTF-16). And Linux filesystems use UTF-8 for file names anyway, right? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 10:03 ` Eli Zaretskii @ 2007-11-09 11:05 ` Jan Djärv 2007-11-09 11:07 ` Andreas Schwab 1 sibling, 0 replies; 21+ messages in thread From: Jan Djärv @ 2007-11-09 11:05 UTC (permalink / raw) To: Eli Zaretskii; +Cc: lekktu, emacs-devel, Stefan Monnier, jasonr Eli Zaretskii skrev: >> From: Stefan Monnier <monnier@iro.umontreal.ca> >> Cc: Eli Zaretskii <eliz@gnu.org>, Juanma Barranquero <lekktu@gmail.com>, emacs-devel@gnu.org >> Date: Thu, 08 Nov 2007 23:01:31 -0500 >> >> It's actually a bit worse: it shouldn't just be encodable with utf-8, >> but it should also be the case that encoding to utf-8 and back should >> return the exact same string (since these are filenames and will be >> compared with simple byte-comparison in the kernel). > > What kernel are we talking about here? The Windows filesystem, for > example, does not compare bytes, but rather 16-bit words (UTF-16). > And Linux filesystems use UTF-8 for file names anyway, right? Linux filesystems (and others) don't interpret the file names. They are just sequences of bytes. It is the user space tools like ls, emacs and others that put meaning to these bytes. Jan D. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 10:03 ` Eli Zaretskii 2007-11-09 11:05 ` Jan Djärv @ 2007-11-09 11:07 ` Andreas Schwab 2007-11-09 11:53 ` Eli Zaretskii 1 sibling, 1 reply; 21+ messages in thread From: Andreas Schwab @ 2007-11-09 11:07 UTC (permalink / raw) To: Eli Zaretskii; +Cc: lekktu, emacs-devel, Stefan Monnier, jasonr Eli Zaretskii <eliz@gnu.org> writes: > And Linux filesystems use UTF-8 for file names anyway, right? Linux filesystems do not use any encoding, they use raw bytes. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 11:07 ` Andreas Schwab @ 2007-11-09 11:53 ` Eli Zaretskii 2007-11-09 12:15 ` Jan Djärv ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Eli Zaretskii @ 2007-11-09 11:53 UTC (permalink / raw) To: Andreas Schwab; +Cc: lekktu, emacs-devel, monnier, jasonr > From: Andreas Schwab <schwab@suse.de> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, lekktu@gmail.com, > jasonr@f2s.com, emacs-devel@gnu.org > Date: Fri, 09 Nov 2007 12:07:23 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > And Linux filesystems use UTF-8 for file names anyway, right? > > Linux filesystems do not use any encoding, they use raw bytes. Yes, but what kind of bytes would those be on a typical box that uses non-ASCII characters in file names? I know it can be _anything_, but I think typically these characters will be encoded in UTF-8. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 11:53 ` Eli Zaretskii @ 2007-11-09 12:15 ` Jan Djärv 2007-11-09 12:16 ` Kenichi Handa 2007-11-09 12:54 ` Andreas Schwab 2 siblings, 0 replies; 21+ messages in thread From: Jan Djärv @ 2007-11-09 12:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Andreas Schwab, emacs-devel, monnier, jasonr, lekktu Eli Zaretskii skrev: >> From: Andreas Schwab <schwab@suse.de> >> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, lekktu@gmail.com, >> jasonr@f2s.com, emacs-devel@gnu.org >> Date: Fri, 09 Nov 2007 12:07:23 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >>> And Linux filesystems use UTF-8 for file names anyway, right? >> Linux filesystems do not use any encoding, they use raw bytes. > > Yes, but what kind of bytes would those be on a typical box that uses > non-ASCII characters in file names? I know it can be _anything_, but > I think typically these characters will be encoded in UTF-8. > The trend for Gnome and KDE and other desktop environments is to use UTF-8 (just about everywhere, not just file names), so you can say UTF-8 is most likely. Jan D. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 11:53 ` Eli Zaretskii 2007-11-09 12:15 ` Jan Djärv @ 2007-11-09 12:16 ` Kenichi Handa 2007-11-09 12:54 ` Andreas Schwab 2 siblings, 0 replies; 21+ messages in thread From: Kenichi Handa @ 2007-11-09 12:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: schwab, emacs-devel, monnier, jasonr, lekktu In article <usl3fprcf.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > Yes, but what kind of bytes would those be on a typical box that uses > non-ASCII characters in file names? I know it can be _anything_, but > I think typically these characters will be encoded in UTF-8. I think many CJK users still uses one of EUC encoding. For Taiwanese users, Big5 is still popular. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 11:53 ` Eli Zaretskii 2007-11-09 12:15 ` Jan Djärv 2007-11-09 12:16 ` Kenichi Handa @ 2007-11-09 12:54 ` Andreas Schwab 2007-11-09 14:01 ` Eli Zaretskii 2 siblings, 1 reply; 21+ messages in thread From: Andreas Schwab @ 2007-11-09 12:54 UTC (permalink / raw) To: Eli Zaretskii; +Cc: lekktu, emacs-devel, monnier, jasonr Eli Zaretskii <eliz@gnu.org> writes: > Yes, but what kind of bytes would those be on a typical box that uses > non-ASCII characters in file names? I know it can be _anything_, but > I think typically these characters will be encoded in UTF-8. This is completely controlled by the locale. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 12:54 ` Andreas Schwab @ 2007-11-09 14:01 ` Eli Zaretskii 0 siblings, 0 replies; 21+ messages in thread From: Eli Zaretskii @ 2007-11-09 14:01 UTC (permalink / raw) To: Andreas Schwab; +Cc: lekktu, emacs-devel, monnier, jasonr > From: Andreas Schwab <schwab@suse.de> > Cc: monnier@iro.umontreal.ca, lekktu@gmail.com, jasonr@f2s.com, > emacs-devel@gnu.org > Date: Fri, 09 Nov 2007 13:54:18 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Yes, but what kind of bytes would those be on a typical box that uses > > non-ASCII characters in file names? I know it can be _anything_, but > > I think typically these characters will be encoded in UTF-8. > > This is completely controlled by the locale. Sure, I know that. That's not what I was asking. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 4:01 ` Stefan Monnier 2007-11-09 10:03 ` Eli Zaretskii @ 2007-11-09 10:34 ` Kenichi Handa 2007-11-09 16:25 ` Stefan Monnier 1 sibling, 1 reply; 21+ messages in thread From: Kenichi Handa @ 2007-11-09 10:34 UTC (permalink / raw) To: Stefan Monnier; +Cc: lekktu, eliz, jasonr, emacs-devel In article <jwvprykm5ky.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: >>> But it will fail in Emacs-22 if the file (which contains file names) >>> contains chars that Emacs-22 doesn't know how to encode to (and decode >>> from) utf-8. > > Are there any such chars that are likely to be used in filenames? Or is it > > just the mule specific charsets that Emacs-22 cannot encode as utf-8. > It's actually a bit worse: it shouldn't just be encodable with utf-8, > but it should also be the case that encoding to utf-8 and back should > return the exact same string (since these are filenames and will be > compared with simple byte-comparison in the kernel). I think the important thing is to assure the round-trip of decode&encode (not encode&decode). And utf-8 should preserve exact byte sequence on decode&encode. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 10:34 ` Kenichi Handa @ 2007-11-09 16:25 ` Stefan Monnier 2007-11-10 1:10 ` Kenichi Handa 0 siblings, 1 reply; 21+ messages in thread From: Stefan Monnier @ 2007-11-09 16:25 UTC (permalink / raw) To: Kenichi Handa; +Cc: lekktu, eliz, jasonr, emacs-devel >>>> But it will fail in Emacs-22 if the file (which contains file names) >>>> contains chars that Emacs-22 doesn't know how to encode to (and decode >>>> from) utf-8. >> > Are there any such chars that are likely to be used in filenames? Or is it >> > just the mule specific charsets that Emacs-22 cannot encode as utf-8. >> It's actually a bit worse: it shouldn't just be encodable with utf-8, >> but it should also be the case that encoding to utf-8 and back should >> return the exact same string (since these are filenames and will be >> compared with simple byte-comparison in the kernel). > I think the important thing is to assure the round-trip of > decode&encode (not encode&decode). Are you sure? The situation is that we have a file name as an Emacs string (i.e. decoded say from "locale" coding system) and we need to store it into a file to load it back in a later Emacs invocation (at which point we may use it to access the file, using hopefully the same "locale" coding system). So what needs to be byte-preserving is really: locale-decode -> utf8-encode -> utf8-decode -> locale-encode So as Eli points out, if locale is utf-8 there shouldn't be any problem. In any case, I'd go with utf-8. Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 16:25 ` Stefan Monnier @ 2007-11-10 1:10 ` Kenichi Handa 0 siblings, 0 replies; 21+ messages in thread From: Kenichi Handa @ 2007-11-10 1:10 UTC (permalink / raw) To: Stefan Monnier; +Cc: lekktu, eliz, jasonr, emacs-devel In article <jwvk5orgzk2.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > > I think the important thing is to assure the round-trip of > > decode&encode (not encode&decode). > Are you sure? The situation is that we have a file name as an Emacs > string (i.e. decoded say from "locale" coding system) and we need to > store it into a file to load it back in a later Emacs invocation (at > which point we may use it to access the file, using hopefully the same > "locale" coding system). > So what needs to be byte-preserving is really: > locale-decode -> utf8-encode -> utf8-decode -> locale-encode Ah, I misunderstood the situation, sorry. You are right. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-08 16:49 ` Juanma Barranquero 2007-11-08 20:50 ` Eli Zaretskii @ 2007-11-09 0:41 ` Kenichi Handa 2007-11-09 0:50 ` Juanma Barranquero 1 sibling, 1 reply; 21+ messages in thread From: Kenichi Handa @ 2007-11-09 0:41 UTC (permalink / raw) To: Juanma Barranquero; +Cc: monnier, emacs-devel In article <f7ccd24b0711080849g4355e7cet1c7c16116528e804@mail.gmail.com>, "Juanma Barranquero" <lekktu@gmail.com> writes: > > I'd recommend utf-8 (good for the future) or emacs-mule (good for the > > present). > The unicode branch can read emacs-mule files. And the ido history is > not read by other programs, so external compatibility is not an issue. But, unicode branch will be merged into the trunk before the release based on the trunk code. So, I think it is better to start using utf-8 now. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 0:41 ` Kenichi Handa @ 2007-11-09 0:50 ` Juanma Barranquero 2007-11-09 1:05 ` Kenichi Handa 0 siblings, 1 reply; 21+ messages in thread From: Juanma Barranquero @ 2007-11-09 0:50 UTC (permalink / raw) To: Kenichi Handa; +Cc: monnier, emacs-devel On 11/9/07, Kenichi Handa <handa@ni.aist.go.jp> wrote: > But, unicode branch will be merged into the trunk before the > release based on the trunk code. So, I think it is better > to start using utf-8 now. This is a bug fix; it should go into the EMACS_22_BASE branch too. So the question is, are the chars that Emacs 22.2 does not know how to encode in utf-8 relevant, or, as Jason said, are they unlikely to appear in filenames? Juanma ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Encoding for a file containing filenames? 2007-11-09 0:50 ` Juanma Barranquero @ 2007-11-09 1:05 ` Kenichi Handa 0 siblings, 0 replies; 21+ messages in thread From: Kenichi Handa @ 2007-11-09 1:05 UTC (permalink / raw) To: Juanma Barranquero; +Cc: monnier, emacs-devel In article <f7ccd24b0711081650o17bf1cf9o28b1913065e7708f@mail.gmail.com>, "Juanma Barranquero" <lekktu@gmail.com> writes: > On 11/9/07, Kenichi Handa <handa@ni.aist.go.jp> wrote: > > But, unicode branch will be merged into the trunk before the > > release based on the trunk code. So, I think it is better > > to start using utf-8 now. > This is a bug fix; it should go into the EMACS_22_BASE branch too. Ah. > So the question is, are the chars that Emacs 22.2 does not > know how to encode in utf-8 relevant, or, as Jason said, > are they unlikely to appear in filenames? I think it's very unlike that people use such a character that is unencodable by Emacs22's utf-8 in a filename. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2007-11-10 1:10 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-08 15:05 Encoding for a file containing filenames? Juanma Barranquero 2007-11-08 16:32 ` Stefan Monnier 2007-11-08 16:49 ` Juanma Barranquero 2007-11-08 20:50 ` Eli Zaretskii 2007-11-08 22:38 ` Stefan Monnier 2007-11-08 23:42 ` Jason Rumney 2007-11-09 4:01 ` Stefan Monnier 2007-11-09 10:03 ` Eli Zaretskii 2007-11-09 11:05 ` Jan Djärv 2007-11-09 11:07 ` Andreas Schwab 2007-11-09 11:53 ` Eli Zaretskii 2007-11-09 12:15 ` Jan Djärv 2007-11-09 12:16 ` Kenichi Handa 2007-11-09 12:54 ` Andreas Schwab 2007-11-09 14:01 ` Eli Zaretskii 2007-11-09 10:34 ` Kenichi Handa 2007-11-09 16:25 ` Stefan Monnier 2007-11-10 1:10 ` Kenichi Handa 2007-11-09 0:41 ` Kenichi Handa 2007-11-09 0:50 ` Juanma Barranquero 2007-11-09 1:05 ` Kenichi Handa
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.