* UTF-8 in path / filename @ 2006-08-24 13:59 Grégory SCHMITT 2006-08-24 14:42 ` Noah Slater ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Grégory SCHMITT @ 2006-08-24 13:59 UTC (permalink / raw) Hi everyone, I'm running emacs 21.4.1 using Linux (Fedora Core 5). When I try to open a file and the path name contains UTF-8 letters, emacs won't be able to find the file. I create a folder called "Grégory". I put any file in it (let's call it "test") and if I, from a simple xterm, try to do "emacs Grégory/test", emacs won't be able to open the file. However, it will be successful if I manually visit using C-x C-f. If I use any other editor (such as mcedit), it will open OK. Any explanation ? -- Grégory SCHMITT <mailto:gregory.schmitt@free.fr> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-24 13:59 UTF-8 in path / filename Grégory SCHMITT @ 2006-08-24 14:42 ` Noah Slater 2006-08-25 12:08 ` Peter Dyballa [not found] ` <mailman.5606.1156507702.9609.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 17+ messages in thread From: Noah Slater @ 2006-08-24 14:42 UTC (permalink / raw) Cc: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 373 bytes --] Grégory, What is the command you are using? Perhaps xterm is configured incorrectly and is mangling the file path before passing to Emacs. What happens if you tab complete the file name in the shell? Does the same happen with uxterm? Thanks, Noah -- "Creativity can be a social contribution, but only in so far as society is free to use the results." - R. Stallman [-- Attachment #2: Type: text/plain, Size: 152 bytes --] _______________________________________________ help-gnu-emacs mailing list help-gnu-emacs@gnu.org http://lists.gnu.org/mailman/listinfo/help-gnu-emacs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-24 13:59 UTF-8 in path / filename Grégory SCHMITT 2006-08-24 14:42 ` Noah Slater @ 2006-08-25 12:08 ` Peter Dyballa [not found] ` <mailman.5606.1156507702.9609.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 17+ messages in thread From: Peter Dyballa @ 2006-08-25 12:08 UTC (permalink / raw) Cc: help-gnu-emacs Am 24.08.2006 um 15:59 schrieb Grégory SCHMITT: > Hi everyone, > > I'm running emacs 21.4.1 using Linux (Fedora Core 5). When I try to > open a > file and the path name contains UTF-8 letters, emacs won't be able > to find > the file. > > I create a folder called "Grégory". I put any file in it (let's > call it > "test") and if I, from a simple xterm, try to do "emacs Grégory/test", > emacs won't be able to open the file. However, it will be > successful if I > manually visit using C-x C-f. > > If I use any other editor (such as mcedit), it will open OK. > > Any explanation ? > Yes: your terminal emulation/shell swallows/hides information. On Mac OS X in Apple's Terminal (TERM is xterm-color) I can see UTF-8 filenames, for example äöüßÜÖÄ€. File name expansion/completion does *not* work on them (although RGB äöüæÆÜÖÄ.txt gets expanded to RGB a?^?o?^?u?^?æ?^?U?^?O?^?A?^?.txt). And of course it does not work to invoke GNU Emacs with this file name as argument (or 'built-in' vi, nano. It *works* though when I do that from the *shell* buffer in Unicode Emacs 23.0.0 or GNU Emacs 22.0.50 ... (although no file name completion and the latter showing the ¨ as empty boxes in the file name) If I for example paste a name with UTF-8 contents from ls output to pass it to vi (it gives the best complaints) I can see that the de-composed UTF-8 characters are strangely interpreted. An ä seems to vanish and become kind of control character, the ¨ component of A¨, i.e. Ä, is passed as <cc> or such ... Since in your case mcedit accepts the file name, mcedit and your terminal seem to use the same character encoding, so for both é *is* an é. GNU Emacs lives in its own world of almost indefinite character encodings. One way to make Emacs work correctly is to set environment variables like LC_All, LANG, or LC_CTYPE which obviously just repeat what your shell and your OS' standard utilities know. Next is *not* to set current-language-environment! From LC_CTYPE etc. Emacs learns what encodings to set for buffer contents, file names, process data. If it makes mistakes in this you might consider to use (prefer-coding-system 'iso-latin-9-unix) ; the one with € or a few such statements with different codings each. GNU Emacs will then try to apply these encodings first. Since you're working with a non-Unicode Emacs you might need to set (unify-8859-on-decoding-mode t) (unify-8859-on-encoding-mode t) to make the 8 bit ISO Latin encodings be handled as quite the same, i.e. é would be in any of these encodings in which it exists the same, i.e. you could search for it in all buffers and you only once told isearch to look for é. One important thing is that *you* already messed up your .emacs file. Try to launch it also with --no-init-file and/or --no-site-file and also with -nw, i.e. running inside the terminal without X windows. -- Greetings Pete The most exciting phrase to hear in science, the one that heralds new discoveries, is not "Eureka!" (I found it!) but "That's funny..." Isaac Asimov ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <mailman.5606.1156507702.9609.help-gnu-emacs@gnu.org>]
* Re: UTF-8 in path / filename [not found] ` <mailman.5606.1156507702.9609.help-gnu-emacs@gnu.org> @ 2006-08-25 13:42 ` Grégory SCHMITT 2006-08-25 18:35 ` Peter Dyballa 0 siblings, 1 reply; 17+ messages in thread From: Grégory SCHMITT @ 2006-08-25 13:42 UTC (permalink / raw) Le Fri, 25 Aug 2006 14:08:11 +0200, Peter Dyballa a écrit : > One important thing is that *you* already messed up your .emacs file. Try > to launch it also with --no-init-file and/or --no-site-file and also with > -nw, i.e. running inside the terminal without X windows. OK. I did it. I move my .emacs to another place, even though I never really modified it. Still no success. For info, my locale is set as LANG="fr_FR.UTF-8" (and that's all: no LC_TYPE... or other). My terminal is a xterm (such as yours); I tried from the console, with bash only, and that was still the same result. If I set emacs to run in unibyte mode (with --unibyte on the command line), it does work, but the file content (which is UTF-8) is parsed as 8859-15. -- Grégory SCHMITT <mailto:gregory.schmitt@free.fr> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-25 13:42 ` Grégory SCHMITT @ 2006-08-25 18:35 ` Peter Dyballa 2006-08-25 22:06 ` Grégory SCHMITT 0 siblings, 1 reply; 17+ messages in thread From: Peter Dyballa @ 2006-08-25 18:35 UTC (permalink / raw) Cc: help-gnu-emacs Am 25.08.2006 um 15:42 schrieb Grégory SCHMITT: > If I set emacs to run in unibyte mode (with --unibyte on the command > line), it does work, but the file content (which is UTF-8) is > parsed as > 8859-15. > This looks as if your system does not use UTF-8 ... Can you create a file with accented characters? If not, can you put a copy of the file in the Grégory directory into your home or some other directory and invoke emacs, with or with no unibytes, with both files? In the first case the accented name would appear in the mode- line of the buffer (and would see what was passed or received as argument), in the latter case GNU Emacs would put the directory's name in the mode-line, I hope, to distinguish the two files with the same name. Again, you would see what was passed or received as "Grégory" ... If the file names are or are not UTF-8, you can declare this in .emacs with: (setq default-file-name-coding-system 'utf-8) (setq default-file-name-coding-system 'iso-8859-15) There are a lot more *coding-systems you can set ... -- Greetings Pete <\ \__ O __O | O\ _\\/\-% _`\<, '()-'-(_)--(_) (_)/(_) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-25 18:35 ` Peter Dyballa @ 2006-08-25 22:06 ` Grégory SCHMITT 2006-08-25 22:55 ` Peter Dyballa [not found] ` <mailman.5656.1156546542.9609.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 17+ messages in thread From: Grégory SCHMITT @ 2006-08-25 22:06 UTC (permalink / raw) Cc: help-gnu-emacs > ----- Original Message ----- > Date: Fri, 25 Aug 2006 20:35:08 +0200 > From: Peter Dyballa <Peter_Dyballa@Web.DE> > To: Grégory SCHMITT <gregory.schmitt@free.fr> > Cc: help-gnu-emacs@gnu.org > Subject: Re: UTF-8 in path / filename > > Am 25.08.2006 um 15:42 schrieb Grégory SCHMITT: > > > If I set emacs to run in unibyte mode (with --unibyte on the command > > line), it does work, but the file content (which is UTF-8) is > > parsed as > > 8859-15. > > > > This looks as if your system does not use UTF-8 ... I thought Fedora uses UTF-8 by default. > Can you create a file with accented characters? If not, can you put a > copy of the file in the Grégory directory into your home or some > other directory and invoke emacs, with or with no unibytes, with both > files? In the first case the accented name would appear in the mode- > line of the buffer (and would see what was passed or received as > argument), in the latter case GNU Emacs would put the directory's > name in the mode-line, I hope, to distinguish the two files with the > same name. Again, you would see what was passed or received as > "Grégory" ... > > If the file names are or are not UTF-8, you can declare this > in .emacs with: > > (setq default-file-name-coding-system 'utf-8) > (setq default-file-name-coding-system 'iso-8859-15) OK. So I have tow folders, "Greg" and "Grégory" in my home (ext3 filesystem, default options). I now have two file, "test" and "testé" in each of them, plus in the current directory. Those files have the same Utf-8 content, so I'm able to tell if they're parsed correctly or not. First case, with multibyte: - both files in the "Greg" folder are visited correctly: file is opened, content looks ok. However, the buffer name for "testé" appears as "testÀ" (or sth like that), which in my mind is proof that the file name is actually UTF-8 and displayed like ISO. - both files in the "Grégory" folder are not visited. Manually visiting the files works fine however, and the buffer name is correct ("testé" is spelled correctly).File content is ok as well. - both files in the current directory are visited ok, content is ok, buffer name NOT ok. Second, with unibyte: - "Greg" folder: files visited ok, content NOT ok, buffer name NOT ok. - "Grégory" folder: files visited ok, content NOT ok, buffer name NOT ok. - both files in the current directory are visited ok, content NOT ok, buffer name NOT ok. Hope that helps. As for me, I'm stuck... -- Grégory SCHMITT <mailto:gregory.schmitt@free.fr> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-25 22:06 ` Grégory SCHMITT @ 2006-08-25 22:55 ` Peter Dyballa [not found] ` <mailman.5656.1156546542.9609.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 17+ messages in thread From: Peter Dyballa @ 2006-08-25 22:55 UTC (permalink / raw) Cc: help-gnu-emacs Am 26.08.2006 um 00:06 schrieb Grégory SCHMITT: > Hope that helps. As for me, I'm stuck... I feel the same! All GNU Emacsen are not meant to handle UTF-8 as other applications can do. Unicode Emacs 23.0.0 behaves a bit better. What you could try is to set default-buffer-file-coding-system to utf-8. It could also be that some preparation in file-coding-system- alist does not let you see UTF-8 contents, so check its value. I have in my customisation section '(unibyte-display-via-language-environment t) and avoid set-language-environment. There won't be a perfect solution with GNU Emacs in the near future ... Is your Emacs copy installed from an RPM package or did you configure and compile yourself? For me UTF-8 and Emacs are too important to get it from somewhere, so I compile myself. -- Greetings Pete »¿ʇı̣ əsnqɐ ʇ,uɐɔ noʎ ɟı̣ ɓuı̣ɥʇʎuɐ sı̣ pooɓ ʇɐɥʍ« ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <mailman.5656.1156546542.9609.help-gnu-emacs@gnu.org>]
* Re: UTF-8 in path / filename [not found] ` <mailman.5656.1156546542.9609.help-gnu-emacs@gnu.org> @ 2006-08-25 23:06 ` Grégory SCHMITT 2006-08-25 23:09 ` Miles Bader [not found] ` <mailman.5657.1156547377.9609.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 17+ messages in thread From: Grégory SCHMITT @ 2006-08-25 23:06 UTC (permalink / raw) Le Sat, 26 Aug 2006 00:55:31 +0200, Peter Dyballa a écrit : > > Am 26.08.2006 um 00:06 schrieb Grégory SCHMITT: > > There won't be a perfect solution with GNU Emacs in the near future ... > > > Is your Emacs copy installed from an RPM package or did you configure and > compile yourself? For me UTF-8 and Emacs are too important to get it from > somewhere, so I compile myself. > It's the standard RPM from Fedora. I will give a look at other versions of emacs. -- Grégory SCHMITT <mailto:gregory.schmitt@free.fr> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename [not found] ` <mailman.5656.1156546542.9609.help-gnu-emacs@gnu.org> 2006-08-25 23:06 ` Grégory SCHMITT @ 2006-08-25 23:09 ` Miles Bader 2006-08-26 9:36 ` Peter Dyballa [not found] ` <mailman.5657.1156547377.9609.help-gnu-emacs@gnu.org> 2 siblings, 1 reply; 17+ messages in thread From: Miles Bader @ 2006-08-25 23:09 UTC (permalink / raw) Peter Dyballa <Peter_Dyballa@Web.DE> writes: > There won't be a perfect solution with GNU Emacs in the near future ... You constantly seem to be having problems with UTF-8, but it works absolutely perfectly for me, filenames, dired, everything (using emacs 22). [It works perfectly even if I do `emacs -Q' to avoid loading my init file, though I normally use (set-language-environment 'japanese).] AFAIK the main thing is that your LANG environment variable be set to something mentioning utf-8 -- I use "ja_JP.UTF-8". If that doesn't work, I dunno, maybe it's something screwy about the mac. -Miles -- .Numeric stability is probably not all that important when you're guessing. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-25 23:09 ` Miles Bader @ 2006-08-26 9:36 ` Peter Dyballa 2006-08-26 22:13 ` James Cloos [not found] ` <mailman.5694.1156630455.9609.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 17+ messages in thread From: Peter Dyballa @ 2006-08-26 9:36 UTC (permalink / raw) Cc: help-gnu-emacs Am 26.08.2006 um 01:09 schrieb Miles Bader: > Peter Dyballa <Peter_Dyballa@Web.DE> writes: >> There won't be a perfect solution with GNU Emacs in the near >> future ... > > You constantly seem to be having problems with UTF-8, but it works > absolutely perfectly for me, filenames, dired, everything (using > emacs 22). > > [It works perfectly even if I do `emacs -Q' to avoid loading my init > file, though I normally use (set-language-environment 'japanese).] > > AFAIK the main thing is that your LANG environment variable be set to > something mentioning utf-8 -- I use "ja_JP.UTF-8". > pete 39 /\ . /Users/pete pete 40 /\ env | egrep -i 'LC|LANG' LANG=de_DE.UTF-8 LC_CTYPE=de_DE.UTF-8 pete 41 /\ /usr/local/bin/emacs-22.0.50 -Q & Files with UTF-8 characters in them are shown in dired (has -u: in mode-line, i.e. uses UTF-8) à la <vowel><empty box>. Some UTF-8 characters like ß or Û show up as themselves. In the same manner they appear in the buffer's mode-line, once visited, and also in the list of buffers buffer (C-x b), completely unreadable in the Buffers menu from menu bar and in another completely unreadable fashion in the "Buffer Menu" pop-up. The font used for the vowels, the empty boxes, or the other characters is taken from the Java SDK and quite rich (1425 mapped characters for mostly European and some near eastern scripts): -B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60- ISO10646-1 (#x61) -B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60- ISO10646-1 (#x308) -B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60- ISO10646-1 (#xDF) -B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60- ISO10646-1 (#x20AC) Somehow this looks like a mixture of ISO 8859 characters (#x61, #xDF) and Unicode (#x20AC) and something else (#x308) or are some representations just abbreviations that leave away the 'leading zeros?' The other information from C-u C-x = on the examples is: character: a (97, #o141, #x61, U+0061) charset: ascii (ASCII (ISO646 IRV)) code point: #x61 syntax: w which means: word category: a:ASCII l:Latin buffer code: #x61 file code: #x61 (encoded by coding system mule-utf-8) character: (332488, #o1211310, #x512c8, U+0308) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point: #x25 #x48 syntax: w which means: word category: ^:Combining diacritic or mark buffer code: #x9C #xF4 #xA5 #xC8 file code: #xCC #x88 (encoded by coding system mule-utf-8) character: ß (2271, #o4337, #x8df, U+00DF) charset: latin-iso8859-1 (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.) code point: #x5F syntax: w which means: word category: l:Latin buffer code: #x81 #xDF file code: #xC3 #x9F (encoded by coding system mule-utf-8) character: Û (342604, #o1235114, #x53a4c, U+20AC) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point: #x74 #x4C syntax: w which means: word buffer code: #x9C #xF4 #xF4 #xCC file code: #xE2 #x82 #xAC (encoded by coding system mule-utf-8) An excerpt from the fontset's description (I am missing ISO 8859-16!): Fontset: -*-*-medium-r-*-*-10-*-*-*-m-*-fontset-startup CHARSET or CHAR RANGE FONT NAME --------------------- --------- ascii -b&h-lucidatypewriter-medium-r-normal-sans-10-100-75-75-m-60- iso10646-1 [-Adobe-Courier-Medium-R-Normal--10-100-75-75-M-60-ISO10646-1] [-B&H-LucidaTypewriter-Bold-R-Normal-Sans-10-100-75-75-M-60- ISO10646-1] [-B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60- ISO10646-1] latin-iso8859-1 -b&h-lucidatypewriter-*-iso10646-1 [-B&H-LucidaTypewriter-Bold-R-Normal-Sans-10-100-75-75-M-60- ISO10646-1] [-B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60- ISO10646-1] latin-iso8859-2 -*-iso8859-2 latin-iso8859-3 -*-iso8859-3 latin-iso8859-4 -*-iso8859-4 thai-tis620 -*-*-*-tis620-* greek-iso8859-7 -*-iso8859-7 arabic-iso8859-6 -*-iso8859-6 hebrew-iso8859-8 -*-iso8859-8 katakana-jisx0201 -*-jisx0201-* latin-jisx0201 -*-jisx0201-* cyrillic-iso8859-5 -*-iso8859-5 latin-iso8859-9 -*-iso8859-9 latin-iso8859-15 -*-iso8859-15 latin-iso8859-14 -*-iso8859-14 ... mule-unicode-2500-33ff -b&h-lucidatypewriter-*-iso10646-1 mule-unicode-e000-ffff -b&h-lucidatypewriter-*-iso10646-1 mule-unicode-0100-24ff -b&h-lucidatypewriter-*-iso10646-1 [-B&H-LucidaTypewriter-Bold-R-Normal-Sans-10-100-75-75-M-60- ISO10646-1] [-B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60- ISO10646-1] ... IMO the display of UTF-8 characters is not sufficient. > If that doesn't work, I dunno, maybe it's something screwy about > the mac. > There is something special, possibly screwy, in Mac OS X's (or better: HFS+', the file system's) way to store UTF-8 characters in file names: they get de-composed, i.e. an ä becomes a¨, an à becomes a`, etc. (and only these, a file's contents does not get de-composed how would such a JPEG picture look like?). So two or three octets in the string on disk are expanded to a pair of one octet and (mostly ?) two octets. GNU Emacs should be able to detect that: if a character is from the category (see above) "Combining diacritic or mark" it can't stand alone by nature, but must be combined with the character on the left in a left to right writing system or with the character on the right in a right to left writing system (I have no idea of the rules in a top to bottom writing system like Mongolian and whether these have combining characters). And it should be able to handle the character categories correctly. -- Greetings Pete What¹s the difference between OS X and Vista? Microsoft employees are excited about OS X ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-26 9:36 ` Peter Dyballa @ 2006-08-26 22:13 ` James Cloos 2006-08-27 13:12 ` Peter Dyballa [not found] ` <mailman.5694.1156630455.9609.help-gnu-emacs@gnu.org> 1 sibling, 1 reply; 17+ messages in thread From: James Cloos @ 2006-08-26 22:13 UTC (permalink / raw) Cc: help-gnu-emacs, Miles Bader >>>>> "Peter" == Peter Dyballa <Peter_Dyballa@Web.DE> writes: Peter> Files with UTF-8 characters in them are shown in dired (has -u: in Peter> mode-line, i.e. uses UTF-8) à la <vowel><empty box>. Some UTF-8 Peter> characters like ß or Û show up as themselves. Doesn't apple by default use NFD (Normalizaion Form Decomposed) for filenames? That would explain the <vowel><box> sequences. I suspect most others end up with NFC filenames. And composition seems much better in the emacs-unicode-2 branch than in HEAD. (But still not perfect. I sometimes get bad metrics on composed glyphs; and sometimes they display as intended....) Can you get at the actual octet-sequence of the filenames? -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 0xED7DAEA6 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-26 22:13 ` James Cloos @ 2006-08-27 13:12 ` Peter Dyballa 2006-08-28 15:11 ` James Cloos 0 siblings, 1 reply; 17+ messages in thread From: Peter Dyballa @ 2006-08-27 13:12 UTC (permalink / raw) Cc: help-gnu-emacs, Miles Bader Am 27.08.2006 um 00:13 schrieb James Cloos: > Peter> Files with UTF-8 characters in them are shown in dired (has - > u: in > Peter> mode-line, i.e. uses UTF-8) à la <vowel><empty box>. Some > UTF-8 > Peter> characters like ß or Û show up as themselves. > > Doesn't apple by default use NFD (Normalizaion Form Decomposed) for > filenames? That would explain the <vowel><box> sequences. Yes, that's the correct term for the way file names are recorded in HFS+. The font file, LucidaTypewriterRegular.ttf, has no combining diacritical marks defined (only some modifiers), so these empty boxes are displayed instead. > > Can you get at the actual octet-sequence of the filenames? Do you know a tool that can do that? I can only think of a C programme that reads the inode and than outputs the octets. Doing the same as Harald did I get in Terminal different output (because UTF-8 characters are substituted with question marks, for example: pete 140 /\ l -1 | grep .txt | grep ' ' | grep -v Mac RGB äöüæÆÜÖÄ.txt pete 141 /\ l -1 | grep .txt | grep ' ' | grep -v Mac | od -t a R G B sp a ? 88 o ? 88 u ? 88 ? ? ? 86 U ? 88 O ? 88 A ? 88 . t x t nl In Emacsen' shells I get: R G B sp a \314 88 o \314 88 u \314 88 \303 \246 \303 86 U \314 88 O \314 88 A \314 88 . t x t nl The file name áÛïǓà.txt is interpreted as: a \314 81 U \314 82 i \314 88 U \314 8c a \314 80 . t x t nl -- Greetings Pete "Isn't vi that text editor with two modes... one that beeps and one that corrupts your file?" -- Dan Jacobson, on comp.os.linux.advocacy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-27 13:12 ` Peter Dyballa @ 2006-08-28 15:11 ` James Cloos 2006-08-28 15:55 ` Peter Dyballa 0 siblings, 1 reply; 17+ messages in thread From: James Cloos @ 2006-08-28 15:11 UTC (permalink / raw) Cc: help-gnu-emacs, Miles Bader JimC> Doesn't apple by default use NFD (Normalizaion Form Decomposed) JimC> for filenames? That would explain the <vowel><box> sequences. Peter> Yes, that's the correct term for the way file names are Peter> recorded in HFS+. So then the problem is narrowed to support for composition. I just gave it a test, running the unicode-2 branch on a linux box, using the en_US-UTF8 locale. I copied the filename you quoted (äöüæÆÜÖÄ.txt), gave it a prefix to ease globbing (resulting in /tmp/xxx-äöüæÆÜÖÄ.txt), and ran find-file on /tmp. It worked correctly. (Well, almost; the glyphs composed by emacs have twice the height of pre-composed glyphs. There was a time when emacs didn't do that, but it is doing it again. Including in this buffer. But that looks to be specific to --enable-font-backend and DejaVu Sans Mono. With other fonts I do not get visible accents, even though C-u C-x = claims it is composing. And without --e-f-b I get composed glyphs which have correct vertical metrics.) I also tested this: :; echo /tmp/xxx-a* and got the filename, showing that bash treats the code points as separate characters when globbing. (Which also means I didn't actually need the xxx- prefix, since a* will therefore match the original filename....) So. Does C-u C-x = claim to be composing for you? -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 0xED7DAEA6 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-28 15:11 ` James Cloos @ 2006-08-28 15:55 ` Peter Dyballa 0 siblings, 0 replies; 17+ messages in thread From: Peter Dyballa @ 2006-08-28 15:55 UTC (permalink / raw) Cc: help-gnu-emacs, Miles Bader Am 28.08.2006 um 17:11 schrieb James Cloos: > So. Does C-u C-x = claim to be composing for you? Yes, in GNU Emacs 23: character: U (85, #o125, #x55) preferred charset: ascii (ASCII (ISO646 IRV)) code point: 0x55 syntax: w which means: word category: a:ASCII l:Latin r:Japanese roman buffer code: #x55 file code: not encodable by coding system utf-8-unix display: composed to form "Ü" (see below) Unicode data: Name: LATIN CAPITAL LETTER U Category: Letter, Uppercase Combining class: Lu Bidi category: Lu Lowercase: u Composed with the following character(s) "¨" by the rule: (?U (tc . bc) ?¨) The component character(s) are displayed by these fonts (glyph codes): U: -B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60- ISO8859-1 (#x55) ¨: -MUTT-ClearlyU-Medium-R-Normal--17-120-100-100-P-123-ISO10646-1 (#x308) (Here you can see the reason for the large vertical composed characters: a much too big font.) In GNU Emacs 22.0.50 they are not composed, they are <vowel><accent>. Instead of composing a character I would first try to find the pre- composed form in the font(set) used. It surely would look much better. -- Greetings Pete "We have to expect it, otherwise we would be surprised." ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <mailman.5694.1156630455.9609.help-gnu-emacs@gnu.org>]
* Re: UTF-8 in path / filename [not found] ` <mailman.5694.1156630455.9609.help-gnu-emacs@gnu.org> @ 2006-08-27 8:46 ` Harald Hanche-Olsen 0 siblings, 0 replies; 17+ messages in thread From: Harald Hanche-Olsen @ 2006-08-27 8:46 UTC (permalink / raw) + James Cloos <cloos@jhcloos.com>: | Doesn't apple by default use NFD (Normalizaion Form Decomposed) for | filenames? Seems you're right. See below. | Can you get at the actual octet-sequence of the filenames? I just now used TextEdit to creat a text with the filename xxx-é-ï-ē-ĭ-ǫḥ.txt (the xxx- prefix only so I could access it using wildcards in my shell) ; echo xxx-*.txt | od -t a 0000000 x x x - e cc 81 - i cc 88 - e cc 84 - 0000020 i cc 86 - o cc a8 h cc a3 . t x t nl 0000037 -- * Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/> - It is undesirable to believe a proposition when there is no ground whatsoever for supposing it is true. -- Bertrand Russell ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <mailman.5657.1156547377.9609.help-gnu-emacs@gnu.org>]
* Re: UTF-8 in path / filename [not found] ` <mailman.5657.1156547377.9609.help-gnu-emacs@gnu.org> @ 2006-08-25 23:22 ` Grégory SCHMITT 2006-08-25 23:25 ` Miles Bader 0 siblings, 1 reply; 17+ messages in thread From: Grégory SCHMITT @ 2006-08-25 23:22 UTC (permalink / raw) Le Sat, 26 Aug 2006 08:09:25 +0900, Miles Bader a écrit : > Peter Dyballa <Peter_Dyballa@Web.DE> writes: >> There won't be a perfect solution with GNU Emacs in the near future ... > > You constantly seem to be having problems with UTF-8, but it works > absolutely perfectly for me, filenames, dired, everything (using emacs > 22). Emacs 22 is said to be a MAJOR improvement for Utf. Wish I could get my hands on a package release soon... -- Grégory SCHMITT <mailto:gregory.schmitt@free.fr> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: UTF-8 in path / filename 2006-08-25 23:22 ` Grégory SCHMITT @ 2006-08-25 23:25 ` Miles Bader 0 siblings, 0 replies; 17+ messages in thread From: Miles Bader @ 2006-08-25 23:25 UTC (permalink / raw) Grégory SCHMITT <gregory.schmitt@free.fr> writes: > Emacs 22 is said to be a MAJOR improvement for Utf. Wish I could get my > hands on a package release soon... I'm sure there must be somebody out there maintaining RPMs for the development version (in debian you can use the "emacs-snapshot" package; there are also nicely packaged windows binaries out there). -Miles -- "Most attacks seem to take place at night, during a rainstorm, uphill, where four map sheets join." -- Anon. British Officer in WW I ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2006-08-28 15:55 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-24 13:59 UTF-8 in path / filename Grégory SCHMITT 2006-08-24 14:42 ` Noah Slater 2006-08-25 12:08 ` Peter Dyballa [not found] ` <mailman.5606.1156507702.9609.help-gnu-emacs@gnu.org> 2006-08-25 13:42 ` Grégory SCHMITT 2006-08-25 18:35 ` Peter Dyballa 2006-08-25 22:06 ` Grégory SCHMITT 2006-08-25 22:55 ` Peter Dyballa [not found] ` <mailman.5656.1156546542.9609.help-gnu-emacs@gnu.org> 2006-08-25 23:06 ` Grégory SCHMITT 2006-08-25 23:09 ` Miles Bader 2006-08-26 9:36 ` Peter Dyballa 2006-08-26 22:13 ` James Cloos 2006-08-27 13:12 ` Peter Dyballa 2006-08-28 15:11 ` James Cloos 2006-08-28 15:55 ` Peter Dyballa [not found] ` <mailman.5694.1156630455.9609.help-gnu-emacs@gnu.org> 2006-08-27 8:46 ` Harald Hanche-Olsen [not found] ` <mailman.5657.1156547377.9609.help-gnu-emacs@gnu.org> 2006-08-25 23:22 ` Grégory SCHMITT 2006-08-25 23:25 ` Miles Bader
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.