* opening files with unicode characters in the file name on windows
@ 2004-08-02 12:53 Mathias Dahl
2004-08-02 16:14 ` Kevin Rodgers
0 siblings, 1 reply; 13+ messages in thread
From: Mathias Dahl @ 2004-08-02 12:53 UTC (permalink / raw)
I'm cannot get emacs to open up a file that has a file name
with unicode characters in it. I have created these file
names by copy-paste from the Character Map tool in
Windows. As Emacs has good suupport for reading "unicode
formats" like UTF-8, UTF-16 etc it is a pity that it cannot
open these files.
My emacs version:
GNU Emacs 21.3.50.1 (i386-mingw-nt5.1.2600) of 2004-07-09 on FARIBA
OS:
Windows XP
Any suggestions to how I could open these files (other than
renaming them of course) are appreciated.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: opening files with unicode characters in the file name on windows
2004-08-02 12:53 opening files with unicode characters in the file name on windows Mathias Dahl
@ 2004-08-02 16:14 ` Kevin Rodgers
2004-08-03 6:32 ` Mathias Dahl
0 siblings, 1 reply; 13+ messages in thread
From: Kevin Rodgers @ 2004-08-02 16:14 UTC (permalink / raw)
Mathias Dahl wrote:
> I'm cannot get emacs to open up a file that has a file name
> with unicode characters in it. I have created these file
> names by copy-paste from the Character Map tool in
> Windows. As Emacs has good suupport for reading "unicode
> formats" like UTF-8, UTF-16 etc it is a pity that it cannot
> open these files.
>
> My emacs version:
>
> GNU Emacs 21.3.50.1 (i386-mingw-nt5.1.2600) of 2004-07-09 on FARIBA
>
> OS:
>
> Windows XP
>
> Any suggestions to how I could open these files (other than
> renaming them of course) are appreciated.
Does (setq file-name-coding-system 'utf-8) help?
--
Kevin Rodgers
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: opening files with unicode characters in the file name on windows
2004-08-02 16:14 ` Kevin Rodgers
@ 2004-08-03 6:32 ` Mathias Dahl
2004-08-03 19:19 ` Eli Zaretskii
[not found] ` <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-03 6:32 UTC (permalink / raw)
Kevin Rodgers <ihs_4664@yahoo.com> writes:
> > I'm cannot get emacs to open up a file that has a file name
> > with unicode characters in it. I have created these file
> > names by copy-paste from the Character Map tool in
> > Windows. As Emacs has good suupport for reading "unicode
> > formats" like UTF-8, UTF-16 etc it is a pity that it cannot
> > open these files.
> Does (setq file-name-coding-system 'utf-8) help?
No, even though it was a very interesting option. When I set
that variable I can *save* files and the file names looks
very cryptic in explorer.exe, probably because Windows use
UTF-16, but when I set the variable to UTF-16, emacs seems
to lock up and I have to press C-g almost the whole time,
VERY strange...
Anyway, if I used UTF-8 and saved a file containing swedish
characters, this file was visible with correct characters in
for examle dired, and Windows saw then as garbage.
Is UTF-16 not supported in this case or do I have an emacs
that is buggy (I'm using CVS stuff after all)?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: opening files with unicode characters in the file name on windows
2004-08-03 6:32 ` Mathias Dahl
@ 2004-08-03 19:19 ` Eli Zaretskii
[not found] ` <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2004-08-03 19:19 UTC (permalink / raw)
> From: Mathias Dahl <brakjoller@hotmail.com>
> Newsgroups: gnu.emacs.help
> Date: 03 Aug 2004 08:32:15 +0200
>
> > Does (setq file-name-coding-system 'utf-8) help?
>
> No, even though it was a very interesting option. When I set
> that variable I can *save* files and the file names looks
> very cryptic in explorer.exe, probably because Windows use
> UTF-16
Your original message said ``file names with Unicode characters''.
Can you tell what characters are those, and why do you think they are
encoded in some Unicode-related encoding, like UTF-16? Can you look
at the file's name as recorded in the directory with some low-level
tool that actually shows the byte values that encode the file's name?
You see, I suspect that Windows file names are encoded in the system
codepage, not in UTF-16. So perhaps setting file-name-coding-system
to that codepage would solve the problem.
^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>]
* Re: opening files with unicode characters in the file name on windows
[not found] ` <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>
@ 2004-08-04 7:46 ` Mathias Dahl
2004-08-04 7:56 ` Jason Rumney
2004-08-04 14:27 ` Mathias Dahl
1 sibling, 1 reply; 13+ messages in thread
From: Mathias Dahl @ 2004-08-04 7:46 UTC (permalink / raw)
"Eli Zaretskii" <eliz@gnu.org> writes:
> > From: Mathias Dahl <brakjoller@hotmail.com>
> > Newsgroups: gnu.emacs.help
> > Date: 03 Aug 2004 08:32:15 +0200
> >
> > > Does (setq file-name-coding-system 'utf-8) help?
> >
> > No, even though it was a very interesting option. When I set
> > that variable I can *save* files and the file names looks
> > very cryptic in explorer.exe, probably because Windows use
> > UTF-16
> Your original message said ``file names with Unicode
> characters''. Can you tell what characters are those, and
> why do you think they are encoded in some Unicode-related
> encoding, like UTF-16?
Well, I have been surfing around for a couple of weeks ago
since I have had to debug some unicode-issues in our
applications. Everywhere I go I rad about how Microsoft uses
unicode internally for string, and also in file names. And
as they say that they use UTF-16 for strings and file
content I just supposed they used it for encoding file names
too. But of course I man be wrong. And I really mean that as
I am a complete beginner when it comes to unicode.
> Can you look at the file's name as recorded in the
> directory with some low-level tool that actually shows the
> byte values that encode the file's name?
No, but I would really like to. :)
> You see, I suspect that Windows file names are encoded in
> the system codepage, not in UTF-16. So perhaps setting
> file-name-coding-system to that codepage would solve the
> problem.
Hmm, ok. I will try that, I just have to figure out which
code page I am currently using. Thanks for the tip, I will
report back my findings here.
Btw, is there some more "low-level" way of opening files in
Emacs so that I can open ANY file regardless of how the file
name is encoded?
/Mathias
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: opening files with unicode characters in the file name on windows
2004-08-04 7:46 ` Mathias Dahl
@ 2004-08-04 7:56 ` Jason Rumney
2004-08-04 8:42 ` Mathias Dahl
0 siblings, 1 reply; 13+ messages in thread
From: Jason Rumney @ 2004-08-04 7:56 UTC (permalink / raw)
Mathias Dahl <brakjoller@hotmail.com> writes:
> Hmm, ok. I will try that, I just have to figure out which
> code page I am currently using. Thanks for the tip, I will
> report back my findings here.
Take a look at the value of locale-coding-system. That is the most
likely candidate for file-name-coding-system.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: opening files with unicode characters in the file name on windows
2004-08-04 7:56 ` Jason Rumney
@ 2004-08-04 8:42 ` Mathias Dahl
2004-08-04 16:29 ` Eli Zaretskii
[not found] ` <mailman.2714.1091637376.1960.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-04 8:42 UTC (permalink / raw)
jasonr (Jason Rumney) @ f2s.com writes:
> Mathias Dahl <brakjoller@hotmail.com> writes:
>
> > Hmm, ok. I will try that, I just have to figure out which
> > code page I am currently using. Thanks for the tip, I will
> > report back my findings here.
>
> Take a look at the value of locale-coding-system. That is the most
> likely candidate for file-name-coding-system.
I tried these now (I actually think they are the same):
(setq file-name-coding-system 'cp1252)
(setq file-name-coding-system 'windows-1252)
And I cannot open the files with cyrillic or arabic or
hebrew characters in them. I am almost convinced that
Windows *do* encode them with UTF-16, but when I set UTF-16
as file-name-coding-system emacs freezes whatever I do and I
have to keep pressing C-g to unfreeze it. :(
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: opening files with unicode characters in the file name on windows
2004-08-04 8:42 ` Mathias Dahl
@ 2004-08-04 16:29 ` Eli Zaretskii
[not found] ` <mailman.2714.1091637376.1960.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2004-08-04 16:29 UTC (permalink / raw)
> From: Mathias Dahl <brakjoller@hotmail.com>
> Newsgroups: gnu.emacs.help
> Date: 04 Aug 2004 10:42:34 +0200
>
> I am almost convinced that Windows *do* encode them with UTF-16, but
> when I set UTF-16 as file-name-coding-system emacs freezes whatever
> I do and I have to keep pressing C-g to unfreeze it. :(
What value, exactly, did you try to use for file-name-coding-system?
^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <mailman.2714.1091637376.1960.help-gnu-emacs@gnu.org>]
* Re: opening files with unicode characters in the file name on windows
[not found] ` <mailman.2714.1091637376.1960.help-gnu-emacs@gnu.org>
@ 2004-08-05 11:28 ` Mathias Dahl
2004-08-06 9:38 ` Eli Zaretskii
[not found] ` <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-05 11:28 UTC (permalink / raw)
"Eli Zaretskii" <eliz@gnu.org> writes:
> > I am almost convinced that Windows *do* encode them with UTF-16,
> > but when I set UTF-16 as file-name-coding-system emacs freezes
> > whatever I do and I have to keep pressing C-g to unfreeze it. :(
>
> What value, exactly, did you try to use for file-name-coding-system?
I did this:
(setq file-name-coding-system 'utf-16)
I also tested this after starting up with --no-init-file.
If I do
(setq file-name-coding-system 'utf-8)
it works and my file names looks very funny in Explorer.exe if I save
a file with, for example, Swedish characters... :)
I tried something similar on Emacs 21.3 on Mandrake at home, setting
the coding-system to mule-utf-16-le and now my Putty-window sits
there, freezing... :)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: opening files with unicode characters in the file name on windows
2004-08-05 11:28 ` Mathias Dahl
@ 2004-08-06 9:38 ` Eli Zaretskii
[not found] ` <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2004-08-06 9:38 UTC (permalink / raw)
> From: Mathias Dahl <brakjoller@hotmail.com>
> Newsgroups: gnu.emacs.help
> Date: 05 Aug 2004 13:28:26 +0200
>
> > > I am almost convinced that Windows *do* encode them with UTF-16,
> > > but when I set UTF-16 as file-name-coding-system emacs freezes
> > > whatever I do and I have to keep pressing C-g to unfreeze it. :(
> >
> > What value, exactly, did you try to use for file-name-coding-system?
>
> I did this:
>
> (setq file-name-coding-system 'utf-16)
Sounds like a bug, so please take this to gnu.emacs.bug.
Meanwhile, if you set debug-on-quit to t and repeat what you told
above, what traceback do you see after C-q? That traceback should
show what function is inflooping for utf-16.
^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>]
* Re: opening files with unicode characters in the file name on windows
[not found] ` <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>
@ 2004-08-06 11:44 ` Mathias Dahl
2004-08-06 13:08 ` Mathias Dahl
1 sibling, 0 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-06 11:44 UTC (permalink / raw)
"Eli Zaretskii" <eliz@gnu.org> writes:
> Sounds like a bug, so please take this to gnu.emacs.bug.
Last time I tried to post there I got a reply to the e-mail
address I use that the article could not be posted. Is there
another interface than news, maybe an e-mail gateway?
> Meanwhile, if you set debug-on-quit to t and repeat what
> you told above, what traceback do you see after C-q? That
> traceback should show what function is inflooping for
> utf-16.
This is what I get:
Debugger entered--Lisp error: (quit)
utf-8-pre-write-conversion(1 4)
file-exists-p("h:/")
make-directory("h:/.emacs.d/auto-save-list/" t)
This is what I do to get this error:
1. Start emacs using --no-init-file
2. M-: (setq debug-on-quit t)
3. C-g -- to make emacs load debug, which it will not be
able to load after I change the
file-name-coding-system
4. (setq file-name-coding-system 'utf-16)
5. Wait for a while
6. After switching back to emacs it has frozen
7. "Unfreeze" with C-g
The error is not all the one above, even though the
make-directory part seems to be there most of the times.
Also, I'll try setting my home to a local drive to see if
things change.
/Mathias
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: opening files with unicode characters in the file name on windows
[not found] ` <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>
2004-08-06 11:44 ` Mathias Dahl
@ 2004-08-06 13:08 ` Mathias Dahl
1 sibling, 0 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-06 13:08 UTC (permalink / raw)
"Eli Zaretskii" <eliz@gnu.org> writes:
> >
> > I did this:
> >
> > (setq file-name-coding-system 'utf-16)
>
> Sounds like a bug, so please take this to gnu.emacs.bug.
I reported the bug via e-mail.
Thanks for all the suggestions!
/Mathias
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: opening files with unicode characters in the file name on windows
[not found] ` <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>
2004-08-04 7:46 ` Mathias Dahl
@ 2004-08-04 14:27 ` Mathias Dahl
1 sibling, 0 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-04 14:27 UTC (permalink / raw)
"Eli Zaretskii" <eliz@gnu.org> writes:
> Your original message said ``file names with Unicode characters''.
> Can you tell what characters are those, and why do you think they
> are encoded in some Unicode-related encoding, like UTF-16? Can you
> look at the file's name as recorded in the directory with some
> low-level tool that actually shows the byte values that encode the
> file's name?
I have done some investigation and I am pretty sure UTF-16 is the
encoding used. The following VBScript program (sorry for pasting
non-emacs related stuff here) loops through all files in a folder and
if the file names contain character values > 255 displays a list with
unicode code point values:
' -- TestUnicoceFileNames.vbs ---
Option Explicit
' --------- Main program starts
Dim sFileName
Dim oFSO
Dim oFile
Set oFSO = CreateObject("Scripting.FileSystemObject")
For Each oFile In oFSO.GetFolder("c:\document\my docs").Files
checkUnicodeFileName(oFile.Name)
Next
Set oFSO = Nothing
' --------- Main program ends
Private Sub checkUnicodeFileName(fileName)
Dim i
Dim c
Dim n
For i = 1 to Len(fileName)
c = Mid(fileName, i, 1)
n = AscW(c)
If n > 255 Then
MsgBox "File name contains unicode characters: " & _
Chr(10) & Chr(10) & _
"File name: " & fileName & _
Chr(10) & Chr(10) & _
"Characters and their unicode code points:" & _
Chr(10) & Chr(10) & _
getStringInfo(fileName)
Exit Sub
End If
Next
End Sub
Private Function getStringInfo(s)
Dim i
Dim n
Dim c
Dim h
Dim result
result = "Char" & Chr(9) & "U+NNNN" & Chr(10) & Chr(10)
For i = 1 to Len(s)
c = Mid(s, i, 1)
n = AscW(c)
h = Hex(n)
result = result & c & Chr(9) & Right("0000" & h, 4) & Chr(10)
Next
getStringInfo = result
End Function
' -- TestUnicoceFileNames.vbs end here---
The output looks like this (you do not see the actual characters which
I do if I use a "unicode font" for message boxes):
File name contains unicode characters:
File name: pravda_правда.txt
Characters and their unicode code points:
Char U+NNNN
p 0070
r 0072
a 0061
v 0076
d 0064
a 0061
_ 005F
п 043F
р 0440
а 0430
в 0432
д 0434
а 0430
. 002E
t 0074
x 0078
t 0074
/Mathias
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2004-08-06 13:08 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-02 12:53 opening files with unicode characters in the file name on windows Mathias Dahl
2004-08-02 16:14 ` Kevin Rodgers
2004-08-03 6:32 ` Mathias Dahl
2004-08-03 19:19 ` Eli Zaretskii
[not found] ` <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>
2004-08-04 7:46 ` Mathias Dahl
2004-08-04 7:56 ` Jason Rumney
2004-08-04 8:42 ` Mathias Dahl
2004-08-04 16:29 ` Eli Zaretskii
[not found] ` <mailman.2714.1091637376.1960.help-gnu-emacs@gnu.org>
2004-08-05 11:28 ` Mathias Dahl
2004-08-06 9:38 ` Eli Zaretskii
[not found] ` <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>
2004-08-06 11:44 ` Mathias Dahl
2004-08-06 13:08 ` Mathias Dahl
2004-08-04 14:27 ` Mathias Dahl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).