unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* opening files with unicode characters in the file name on windows
@ 2004-08-02 12:53 Mathias Dahl
  2004-08-02 16:14 ` Kevin Rodgers
  0 siblings, 1 reply; 13+ messages in thread
From: Mathias Dahl @ 2004-08-02 12:53 UTC (permalink / raw)


I'm cannot get emacs to open up a file that has a file name
with unicode characters in it. I have created these file
names by copy-paste from the Character Map tool in
Windows. As Emacs has good suupport for reading "unicode
formats" like UTF-8, UTF-16 etc it is a pity that it cannot
open these files.

My emacs version:

GNU Emacs 21.3.50.1 (i386-mingw-nt5.1.2600) of 2004-07-09 on FARIBA

OS:

Windows XP

Any suggestions to how I could open these files (other than
renaming them of course) are appreciated.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
  2004-08-02 12:53 opening files with unicode characters in the file name on windows Mathias Dahl
@ 2004-08-02 16:14 ` Kevin Rodgers
  2004-08-03  6:32   ` Mathias Dahl
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Rodgers @ 2004-08-02 16:14 UTC (permalink / raw)


Mathias Dahl wrote:
 > I'm cannot get emacs to open up a file that has a file name
 > with unicode characters in it. I have created these file
 > names by copy-paste from the Character Map tool in
 > Windows. As Emacs has good suupport for reading "unicode
 > formats" like UTF-8, UTF-16 etc it is a pity that it cannot
 > open these files.
 >
 > My emacs version:
 >
 > GNU Emacs 21.3.50.1 (i386-mingw-nt5.1.2600) of 2004-07-09 on FARIBA
 >
 > OS:
 >
 > Windows XP
 >
 > Any suggestions to how I could open these files (other than
 > renaming them of course) are appreciated.

Does (setq file-name-coding-system 'utf-8) help?

-- 
Kevin Rodgers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
  2004-08-02 16:14 ` Kevin Rodgers
@ 2004-08-03  6:32   ` Mathias Dahl
  2004-08-03 19:19     ` Eli Zaretskii
       [not found]     ` <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-03  6:32 UTC (permalink / raw)


Kevin Rodgers <ihs_4664@yahoo.com> writes:

>  > I'm cannot get emacs to open up a file that has a file name
>  > with unicode characters in it. I have created these file
>  > names by copy-paste from the Character Map tool in
>  > Windows. As Emacs has good suupport for reading "unicode
>  > formats" like UTF-8, UTF-16 etc it is a pity that it cannot
>  > open these files.

> Does (setq file-name-coding-system 'utf-8) help?

No, even though it was a very interesting option. When I set
that variable I can *save* files and the file names looks
very cryptic in explorer.exe, probably because Windows use
UTF-16, but when I set the variable to UTF-16, emacs seems
to lock up and I have to press C-g almost the whole time,
VERY strange...

Anyway, if I used UTF-8 and saved a file containing swedish
characters, this file was visible with correct characters in
for examle dired, and Windows saw then as garbage.

Is UTF-16 not supported in this case or do I have an emacs
that is buggy (I'm using CVS stuff after all)?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
  2004-08-03  6:32   ` Mathias Dahl
@ 2004-08-03 19:19     ` Eli Zaretskii
       [not found]     ` <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2004-08-03 19:19 UTC (permalink / raw)


> From: Mathias Dahl <brakjoller@hotmail.com>
> Newsgroups: gnu.emacs.help
> Date: 03 Aug 2004 08:32:15 +0200
> 
> > Does (setq file-name-coding-system 'utf-8) help?
> 
> No, even though it was a very interesting option. When I set
> that variable I can *save* files and the file names looks
> very cryptic in explorer.exe, probably because Windows use
> UTF-16

Your original message said ``file names with Unicode characters''.
Can you tell what characters are those, and why do you think they are
encoded in some Unicode-related encoding, like UTF-16?  Can you look
at the file's name as recorded in the directory with some low-level
tool that actually shows the byte values that encode the file's name?

You see, I suspect that Windows file names are encoded in the system
codepage, not in UTF-16.  So perhaps setting file-name-coding-system
to that codepage would solve the problem.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
       [not found]     ` <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>
@ 2004-08-04  7:46       ` Mathias Dahl
  2004-08-04  7:56         ` Jason Rumney
  2004-08-04 14:27       ` Mathias Dahl
  1 sibling, 1 reply; 13+ messages in thread
From: Mathias Dahl @ 2004-08-04  7:46 UTC (permalink / raw)


"Eli Zaretskii" <eliz@gnu.org> writes:

> > From: Mathias Dahl <brakjoller@hotmail.com>
> > Newsgroups: gnu.emacs.help
> > Date: 03 Aug 2004 08:32:15 +0200
> > 
> > > Does (setq file-name-coding-system 'utf-8) help?
> > 
> > No, even though it was a very interesting option. When I set
> > that variable I can *save* files and the file names looks
> > very cryptic in explorer.exe, probably because Windows use
> > UTF-16
 
> Your original message said ``file names with Unicode
> characters''.  Can you tell what characters are those, and
> why do you think they are encoded in some Unicode-related
> encoding, like UTF-16?

Well, I have been surfing around for a couple of weeks ago
since I have had to debug some unicode-issues in our
applications. Everywhere I go I rad about how Microsoft uses
unicode internally for string, and also in file names. And
as they say that they use UTF-16 for strings and file
content I just supposed they used it for encoding file names
too. But of course I man be wrong. And I really mean that as
I am a complete beginner when it comes to unicode.

> Can you look at the file's name as recorded in the
> directory with some low-level tool that actually shows the
> byte values that encode the file's name?

No, but I would really like to. :)
 
> You see, I suspect that Windows file names are encoded in
> the system codepage, not in UTF-16.  So perhaps setting
> file-name-coding-system to that codepage would solve the
> problem.

Hmm, ok. I will try that, I just have to figure out which
code page I am currently using. Thanks for the tip, I will
report back my findings here.

Btw, is there some more "low-level" way of opening files in
Emacs so that I can open ANY file regardless of how the file
name is encoded?

/Mathias

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
  2004-08-04  7:46       ` Mathias Dahl
@ 2004-08-04  7:56         ` Jason Rumney
  2004-08-04  8:42           ` Mathias Dahl
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Rumney @ 2004-08-04  7:56 UTC (permalink / raw)


Mathias Dahl <brakjoller@hotmail.com> writes:

> Hmm, ok. I will try that, I just have to figure out which
> code page I am currently using. Thanks for the tip, I will
> report back my findings here.

Take a look at the value of locale-coding-system. That is the most
likely candidate for file-name-coding-system.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
  2004-08-04  7:56         ` Jason Rumney
@ 2004-08-04  8:42           ` Mathias Dahl
  2004-08-04 16:29             ` Eli Zaretskii
       [not found]             ` <mailman.2714.1091637376.1960.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-04  8:42 UTC (permalink / raw)


jasonr (Jason Rumney) @  f2s.com writes:

> Mathias Dahl <brakjoller@hotmail.com> writes:
> 
> > Hmm, ok. I will try that, I just have to figure out which
> > code page I am currently using. Thanks for the tip, I will
> > report back my findings here.
> 
> Take a look at the value of locale-coding-system. That is the most
> likely candidate for file-name-coding-system.

I tried these now (I actually think they are the same):

(setq file-name-coding-system 'cp1252)
(setq file-name-coding-system 'windows-1252)

And I cannot open the files with cyrillic or arabic or
hebrew characters in them. I am almost convinced that
Windows *do* encode them with UTF-16, but when I set UTF-16
as file-name-coding-system emacs freezes whatever I do and I
have to keep pressing C-g to unfreeze it. :(

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
       [not found]     ` <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>
  2004-08-04  7:46       ` Mathias Dahl
@ 2004-08-04 14:27       ` Mathias Dahl
  1 sibling, 0 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-04 14:27 UTC (permalink / raw)


"Eli Zaretskii" <eliz@gnu.org> writes:

> Your original message said ``file names with Unicode characters''.
> Can you tell what characters are those, and why do you think they
> are encoded in some Unicode-related encoding, like UTF-16?  Can you
> look at the file's name as recorded in the directory with some
> low-level tool that actually shows the byte values that encode the
> file's name?

I have done some investigation and I am pretty sure UTF-16 is the
encoding used. The following VBScript program (sorry for pasting
non-emacs related stuff here) loops through all files in a folder and
if the file names contain character values > 255 displays a list with
unicode code point values:

' -- TestUnicoceFileNames.vbs ---

Option Explicit

' --------- Main program starts

Dim sFileName
Dim oFSO
Dim oFile

Set oFSO = CreateObject("Scripting.FileSystemObject")

For Each oFile In oFSO.GetFolder("c:\document\my docs").Files
  checkUnicodeFileName(oFile.Name)
Next

Set oFSO = Nothing

' --------- Main program ends

Private Sub checkUnicodeFileName(fileName)

  Dim i
  Dim c
  Dim n

  For i = 1 to Len(fileName)

    c = Mid(fileName, i, 1)
    n = AscW(c)

    If n > 255 Then
      MsgBox "File name contains unicode characters: " & _
             Chr(10) & Chr(10) & _
             "File name: " & fileName & _
             Chr(10) & Chr(10) & _
             "Characters and their unicode code points:" & _
             Chr(10) & Chr(10) & _
             getStringInfo(fileName)
      Exit Sub
    End If

  Next

End Sub

Private Function getStringInfo(s)
  Dim i
  Dim n
  Dim c
  Dim h
  Dim result

  result = "Char" & Chr(9) & "U+NNNN" & Chr(10) & Chr(10)

  For i = 1 to Len(s)
    c = Mid(s, i, 1)
    n = AscW(c)
    h = Hex(n)
    result = result & c & Chr(9) & Right("0000" & h, 4) & Chr(10)
  Next

  getStringInfo = result

End Function

' -- TestUnicoceFileNames.vbs end here---

The output looks like this (you do not see the actual characters which
I do if I use a "unicode font" for message boxes):

File name contains unicode characters: 

File name: pravda_правда.txt

Characters and their unicode code points:

Char	U+NNNN

p	0070
r	0072
a	0061
v	0076
d	0064
a	0061
_	005F
п	043F
р	0440
а	0430
в	0432
д	0434
а	0430
.	002E
t	0074
x	0078
t	0074

/Mathias

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
  2004-08-04  8:42           ` Mathias Dahl
@ 2004-08-04 16:29             ` Eli Zaretskii
       [not found]             ` <mailman.2714.1091637376.1960.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2004-08-04 16:29 UTC (permalink / raw)


> From: Mathias Dahl <brakjoller@hotmail.com>
> Newsgroups: gnu.emacs.help
> Date: 04 Aug 2004 10:42:34 +0200
> 
> I am almost convinced that Windows *do* encode them with UTF-16, but
> when I set UTF-16 as file-name-coding-system emacs freezes whatever
> I do and I have to keep pressing C-g to unfreeze it. :(

What value, exactly, did you try to use for file-name-coding-system?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
       [not found]             ` <mailman.2714.1091637376.1960.help-gnu-emacs@gnu.org>
@ 2004-08-05 11:28               ` Mathias Dahl
  2004-08-06  9:38                 ` Eli Zaretskii
       [not found]                 ` <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-05 11:28 UTC (permalink / raw)


"Eli Zaretskii" <eliz@gnu.org> writes:

> > I am almost convinced that Windows *do* encode them with UTF-16,
> > but when I set UTF-16 as file-name-coding-system emacs freezes
> > whatever I do and I have to keep pressing C-g to unfreeze it. :(
> 
> What value, exactly, did you try to use for file-name-coding-system?

I did this:

(setq file-name-coding-system 'utf-16)

I also tested this after starting up with --no-init-file.

If I do

(setq file-name-coding-system 'utf-8)

it works and my file names looks very funny in Explorer.exe if I save
a file with, for example, Swedish characters... :)

I tried something similar on Emacs 21.3 on Mandrake at home, setting
the coding-system to mule-utf-16-le and now my Putty-window sits
there, freezing... :)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
  2004-08-05 11:28               ` Mathias Dahl
@ 2004-08-06  9:38                 ` Eli Zaretskii
       [not found]                 ` <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2004-08-06  9:38 UTC (permalink / raw)


> From: Mathias Dahl <brakjoller@hotmail.com>
> Newsgroups: gnu.emacs.help
> Date: 05 Aug 2004 13:28:26 +0200
> 
> > > I am almost convinced that Windows *do* encode them with UTF-16,
> > > but when I set UTF-16 as file-name-coding-system emacs freezes
> > > whatever I do and I have to keep pressing C-g to unfreeze it. :(
> > 
> > What value, exactly, did you try to use for file-name-coding-system?
> 
> I did this:
> 
> (setq file-name-coding-system 'utf-16)

Sounds like a bug, so please take this to gnu.emacs.bug.

Meanwhile, if you set debug-on-quit to t and repeat what you told
above, what traceback do you see after C-q?  That traceback should
show what function is inflooping for utf-16.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
       [not found]                 ` <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>
@ 2004-08-06 11:44                   ` Mathias Dahl
  2004-08-06 13:08                   ` Mathias Dahl
  1 sibling, 0 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-06 11:44 UTC (permalink / raw)


"Eli Zaretskii" <eliz@gnu.org> writes:

> Sounds like a bug, so please take this to gnu.emacs.bug.

Last time I tried to post there I got a reply to the e-mail
address I use that the article could not be posted. Is there
another interface than news, maybe an e-mail gateway?
 
> Meanwhile, if you set debug-on-quit to t and repeat what
> you told above, what traceback do you see after C-q?  That
> traceback should show what function is inflooping for
> utf-16.

This is what I get:

  Debugger entered--Lisp error: (quit)
    utf-8-pre-write-conversion(1 4)
    file-exists-p("h:/")
    make-directory("h:/.emacs.d/auto-save-list/" t)

This is what I do to get this error:

1. Start emacs using --no-init-file
2. M-: (setq debug-on-quit t)
3. C-g -- to make emacs load debug, which it will not be
          able to load after I change the
          file-name-coding-system
4. (setq file-name-coding-system 'utf-16)
5. Wait for a while
6. After switching back to emacs it has frozen
7. "Unfreeze" with C-g

The error is not all the one above, even though the
make-directory part seems to be there most of the times.

Also, I'll try setting my home to a local drive to see if
things change.

/Mathias

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: opening files with unicode characters in the file name on windows
       [not found]                 ` <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>
  2004-08-06 11:44                   ` Mathias Dahl
@ 2004-08-06 13:08                   ` Mathias Dahl
  1 sibling, 0 replies; 13+ messages in thread
From: Mathias Dahl @ 2004-08-06 13:08 UTC (permalink / raw)


"Eli Zaretskii" <eliz@gnu.org> writes:

> > 
> > I did this:
> > 
> > (setq file-name-coding-system 'utf-16)
> 
> Sounds like a bug, so please take this to gnu.emacs.bug.

I reported the bug via e-mail.

Thanks for all the suggestions!

/Mathias

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-08-06 13:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-02 12:53 opening files with unicode characters in the file name on windows Mathias Dahl
2004-08-02 16:14 ` Kevin Rodgers
2004-08-03  6:32   ` Mathias Dahl
2004-08-03 19:19     ` Eli Zaretskii
     [not found]     ` <mailman.2622.1091561203.1960.help-gnu-emacs@gnu.org>
2004-08-04  7:46       ` Mathias Dahl
2004-08-04  7:56         ` Jason Rumney
2004-08-04  8:42           ` Mathias Dahl
2004-08-04 16:29             ` Eli Zaretskii
     [not found]             ` <mailman.2714.1091637376.1960.help-gnu-emacs@gnu.org>
2004-08-05 11:28               ` Mathias Dahl
2004-08-06  9:38                 ` Eli Zaretskii
     [not found]                 ` <mailman.112.1091785538.2011.help-gnu-emacs@gnu.org>
2004-08-06 11:44                   ` Mathias Dahl
2004-08-06 13:08                   ` Mathias Dahl
2004-08-04 14:27       ` Mathias Dahl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).