unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs]
       [not found]             ` <200408101144.UAA13100@etlken.m17n.org>
@ 2004-08-10 12:01               ` Jason Rumney
  2004-08-10 16:49                 ` Mattis
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Rumney @ 2004-08-10 12:01 UTC (permalink / raw)
  Cc: emacs-devel, brakjoller


[-- Attachment #1.1: Type: text/plain, Size: 1730 bytes --]

Kenichi Handa wrote:

>In article <c99f54dd040810043627699b54@mail.gmail.com>, Mattis <brakjoller@gmail.com> writes:
>
>  
>
>>> Are they real question marks?  Have you checked them with
>>> "C-x ="?
>>>      
>>>
>>Yes, they are real question marks.
>>    
>>
> 
>Hmmm, then something is different in Windows port.
>  
>
>>I do not know how to proceed. It definitely seems like the low level
>>emacs functions do not support all characters under Windows. Is this
>>considered a bug or a non-feature?
>>    
>>
>
>I have no idea because I have almost no knowledge about
>Windows.  Jason, do you know why
>
>  
>
The functions that deal with file names in the C library on Windows 
return ? for characters that are not supported by the system locale, 
even though NTFS supports Unicode file names. The answer would be to use 
Unicode-aware Win32 API functions instead of the standard C library, but 
such functions are only supported on some versions of Windows, so 
determining when to use them and when not to is a problem.

It is quite unusual for users to enter filenames in languages other than 
their own, and Emacs is certainly not the only application that has this 
problem (try "dir" in the windows command prompt for example), so this 
has not been high priority. It is probably appropriate to look at this 
in the next version of Emacs that uses Unicode internally. To do so now 
is too big a change for feature-freeze, I think and we already have a 
big change to support Unicode on the clipboard to install (which has 
more benefits to users than file names IMHO).


>(let ((file-name-coding-system 'raw-text))
>  (directory-files DIRECTORY_NAME))
>
>returns `?' in file names if they are encoded in utf-16-le?
>  
>

[-- Attachment #1.2: Type: text/html, Size: 2616 bytes --]

[-- Attachment #2: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs]
  2004-08-10 12:01               ` [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs] Jason Rumney
@ 2004-08-10 16:49                 ` Mattis
  2004-08-10 17:15                   ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: Mattis @ 2004-08-10 16:49 UTC (permalink / raw)
  Cc: emacs-devel, Kenichi Handa

> The answer would be to use Unicode-aware Win32 API functions 
> instead of the standard C library, but such functions are only 
> supported on some versions of Windows, so determining 
> when to use them and when not to is a problem.

Well, probably, yes. I agree that this might not be a high prio case,
but IMHO it should be solved sooner or later.
 
> It is quite unusual for users to enter filenames in languages 
> other than their own

Probably. I had a real-world problem though even though it might
not be very common.

Anyway, I do not want to argue about priorities here, but I want
to remind you of the second problem, the "freeze" that happens
on both Windows and GNU/Linux:

1. $ emacs --no-init-file --no-site-file

2. (setq debug-on-init t)

3. (setq file-name-coding-system 'utf-16-le)

4. Wait for a while

See now how emacs seems to "freeze". It is possible to unfreeze it
using C-g and then a debug outout like this is displayed:

  ange-ftp-completion-hook-function(file-exists-p "/")
  file-exists-p("/")
  make-directory("/home/mathias/.emacs.d/auto-save-list/" t)

As the primary problem will not be solved (unicode support for file names)
I will not be affected by the problem above in a real-world case, but I 
wanted you to know that something is fishy here, even on GNU/Linux.

Thanks for the help!

/Mathias

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs]
  2004-08-10 16:49                 ` Mattis
@ 2004-08-10 17:15                   ` Stefan Monnier
  2004-08-11 10:59                     ` Mattis
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2004-08-10 17:15 UTC (permalink / raw)
  Cc: emacs-devel, Kenichi Handa, Jason Rumney

> See now how emacs seems to "freeze". It is possible to unfreeze it
> using C-g and then a debug outout like this is displayed:

>   ange-ftp-completion-hook-function(file-exists-p "/")
>   file-exists-p("/")
>   make-directory("/home/mathias/.emacs.d/auto-save-list/" t)

Yes, the problem is as follows:

  (file-exists-p "/home/mathias/.emacs.d/auto-save-list/") -> nil

so make-directory decides the dir needs to be created, but he first checks
to see if the parent needs to be created as well:

  (file-exists-p "/home/mathias/.emacs.d/") -> nil

so it tris to create the parent, check its own parent:

  (file-exists-p "/home/mathias/") -> nil
...
  (file-exists-p "/home/") -> nil
...
  (file-exists-p "/") -> nil
...
  (file-exists-p "/") -> nil
...
  (file-exists-p "/") -> nil
...
because the parent of "/" is "/" and because after encoding in utf-16,
even "/" doesn't exist.

Such an encoding is clearly completely wrong for such a system, so I'm not
sure how important it is to protect oneself against such situations.
After all, there are several other ways to screw oneself and this one is at
least reasonably easy to revert.  Try the following ed-emulator:

	M-: (use-global-map (make-keymap)) RET


-- Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs]
  2004-08-10 17:15                   ` Stefan Monnier
@ 2004-08-11 10:59                     ` Mattis
  2004-08-11 14:57                       ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: Mattis @ 2004-08-11 10:59 UTC (permalink / raw)
  Cc: emacs-devel, Kenichi Handa, Jason Rumney

> Yes, the problem is as follows:
> 
>   (file-exists-p "/home/mathias/.emacs.d/auto-save-list/") -> nil
> 
> so make-directory decides the dir needs to be created, but he first checks
> to see if the parent needs to be created as well:
> 
>   (file-exists-p "/home/mathias/.emacs.d/") -> nil
> 
> so it tris to create the parent, check its own parent:
> 
>   (file-exists-p "/home/mathias/") -> nil
> ....
>   (file-exists-p "/home/") -> nil
> ....
>   (file-exists-p "/") -> nil
> ....
>   (file-exists-p "/") -> nil
> ...
> because the parent of "/" is "/" and because after encoding in utf-16,
> even "/" doesn't exist.
> 
> Such an encoding is clearly completely wrong for such a system, so I'm not
> sure how important it is to protect oneself against such situations.

I see. But what if the test that file-exists-p does was to encode the
"test-string"
first in the same encoding? Or maybe this would be crazy, this isn't exactly
my area of expertice. :)

> After all, there are several other ways to screw oneself and this one is at
> least reasonably easy to revert.

Agree.

Anyway, the conclusion of all this seems to be that Emacs is not 100 % ready for
unicode yet so I have to avoid to try torture it with these in the
meantime... :)

/Mathias

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs]
  2004-08-11 10:59                     ` Mattis
@ 2004-08-11 14:57                       ` Stefan Monnier
  2004-08-12  8:22                         ` Mattis
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2004-08-11 14:57 UTC (permalink / raw)
  Cc: emacs-devel, Kenichi Handa, Jason Rumney

> I see.  But what if the test that file-exists-p does was to encode the
> "test-string" first in the same encoding? Or maybe this would be crazy,
> this isn't exactly my area of expertise. :)

I do not understand what you are trying to say here.  The problem is that
(file-exists-p "/") needs to turn the stream of *characters* "/" into
a stream of *bytes* (which is what the encoding is for) and if you specify
the wrong encoding you might get the wrong bytes, so the OS will of course
not find the corresponding file.

> Anyway, the conclusion of all this seems to be that Emacs is not 100 %
> ready for unicode yet so I have to avoid to try torture it with these in
> the meantime... :)

While the unicode support is not finished (and probably never will be given
the amount of detail you can get into if you care to), it has nothing to do
with the problem at hand.  The only valid conclusion here is that your file
names are not encoded in utf-16 and thus you get what you deserve if you
wrongly tell Emacs to use utf-16 encoding for filenames.


        Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs]
  2004-08-11 14:57                       ` Stefan Monnier
@ 2004-08-12  8:22                         ` Mattis
  2004-08-12 11:00                           ` Jason Rumney
  2004-08-12 11:36                           ` Kenichi Handa
  0 siblings, 2 replies; 10+ messages in thread
From: Mattis @ 2004-08-12  8:22 UTC (permalink / raw)
  Cc: emacs-devel, Kenichi Handa, Jason Rumney

> The problem is that (file-exists-p "/") needs to turn the stream 
> of *characters* "/" into a stream of *bytes* (which is what 
> the encoding is for) and if you specify the wrong encoding 
> you might get the wrong bytes, so the OS will of course
> not find the corresponding file.

Ah, yes, I understand now.

> > Anyway, the conclusion of all this seems to be that Emacs is not 100 %
> > ready for unicode yet so I have to avoid to try torture it with these in
> > the meantime... :)
> 
> While the unicode support is not finished (and probably never will be given
> the amount of detail you can get into if you care to), it has nothing to do
> with the problem at hand.  The only valid conclusion here is that your file
> names are not encoded in utf-16 and thus you get what you deserve if you
> wrongly tell Emacs to use utf-16 encoding for filenames.

OK. So, the best would be to have the encoding set to "undecided" and then let
Emacs figure out which encoding is used, right? And hopefully it will be able to
do this correctly in later versions. (I'm talking about the unicode
file names on Windows-problem now, not the "freeze")

Thanks for all the help.

/Mathias

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs]
  2004-08-12  8:22                         ` Mattis
@ 2004-08-12 11:00                           ` Jason Rumney
  2004-08-12 11:36                           ` Kenichi Handa
  1 sibling, 0 replies; 10+ messages in thread
From: Jason Rumney @ 2004-08-12 11:00 UTC (permalink / raw)
  Cc: Kenichi Handa, Stefan Monnier, emacs-devel

Mattis wrote:

>OK. So, the best would be to have the encoding set to "undecided"
>
The best is to leave it at what Emacs determined at startup, ie 
appropriate for your system locale setting.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs]
  2004-08-12  8:22                         ` Mattis
  2004-08-12 11:00                           ` Jason Rumney
@ 2004-08-12 11:36                           ` Kenichi Handa
  2004-08-12 12:31                             ` Jason Rumney
  1 sibling, 1 reply; 10+ messages in thread
From: Kenichi Handa @ 2004-08-12 11:36 UTC (permalink / raw)
  Cc: jasonr, monnier, emacs-devel

In article <c99f54dd0408120122401e9526@mail.gmail.com>, Mattis <brakjoller@gmail.com> writes:

> OK. So, the best would be to have the encoding set to
> "undecided" and then let Emacs figure out which encoding
> is used, right?

Unfortunately, no.  By that, Emacs may be able to decode the
file name corrrectly, but it doesn't remember the encoding
for the time it has to encode the file name.

I belive that Windows itself has a function to handle such a
file name consistently.  So, I think the best is to make a
speical coding system, say windows-file-name, that works
only for Windows.  It decodes file names into utf-8 sequence
(internal code of Emacs-Unicode), and encode file names into
locale encoding or utf-16le-with-signature depending on the
contents.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs]
  2004-08-12 11:36                           ` Kenichi Handa
@ 2004-08-12 12:31                             ` Jason Rumney
  2004-08-13  1:46                               ` Kenichi Handa
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Rumney @ 2004-08-12 12:31 UTC (permalink / raw)
  Cc: emacs-devel, monnier, brakjoller

Kenichi Handa wrote:

>I belive that Windows itself has a function to handle such a
>file name consistently.  So, I think the best is to make a
>speical coding system, say windows-file-name, that works
>only for Windows.
>
Part of this discussion has occured off list, so you may not have seen 
my previous mail on this.

Currently Emacs uses the standard C library functions for file I/O on 
all platforms. In my judgement, changing this would be too much work to 
take on during feature-freeze, especially since it is complicated by the 
fact that the full Unicode API is not supported on all versions of Windows.

In future when this is implemented, I do not see the need for 
"windows-file-name" coding-system. In the versions of Windows where the 
Unicode API is fully supported, there is no need for 
file-name-coding-system, since it is known to be 
utf-16-le-with-signature. In the cases where those APIs are not 
supported, then file-name-coding-system should be used with the standard 
C library as now.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs]
  2004-08-12 12:31                             ` Jason Rumney
@ 2004-08-13  1:46                               ` Kenichi Handa
  0 siblings, 0 replies; 10+ messages in thread
From: Kenichi Handa @ 2004-08-13  1:46 UTC (permalink / raw)
  Cc: emacs-devel, monnier, brakjoller

In article <411B631E.4050006@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:

> Part of this discussion has occured off list, so you may not have seen 
> my previous mail on this.

> Currently Emacs uses the standard C library functions for file I/O on 
> all platforms. In my judgement, changing this would be too much work to 
> take on during feature-freeze, especially since it is complicated by the 
> fact that the full Unicode API is not supported on all versions of Windows.

Of course, I agree that we shouldn't change the current
behaviour now.  I was talking about what to do in
emacs-unicode.

> In future when this is implemented, I do not see the need for 
> "windows-file-name" coding-system. In the versions of Windows where the 
> Unicode API is fully supported, there is no need for 
> file-name-coding-system, since it is known to be 
> utf-16-le-with-signature. In the cases where those APIs are not 
> supported, then file-name-coding-system should be used with the standard 
> C library as now.

Ok, I see.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-08-13  1:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E1BtYdG-0004bH-IO@fencepost.gnu.org>
     [not found] ` <200408091228.VAA10104@etlken.m17n.org>
     [not found]   ` <c99f54dd0408091315edacc8d@mail.gmail.com>
     [not found]     ` <200408100133.KAA11854@etlken.m17n.org>
     [not found]       ` <c99f54dd0408100341410de4d5@mail.gmail.com>
     [not found]         ` <200408101058.TAA12976@etlken.m17n.org>
     [not found]           ` <c99f54dd040810043627699b54@mail.gmail.com>
     [not found]             ` <200408101144.UAA13100@etlken.m17n.org>
2004-08-10 12:01               ` [brakjoller@gmail.com: setting utf-16 as file-name-coding-system locks up emacs] Jason Rumney
2004-08-10 16:49                 ` Mattis
2004-08-10 17:15                   ` Stefan Monnier
2004-08-11 10:59                     ` Mattis
2004-08-11 14:57                       ` Stefan Monnier
2004-08-12  8:22                         ` Mattis
2004-08-12 11:00                           ` Jason Rumney
2004-08-12 11:36                           ` Kenichi Handa
2004-08-12 12:31                             ` Jason Rumney
2004-08-13  1:46                               ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).