unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
@ 2013-01-26  5:52 Shigeru Fukaya
  2013-01-26 10:26 ` Stefan Monnier
  2013-01-26 12:56 ` Eli Zaretskii
  0 siblings, 2 replies; 15+ messages in thread
From: Shigeru Fukaya @ 2013-01-26  5:52 UTC (permalink / raw)
  To: 13553

Hi,

`file-attributes' returns nil as stat info for a file, its name ending
with a character whose second byte is 0x5c.

It is because IS_DIRECTORY_SEP is wrongly used.
IS_DIRECTORY_SEP only works when its argument is surely on a start
byte of dbcs characters.
I find incorrect usage, at least in some functions of `w32.c'.
(readdir, stat_worker, read_unc_volume)


I show, as a silly sample, of easy fix.

#define IS_AT_DIRECTORY_SEP(p,i) \
  (p[i] == '/' || (p[i] == '\\' && _mbsbtype(p,i) == _MBC_SINGLE))

and,

  IS_DIRECTORY_SEP(p[i]) 

to

  IS_AT_DIRECTORY_SEP(p, i) 


Regards,
Shigeru





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26  5:52 bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP Shigeru Fukaya
@ 2013-01-26 10:26 ` Stefan Monnier
  2013-01-26 11:16   ` Eli Zaretskii
  2013-01-26 12:56 ` Eli Zaretskii
  1 sibling, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2013-01-26 10:26 UTC (permalink / raw)
  To: Shigeru Fukaya; +Cc: 13553

> It is because IS_DIRECTORY_SEP is wrongly used.
> IS_DIRECTORY_SEP only works when its argument is surely on a start
> byte of dbcs characters.

If the file name is properly decoded (i.e. in its multibyte form), there
can't be a 0x5c byte other than at the beginning of a char.

So this bug can only happens if you pass a unibyte filename.  IOW the
bug is probably in the caller rather than in the file-attributes function.

Please show us the actual situation where this happens, since file names
are normally always multibyte.


        Stefan






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26 10:26 ` Stefan Monnier
@ 2013-01-26 11:16   ` Eli Zaretskii
  2013-01-26 12:40     ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-26 11:16 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 13553, shigeru.fukaya

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 26 Jan 2013 05:26:04 -0500
> Cc: 13553@debbugs.gnu.org
> 
> > It is because IS_DIRECTORY_SEP is wrongly used.
> > IS_DIRECTORY_SEP only works when its argument is surely on a start
> > byte of dbcs characters.
> 
> If the file name is properly decoded (i.e. in its multibyte form), there
> can't be a 0x5c byte other than at the beginning of a char.
> 
> So this bug can only happens if you pass a unibyte filename.  IOW the
> bug is probably in the caller rather than in the file-attributes function.

No, it's not in the caller.  It's in w32.c, whose functions almost
always manipulate encoded file names (because they shadow system
APIs).

> Please show us the actual situation where this happens, since file names
> are normally always multibyte.

You call file-attributes, which encodes the file name and passes it to
'lstat'.  The implementation of 'lstat' in w32.c then looks at the
last byte of the encoded file name to see if there's a slash or
backslash there.  Boom!

I will fix that.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26 11:16   ` Eli Zaretskii
@ 2013-01-26 12:40     ` Stefan Monnier
  2013-01-26 13:23       ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2013-01-26 12:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13553, shigeru.fukaya

> You call file-attributes, which encodes the file name and passes it to
> 'lstat'.  The implementation of 'lstat' in w32.c then looks at the
> last byte of the encoded file name to see if there's a slash or
> backslash there.  Boom!

I see.   So, does that meant that w32.c can't faithfully implement lstat
without doing the moral equivalent of re-decoding its argument?


        Stefan






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26  5:52 bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP Shigeru Fukaya
  2013-01-26 10:26 ` Stefan Monnier
@ 2013-01-26 12:56 ` Eli Zaretskii
  2013-01-26 14:06   ` Eli Zaretskii
  2013-01-26 16:33   ` Shigeru Fukaya
  1 sibling, 2 replies; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-26 12:56 UTC (permalink / raw)
  To: Shigeru Fukaya; +Cc: 13553

> From: Shigeru Fukaya <shigeru.fukaya@gmail.com>
> Date: Sat, 26 Jan 2013 14:52:59 +0900
> 
> `file-attributes' returns nil as stat info for a file, its name ending
> with a character whose second byte is 0x5c.
> 
> It is because IS_DIRECTORY_SEP is wrongly used.
> IS_DIRECTORY_SEP only works when its argument is surely on a start
> byte of dbcs characters.
> I find incorrect usage, at least in some functions of `w32.c'.
> (readdir, stat_worker, read_unc_volume)

Thanks.  I fixed w32.c functions that use IS_DIRECTORY_SEP in revision
111200 on the emacs-24 branch.  As I cannot easily test DBCS file-name
encodings on my system, please test the fixed version and see if the
problem indeed goes away.  The patch relative to emacs-24 branch is
below, for your convenience.

After looking at the code which I fixed, I'm afraid I don't understand
why file-attributes returned nil in your case.  The code that uses
IS_DIRECTORY_SEP there is only a fallback, for when the
GetFileInformationByHandle API fails for some reason.  You don't say
what version of Windows you are using, but I won't expect that
fallback code be executed for any reasonably recent Windows version.

So could you please trace through the code that file-attributes calls,
and see which use of IS_DIRECTORY_SEP caused the problem?  I'm afraid
that it's not in w32.c at all, but elsewhere.


=== modified file 'src/w32.c'
--- src/w32.c	2013-01-25 14:47:37 +0000
+++ src/w32.c	2013-01-26 12:49:34 +0000
@@ -1503,12 +1503,17 @@ parse_root (char * name, char ** pPath)
   else if (IS_DIRECTORY_SEP (name[0]) && IS_DIRECTORY_SEP (name[1]))
     {
       int slashes = 2;
+      int dbcs_p = max_filename_mbslen () > 1;
+
       name += 2;
       do
         {
 	  if (IS_DIRECTORY_SEP (*name) && --slashes == 0)
 	    break;
-	  name++;
+	  if (dbcs_p)
+	    name = CharNextExA (file_name_codepage, name, 0);
+	  else
+	    name++;
 	}
       while ( *name );
       if (IS_DIRECTORY_SEP (name[0]))
@@ -2369,12 +2374,23 @@ get_volume_info (const char * name, cons
     {
       char *str = temp;
       int slashes = 4;
+      int dbcs_p = max_filename_mbslen () > 1;
+
       rootname = temp;
       do
         {
 	  if (IS_DIRECTORY_SEP (*name) && --slashes == 0)
 	    break;
-	  *str++ = *name++;
+	  if (!dbcs_p)
+	    *str++ = *name++;
+	  else
+	    {
+	      const char *p = name;
+
+	      name = CharNextExA (file_name_codepage, name, 0);
+	      memcpy (str, p, name - p);
+	      str += name - p;
+	    }
 	}
       while ( *name );
 
@@ -2610,11 +2626,23 @@ readdir (DIR *dirp)
     {
       char filename[MAXNAMLEN + 3];
       int ln;
+      int dbcs_p = max_filename_mbslen () > 1;
 
       strcpy (filename, dir_pathname);
       ln = strlen (filename) - 1;
-      if (!IS_DIRECTORY_SEP (filename[ln]))
-	strcat (filename, "\\");
+      if (!dbcs_p)
+	{
+	  if (!IS_DIRECTORY_SEP (filename[ln]))
+	    strcat (filename, "\\");
+	}
+      else
+	{
+	  char *end = filename + ln + 1;
+	  char *last_char = CharPrevExA (file_name_codepage, filename, end, 0);
+
+	  if (!IS_DIRECTORY_SEP (*last_char))
+	    strcat (filename, "\\");
+	}
       strcat (filename, "*");
 
       /* Note: No need to resolve symlinks in FILENAME, because
@@ -2719,6 +2747,7 @@ read_unc_volume (HANDLE henum, char *rea
   DWORD bufsize = 512;
   char *buffer;
   char *ptr;
+  int dbcs_p = max_filename_mbslen () > 1;
 
   count = 1;
   buffer = alloca (bufsize);
@@ -2729,7 +2758,13 @@ read_unc_volume (HANDLE henum, char *rea
   /* WNetEnumResource returns \\resource\share...skip forward to "share". */
   ptr = ((LPNETRESOURCE) buffer)->lpRemoteName;
   ptr += 2;
-  while (*ptr && !IS_DIRECTORY_SEP (*ptr)) ptr++;
+  if (!dbcs_p)
+    while (*ptr && !IS_DIRECTORY_SEP (*ptr)) ptr++;
+  else
+    {
+      while (*ptr && !IS_DIRECTORY_SEP (*ptr))
+	ptr = CharNextExA (file_name_codepage, ptr, 0);
+    }
   ptr++;
 
   strncpy (readbuf, ptr, size);
@@ -2766,9 +2801,11 @@ logon_network_drive (const char *path)
 {
   NETRESOURCE resource;
   char share[MAX_PATH];
-  int i, n_slashes;
+  int n_slashes;
   char drive[4];
   UINT drvtype;
+  char *p;
+  int dbcs_p;
 
   if (IS_DIRECTORY_SEP (path[0]) && IS_DIRECTORY_SEP (path[1]))
     drvtype = DRIVE_REMOTE;
@@ -2790,13 +2827,18 @@ logon_network_drive (const char *path)
   n_slashes = 2;
   strncpy (share, path, MAX_PATH);
   /* Truncate to just server and share name.  */
-  for (i = 2; i < MAX_PATH; i++)
+  dbcs_p = max_filename_mbslen () > 1;
+  for (p = share + 2; *p && p < share + MAX_PATH; )
     {
-      if (IS_DIRECTORY_SEP (share[i]) && ++n_slashes > 3)
+      if (IS_DIRECTORY_SEP (*p) && ++n_slashes > 3)
         {
-          share[i] = '\0';
+          *p = '\0';
           break;
         }
+      if (dbcs_p)
+	p = CharNextExA (file_name_codepage, p, 0);
+      else
+	p++;
     }
 
   resource.dwType = RESOURCETYPE_DISK;
@@ -3557,6 +3599,7 @@ stat_worker (const char * path, struct s
   DWORD access_rights = 0;
   DWORD fattrs = 0, serialnum = 0, fs_high = 0, fs_low = 0, nlinks = 1;
   FILETIME ctime, atime, wtime;
+  int dbcs_p;
 
   if (path == NULL || buf == NULL)
     {
@@ -3751,6 +3794,7 @@ stat_worker (const char * path, struct s
 	 did not ask for extra precision, resolving symlinks will fly
 	 in the face of that request, since the user then wants the
 	 lightweight version of the code.  */
+      dbcs_p = max_filename_mbslen () > 1;
       rootdir = (path >= save_name + len - 1
 		 && (IS_DIRECTORY_SEP (*path) || *path == 0));
 
@@ -3778,8 +3822,19 @@ stat_worker (const char * path, struct s
 	}
       else if (rootdir)
 	{
-	  if (!IS_DIRECTORY_SEP (name[len-1]))
-	    strcat (name, "\\");
+	  if (!dbcs_p)
+	    {
+	      if (!IS_DIRECTORY_SEP (name[len-1]))
+		strcat (name, "\\");
+	    }
+	  else
+	    {
+	      char *end = name + len;
+	      char *n = CharPrevExA (file_name_codepage, name, end, 0);
+
+	      if (!IS_DIRECTORY_SEP (*n))
+		strcat (name, "\\");
+	    }
 	  if (GetDriveType (name) < 2)
 	    {
 	      errno = ENOENT;
@@ -3791,15 +3846,37 @@ stat_worker (const char * path, struct s
 	}
       else
 	{
-	  if (IS_DIRECTORY_SEP (name[len-1]))
-	    name[len - 1] = 0;
+	  if (!dbcs_p)
+	    {
+	      if (IS_DIRECTORY_SEP (name[len-1]))
+		name[len - 1] = 0;
+	    }
+	  else
+	    {
+	      char *end = name + len;
+	      char *n = CharPrevExA (file_name_codepage, name, end, 0);
+
+	      if (IS_DIRECTORY_SEP (*n))
+		*n = 0;
+	    }
 
 	  /* (This is hacky, but helps when doing file completions on
 	     network drives.)  Optimize by using information available from
 	     active readdir if possible.  */
 	  len = strlen (dir_pathname);
-	  if (IS_DIRECTORY_SEP (dir_pathname[len-1]))
-	    len--;
+	  if (!dbcs_p)
+	    {
+	      if (IS_DIRECTORY_SEP (dir_pathname[len-1]))
+		len--;
+	    }
+	  else
+	    {
+	      char *end = dir_pathname + len;
+	      char *n = CharPrevExA (file_name_codepage, dir_pathname, end, 0);
+
+	      if (IS_DIRECTORY_SEP (*n))
+		len--;
+	    }
 	  if (dir_find_handle != INVALID_HANDLE_VALUE
 	      && !(is_a_symlink && follow_symlinks)
 	      && strnicmp (save_name, dir_pathname, len) == 0
@@ -4060,6 +4137,7 @@ symlink (char const *filename, char cons
   char linkfn[MAX_PATH], *tgtfn;
   DWORD flags = 0;
   int dir_access, filename_ends_in_slash;
+  int dbcs_p;
 
   /* Diagnostics follows Posix as much as possible.  */
   if (filename == NULL || linkname == NULL)
@@ -4085,6 +4163,8 @@ symlink (char const *filename, char cons
       return -1;
     }
 
+  dbcs_p = max_filename_mbslen () > 1;
+
   /* Note: since empty FILENAME was already rejected, we can safely
      refer to FILENAME[1].  */
   if (!(IS_DIRECTORY_SEP (filename[0]) || IS_DEVICE_SEP (filename[1])))
@@ -4099,8 +4179,21 @@ symlink (char const *filename, char cons
       char tem[MAX_PATH];
       char *p = linkfn + strlen (linkfn);
 
-      while (p > linkfn && !IS_ANY_SEP (p[-1]))
-	p--;
+      if (!dbcs_p)
+	{
+	  while (p > linkfn && !IS_ANY_SEP (p[-1]))
+	    p--;
+	}
+      else
+	{
+	  char *p1 = CharPrevExA (file_name_codepage, linkfn, p, 0);
+
+	  while (p > linkfn && !IS_ANY_SEP (*p1))
+	    {
+	      p = p1;
+	      p1 = CharPrevExA (file_name_codepage, linkfn, p1, 0);
+	    }
+	}
       if (p > linkfn)
 	strncpy (tem, linkfn, p - linkfn);
       tem[p - linkfn] = '\0';
@@ -4115,7 +4208,15 @@ symlink (char const *filename, char cons
      exist, but ends in a slash, we create a symlink to directory.  If
      FILENAME exists and is a directory, we always create a symlink to
      directory.  */
-  filename_ends_in_slash = IS_DIRECTORY_SEP (filename[strlen (filename) - 1]);
+  if (!dbcs_p)
+    filename_ends_in_slash = IS_DIRECTORY_SEP (filename[strlen (filename) - 1]);
+  else
+    {
+      const char *end = filename + strlen (filename);
+      const char *n = CharPrevExA (file_name_codepage, filename, end, 0);
+
+      filename_ends_in_slash = IS_DIRECTORY_SEP (*n);
+    }
   if (dir_access == 0 || filename_ends_in_slash)
     flags = SYMBOLIC_LINK_FLAG_DIRECTORY;
 
@@ -4440,6 +4541,7 @@ chase_symlinks (const char *file)
   char link[MAX_PATH];
   ssize_t res, link_len;
   int loop_count = 0;
+  int dbcs_p;
 
   if (is_windows_9x () == TRUE || !is_symlink (file))
     return (char *)file;
@@ -4447,13 +4549,27 @@ chase_symlinks (const char *file)
   if ((link_len = GetFullPathName (file, MAX_PATH, link, NULL)) == 0)
     return (char *)file;
 
+  dbcs_p = max_filename_mbslen () > 1;
   target[0] = '\0';
   do {
 
     /* Remove trailing slashes, as we want to resolve the last
        non-trivial part of the link name.  */
-    while (link_len > 3 && IS_DIRECTORY_SEP (link[link_len-1]))
-      link[link_len--] = '\0';
+    if (!dbcs_p)
+      {
+	while (link_len > 3 && IS_DIRECTORY_SEP (link[link_len-1]))
+	  link[link_len--] = '\0';
+      }
+    else if (link_len > 3)
+      {
+	char *n = CharPrevExA (file_name_codepage, link, link + link_len, 0);
+
+	while (n >= link + 2 && IS_DIRECTORY_SEP (*n))
+	  {
+	    n[1] = '\0';
+	    n = CharPrevExA (file_name_codepage, link, n, 0);
+	  }
+      }
 
     res = readlink (link, target, MAX_PATH);
     if (res > 0)
@@ -4466,8 +4582,21 @@ chase_symlinks (const char *file)
 	       the symlink, then copy the result back to target.  */
 	    char *p = link + link_len;
 
-	    while (p > link && !IS_ANY_SEP (p[-1]))
-	      p--;
+	    if (!dbcs_p)
+	      {
+		while (p > link && !IS_ANY_SEP (p[-1]))
+		  p--;
+	      }
+	    else
+	      {
+		char *p1 = CharPrevExA (file_name_codepage, link, p, 0);
+
+		while (p > link && !IS_ANY_SEP (*p1))
+		  {
+		    p = p1;
+		    p1 = CharPrevExA (file_name_codepage, link, p1, 0);
+		  }
+	      }
 	    strcpy (p, target);
 	    strcpy (target, link);
 	  }






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26 12:40     ` Stefan Monnier
@ 2013-01-26 13:23       ` Eli Zaretskii
  2013-01-26 22:24         ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-26 13:23 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 13553, shigeru.fukaya

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: shigeru.fukaya@gmail.com,  13553@debbugs.gnu.org
> Date: Sat, 26 Jan 2013 07:40:08 -0500
> 
> > You call file-attributes, which encodes the file name and passes it to
> > 'lstat'.  The implementation of 'lstat' in w32.c then looks at the
> > last byte of the encoded file name to see if there's a slash or
> > backslash there.  Boom!
> 
> I see.   So, does that meant that w32.c can't faithfully implement lstat
> without doing the moral equivalent of re-decoding its argument?

It can, if we limit such support to Windows codepage encodings.  See
the changes I made on the emacs-24 branch revisions 111194 and 111200.

This will still lose if the user binds file-name-coding-system to
something like shift_jis (instead of using its Windows extension
cp932), but there's nothing I can do about that.  I was lucky to find
in Windows APIs 2 functions that can move forward and backward by DBCS
characters, and that accept a codepage as their argument (i.e. do not
limit support to the current system codepage).  Personally, I think
supporting all possible Windows codepages is good enough.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26 12:56 ` Eli Zaretskii
@ 2013-01-26 14:06   ` Eli Zaretskii
  2013-01-26 16:33   ` Shigeru Fukaya
  1 sibling, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-26 14:06 UTC (permalink / raw)
  To: shigeru.fukaya; +Cc: 13553

> Date: Sat, 26 Jan 2013 14:56:46 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 13553@debbugs.gnu.org
> 
> After looking at the code which I fixed, I'm afraid I don't understand
> why file-attributes returned nil in your case.  The code that uses
> IS_DIRECTORY_SEP there is only a fallback, for when the
> GetFileInformationByHandle API fails for some reason.  You don't say
> what version of Windows you are using, but I won't expect that
> fallback code be executed for any reasonably recent Windows version.

Actually, that fallback code gets also executed if you customize
w32-get-true-file-attributes to nil, or if the file is on a remote
(i.e. networked) filesystem.  Could one of these be true in your case?





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26 12:56 ` Eli Zaretskii
  2013-01-26 14:06   ` Eli Zaretskii
@ 2013-01-26 16:33   ` Shigeru Fukaya
  2013-01-26 17:37     ` Eli Zaretskii
  1 sibling, 1 reply; 15+ messages in thread
From: Shigeru Fukaya @ 2013-01-26 16:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13553

I greatly appreciate your very quick work, thank you.

I looked through your fix of all functions
(parse_root, get_volume_info, readdir, read_unc_volume, 
logon_network_drive, star_worker, symlink, chase_symlinks)
one by one, and found good at first look.

I'll try build and check it next, anyway.

Yes, my `w32-get-true-file-attributes' is nil. (Isn't it default?)

As for coding symbol, not a few Japanese use 'cp932 not 'shift-jis I 
suppose.  But we need some notice for users.

Regards,
Shigeru


>> From: Shigeru Fukaya <shigeru.fukaya@gmail.com>
>> Date: Sat, 26 Jan 2013 14:52:59 +0900
>> 
>> `file-attributes' returns nil as stat info for a file, its name ending
>> with a character whose second byte is 0x5c.
>> 
>> It is because IS_DIRECTORY_SEP is wrongly used.
>> IS_DIRECTORY_SEP only works when its argument is surely on a start
>> byte of dbcs characters.
>> I find incorrect usage, at least in some functions of `w32.c'.
>> (readdir, stat_worker, read_unc_volume)
>
>Thanks.  I fixed w32.c functions that use IS_DIRECTORY_SEP in revision
>111200 on the emacs-24 branch.  As I cannot easily test DBCS file-name
>encodings on my system, please test the fixed version and see if the
>problem indeed goes away.  The patch relative to emacs-24 branch is
>below, for your convenience.
>
>After looking at the code which I fixed, I'm afraid I don't understand
>why file-attributes returned nil in your case.  The code that uses
>IS_DIRECTORY_SEP there is only a fallback, for when the
>GetFileInformationByHandle API fails for some reason.  You don't say
>what version of Windows you are using, but I won't expect that
>fallback code be executed for any reasonably recent Windows version.
>
>So could you please trace through the code that file-attributes calls,
>and see which use of IS_DIRECTORY_SEP caused the problem?  I'm afraid
>that it's not in w32.c at all, but elsewhere.
>
>
>=== modified file 'src/w32.c'
...





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26 16:33   ` Shigeru Fukaya
@ 2013-01-26 17:37     ` Eli Zaretskii
  2013-01-27  6:56       ` Shigeru Fukaya
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-26 17:37 UTC (permalink / raw)
  To: Shigeru Fukaya; +Cc: 13553

> From: Shigeru Fukaya <shigeru.fukaya@gmail.com>
> Cc: 13553@debbugs.gnu.org
> Date: Sun, 27 Jan 2013 01:33:32 +0900
> 
> I'll try build and check it next, anyway.

Thank you.

> Yes, my `w32-get-true-file-attributes' is nil. (Isn't it default?)

The default is 'local', which means your local files use a different
code path, one which doesn't need to use IS_DIRECTORY_SEP.

If your value was nil, then I understand why the code I fixed could
cause trouble.

> As for coding symbol, not a few Japanese use 'cp932 not 'shift-jis I 
> suppose.  But we need some notice for users.

Not sure what you are saying here.  Do you think many Japanese Windows
users will set file-name-coding-system to shift_jis, or that most of
them will set it to cp932?

Normally, users don't customize file-name-coding-system at all, in
which case Emacs will use default-file-name-coding-system, that is
automatically set to cp932, according to the system-wide codepage.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26 13:23       ` Eli Zaretskii
@ 2013-01-26 22:24         ` Stefan Monnier
  2013-01-27  7:06           ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2013-01-26 22:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13553, shigeru.fukaya

>> > You call file-attributes, which encodes the file name and passes it to
>> > 'lstat'.  The implementation of 'lstat' in w32.c then looks at the
>> > last byte of the encoded file name to see if there's a slash or
>> > backslash there.  Boom!
>> I see.   So, does that meant that w32.c can't faithfully implement lstat
>> without doing the moral equivalent of re-decoding its argument?
> It can, if we limit such support to Windows codepage encodings.  See
> the changes I made on the emacs-24 branch revisions 111194 and 111200.

CharNextExA and CharPrevExA, in my mind, do perform the moral equivalent
of decoding the argument (just in an incremental way).

> Personally, I think supporting all possible Windows codepages is
> good enough.

Fine by me,


        Stefan






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26 17:37     ` Eli Zaretskii
@ 2013-01-27  6:56       ` Shigeru Fukaya
  2013-01-27  7:15         ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Shigeru Fukaya @ 2013-01-27  6:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13553

Built on migw32, and no trouble, thank you
(I didn't do coverage tests, sorry).

But we must remember, my raise of issue is the incorrect usage of
IS_DIRECTORY_SEP.  There are still more in fileio.c and more.
(I really expect remove the check of '\' from IS_DIRECTORY_SEP, and
handle '\' only where it is necessary, and things wolud be better)

>> As for coding symbol, not a few Japanese use 'cp932 not 'shift-jis I 
>> suppose.  But we need some notice for users.
>
>Not sure what you are saying here.  Do you think many Japanese Windows
>users will set file-name-coding-system to shift_jis, or that most of
>them will set it to cp932?
>
>Normally, users don't customize file-name-coding-system at all, in
>which case Emacs will use default-file-name-coding-system, that is
>automatically set to cp932, according to the system-wide codepage.

Yes, you are right. I mean, maybe, the case of using some remote file
system.  You can check by yourself their (our?) usage if you like.
Some seems still using shift-jis, not cp932.

http://www.google.co.jp/search?q=file-name-coding-system+sjis


Regards,
Shigeru





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-26 22:24         ` Stefan Monnier
@ 2013-01-27  7:06           ` Eli Zaretskii
  2013-01-27  8:36             ` Shigeru Fukaya
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-27  7:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 13553, shigeru.fukaya

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: shigeru.fukaya@gmail.com,  13553@debbugs.gnu.org
> Date: Sat, 26 Jan 2013 17:24:37 -0500
> 
> CharNextExA and CharPrevExA, in my mind, do perform the moral equivalent
> of decoding the argument (just in an incremental way).

No, they are the equivalents of NEXT_CHAR_BOUNDARY and
PREV_CHAR_BOUNDARY, except that they can do this for many different
encodings, not just for UTF-8.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-27  6:56       ` Shigeru Fukaya
@ 2013-01-27  7:15         ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-27  7:15 UTC (permalink / raw)
  To: Shigeru Fukaya; +Cc: 13553-done

> From: Shigeru Fukaya <shigeru.fukaya@gmail.com>
> Cc: 13553@debbugs.gnu.org
> Date: Sun, 27 Jan 2013 15:56:54 +0900
> 
> Built on migw32, and no trouble, thank you
> (I didn't do coverage tests, sorry).

Thanks.  I'm therefore closing this bug.

> But we must remember, my raise of issue is the incorrect usage of
> IS_DIRECTORY_SEP.  There are still more in fileio.c and more.

Yes.  This is being discussed on emacs-devel, and once that discussion
ends, the fileio.c and dired.c functions will be fixed as well.

> >> As for coding symbol, not a few Japanese use 'cp932 not 'shift-jis I 
> >> suppose.  But we need some notice for users.
> >
> >Not sure what you are saying here.  Do you think many Japanese Windows
> >users will set file-name-coding-system to shift_jis, or that most of
> >them will set it to cp932?
> >
> >Normally, users don't customize file-name-coding-system at all, in
> >which case Emacs will use default-file-name-coding-system, that is
> >automatically set to cp932, according to the system-wide codepage.
> 
> Yes, you are right. I mean, maybe, the case of using some remote file
> system.

Accessing remote files doesn't go through functions in w32.c, it goes
through file handlers (in Tramp).

> You can check by yourself their (our?) usage if you like.
> Some seems still using shift-jis, not cp932.
> 
> http://www.google.co.jp/search?q=file-name-coding-system+sjis

If they want 100% solid support, they will have to change their
customizations, sorry.  (We could implement an equivalence table,
whereby, e.g., shift_jis would be mapped to cp932, but these encodings
are slightly different, so I think that would be a kludge.)

In any case, the new code in w32.c is no worse than the previous one,
when file-name-coding-system is set to anything that is not a
recognized Windows codepage.  It is actually slightly better: it uses
the system-wide ANSI codepage in that case.  In most cases, this would
be the right thing anyway.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-27  7:06           ` Eli Zaretskii
@ 2013-01-27  8:36             ` Shigeru Fukaya
  2013-01-27  9:40               ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Shigeru Fukaya @ 2013-01-27  8:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13553

One more, please.

CharNextExA/CharPrevExA work based on dynamically specified codepage.

And, _mbspbrk/_mbslwr are also used in w32.c.  They work based on MS
Window's locale setting.

Isn't it inconsistent?  _mbsinc/_mbsdec would simply be sufficient?
(Besides, I don't know well, _mbslwr seems to me lower latin
characters' case on some locale.  Is it true, or intended behavior?)

Thank you,
Shigeru


>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: shigeru.fukaya@gmail.com,  13553@debbugs.gnu.org
>> Date: Sat, 26 Jan 2013 17:24:37 -0500
>> 
>> CharNextExA and CharPrevExA, in my mind, do perform the moral equivalent
>> of decoding the argument (just in an incremental way).
>
>No, they are the equivalents of NEXT_CHAR_BOUNDARY and
>PREV_CHAR_BOUNDARY, except that they can do this for many different
>encodings, not just for UTF-8.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
  2013-01-27  8:36             ` Shigeru Fukaya
@ 2013-01-27  9:40               ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-27  9:40 UTC (permalink / raw)
  To: Shigeru Fukaya; +Cc: 13553

> From: Shigeru Fukaya <shigeru.fukaya@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  13553@debbugs.gnu.org
> Date: Sun, 27 Jan 2013 17:36:48 +0900
> 
> CharNextExA/CharPrevExA work based on dynamically specified codepage.
> 
> And, _mbspbrk/_mbslwr are also used in w32.c.  They work based on MS
> Window's locale setting.
> 
> Isn't it inconsistent?

It is, but I don't know of any practical way of eliminating that
inconsistency.  In the normal case, where the current
file-name-coding-system is the same as the ANSI codepage, there is no
inconsistency, so using CharNextExA/CharPrevExA is never worse than
using _mbsinc/_mbsdec.  It is sometimes better, because the code which
calls the _mbs* functions is executed only by some functions in w32.c.

> _mbsinc/_mbsdec would simply be sufficient?

That would preclude code like this:

  (let ((file-name-coding-system 'cp932))
    (expand-file-name "SOMETHING" "C:/")

I think it's reasonable to expect this to be supported, even if the
default file-name encoding is not cp932.

> (Besides, I don't know well, _mbslwr seems to me lower latin
> characters' case on some locale.  Is it true, or intended behavior?)

The intended behavior is to downcase letters for which lower-case is
defined.  I don't know if this is limited to Latin characters.





^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-01-27  9:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-26  5:52 bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP Shigeru Fukaya
2013-01-26 10:26 ` Stefan Monnier
2013-01-26 11:16   ` Eli Zaretskii
2013-01-26 12:40     ` Stefan Monnier
2013-01-26 13:23       ` Eli Zaretskii
2013-01-26 22:24         ` Stefan Monnier
2013-01-27  7:06           ` Eli Zaretskii
2013-01-27  8:36             ` Shigeru Fukaya
2013-01-27  9:40               ` Eli Zaretskii
2013-01-26 12:56 ` Eli Zaretskii
2013-01-26 14:06   ` Eli Zaretskii
2013-01-26 16:33   ` Shigeru Fukaya
2013-01-26 17:37     ` Eli Zaretskii
2013-01-27  6:56       ` Shigeru Fukaya
2013-01-27  7:15         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).