unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* expand-file-name problem for eight-bit-control/graphic
@ 2003-03-13  7:47 Kenichi Handa
  2003-03-15  6:54 ` Richard Stallman
  0 siblings, 1 reply; 6+ messages in thread
From: Kenichi Handa @ 2003-03-13  7:47 UTC (permalink / raw)


I've just found that expand-file-name sometimes converts
unibyte filename to multibyte, and multibyte filename to
unibyte.

Ex.1  unibyte->multibyte
(expand-file-name "~/\201\300") => "/home/handa/À"

Ex.2 multibyte->unibyte
(expand-file-name "~/À\200") => "/home/handa/\201\300\236\240"

The reason is that it uses make_string and build_string
blindly.  It seems that the attached patch fixes this bug,
but, as expand-file-name is one of heavily system-dependent
parts, and has lots of "#ifdef", I'd like to ask the other
poeple to confirm this patch doesn't cause any problem.

---
Ken'ichi HANDA
handa@m17n.org

2003-03-13  Kenichi Handa  <handa@etlken2>

	* fileio.c (Fexpand_file_name): Preserve multibyteness of NAME in
	the return value.

*** fileio.c.~1.474.~	Mon Feb  3 09:16:21 2003
--- fileio.c	Thu Mar 13 16:28:58 2003
***************
*** 1028,1033 ****
--- 1028,1034 ----
  #endif /* DOS_NT */
    int length;
    Lisp_Object handler;
+   int multibyte;
  
    CHECK_STRING (name);
  
***************
*** 1111,1116 ****
--- 1112,1123 ----
    name = FILE_SYSTEM_CASE (name);
  #endif
  
+   if (STRING_MULTIBYTE (default_directory))
+     default_directory = ENCODE_FILE (default_directory);
+   multibyte = STRING_MULTIBYTE (name);
+   if (multibyte)
+     name = ENCODE_FILE (name);
+ 
    nm = SDATA (name);
  
  #ifdef DOS_NT
***************
*** 1275,1281 ****
  	{
  #ifdef VMS
  	  if (index (nm, '/'))
! 	    return build_string (sys_translate_unix (nm));
  #endif /* VMS */
  #ifdef DOS_NT
  	  /* Make sure directories are all separated with / or \ as
--- 1282,1294 ----
  	{
  #ifdef VMS
  	  if (index (nm, '/'))
! 	    {
! 	      nm = sys_translate_unix (nm);
! 	      name = make_unibyte_string (nm, strlen (nm));
! 	      if (multibyte)
! 		name = DECODE_FILE (name);
! 	      return name;
! 	    }
  #endif /* VMS */
  #ifdef DOS_NT
  	  /* Make sure directories are all separated with / or \ as
***************
*** 1286,1307 ****
  	  if (IS_DIRECTORY_SEP (nm[1]))
  	    {
  	      if (strcmp (nm, SDATA (name)) != 0)
! 		name = build_string (nm);
  	    }
  	  else
  #endif
  	  /* drive must be set, so this is okay */
  	  if (strcmp (nm - 2, SDATA (name)) != 0)
  	    {
! 	      name = make_string (nm - 2, p - nm + 2);
  	      SSET (name, 0, DRIVE_LETTER (drive));
  	      SSET (name, 1, ':');
  	    }
  	  return name;
  #else /* not DOS_NT */
! 	  if (nm == SDATA (name))
! 	    return name;
! 	  return build_string (nm);
  #endif /* not DOS_NT */
  	}
      }
--- 1299,1324 ----
  	  if (IS_DIRECTORY_SEP (nm[1]))
  	    {
  	      if (strcmp (nm, SDATA (name)) != 0)
! 		name = make_unibyte_string (nm, strlen (nm));
  	    }
  	  else
  #endif
  	  /* drive must be set, so this is okay */
  	  if (strcmp (nm - 2, SDATA (name)) != 0)
  	    {
! 	      name = make_unibyte_string (nm - 2, p - nm + 2);
  	      SSET (name, 0, DRIVE_LETTER (drive));
  	      SSET (name, 1, ':');
  	    }
+ 	  if (multibyte)
+ 	    name = DECODE_FILE (name);
  	  return name;
  #else /* not DOS_NT */
! 	  if (nm != SDATA (name))
! 	    name = make_unibyte_string (nm, strlen (nm));
! 	  if (multibyte)
! 	    name = DECODE_FILE (name);
! 	  return name;
  #endif /* not DOS_NT */
  	}
      }
***************
*** 1670,1676 ****
    CORRECT_DIR_SEPS (target);
  #endif /* DOS_NT */
  
!   return make_string (target, o - target);
  }
  
  #if 0
--- 1687,1696 ----
    CORRECT_DIR_SEPS (target);
  #endif /* DOS_NT */
  
!   name = make_unibyte_string (target, o - target);
!   if (multibyte)
!     name = DECODE_FILE (name);
!   return name;
  }
  
  #if 0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: expand-file-name problem for eight-bit-control/graphic
  2003-03-13  7:47 expand-file-name problem for eight-bit-control/graphic Kenichi Handa
@ 2003-03-15  6:54 ` Richard Stallman
  2003-03-18  2:03   ` Kenichi Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Stallman @ 2003-03-15  6:54 UTC (permalink / raw)
  Cc: emacs-devel

I have a lot of doubts about this code, because it seems to encode and
then decode the file name.  Since the arguments and values are both
strings for use within Emacs, I think it is incorrect for
expand-file-name to ever encode or decode a file name.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: expand-file-name problem for eight-bit-control/graphic
  2003-03-15  6:54 ` Richard Stallman
@ 2003-03-18  2:03   ` Kenichi Handa
  2003-03-18 13:24     ` Kenichi Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Kenichi Handa @ 2003-03-18  2:03 UTC (permalink / raw)
  Cc: emacs-devel

In article <E18u5Z4-0004yd-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> I have a lot of doubts about this code, because it seems to encode and
> then decode the file name.  Since the arguments and values are both
> strings for use within Emacs, I think it is incorrect for
> expand-file-name to ever encode or decode a file name.

But, at least, we can't use build_string and make_string
blindly to reconstruct a file name.  So, how about the
attached change?

---
Ken'ichi HANDA
handa@m17n.org

*** fileio.c.~1.474.~	Tue Mar 18 09:59:02 2003
--- fileio.c	Tue Mar 18 10:52:33 2003
***************
*** 992,997 ****
--- 992,1014 ----
  
  
  \f
+ /* Return a string made from NBYTES bytes at P.  If MULTIBYTE is
+    nonzero, the string is multibyte (it is assumed that the bytes are
+    in correct multibyte form).  If MULTIBYTE is zero, the string is
+    unibyte.  */
+ 
+ static Lisp_Object
+ bytes_to_string (unsigned char *p, int nbytes, int multibyte)
+ {
+   int nchars;
+ 
+   if (! multibyte)
+     return make_unibyte_string ((char *) p, nbytes);
+   nchars = multibyte_chars_in_text (p, nbytes);
+   return make_multibyte_string ((char *) p, nchars, nbytes);
+ }
+ 
+ 
  DEFUN ("expand-file-name", Fexpand_file_name, Sexpand_file_name, 1, 2, 0,
         doc: /* Convert filename NAME to absolute, and canonicalize it.
  Second arg DEFAULT-DIRECTORY is directory to start with if NAME is relative
***************
*** 1028,1033 ****
--- 1045,1051 ----
  #endif /* DOS_NT */
    int length;
    Lisp_Object handler;
+   int multibyte;
  
    CHECK_STRING (name);
  
***************
*** 1111,1116 ****
--- 1129,1135 ----
    name = FILE_SYSTEM_CASE (name);
  #endif
  
+   multibyte = STRING_MULTIBYTE (name);
    nm = SDATA (name);
  
  #ifdef DOS_NT
***************
*** 1275,1281 ****
  	{
  #ifdef VMS
  	  if (index (nm, '/'))
! 	    return build_string (sys_translate_unix (nm));
  #endif /* VMS */
  #ifdef DOS_NT
  	  /* Make sure directories are all separated with / or \ as
--- 1294,1304 ----
  	{
  #ifdef VMS
  	  if (index (nm, '/'))
! 	    {
! 	      nm = sys_translate_unix (nm);
! 	      length = strlen (nm);
! 	      return bytes_to_string (nm, length, multibyte);
! 	    }
  #endif /* VMS */
  #ifdef DOS_NT
  	  /* Make sure directories are all separated with / or \ as
***************
*** 1286,1299 ****
  	  if (IS_DIRECTORY_SEP (nm[1]))
  	    {
  	      if (strcmp (nm, SDATA (name)) != 0)
! 		name = build_string (nm);
  	    }
  	  else
  #endif
  	  /* drive must be set, so this is okay */
  	  if (strcmp (nm - 2, SDATA (name)) != 0)
  	    {
! 	      name = make_string (nm - 2, p - nm + 2);
  	      SSET (name, 0, DRIVE_LETTER (drive));
  	      SSET (name, 1, ':');
  	    }
--- 1309,1322 ----
  	  if (IS_DIRECTORY_SEP (nm[1]))
  	    {
  	      if (strcmp (nm, SDATA (name)) != 0)
! 		name = bytes_to_string (nm, strlen (nm), multibyte);
  	    }
  	  else
  #endif
  	  /* drive must be set, so this is okay */
  	  if (strcmp (nm - 2, SDATA (name)) != 0)
  	    {
! 	      name = bytes_to_string (nm, strlen (nm), multibyte);
  	      SSET (name, 0, DRIVE_LETTER (drive));
  	      SSET (name, 1, ':');
  	    }
***************
*** 1301,1307 ****
  #else /* not DOS_NT */
  	  if (nm == SDATA (name))
  	    return name;
! 	  return build_string (nm);
  #endif /* not DOS_NT */
  	}
      }
--- 1324,1330 ----
  #else /* not DOS_NT */
  	  if (nm == SDATA (name))
  	    return name;
! 	  return bytes_to_string (nm, strlen (nm), multibyte);
  #endif /* not DOS_NT */
  	}
      }
***************
*** 1670,1676 ****
    CORRECT_DIR_SEPS (target);
  #endif /* DOS_NT */
  
!   return make_string (target, o - target);
  }
  
  #if 0
--- 1693,1699 ----
    CORRECT_DIR_SEPS (target);
  #endif /* DOS_NT */
  
!   return bytes_to_string (target, o - target, multibyte);
  }
  
  #if 0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: expand-file-name problem for eight-bit-control/graphic
  2003-03-18  2:03   ` Kenichi Handa
@ 2003-03-18 13:24     ` Kenichi Handa
  2003-03-19 13:36       ` Richard Stallman
  2003-03-20 18:49       ` Juanma Barranquero
  0 siblings, 2 replies; 6+ messages in thread
From: Kenichi Handa @ 2003-03-18 13:24 UTC (permalink / raw)
  Cc: emacs-devel

I wrote:
> But, at least, we can't use build_string and make_string
> blindly to reconstruct a file name.  So, how about the
> attached change?

I found that several other functions in fileio.c have the
same problem as expand-file-name.  They all do something
like this:
   str = SDATA (filename);
   ...
   if (STRING_MULTIBYTE (filename))
     return make_string (beg, p - beg);
   return make_unibyte_string (beg, p - beg);
and make_string will return a unibyte string if FILENAME
originally contains eight-bit-control/graphics.

Another bug is in read-file-name.  It doesn't decode
homedir.

Here's a new patch which replaces the previous one.

I think this fix is important because nowadays people more
often encounter, for instance, utf-8 filenames in latin-1
locale or vice versa.

---
Ken'ichi HANDA
handa@m17n.org

*** fileio.c.~1.474.~	Tue Mar 18 09:59:02 2003
--- fileio.c	Tue Mar 18 22:09:35 2003
***************
*** 235,240 ****
--- 235,258 ----
  			Lisp_Object *, struct coding_system *));
  static int e_write P_ ((int, Lisp_Object, int, int, struct coding_system *));
  
+ static Lisp_Object build_file_name P_ ((const unsigned char *, int, int));
+ 
+ /* Return a string made from NBYTES bytes at P.  If MULTIBYTE is
+    nonzero, the string is multibyte (it is assumed that the bytes are
+    in correct multibyte form).  If MULTIBYTE is zero, the string is
+    unibyte.  */
+ 
+ static Lisp_Object
+ build_file_name (const unsigned char *p, int nbytes, int multibyte)
+ {
+   int nchars;
+ 
+   if (! multibyte)
+     return make_unibyte_string ((char *) p, nbytes);
+   nchars = multibyte_chars_in_text (p, nbytes);
+   return make_multibyte_string ((char *) p, nchars, nbytes);
+ }
+ 
  \f
  void
  report_file_error (string, data)
***************
*** 447,455 ****
    CORRECT_DIR_SEPS (beg);
  #endif /* DOS_NT */
  
!   if (STRING_MULTIBYTE (filename))
!     return make_string (beg, p - beg);
!   return make_unibyte_string (beg, p - beg);
  }
  
  DEFUN ("file-name-nondirectory", Ffile_name_nondirectory,
--- 465,471 ----
    CORRECT_DIR_SEPS (beg);
  #endif /* DOS_NT */
  
!   return build_file_name (beg, p - beg, STRING_MULTIBYTE (filename));
  }
  
  DEFUN ("file-name-nondirectory", Ffile_name_nondirectory,
***************
*** 488,496 ****
  	 )
      p--;
  
!   if (STRING_MULTIBYTE (filename))
!     return make_string (p, end - p);
!   return make_unibyte_string (p, end - p);
  }
  
  DEFUN ("unhandled-file-name-directory", Funhandled_file_name_directory,
--- 504,510 ----
  	 )
      p--;
  
!   return build_file_name (p, end - p, STRING_MULTIBYTE (filename));
  }
  
  DEFUN ("unhandled-file-name-directory", Funhandled_file_name_directory,
***************
*** 631,637 ****
      return call2 (handler, Qfile_name_as_directory, file);
  
    buf = (char *) alloca (SBYTES (file) + 10);
!   return build_string (file_name_as_directory (buf, SDATA (file)));
  }
  \f
  /*
--- 645,652 ----
      return call2 (handler, Qfile_name_as_directory, file);
  
    buf = (char *) alloca (SBYTES (file) + 10);
!   file_name_as_directory (buf, SDATA (file));
!   return build_file_name (buf, strlen (buf), STRING_MULTIBYTE (file));
  }
  \f
  /*
***************
*** 831,837 ****
    buf = (char *) alloca (SBYTES (directory) + 20);
  #endif
    directory_file_name (SDATA (directory), buf);
!   return build_string (buf);
  }
  
  static char make_temp_name_tbl[64] =
--- 846,852 ----
    buf = (char *) alloca (SBYTES (directory) + 20);
  #endif
    directory_file_name (SDATA (directory), buf);
!   return build_file_name (buf, strlen (buf), STRING_MULTIBYTE (directory));
  }
  
  static char make_temp_name_tbl[64] =
***************
*** 1275,1281 ****
  	{
  #ifdef VMS
  	  if (index (nm, '/'))
! 	    return build_string (sys_translate_unix (nm));
  #endif /* VMS */
  #ifdef DOS_NT
  	  /* Make sure directories are all separated with / or \ as
--- 1290,1300 ----
  	{
  #ifdef VMS
  	  if (index (nm, '/'))
! 	    {
! 	      nm = sys_translate_unix (nm);
! 	      return build_file_name (nm, strlen (nm),
! 				      STRING_MULTIBYTE (name));
! 	    }
  #endif /* VMS */
  #ifdef DOS_NT
  	  /* Make sure directories are all separated with / or \ as
***************
*** 1286,1299 ****
  	  if (IS_DIRECTORY_SEP (nm[1]))
  	    {
  	      if (strcmp (nm, SDATA (name)) != 0)
! 		name = build_string (nm);
  	    }
  	  else
  #endif
  	  /* drive must be set, so this is okay */
  	  if (strcmp (nm - 2, SDATA (name)) != 0)
  	    {
! 	      name = make_string (nm - 2, p - nm + 2);
  	      SSET (name, 0, DRIVE_LETTER (drive));
  	      SSET (name, 1, ':');
  	    }
--- 1305,1320 ----
  	  if (IS_DIRECTORY_SEP (nm[1]))
  	    {
  	      if (strcmp (nm, SDATA (name)) != 0)
! 		name
! 		  = build_file_name (nm, strlen (nm), STRING_MULTIBYTE (name));
  	    }
  	  else
  #endif
  	  /* drive must be set, so this is okay */
  	  if (strcmp (nm - 2, SDATA (name)) != 0)
  	    {
! 	      name
! 		= build_file_name (nm, strlen (nm), STRING_MULTIBYTE (name));
  	      SSET (name, 0, DRIVE_LETTER (drive));
  	      SSET (name, 1, ':');
  	    }
***************
*** 1301,1307 ****
  #else /* not DOS_NT */
  	  if (nm == SDATA (name))
  	    return name;
! 	  return build_string (nm);
  #endif /* not DOS_NT */
  	}
      }
--- 1322,1328 ----
  #else /* not DOS_NT */
  	  if (nm == SDATA (name))
  	    return name;
! 	  return build_file_name (nm, strlen (nm), STRING_MULTIBYTE (name));
  #endif /* not DOS_NT */
  	}
      }
***************
*** 1670,1676 ****
    CORRECT_DIR_SEPS (target);
  #endif /* DOS_NT */
  
!   return make_string (target, o - target);
  }
  
  #if 0
--- 1691,1697 ----
    CORRECT_DIR_SEPS (target);
  #endif /* DOS_NT */
  
!   return build_file_name (target, o - target, STRING_MULTIBYTE (name));
  }
  
  #if 0
***************
*** 2101,2107 ****
      }
  
  #ifdef VMS
!   return build_string (nm);
  #else
  
    /* See if any variables are substituted into the string
--- 2122,2128 ----
      }
  
  #ifdef VMS
!   return build_file_name (nm, strlen (nm), STRING_MULTIBYTE (filename));
  #else
  
    /* See if any variables are substituted into the string
***************
*** 2244,2252 ****
        xnm = p;
  #endif
  
!   if (STRING_MULTIBYTE (filename))
!     return make_string (xnm, x - xnm);
!   return make_unibyte_string (xnm, x - xnm);
  
   badsubst:
    error ("Bad format environment-variable substitution");
--- 2265,2271 ----
        xnm = p;
  #endif
  
!   return build_file_name (xnm, x - xnm, STRING_MULTIBYTE (filename));
  
   badsubst:
    error ("Bad format environment-variable substitution");
***************
*** 6023,6028 ****
--- 6042,6048 ----
    Lisp_Object val, insdef, tem;
    struct gcpro gcpro1, gcpro2;
    register char *homedir;
+   Lisp_Object decoded_homedir;
    int replace_in_history = 0;
    int add_to_history = 0;
    int count;
***************
*** 6045,6069 ****
        CORRECT_DIR_SEPS (homedir);
      }
  #endif
    if (homedir != 0
        && STRINGP (dir)
!       && !strncmp (homedir, SDATA (dir), strlen (homedir))
!       && IS_DIRECTORY_SEP (SREF (dir, strlen (homedir))))
      {
!       dir = make_string (SDATA (dir) + strlen (homedir) - 1,
! 			 SBYTES (dir) - strlen (homedir) + 1);
!       SSET (dir, 0, '~');
      }
    /* Likewise for default_filename.  */
    if (homedir != 0
        && STRINGP (default_filename)
!       && !strncmp (homedir, SDATA (default_filename), strlen (homedir))
!       && IS_DIRECTORY_SEP (SREF (default_filename, strlen (homedir))))
      {
        default_filename
! 	= make_string (SDATA (default_filename) + strlen (homedir) - 1,
! 		       SBYTES (default_filename) - strlen (homedir) + 1);
!       SSET (default_filename, 0, '~');
      }
    if (!NILP (default_filename))
      {
--- 6065,6093 ----
        CORRECT_DIR_SEPS (homedir);
      }
  #endif
+   if (homedir != 0)
+     decoded_homedir
+       = DECODE_FILE (make_unibyte_string (homedir, strlen (homedir)));
    if (homedir != 0
        && STRINGP (dir)
!       && !strncmp (SDATA (decoded_homedir), SDATA (dir),
! 		   SBYTES (decoded_homedir))
!       && IS_DIRECTORY_SEP (SREF (dir, SBYTES (decoded_homedir))))
      {
!       dir = Fsubstring (dir, make_number (SCHARS (decoded_homedir) + 1), Qnil);
!       dir = concat2 (build_string ("~"), dir);
      }
    /* Likewise for default_filename.  */
    if (homedir != 0
        && STRINGP (default_filename)
!       && !strncmp (SDATA (decoded_homedir), SDATA (default_filename),
! 		   SBYTES (decoded_homedir))
!       && IS_DIRECTORY_SEP (SREF (default_filename, SBYTES (decoded_homedir))))
      {
        default_filename
! 	= Fsubstring (default_filename,
! 		      make_number (SCHARS (decoded_homedir) + 1), Qnil);
!       default_filename = concat2 (build_string ("~"), default_filename);
      }
    if (!NILP (default_filename))
      {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: expand-file-name problem for eight-bit-control/graphic
  2003-03-18 13:24     ` Kenichi Handa
@ 2003-03-19 13:36       ` Richard Stallman
  2003-03-20 18:49       ` Juanma Barranquero
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Stallman @ 2003-03-19 13:36 UTC (permalink / raw)
  Cc: emacs-devel

This change looks good.  But it seems that build_file_name
could be useful for other purposes, not only for file names,
so maybe it should have a different name that doesn't say it's
only for file names.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: expand-file-name problem for eight-bit-control/graphic
  2003-03-18 13:24     ` Kenichi Handa
  2003-03-19 13:36       ` Richard Stallman
@ 2003-03-20 18:49       ` Juanma Barranquero
  1 sibling, 0 replies; 6+ messages in thread
From: Juanma Barranquero @ 2003-03-20 18:49 UTC (permalink / raw)
  Cc: emacs-devel

ELISP> (expand-file-name "c:/tmp")
"c:/tmp"
ELISP> (expand-file-name "c://tmp")
"c:mp"
ELISP> (expand-file-name "c:\\tmp")
"c:mp"


                                                           /L/e/k/t/u

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-03-20 18:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-13  7:47 expand-file-name problem for eight-bit-control/graphic Kenichi Handa
2003-03-15  6:54 ` Richard Stallman
2003-03-18  2:03   ` Kenichi Handa
2003-03-18 13:24     ` Kenichi Handa
2003-03-19 13:36       ` Richard Stallman
2003-03-20 18:49       ` Juanma Barranquero

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).