* bug#15260: cannot build in a directory with non-ascii characters @ 2013-09-03 17:46 Glenn Morris 2013-10-23 20:48 ` Glenn Morris 0 siblings, 1 reply; 50+ messages in thread From: Glenn Morris @ 2013-09-03 17:46 UTC (permalink / raw) To: 15260 Package: emacs Severity: important Version: 24.3 It seems Emacs (still) cannot be built in a directory whose name contains non-ascii characters. Ref: http://lists.gnu.org/archive/html/help-gnu-emacs/2013-09/msg00033.html If it cannot be made to work, configure should abort with an error in such cases. I have some vague memory that it also might not work with spaces in the names, but did not test. Similar restrictions may apply to the install --prefix as well. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-09-03 17:46 bug#15260: cannot build in a directory with non-ascii characters Glenn Morris @ 2013-10-23 20:48 ` Glenn Morris 2013-10-24 18:25 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Glenn Morris @ 2013-10-23 20:48 UTC (permalink / raw) To: 15260 Glenn Morris wrote: > If it cannot be made to work, configure should abort with an error in > such cases. [non-ascii directories] Done. Leaving this open as a wishlist to make it work. > I have some vague memory that it also might not work with spaces in the > names, but did not test. This works now - http://debbugs.gnu.org/15675 . ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-23 20:48 ` Glenn Morris @ 2013-10-24 18:25 ` Eli Zaretskii 2013-10-24 18:35 ` Glenn Morris 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-24 18:25 UTC (permalink / raw) To: Glenn Morris; +Cc: 15260 > From: Glenn Morris <rgm@gnu.org> > Date: Wed, 23 Oct 2013 16:48:42 -0400 > > Glenn Morris wrote: > > > If it cannot be made to work, configure should abort with an error in > > such cases. [non-ascii directories] > > Done. Leaving this open as a wishlist to make it work. dnl configure sets LC_ALL=C early on, so this range should work. case "$var" in *[[^\ -~]]*) AC_MSG_ERROR([Emacs cannot be built or installed in a directory whose name contains non-ASCII characters: $var]) ;; esac This is quite drastic. Do we understand what is the underlying technical reason for the build failures? The bug reports didn't give any explanations, only the fact that moving to a pure-ASCII directory fixed the problem. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-24 18:25 ` Eli Zaretskii @ 2013-10-24 18:35 ` Glenn Morris 2013-10-25 14:25 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Glenn Morris @ 2013-10-24 18:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 Eli Zaretskii wrote: > case "$var" in > *[[^\ -~]]*) AC_MSG_ERROR([Emacs cannot be built or installed in a directory whose name contains non-ASCII characters: $var]) ;; > esac > > This is quite drastic. I don't think so. The alternative is a cryptic failure during the build stage. > Do we understand what is the underlying technical reason for the > build failures? Something to do with failure to find files, just as it was 6 years ago. http://lists.gnu.org/archive/html/emacs-devel/2007-05/msg00984.html The immediate problem for me is a dump failure: Finding pointers to doc strings... Finding pointers to doc strings...done Dumping under the name emacs emacs: Can't open /path/to/non-ascii/src/temacs for reading: No such file or directory make[1]: *** [bootstrap-emacs] Error 1 Why not make a non-ASCII directory and try it yourself... ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-24 18:35 ` Glenn Morris @ 2013-10-25 14:25 ` Eli Zaretskii 2013-10-25 17:08 ` Glenn Morris 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-25 14:25 UTC (permalink / raw) To: Glenn Morris; +Cc: 15260 > From: Glenn Morris <rgm@gnu.org> > Cc: 15260@debbugs.gnu.org > Date: Thu, 24 Oct 2013 14:35:15 -0400 > > Eli Zaretskii wrote: > > > case "$var" in > > *[[^\ -~]]*) AC_MSG_ERROR([Emacs cannot be built or installed in a directory whose name contains non-ASCII characters: $var]) ;; > > esac > > > > This is quite drastic. > > I don't think so. The alternative is a cryptic failure during the build stage. > > > Do we understand what is the underlying technical reason for the > > build failures? > > Something to do with failure to find files, just as it was 6 years ago. > http://lists.gnu.org/archive/html/emacs-devel/2007-05/msg00984.html > > The immediate problem for me is a dump failure: > > Finding pointers to doc strings... > Finding pointers to doc strings...done > Dumping under the name emacs > emacs: Can't open /path/to/non-ascii/src/temacs for reading: No such file > or directory > make[1]: *** [bootstrap-emacs] Error 1 Does the change below help? > Why not make a non-ASCII directory and try it yourself... It requires too much setup on my part (this cannot be simulated on Windows without too much hassle). But I will do that if there's no easier way. I just thought that some analysis has been done already. === modified file 'src/emacs.c' --- src/emacs.c 2013-10-20 16:47:42 +0000 +++ src/emacs.c 2013-10-25 14:21:47 +0000 @@ -2044,11 +2044,15 @@ You must run Emacs in batch mode in orde CHECK_STRING (filename); filename = Fexpand_file_name (filename, Qnil); + filename = ENCODE_FILE (filename); if (!NILP (symfile)) { CHECK_STRING (symfile); if (SCHARS (symfile)) - symfile = Fexpand_file_name (symfile, Qnil); + { + symfile = Fexpand_file_name (symfile, Qnil); + symfile = ENCODE_FILE (symfile); + } } tem = Vpurify_flag; ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-25 14:25 ` Eli Zaretskii @ 2013-10-25 17:08 ` Glenn Morris 2013-10-25 18:31 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Glenn Morris @ 2013-10-25 17:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 Eli Zaretskii wrote: > It requires too much setup on my part (this cannot be simulated on > Windows without too much hassle). Sorry, I assumed you could build on fencepost. I'm expecting multiple points of failure, so this might not be an efficient process... The first time, I was trying an out-of-tree build in a non-ascii build directory, but still with ascii srcdir. Using an in-place build in a non-ascii directory fails to even start temacs (this is with or without your patch): Warning: arch-independent data dir `/tmp/EMACS/share/emacs/24.3.50/etc/': No such file or directory Error: charsets directory not found: /tmp/EMACS/share/emacs/24.3.50/etc/charsets Emacs will not function correctly without the character map files. Please check your installation! make[1]: *** [bootstrap-emacs] Error 1 /tmp/EMACS was my install --prefix. It's not supposed to exist until after installation, but the code that tries to find etc/ is presumably mistakenly concluding that srcdir/etc does not exist and that it must be running installed. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-25 17:08 ` Glenn Morris @ 2013-10-25 18:31 ` Eli Zaretskii 2013-10-25 18:40 ` Glenn Morris 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-25 18:31 UTC (permalink / raw) To: Glenn Morris; +Cc: 15260 > From: Glenn Morris <rgm@gnu.org> > Cc: 15260@debbugs.gnu.org > Date: Fri, 25 Oct 2013 13:08:08 -0400 > > Eli Zaretskii wrote: > > > It requires too much setup on my part (this cannot be simulated on > > Windows without too much hassle). > > Sorry, I assumed you could build on fencepost. That's the backup plan, yes. > I'm expecting multiple > points of failure, so this might not be an efficient process... I don't think we should do it this way. I was just asking about the current state of knowledge. I presume that the changes I suggested didn't help? (They are TRT anyway, so I will install them regardless.) > The first time, I was trying an out-of-tree build in a non-ascii build > directory, but still with ascii srcdir. Using an in-place build in a > non-ascii directory fails to even start temacs (this is with or without > your patch): > > Warning: arch-independent data dir > `/tmp/EMACS/share/emacs/24.3.50/etc/': No such file or directory > Error: charsets directory not found: > /tmp/EMACS/share/emacs/24.3.50/etc/charsets > Emacs will not function correctly without the character map files. > Please check your installation! > make[1]: *** [bootstrap-emacs] Error 1 > > /tmp/EMACS was my install --prefix. It's not supposed to exist until > after installation, but the code that tries to find etc/ is presumably > mistakenly concluding that srcdir/etc does not exist and that it must be > running installed. So in the above, /tmp/EMACS/share/emacs/24.3.50/etc/ is pure-ASCII, and the non-ASCII directory is in the source tree, is that right? ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-25 18:31 ` Eli Zaretskii @ 2013-10-25 18:40 ` Glenn Morris 2013-10-25 18:46 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Glenn Morris @ 2013-10-25 18:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 Eli Zaretskii wrote: > I presume that the changes I suggested didn't help? (They are TRT > anyway, so I will install them regardless.) It helps for "ascii srcdir, non-ascii builddir", but there are still problems later on, again related to Emacs mistakenly believing that certain directories do not exist, when they do (Warning: arch-dependent data dir `...' No such file or directory; etc). The "non-ascii srcdir == builddir" case fails even earlier, due to not finding etc. > So in the above, /tmp/EMACS/share/emacs/24.3.50/etc/ is pure-ASCII, > and the non-ASCII directory is in the source tree, is that right? Yes. I literally did (in a non-ascii) directory: ./configure --prefix=/tmp/EMACS ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-25 18:40 ` Glenn Morris @ 2013-10-25 18:46 ` Eli Zaretskii 2013-10-25 19:27 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-25 18:46 UTC (permalink / raw) To: Glenn Morris; +Cc: 15260 > From: Glenn Morris <rgm@gnu.org> > Cc: 15260@debbugs.gnu.org > Date: Fri, 25 Oct 2013 14:40:46 -0400 > > Eli Zaretskii wrote: > > > I presume that the changes I suggested didn't help? (They are TRT > > anyway, so I will install them regardless.) > > It helps for "ascii srcdir, non-ascii builddir" Good, so one down, N - 1 to go ;-) > but there are still > problems later on, again related to Emacs mistakenly believing that > certain directories do not exist, when they do (Warning: arch-dependent > data dir `...' No such file or directory; etc). > > The "non-ascii srcdir == builddir" case fails even earlier, due to not > finding etc. OK, I will take a closer look. Thanks for the info. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-25 18:46 ` Eli Zaretskii @ 2013-10-25 19:27 ` Eli Zaretskii 2013-10-26 7:50 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-25 19:27 UTC (permalink / raw) To: rgm; +Cc: 15260 > Date: Fri, 25 Oct 2013 21:46:52 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15260@debbugs.gnu.org > > > but there are still > > problems later on, again related to Emacs mistakenly believing that > > certain directories do not exist, when they do (Warning: arch-dependent > > data dir `...' No such file or directory; etc). > > > > The "non-ascii srcdir == builddir" case fails even earlier, due to not > > finding etc. > > OK, I will take a closer look. Thanks for the info. I think I see the problem. All those PATH_* variables that come from epaths.h yield encoded file names (because they were written by the shell). But we never decode them before using them in init_callproc and init_callproc_1. Similar things happen with decode_env_path: it calls 'getenv', but never decodes the values it gets from that. I will take a crack on fixing these. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-25 19:27 ` Eli Zaretskii @ 2013-10-26 7:50 ` Eli Zaretskii 2013-10-26 19:15 ` Glenn Morris 2013-10-27 4:28 ` Stefan Monnier 0 siblings, 2 replies; 50+ messages in thread From: Eli Zaretskii @ 2013-10-26 7:50 UTC (permalink / raw) To: rgm, Stefan Monnier, Kenichi Handa; +Cc: 15260 > Date: Fri, 25 Oct 2013 22:27:19 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15260@debbugs.gnu.org > > > Date: Fri, 25 Oct 2013 21:46:52 +0300 > > From: Eli Zaretskii <eliz@gnu.org> > > Cc: 15260@debbugs.gnu.org > > > > > but there are still > > > problems later on, again related to Emacs mistakenly believing that > > > certain directories do not exist, when they do (Warning: arch-dependent > > > data dir `...' No such file or directory; etc). > > > > > > The "non-ascii srcdir == builddir" case fails even earlier, due to not > > > finding etc. > > > > OK, I will take a closer look. Thanks for the info. > > I think I see the problem. All those PATH_* variables that come from > epaths.h yield encoded file names (because they were written by the > shell). But we never decode them before using them in init_callproc > and init_callproc_1. Similar things happen with decode_env_path: it > calls 'getenv', but never decodes the values it gets from that. > > I will take a crack on fixing these. We definitely need to decode file names in init_callproc_1, init_callproc, and init_lread. But here's where things get hairy: when temacs starts, preloaded Lisp files are not yet loaded, and consequently file-name-coding-system and default-file-name-coding-system are both nil. In such a case, currently DECODE_FILE is a no-op. So we need some way of getting temacs to know what coding-system to use to decode file names during its initialization phase, without relying on the database we have in locale-language-names. This probably calls for a separate variable, init-file-name-coding-system, say. But how to assign a correct value to it? I understand that most Posix systems nowadays use UTF-8 for file names, so I guess we can fall back on that. On MS-Windows, there's a system call that returns the necessary information, so there's no problem for MS-Windows. The question is what to do for Posix systems that don't use UTF-8? I see 2 possibilities: . Try to parse the value of LANG with some shell or Sed script, and come up with a suitable value. . Ask the user to specify the encoding as a switch to the configure script. In both cases, communicate the value to temacs via --eval on its command line. Comments and opinions are welcome. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-26 7:50 ` Eli Zaretskii @ 2013-10-26 19:15 ` Glenn Morris 2013-10-26 20:04 ` Eli Zaretskii 2013-10-27 4:28 ` Stefan Monnier 1 sibling, 1 reply; 50+ messages in thread From: Glenn Morris @ 2013-10-26 19:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 Eli Zaretskii wrote: > Comments and opinions are welcome. Sounds like a fair bit of work, for something that doesn't seem very important. If my testing was correct, the problem only occurs during building, not after Emacs is installed (does that tally with what you found?). And I can't see any reason why anyone _needs_ to build Emacs in a directory with non-ASCII chars. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-26 19:15 ` Glenn Morris @ 2013-10-26 20:04 ` Eli Zaretskii 2013-10-27 3:56 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-26 20:04 UTC (permalink / raw) To: Glenn Morris; +Cc: 15260 > From: Glenn Morris <rgm@gnu.org> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Kenichi Handa <handa@gnu.org>, 15260@debbugs.gnu.org > Date: Sat, 26 Oct 2013 15:15:06 -0400 > > Sounds like a fair bit of work, for something that doesn't seem very > important. It might be important for people who build Emacs on non-English language systems. > If my testing was correct, the problem only occurs during > building, not after Emacs is installed (does that tally with what you > found?). It definitely happens when building. I didn't look deep enough to see what happens once Emacs is installed. The code is definitely wrong. > And I can't see any reason why anyone _needs_ to build Emacs in > a directory with non-ASCII chars. It might be a natural thing in some quarters. E.g., Emacs sources might be a subdirectory of some parent directory with a non-ASCII name where many other packages are built. Anyway, if the project thinks it's not important enough, I have better things to do. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-26 20:04 ` Eli Zaretskii @ 2013-10-27 3:56 ` Eli Zaretskii 2013-10-27 16:19 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-27 3:56 UTC (permalink / raw) To: rgm; +Cc: 15260 > Date: Sat, 26 Oct 2013 23:04:49 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15260@debbugs.gnu.org > > > If my testing was correct, the problem only occurs during > > building, not after Emacs is installed (does that tally with what you > > found?). > > It definitely happens when building. I didn't look deep enough to see > what happens once Emacs is installed. The code is definitely wrong. Btw, are you sure the installed Emacs doesn't find the files under the source tree? Did you try to remove or rename it after installing? ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-27 3:56 ` Eli Zaretskii @ 2013-10-27 16:19 ` Eli Zaretskii 2013-10-27 19:02 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-27 16:19 UTC (permalink / raw) To: rgm; +Cc: 15260 > Date: Sun, 27 Oct 2013 05:56:44 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15260@debbugs.gnu.org > > > Date: Sat, 26 Oct 2013 23:04:49 +0300 > > From: Eli Zaretskii <eliz@gnu.org> > > Cc: 15260@debbugs.gnu.org > > > > > If my testing was correct, the problem only occurs during > > > building, not after Emacs is installed (does that tally with what you > > > found?). > > > > It definitely happens when building. I didn't look deep enough to see > > what happens once Emacs is installed. The code is definitely wrong. > > Btw, are you sure the installed Emacs doesn't find the files under the > source tree? Did you try to remove or rename it after installing? Further testing indicates that it indeed works to install in a non-ASCII directory after building. But it only barely works, at least in my testing: the various files and directories in doc-directory, load-path, etc. are unibyte strings, so using them only works if they are passed to file primitives. If you try to invoke a program with one of these values as a command-line argument, the program will fail (unless your locale encoding is identical to file-name encoding). And even using the unibyte strings in conjunction with files is fragile, as, for example, 'equal' will not compare unibyte and multibyte strings of the same bytes as equal. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-27 16:19 ` Eli Zaretskii @ 2013-10-27 19:02 ` Eli Zaretskii 2013-10-27 19:43 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-27 19:02 UTC (permalink / raw) To: rgm, Stefan Monnier, Kenichi Handa; +Cc: 15260 The first few problems that pop up when building from the source tree whose parent has a non-ASCII name are solved by the changes below. I'm not very fond of these changes, especially the last one: it all looks very fragile and ad-hoc, and that's still on a system with a UTF-8 locale, where things should be relatively easy. After applying these changes, temacs comes up and dumps itself, but fails to find simple.el and bytecomp.el when it proceeds to compiling Lisp files. I guess now load-path is the culprit. Stay tuned. === modified file 'lisp/loadup.el' --- lisp/loadup.el 2013-10-08 15:11:29 +0000 +++ lisp/loadup.el 2013-10-27 18:26:12 +0000 @@ -150,7 +150,9 @@ (load "epa-hook") ;; Any Emacs Lisp source file (*.el) loaded here after can contain ;; multilingual text. -(load "international/mule-cmds") +(let ((dfn-coding default-file-name-coding-system)) + (load "international/mule-cmds") + (setq default-file-name-coding-system dfn-coding)) (load "case-table") ;; This file doesn't exist when building a development version of Emacs ;; from the repository. It is generated just after temacs is built. @@ -163,7 +165,9 @@ (load "language/cyrillic") (load "language/indian") (load "language/sinhala") -(load "language/english") +(let ((dfn-coding default-file-name-coding-system)) + (load "language/english") + (setq default-file-name-coding-system dfn-coding)) (load "language/ethiopic") (load "language/european") (load "language/czech") === modified file 'src/emacs.c' --- src/emacs.c 2013-10-26 13:43:58 +0000 +++ src/emacs.c 2013-10-27 18:48:51 +0000 @@ -2044,14 +2044,22 @@ You must run Emacs in batch mode in orde CHECK_STRING (filename); filename = Fexpand_file_name (filename, Qnil); - filename = ENCODE_FILE (filename); + if (NILP (Vfile_name_coding_system) + && NILP (Vdefault_file_name_coding_system)) + filename = Fstring_to_unibyte (filename); + else + filename = ENCODE_FILE (filename); if (!NILP (symfile)) { CHECK_STRING (symfile); if (SCHARS (symfile)) { symfile = Fexpand_file_name (symfile, Qnil); - symfile = ENCODE_FILE (symfile); + if (NILP (Vfile_name_coding_system) + && NILP (Vdefault_file_name_coding_system)) + symfile = Fstring_to_unibyte (symfile); + else + symfile = ENCODE_FILE (symfile); } } ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-27 19:02 ` Eli Zaretskii @ 2013-10-27 19:43 ` Eli Zaretskii 0 siblings, 0 replies; 50+ messages in thread From: Eli Zaretskii @ 2013-10-27 19:43 UTC (permalink / raw) To: rgm, monnier, handa; +Cc: 15260 > Date: Sun, 27 Oct 2013 21:02:51 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15260@debbugs.gnu.org > > After applying these changes, temacs comes up and dumps itself, but > fails to find simple.el and bytecomp.el when it proceeds to compiling > Lisp files. ^^^^^^^^^^^^^^^^ Instead of "it" I should have written "bootstrap-emacs". Sorry. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-26 7:50 ` Eli Zaretskii 2013-10-26 19:15 ` Glenn Morris @ 2013-10-27 4:28 ` Stefan Monnier 2013-10-27 16:11 ` Eli Zaretskii 1 sibling, 1 reply; 50+ messages in thread From: Stefan Monnier @ 2013-10-27 4:28 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 > But here's where things get hairy: when temacs starts, preloaded Lisp > files are not yet loaded, and consequently file-name-coding-system and > default-file-name-coding-system are both nil. In such a case, > currently DECODE_FILE is a no-op. I don't understand why it wouldn't work to just treat those strings as "binary" (i.e. keep them undecoded in unibyte strings). Then encoding would be a noop and that should hence end up in the right byte-sequence sent to the OS primitives. Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-27 4:28 ` Stefan Monnier @ 2013-10-27 16:11 ` Eli Zaretskii 2013-10-28 0:30 ` Stefan Monnier 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-27 16:11 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: rgm@gnu.org, Kenichi Handa <handa@gnu.org>, 15260@debbugs.gnu.org > Date: Sun, 27 Oct 2013 00:28:36 -0400 > > I don't understand why it wouldn't work to just treat those strings as > "binary" (i.e. keep them undecoded in unibyte strings). Then encoding > would be a noop and that should hence end up in the right byte-sequence > sent to the OS primitives. Not sure I'm following you here. I presume you aren't asking why we generally hold file names in decoded form inside Emacs, nor suggesting that we switch to storing them as undecoded unibyte strings. So I guess you are asking why the particular piece of code being discussed here couldn't keep file names as unibyte strings, is that your question? If so, then the answer is "it could, but that would be even more hair." The problem is that the code involved in this (specifically, init_callproc_1, init_callproc, and probably also init_cmdargs and init_lread) is not something specifically written to stat the directories from epaths.h and announce their non-existence. That code populates important variables with names of files and directories and lists of directories that are henceforth used in Emacs all over the place. Notable examples are data-directory, doc-directory, exec-path, and load-path. Without populating these variables, temacs will not work, and the code which uses these variables assumes their values are decoded strings. The error messages are a by-product: as Emacs computes the values of these variables, it checks the files and directories for existence, and complains if they don't. The root cause is that unibyte strings get stored in variables used by Emacs on the assumption that they are decoded. Given the above, I'm not sure exactly what you are suggesting in practical terms. Can you elaborate? ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-27 16:11 ` Eli Zaretskii @ 2013-10-28 0:30 ` Stefan Monnier 2013-10-28 3:39 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Stefan Monnier @ 2013-10-28 0:30 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 > So I guess you are asking why the particular piece of code being > discussed here couldn't keep file names as unibyte strings, is that > your question? IIUC the issue is how to encode when we don't yet have the coding-systems loaded/setup. But it seems if we can't encode, then we can't decode either, so we should just fallback on using unibyte strings (which shouldn't be encoded on the way back to the OS) for those file names we create/manipulate before coding-systems are available. Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-28 0:30 ` Stefan Monnier @ 2013-10-28 3:39 ` Eli Zaretskii 2013-10-28 4:05 ` Stefan Monnier 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-28 3:39 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: rgm@gnu.org, handa@gnu.org, 15260@debbugs.gnu.org > Date: Sun, 27 Oct 2013 20:30:32 -0400 > > > So I guess you are asking why the particular piece of code being > > discussed here couldn't keep file names as unibyte strings, is that > > your question? > > IIUC the issue is how to encode when we don't yet have the > coding-systems loaded/setup. But it seems if we can't encode, then we > can't decode either, so we should just fallback on using unibyte strings > (which shouldn't be encoded on the way back to the OS) for those file > names we create/manipulate before coding-systems are available. As I explained, this would be even more hair than what I proposed, because you are talking about core Emacs data structures and variables that are involved in every file-related op. On top of that, using unibyte strings is inherently fragile in Emacs, as the code is not written to support them too well, as you well know. We always advise users to stay away of unibyte strings, and for a good reason, so doing this ourselves sounds unwise. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-28 3:39 ` Eli Zaretskii @ 2013-10-28 4:05 ` Stefan Monnier 2013-10-28 16:47 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Stefan Monnier @ 2013-10-28 4:05 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 > As I explained, this would be even more hair than what I proposed, > because you are talking about core Emacs data structures and variables > that are involved in every file-related op. > On top of that, using unibyte strings is inherently fragile in Emacs, > as the code is not written to support them too well, as you well > know. We always advise users to stay away of unibyte strings, and for > a good reason, so doing this ourselves sounds unwise. I know, but I'm not sure why it doesn't "just work". More specifically, for the bug to appear, you need ENCODE (DECODE (s)) to not be the identity function. Why is not so in the "early" Emacs? Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-28 4:05 ` Stefan Monnier @ 2013-10-28 16:47 ` Eli Zaretskii 2013-10-28 18:33 ` Eli Zaretskii 2013-10-31 21:45 ` Glenn Morris 0 siblings, 2 replies; 50+ messages in thread From: Eli Zaretskii @ 2013-10-28 16:47 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: rgm@gnu.org, handa@gnu.org, 15260@debbugs.gnu.org > Date: Mon, 28 Oct 2013 00:05:32 -0400 > > More specifically, for the bug to appear, you need ENCODE (DECODE (s)) > to not be the identity function. Why is not so in the "early" Emacs? Because life's a mess that doesn't easily fit into simple and elegant schemes ;-) For starters, we don't really DECODE_FILE with these file- and directory-names. We just use build_string or make_string, as you can easily see in the init_* functions I mentioned. If you are lucky and your file names are UTF-8 encoded, this produces the same result as DECODE_FILE. If you are less lucky, and your file names are encoded in something else, like Latin-N, you get a unibyte string with the same bytes as in the original. Then we pass these strings to various functions, like file_accessible_directory_p, that _do_ ENCODE_FILE... (Luckily, during most of temacs's run, both file-name-coding-system and its default value are nil, so ENCODE_FILE is a no-op -- except when they aren't, see the next paragraph.) Next, it is quite possible that the file-name-coding-system changes between the time we process and store the file name and the time we encode and pass it to a low-level function. This is especially true during "loadup", when many packages are loaded and their top-level forms are executed. It turns out that 2 of them have side effects that do just that: mule-cmds.el calls reset-language-environment, and language/english.el calls set-language-info-alist; both have the effect of resetting default-file-name-coding-system to latin-1 (!? an interesting "default" for a Unicode-era Emacs, perhaps Handa-san could comment why we still do that). When this happens, your symmetry is broken, and ENCODE_FILE (DECODE_FILE (f)) is no longer the identity function. And then there are other players in this game. For example, default-directory, which is used every time we call expand-file-name, IOW "a lot". If you look in init_buffer, you will see that the default-directory of *scratch* is first set to a multibyte representation of the unibyte string we get from getcwd. In a "normal" Emacs session, we promptly fix that in startup.el, after the call to set-locale-environment initializes all the coding-systems. But "temacs -l loadup dump" doesn't run startup.el, so we are left with what init_buffer did, which is a string no file-name API will be able to grok. Another example is the use of 'equal' (and 'member', which calls 'equal') to compare file and directory names, and look them up in lists: as you know, 'equal' will not compare a unibyte and a multibyte string as equal. So having a mix of unibyte and multibyte strings in file names fails some of the code that relies on 'equal', tricking it into doing wrong things, like deciding that Emacs is _not_ run from the source tree. I'm sure there's more to this saga, I'm just half-way through it... ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-28 16:47 ` Eli Zaretskii @ 2013-10-28 18:33 ` Eli Zaretskii 2013-10-28 22:00 ` Glenn Morris 2013-10-29 1:35 ` Stefan Monnier 2013-10-31 21:45 ` Glenn Morris 1 sibling, 2 replies; 50+ messages in thread From: Eli Zaretskii @ 2013-10-28 18:33 UTC (permalink / raw) To: monnier, Kenichi Handa; +Cc: 15260 > Date: Mon, 28 Oct 2013 18:47:32 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15260@debbugs.gnu.org > > I'm sure there's more to this saga, I'm just half-way through it... The next round is here: # The actual Emacs command run in the targets below. emacs = EMACSLOADPATH="$(abs_lisp)" LC_ALL=C "$(EMACS)" $(EMACSOPT) ^^^^^^^^ Does anyone know or remember why we set LC_ALL=C while running commands in lisp/ (and the same in leim/)? The following log entry in lisp/ChangeLog.13 is the only clue: 2008-02-01 Kenichi Handa <handa@etl.go.jp> * Makefile.in: Be sure to run emacs with LC_ALL=C. But there's no explanation as to why this is needed. What this does is prevent bootstrap-emacs from finding Lisp files, because LC_ALL=C implies -- you guessed it -- file-name encoding by Latin-1, whereas the file names are really encoded in UTF-8 on this system: cd ../lisp; make -w compile-first EMACS="/home/e/eliz/bzr/emacs/xáéçö/src/bootstrap-emacs" make[2]: Entering directory `/srv/data/home/e/eliz/bzr/emacs/xáéçö/lisp' Compiling emacs-lisp/macroexp.el Warning: Could not find simple.el or simple.elc The EMACSLOADPATH environment variable is set, please check its value Lisp directory /home/e/eliz/bzr/emacs/x<E1><E9><E7><F6>/lisp not readable? If I remove the LC_ALL=C setting from lisp/Makefile.in and leim/Makefile.in, I get past this problem (to the next one ;-). So: any reasons not to remove this setting from lisp/Makefile.in and leim/Makefile.in? ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-28 18:33 ` Eli Zaretskii @ 2013-10-28 22:00 ` Glenn Morris 2013-10-29 3:42 ` Eli Zaretskii 2013-10-29 1:35 ` Stefan Monnier 1 sibling, 1 reply; 50+ messages in thread From: Glenn Morris @ 2013-10-28 22:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 Eli Zaretskii wrote: > emacs = EMACSLOADPATH="$(abs_lisp)" LC_ALL=C "$(EMACS)" $(EMACSOPT) > ^^^^^^^^ > > Does anyone know or remember why we set LC_ALL=C while running > commands in lisp/ (and the same in leim/)? FWIW, if I change that to use LC_ALL=en_US.UTF8 and bootstrap (after also fixing cl--gensym-counter to a non-random default), all the resulting *.elc files are identical to the LC=ALL=C case. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-28 22:00 ` Glenn Morris @ 2013-10-29 3:42 ` Eli Zaretskii 0 siblings, 0 replies; 50+ messages in thread From: Eli Zaretskii @ 2013-10-29 3:42 UTC (permalink / raw) To: Glenn Morris; +Cc: 15260 > From: Glenn Morris <rgm@gnu.org> > Cc: monnier@iro.umontreal.ca, Kenichi Handa <handa@gnu.org>, 15260@debbugs.gnu.org > Date: Mon, 28 Oct 2013 18:00:25 -0400 > > Eli Zaretskii wrote: > > > emacs = EMACSLOADPATH="$(abs_lisp)" LC_ALL=C "$(EMACS)" $(EMACSOPT) > > ^^^^^^^^ > > > > Does anyone know or remember why we set LC_ALL=C while running > > commands in lisp/ (and the same in leim/)? > > FWIW, if I change that to use LC_ALL=en_US.UTF8 and bootstrap (after > also fixing cl--gensym-counter to a non-random default), all the > resulting *.elc files are identical to the LC=ALL=C case. Thanks. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-28 18:33 ` Eli Zaretskii 2013-10-28 22:00 ` Glenn Morris @ 2013-10-29 1:35 ` Stefan Monnier 2013-10-29 3:47 ` Eli Zaretskii 2013-11-01 13:58 ` Kenichi Handa 1 sibling, 2 replies; 50+ messages in thread From: Stefan Monnier @ 2013-10-29 1:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 > emacs = EMACSLOADPATH="$(abs_lisp)" LC_ALL=C "$(EMACS)" $(EMACSOPT) > ^^^^^^^^ > Does anyone know or remember why we set LC_ALL=C while running > commands in lisp/ (and the same in leim/)? IIRC the issue was to avoid things like misdetecting coding-systems because of the user's locale setting, in the files we load/compile. IOW, it was to work around bugs (e.g. missing coding: cookie) and is likely unneeded nowadays. Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-29 1:35 ` Stefan Monnier @ 2013-10-29 3:47 ` Eli Zaretskii 2013-10-29 13:56 ` Stefan Monnier 2013-11-01 13:58 ` Kenichi Handa 1 sibling, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-29 3:47 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Kenichi Handa <handa@gnu.org>, 15260@debbugs.gnu.org > Date: Mon, 28 Oct 2013 21:35:00 -0400 > > > emacs = EMACSLOADPATH="$(abs_lisp)" LC_ALL=C "$(EMACS)" $(EMACSOPT) > > ^^^^^^^^ > > > Does anyone know or remember why we set LC_ALL=C while running > > commands in lisp/ (and the same in leim/)? > > IIRC the issue was to avoid things like misdetecting coding-systems > because of the user's locale setting, in the files we load/compile. Makes sense. > IOW, it was to work around bugs (e.g. missing coding: cookie) and is > likely unneeded nowadays. Right, so the way should be clear to remove these. Thanks. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-29 3:47 ` Eli Zaretskii @ 2013-10-29 13:56 ` Stefan Monnier 2013-10-30 18:19 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Stefan Monnier @ 2013-10-29 13:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 >> IOW, it was to work around bugs (e.g. missing coding: cookie) and is >> likely unneeded nowadays. > Right, so the way should be clear to remove these. There may still be problems laying dormant, but we should be able to fix them if/when they show up, Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-29 13:56 ` Stefan Monnier @ 2013-10-30 18:19 ` Eli Zaretskii 2013-10-31 1:01 ` Stefan Monnier 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-30 18:19 UTC (permalink / raw) To: Kenichi Handa; +Cc: 15260 I bumped into this as part of digging into this bug report: there's a strange inconsistency between make_string and string_to_multibyte (or maybe it's just that the "multibyte" part of the name is overloaded). Specifically, if you pass a unibyte string through string_to_multibyte, it will produce a multibyte string, as expected. But if SDATA of the resulting multibyte string, or any of its derivatives, is passed to make_string, the latter will decide that it must make a unibyte string! This is because parse_str_as_multibyte, called internally by make_string, considers the multibyte representation of 8-bit bytes as a sign that the string produced from these bytes must be unibyte. Why do we have this confusing inconsistency? ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-30 18:19 ` Eli Zaretskii @ 2013-10-31 1:01 ` Stefan Monnier 2013-10-31 3:47 ` Eli Zaretskii 2013-10-31 17:16 ` Eli Zaretskii 0 siblings, 2 replies; 50+ messages in thread From: Stefan Monnier @ 2013-10-31 1:01 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 > Why do we have this confusing inconsistency? make_string is a bug. There's no way to know/guess if the string should be unibyte or multibyte. So, it should be removed and replaced by calls to either make_unibyte_string or make_multibyte_string. Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 1:01 ` Stefan Monnier @ 2013-10-31 3:47 ` Eli Zaretskii 2013-10-31 13:40 ` Stefan Monnier 2013-10-31 17:59 ` Eli Zaretskii 2013-10-31 17:16 ` Eli Zaretskii 1 sibling, 2 replies; 50+ messages in thread From: Eli Zaretskii @ 2013-10-31 3:47 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Kenichi Handa <handa@gnu.org>, 15260@debbugs.gnu.org > Date: Wed, 30 Oct 2013 21:01:21 -0400 > > > Why do we have this confusing inconsistency? > > make_string is a bug. There's no way to know/guess if the string should > be unibyte or multibyte. Well, there is a way, but it's tricky ;-) Yes, this inconsistency caused me a lot of grief while working on this bug. > So, it should be removed and replaced by calls to either > make_unibyte_string or make_multibyte_string. I presume you think the same about build_string, then. By sheer luck (or maybe something else), this is exactly what I've been doing in every case where it mattered for the non-ASCII build. (The job is almost done, btw, I'm in final testing.) ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 3:47 ` Eli Zaretskii @ 2013-10-31 13:40 ` Stefan Monnier 2013-10-31 16:25 ` Eli Zaretskii 2013-10-31 17:59 ` Eli Zaretskii 1 sibling, 1 reply; 50+ messages in thread From: Stefan Monnier @ 2013-10-31 13:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 >> So, it should be removed and replaced by calls to either >> make_unibyte_string or make_multibyte_string. > I presume you think the same about build_string, then. Pretty much, except that I tend to think of build_string as only meant for use on string constants, which are all ASCII, normally, so it doesn't matter nearly as much. Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 13:40 ` Stefan Monnier @ 2013-10-31 16:25 ` Eli Zaretskii 2013-10-31 18:04 ` Stefan Monnier 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-31 16:25 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Cc: handa@gnu.org, 15260@debbugs.gnu.org > Date: Thu, 31 Oct 2013 09:40:07 -0400 > > >> So, it should be removed and replaced by calls to either > >> make_unibyte_string or make_multibyte_string. > > I presume you think the same about build_string, then. > > Pretty much, except that I tend to think of build_string as only meant > for use on string constants, which are all ASCII, normally, so it > doesn't matter nearly as much. About 20% of uses of build_string are not guaranteed to act on pure ASCII strings. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 16:25 ` Eli Zaretskii @ 2013-10-31 18:04 ` Stefan Monnier 0 siblings, 0 replies; 50+ messages in thread From: Stefan Monnier @ 2013-10-31 18:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 >> >> So, it should be removed and replaced by calls to either >> >> make_unibyte_string or make_multibyte_string. >> > I presume you think the same about build_string, then. >> Pretty much, except that I tend to think of build_string as only meant >> for use on string constants, which are all ASCII, normally, so it >> doesn't matter nearly as much. > About 20% of uses of build_string are not guaranteed to act on pure > ASCII strings. These should presumably use something like make_unibyte_string or make_multibyte_string, then. Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 3:47 ` Eli Zaretskii 2013-10-31 13:40 ` Stefan Monnier @ 2013-10-31 17:59 ` Eli Zaretskii 2013-10-31 19:24 ` Stefan Monnier 2013-11-04 17:35 ` Eli Zaretskii 1 sibling, 2 replies; 50+ messages in thread From: Eli Zaretskii @ 2013-10-31 17:59 UTC (permalink / raw) To: Stefan Monnier, Glenn Morris; +Cc: 15260 > Date: Thu, 31 Oct 2013 05:47:48 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15260@debbugs.gnu.org > > (The job is almost done, btw, I'm in final testing.) Below is what I came up with. This survived several bootstraps, both on GNU/Linux (in- and out-of-source-tree builds) and on MS-Windows, including "make install" into a non-ASCII directory and invocation from there. To summarize the changes: . ENCODE_FILE now explicitly leaves alone unibyte strings. (It could be that it did this before as well, but I couldn't find the code which had that effect, and doing that early on is TRT anyway.) . All the *-directory and *-path variables we create at startup are forced to be unibyte strings. (Previously, they were sometimes unibyte and sometimes multibyte.) In temacs that dumps or bootstraps, they stay unibyte all the way till program exit. In a session that runs startup.el, they are decoded early in normal-top-level, after setting up the locale environment. The code which decodes these was moved much closer to the beginning of normal-top-level, as its previous place was too late, after some damage was already done. . Several left-overs from working around problems that are long gone are removed. Notable examples: (a) "set LC_ALL=C" when running 'emacs' (NOT 'temacs') in lisp/ and leim/; (b) storing (in init_buffer) the default-directory of *scratch* in the multibyte representation of the original unibyte bytes. . A few related bugs which got in the way were fixed. E.g., in make_temp_name. Please test this, as I could only do that in 2 different locales. Please pay specific attention to strings in load-path, default-directory, data-directory, etc., after Emacs comes up in an interactive session: they should all be multibyte strings (you can use multibyte-string-p to test that). If no problems pop up, I will commit this in a few days. Thanks. === modified file 'configure.ac' --- configure.ac 2013-10-27 18:57:20 +0000 +++ configure.ac 2013-10-31 16:57:18 +0000 @@ -73,30 +73,6 @@ dnl Support for --program-prefix, --prog dnl --program-transform-name options AC_ARG_PROGRAM -dnl http://debbugs.gnu.org/15260 -dnl I think we have to check, eg, both exec_prefix and bindir, -dnl because the latter by default is not yet expanded, but the user -dnl may have specified a value for it via --bindir. -dnl At first glance, _installing_ in non-ASCII seems ok, but in fact -dnl it is not; see http://debbugs.gnu.org/15260#61 -dnl Note that abs_srcdir and abs_builddir are not yet defined. :( - -dnl "`cd \"$srcdir\"`" is not portable. -dnl See autoconf manual "Shell Substitutions": -dnl "There is just no portable way to use double-quoted strings inside -dnl double-quoted back-quoted expressions (pfew!)." -temp_srcdir=`cd "$srcdir"; pwd` - -for var in "`pwd`" "$temp_srcdir" "$prefix" "$exec_prefix" \ - "$datarootdir" "$bindir" "$datadir" "$sharedstatedir" "$libexecdir"; do - - dnl configure sets LC_ALL=C early on, so this range should work. - case "$var" in - *[[^\ -~]]*) AC_MSG_ERROR([Emacs cannot be built or installed in a directory whose name contains non-ASCII characters: $var]) ;; - esac - -done - dnl It is important that variables on the RHS not be expanded here, dnl hence the single quotes. This is per the GNU coding standards, see dnl (autoconf) Installation Directory Variables === modified file 'leim/Makefile.in' --- leim/Makefile.in 2013-10-24 02:29:29 +0000 +++ leim/Makefile.in 2013-10-31 16:57:18 +0000 @@ -34,7 +34,7 @@ EMACS = ../src/emacs buildlisppath=${abs_srcdir}/../lisp # How to run Emacs. -RUN_EMACS = EMACSLOADPATH="$(buildlisppath)" LC_ALL=C \ +RUN_EMACS = EMACSLOADPATH="$(buildlisppath)" \ "${EMACS}" -batch --no-site-file --no-site-lisp MKDIR_P = @MKDIR_P@ === modified file 'lisp/Makefile.in' --- lisp/Makefile.in 2013-10-31 07:27:35 +0000 +++ lisp/Makefile.in 2013-10-31 16:57:18 +0000 @@ -115,7 +115,7 @@ COMPILE_FIRST = \ # The actual Emacs command run in the targets below. -emacs = EMACSLOADPATH="$(abs_lisp)" LC_ALL=C "$(EMACS)" $(EMACSOPT) +emacs = EMACSLOADPATH="$(abs_lisp)" "$(EMACS)" $(EMACSOPT) # Common command to find subdirectories setwins=subdirs=`find . -type d -print`; \ === modified file 'lisp/loadup.el' --- lisp/loadup.el 2013-10-08 15:11:29 +0000 +++ lisp/loadup.el 2013-10-31 16:57:18 +0000 @@ -286,6 +286,20 @@ ;For other systems, you must edit ../src/Makefile.in. (load "site-load" t) +;; Make sure default-directory is unibyte when dumping. This is +;; because we cannot decode and encode it correctly (since the locale +;; environment is not, and should not be, set up). default-directory +;; is used every time we call expand-file-name, which we do in every +;; file primitive. So the only workable solution to support building +;; in non-ASCII directories is to manipulate unibyte strings in the +;; current locale's encoding. +(if (and (or (equal (nth 3 command-line-args) "dump") + (equal (nth 4 command-line-args) "dump") + (equal (nth 3 command-line-args) "bootstrap") + (equal (nth 4 command-line-args) "bootstrap")) + (multibyte-string-p default-directory)) + (setq default-directory (string-to-unibyte default-directory))) + ;; Determine which last version number to use ;; based on the executables that now exist. (if (and (or (equal (nth 3 command-line-args) "dump") === modified file 'lisp/startup.el' --- lisp/startup.el 2013-10-30 02:45:53 +0000 +++ lisp/startup.el 2013-10-31 17:04:11 +0000 @@ -489,6 +489,63 @@ It is the default value of the variable (if command-line-processed (message "Back to top level.") (setq command-line-processed t) + + ;; Set the default strings to display in mode line for end-of-line + ;; formats that aren't native to this platform. This should be + ;; done before calling set-locale-environment, as the latter might + ;; use these mnemonics. + (cond + ((memq system-type '(ms-dos windows-nt)) + (setq eol-mnemonic-unix "(Unix)" + eol-mnemonic-mac "(Mac)")) + (t ; this is for Unix/GNU/Linux systems + (setq eol-mnemonic-dos "(DOS)" + eol-mnemonic-mac "(Mac)"))) + + (set-locale-environment nil) + ;; Decode all default-directory's (probably, only *scratch* exists + ;; at this point). default-directory of *scratch* is the basis + ;; for many other file-name variables and directory lists, so it + ;; is important to decode it ASAP. + (when locale-coding-system + (save-excursion + (dolist (elt (buffer-list)) + (set-buffer elt) + (if default-directory + (setq default-directory + (decode-coding-string default-directory + locale-coding-system t))))) + + ;; Decode all the important variables and directory lists, now + ;; that we know the locale's encoding. This is because the + ;; values of these variables are until here unibyte undecoded + ;; strings created by build_unibyte_string. data-directory in + ;; particular is used to construct many other standard directory + ;; names, so it must be decoded ASAP. + ;; Note that charset-map-path cannot be decoded here, since we + ;; could then be trapped in infinite recursion below, when we + ;; load subdirs.el, because encoding a directory name might need + ;; to load a charset map, which will want to encode + ;; charset-map-path, which will want to load the same charset + ;; map... So decoding of charset-map-path is delayed until + ;; further down below. + (dolist (pathsym '(load-path exec-path)) + (let ((path (symbol-value pathsym))) + (if (listp path) + (set pathsym (mapcar (lambda (dir) + (decode-coding-string + dir + locale-coding-system t)) + path))))) + (dolist (filesym '(data-directory doc-directory exec-directory + installation-directory + invocation-directory invocation-name + source-directory + shared-game-score-directory)) + (let ((file (symbol-value filesym))) + (if (stringp file) + (set filesym (decode-coding-string file locale-coding-system t)))))) + (let ((dir default-directory)) (with-current-buffer "*Messages*" (messages-buffer-mode) @@ -536,6 +593,16 @@ It is the default value of the variable (setq process-environment (delete (concat "PWD=" pwd) process-environment))))) + ;; Now, that other directories were searched, and any charsets we + ;; need for encoding them are already loaded, we are ready to + ;; decode charset-map-path. + (if (listp charset-map-path) + (setq charset-map-path + (mapcar (lambda (dir) + (decode-coding-string + dir + locale-coding-system t)) + charset-map-path))) (setq default-directory (abbreviate-file-name default-directory)) (let ((old-face-font-rescale-alist face-font-rescale-alist)) (unwind-protect @@ -756,18 +823,6 @@ Amongst another things, it parses the co ;;! ;; Choose a good default value for split-window-keep-point. ;;! (setq split-window-keep-point (> baud-rate 2400)) - ;; Set the default strings to display in mode line for - ;; end-of-line formats that aren't native to this platform. - (cond - ((memq system-type '(ms-dos windows-nt)) - (setq eol-mnemonic-unix "(Unix)" - eol-mnemonic-mac "(Mac)")) - (t ; this is for Unix/GNU/Linux systems - (setq eol-mnemonic-dos "(DOS)" - eol-mnemonic-mac "(Mac)"))) - - (set-locale-environment nil) - ;; Convert preloaded file names in load-history to absolute. (let ((simple-file-name ;; Look for simple.el or simple.elc and use their directory @@ -801,7 +856,7 @@ please check its value") load-history)))) ;; Convert the arguments to Emacs internal representation. - (let ((args (cdr command-line-args))) + (let ((args command-line-args)) (while args (setcar args (decode-coding-string (car args) locale-coding-system t)) @@ -1211,19 +1266,6 @@ the `--debug-init' option to view a comp (setq after-init-time (current-time)) (run-hooks 'after-init-hook) - ;; Decode all default-directory. - (if (and (default-value 'enable-multibyte-characters) locale-coding-system) - (save-excursion - (dolist (elt (buffer-list)) - (set-buffer elt) - (if default-directory - (setq default-directory - (decode-coding-string default-directory - locale-coding-system t)))) - (setq command-line-default-directory - (decode-coding-string command-line-default-directory - locale-coding-system t)))) - ;; If *scratch* exists and init file didn't change its mode, initialize it. (if (get-buffer "*scratch*") (with-current-buffer "*scratch*" === modified file 'src/buffer.c' --- src/buffer.c 2013-10-29 14:46:23 +0000 +++ src/buffer.c 2013-10-31 16:57:18 +0000 @@ -5349,13 +5349,10 @@ init_buffer (void) len++; } + /* At this moment, we still don't know how to decode the directory + name. So, we keep the bytes in unibyte form so that file I/O + routines correctly get the original bytes. */ bset_directory (current_buffer, make_unibyte_string (pwd, len)); - if (! NILP (BVAR (&buffer_defaults, enable_multibyte_characters))) - /* At this moment, we still don't know how to decode the - directory name. So, we keep the bytes in multibyte form so - that ENCODE_FILE correctly gets the original bytes. */ - bset_directory - (current_buffer, string_to_multibyte (BVAR (current_buffer, directory))); /* Add /: to the front of the name if it would otherwise be treated as magic. */ === modified file 'src/callproc.c' --- src/callproc.c 2013-08-23 17:57:07 +0000 +++ src/callproc.c 2013-10-31 16:57:18 +0000 @@ -1612,14 +1612,14 @@ init_callproc (void) Lisp_Object tem, tem1, srcdir; srcdir = Fexpand_file_name (build_string ("../src/"), - build_string (PATH_DUMPLOADSEARCH)); + build_unibyte_string (PATH_DUMPLOADSEARCH)); tem = Fexpand_file_name (build_string ("GNU"), Vdata_directory); tem1 = Ffile_exists_p (tem); if (!NILP (Fequal (srcdir, Vinvocation_directory)) || NILP (tem1)) { Lisp_Object newdir; newdir = Fexpand_file_name (build_string ("../etc/"), - build_string (PATH_DUMPLOADSEARCH)); + build_unibyte_string (PATH_DUMPLOADSEARCH)); tem = Fexpand_file_name (build_string ("GNU"), newdir); tem1 = Ffile_exists_p (tem); if (!NILP (tem1)) @@ -1646,7 +1646,7 @@ init_callproc (void) #ifdef DOS_NT Vshared_game_score_directory = Qnil; #else - Vshared_game_score_directory = build_string (PATH_GAME); + Vshared_game_score_directory = build_unibyte_string (PATH_GAME); if (NILP (Ffile_accessible_directory_p (Vshared_game_score_directory))) Vshared_game_score_directory = Qnil; #endif === modified file 'src/coding.h' --- src/coding.h 2013-10-08 06:40:09 +0000 +++ src/coding.h 2013-10-31 16:57:18 +0000 @@ -670,14 +670,16 @@ struct coding_system (code) = (s1 << 8) | s2; \ } while (0) -/* Encode the file name NAME using the specified coding system - for file names, if any. */ -#define ENCODE_FILE(name) \ - (! NILP (Vfile_name_coding_system) \ - ? code_convert_string_norecord (name, Vfile_name_coding_system, 1) \ - : (! NILP (Vdefault_file_name_coding_system) \ - ? code_convert_string_norecord (name, Vdefault_file_name_coding_system, 1) \ - : name)) +/* Encode the file name NAME using the specified coding system for + file names, if any. If NAME is a unibyte string, return NAME. */ +#define ENCODE_FILE(name) \ + (! STRING_MULTIBYTE (name) \ + ? name \ + : (! NILP (Vfile_name_coding_system) \ + ? code_convert_string_norecord (name, Vfile_name_coding_system, 1) \ + : (! NILP (Vdefault_file_name_coding_system) \ + ? code_convert_string_norecord (name, Vdefault_file_name_coding_system, 1) \ + : name))) /* Decode the file name NAME using the specified coding system === modified file 'src/emacs.c' --- src/emacs.c 2013-10-31 08:32:42 +0000 +++ src/emacs.c 2013-10-31 17:03:39 +0000 @@ -396,7 +396,7 @@ init_cmdargs (int argc, char **argv, int initial_argv = argv; initial_argc = argc; - raw_name = build_string (argv[0]); + raw_name = build_unibyte_string (argv[0]); /* Add /: to the front of the name if it would otherwise be treated as magic. */ @@ -430,7 +430,9 @@ init_cmdargs (int argc, char **argv, int /* Emacs was started with relative path, like ./emacs. Make it absolute. */ { - Lisp_Object odir = original_pwd ? build_string (original_pwd) : Qnil; + Lisp_Object odir = + original_pwd ? build_unibyte_string (original_pwd) : Qnil; + Vinvocation_directory = Fexpand_file_name (Vinvocation_directory, odir); } @@ -2204,7 +2206,7 @@ decode_env_path (const char *evarname, c p = strchr (path, SEPCHAR); if (!p) p = path + strlen (path); - element = (p - path ? make_string (path, p - path) + element = (p - path ? make_unibyte_string (path, p - path) : build_string (".")); #ifdef WINDOWSNT /* Relative file names in the default path are interpreted as @@ -2214,7 +2216,7 @@ decode_env_path (const char *evarname, c element = Fexpand_file_name (Fsubstring (element, make_number (emacs_dir_len), Qnil), - build_string (emacs_dir)); + build_unibyte_string (emacs_dir)); #endif /* Add /: to the front of the name === modified file 'src/fileio.c' --- src/fileio.c 2013-10-17 06:42:21 +0000 +++ src/fileio.c 2013-10-31 16:57:18 +0000 @@ -732,8 +732,8 @@ static unsigned make_temp_name_count, ma Lisp_Object make_temp_name (Lisp_Object prefix, bool base64_p) { - Lisp_Object val; - int len, clen; + Lisp_Object val, encoded_prefix; + int len; printmax_t pid; char *p, *data; char pidbuf[INT_BUFSIZE_BOUND (printmax_t)]; @@ -767,12 +767,11 @@ make_temp_name (Lisp_Object prefix, bool #endif } - len = SBYTES (prefix); clen = SCHARS (prefix); - val = make_uninit_multibyte_string (clen + 3 + pidlen, len + 3 + pidlen); - if (!STRING_MULTIBYTE (prefix)) - STRING_SET_UNIBYTE (val); + encoded_prefix = ENCODE_FILE (prefix); + len = SBYTES (encoded_prefix); + val = make_uninit_string (len + 3 + pidlen); data = SSDATA (val); - memcpy (data, SSDATA (prefix), len); + memcpy (data, SSDATA (encoded_prefix), len); p = data + len; memcpy (p, pidbuf, pidlen); @@ -810,7 +809,7 @@ make_temp_name (Lisp_Object prefix, bool { /* We want to return only if errno is ENOENT. */ if (errno == ENOENT) - return val; + return DECODE_FILE (val); else /* The error here is dubious, but there is little else we can do. The alternatives are to return nil, which is @@ -987,7 +986,26 @@ filesystem tree, not (expand-file-name " if (multibyte != STRING_MULTIBYTE (default_directory)) { if (multibyte) - default_directory = string_to_multibyte (default_directory); + { + unsigned char *p = SDATA (name); + + while (*p && ASCII_BYTE_P (*p)) + p++; + if (*p == '\0') + { + /* NAME is a pure ASCII string, and DEFAULT_DIRECTORY is + unibyte. Do not convert DEFAULT_DIRECTORY to + multibyte; instead, convert NAME to a unibyte string, + so that the result of this function is also a unibyte + string. This is needed during bootstraping and + dumping, when Emacs cannot decode file names, because + the locale environment is not set up. */ + name = make_unibyte_string (SSDATA (name), SBYTES (name)); + multibyte = 0; + } + else + default_directory = string_to_multibyte (default_directory); + } else { name = string_to_multibyte (name); === modified file 'src/lread.c' --- src/lread.c 2013-09-26 03:46:47 +0000 +++ src/lread.c 2013-10-31 16:57:18 +0000 @@ -1500,7 +1500,8 @@ openp (Lisp_Object path, Lisp_Object str for (tail = NILP (suffixes) ? list1 (empty_unibyte_string) : suffixes; CONSP (tail); tail = XCDR (tail)) { - ptrdiff_t fnlen, lsuffix = SBYTES (XCAR (tail)); + Lisp_Object suffix = XCAR (tail); + ptrdiff_t fnlen, lsuffix = SBYTES (suffix); Lisp_Object handler; /* Concatenate path element/specified name with the suffix. @@ -1511,7 +1512,7 @@ openp (Lisp_Object path, Lisp_Object str ? 2 : 0); fnlen = SBYTES (filename) - prefixlen; memcpy (fn, SDATA (filename) + prefixlen, fnlen); - memcpy (fn + fnlen, SDATA (XCAR (tail)), lsuffix + 1); + memcpy (fn + fnlen, SDATA (suffix), lsuffix + 1); fnlen += lsuffix; /* Check that the file exists and is not a directory. */ /* We used to only check for handlers on non-absolute file names: @@ -1521,7 +1522,18 @@ openp (Lisp_Object path, Lisp_Object str handler = Ffind_file_name_handler (filename, Qfile_exists_p); It's not clear why that was the case and it breaks things like (load "/bar.el") where the file is actually "/bar.el.gz". */ - string = make_string (fn, fnlen); + /* make_string has its own ideas on when to return a unibyte + string and when a multibyte string, but we know better. + We must have a unibyte string when dumping, since + file-name encoding is shaky at best at that time, and in + particular default-file-name-coding-system is reset + several times during loadup. We therefore don't want to + encode the file before passing it to file I/O library + functions. */ + if (!STRING_MULTIBYTE (filename) && !STRING_MULTIBYTE (suffix)) + string = make_unibyte_string (fn, fnlen); + else + string = make_string (fn, fnlen); handler = Ffind_file_name_handler (string, Qfile_exists_p); if ((!NILP (handler) || !NILP (predicate)) && !NATNUMP (predicate)) { === modified file 'src/xdisp.c' --- src/xdisp.c 2013-10-29 16:11:50 +0000 +++ src/xdisp.c 2013-10-31 16:57:18 +0000 @@ -9728,7 +9728,11 @@ message3_nolog (Lisp_Object m) putc ('\n', stderr); noninteractive_need_newline = 0; if (STRINGP (m)) - fwrite (SDATA (m), SBYTES (m), 1, stderr); + { + Lisp_Object s = ENCODE_SYSTEM (m); + + fwrite (SDATA (s), SBYTES (s), 1, stderr); + } if (cursor_in_echo_area == 0) fprintf (stderr, "\n"); fflush (stderr); @@ -9803,13 +9807,19 @@ message_with_string (const char *m, Lisp { if (m) { + /* ENCODE_SYSTEM below can GC and/or relocate the Lisp + String whose data pointer might be passed to us in M. So + we use a local copy. */ + char *fmt = xstrdup (m); + if (noninteractive_need_newline) putc ('\n', stderr); noninteractive_need_newline = 0; - fprintf (stderr, m, SDATA (string)); + fprintf (stderr, fmt, SDATA (ENCODE_SYSTEM (string))); if (!cursor_in_echo_area) fprintf (stderr, "\n"); fflush (stderr); + xfree (fmt); } } else if (INTERACTIVE) ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 17:59 ` Eli Zaretskii @ 2013-10-31 19:24 ` Stefan Monnier 2013-10-31 19:33 ` Eli Zaretskii 2013-11-04 17:35 ` Eli Zaretskii 1 sibling, 1 reply; 50+ messages in thread From: Stefan Monnier @ 2013-10-31 19:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 > Below is what I came up with. This survived several bootstraps, both Thanks, Eli. > +;; Make sure default-directory is unibyte when dumping. This is > +;; because we cannot decode and encode it correctly (since the locale > +;; environment is not, and should not be, set up). default-directory > +;; is used every time we call expand-file-name, which we do in every > +;; file primitive. So the only workable solution to support building > +;; in non-ASCII directories is to manipulate unibyte strings in the > +;; current locale's encoding. > +(if (and (or (equal (nth 3 command-line-args) "dump") > + (equal (nth 4 command-line-args) "dump") > + (equal (nth 3 command-line-args) "bootstrap") > + (equal (nth 4 command-line-args) "bootstrap")) > + (multibyte-string-p default-directory)) > + (setq default-directory (string-to-unibyte default-directory))) I'm not sure I understand this string-to-unibyte. This call seems to only be correct if default-directory holds the undecoded but multibyte name. Why would we have an undecided yet multibyte name? IOW, I'd expect here to either have default-directory be unibyte already, or be multibyte but encoded in some (arbitrary) encoding (in which case we can't really know how to re-encode it). Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 19:24 ` Stefan Monnier @ 2013-10-31 19:33 ` Eli Zaretskii 2013-11-01 9:27 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-31 19:33 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Glenn Morris <rgm@gnu.org>, 15260@debbugs.gnu.org, Kenichi Handa <handa@gnu.org> > Date: Thu, 31 Oct 2013 15:24:57 -0400 > > > Below is what I came up with. This survived several bootstraps, both > > Thanks, Eli. > > > +;; Make sure default-directory is unibyte when dumping. This is > > +;; because we cannot decode and encode it correctly (since the locale > > +;; environment is not, and should not be, set up). default-directory > > +;; is used every time we call expand-file-name, which we do in every > > +;; file primitive. So the only workable solution to support building > > +;; in non-ASCII directories is to manipulate unibyte strings in the > > +;; current locale's encoding. > > +(if (and (or (equal (nth 3 command-line-args) "dump") > > + (equal (nth 4 command-line-args) "dump") > > + (equal (nth 3 command-line-args) "bootstrap") > > + (equal (nth 4 command-line-args) "bootstrap")) > > + (multibyte-string-p default-directory)) > > + (setq default-directory (string-to-unibyte default-directory))) > > I'm not sure I understand this string-to-unibyte. > This call seems to only be correct if default-directory holds the > undecoded but multibyte name. > Why would we have an undecided yet multibyte name? This was a necessity before I removed this quirk from init_buffer: --- src/buffer.c 2013-10-29 14:46:23 +0000 +++ src/buffer.c 2013-10-31 16:57:18 +0000 @@ -5349,13 +5349,10 @@ init_buffer (void) len++; } + /* At this moment, we still don't know how to decode the directory + name. So, we keep the bytes in unibyte form so that file I/O + routines correctly get the original bytes. */ bset_directory (current_buffer, make_unibyte_string (pwd, len)); - if (! NILP (BVAR (&buffer_defaults, enable_multibyte_characters))) - /* At this moment, we still don't know how to decode the - directory name. So, we keep the bytes in multibyte form so - that ENCODE_FILE correctly gets the original bytes. */ - bset_directory - (current_buffer, string_to_multibyte (BVAR (current_buffer, directory))); /* Add /: to the front of the name if it would otherwise be treated as magic. */ After removing that, it's probably not needed anymore, since now default-directory should be a unibyte string from the very beginning. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 19:33 ` Eli Zaretskii @ 2013-11-01 9:27 ` Eli Zaretskii 2013-11-01 12:33 ` Stefan Monnier 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-11-01 9:27 UTC (permalink / raw) To: monnier; +Cc: 15260 > Date: Thu, 31 Oct 2013 21:33:22 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15260@debbugs.gnu.org > > > > +;; Make sure default-directory is unibyte when dumping. This is > > > +;; because we cannot decode and encode it correctly (since the locale > > > +;; environment is not, and should not be, set up). default-directory > > > +;; is used every time we call expand-file-name, which we do in every > > > +;; file primitive. So the only workable solution to support building > > > +;; in non-ASCII directories is to manipulate unibyte strings in the > > > +;; current locale's encoding. > > > +(if (and (or (equal (nth 3 command-line-args) "dump") > > > + (equal (nth 4 command-line-args) "dump") > > > + (equal (nth 3 command-line-args) "bootstrap") > > > + (equal (nth 4 command-line-args) "bootstrap")) > > > + (multibyte-string-p default-directory)) > > > + (setq default-directory (string-to-unibyte default-directory))) > > > > I'm not sure I understand this string-to-unibyte. > > This call seems to only be correct if default-directory holds the > > undecoded but multibyte name. > > Why would we have an undecided yet multibyte name? > > This was a necessity before I removed this quirk from init_buffer: > > --- src/buffer.c 2013-10-29 14:46:23 +0000 > +++ src/buffer.c 2013-10-31 16:57:18 +0000 > @@ -5349,13 +5349,10 @@ init_buffer (void) > len++; > } > > + /* At this moment, we still don't know how to decode the directory > + name. So, we keep the bytes in unibyte form so that file I/O > + routines correctly get the original bytes. */ > bset_directory (current_buffer, make_unibyte_string (pwd, len)); > - if (! NILP (BVAR (&buffer_defaults, enable_multibyte_characters))) > - /* At this moment, we still don't know how to decode the > - directory name. So, we keep the bytes in multibyte form so > - that ENCODE_FILE correctly gets the original bytes. */ > - bset_directory > - (current_buffer, string_to_multibyte (BVAR (current_buffer, directory))); > > /* Add /: to the front of the name > if it would otherwise be treated as magic. */ > > After removing that, it's probably not needed anymore, since now > default-directory should be a unibyte string from the very beginning. Would you prefer that we error out of default-directory is not a unibyte string at that point in loadup.el? ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-11-01 9:27 ` Eli Zaretskii @ 2013-11-01 12:33 ` Stefan Monnier 2013-11-04 17:37 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Stefan Monnier @ 2013-11-01 12:33 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 > Would you prefer that we error out of default-directory is not a > unibyte string at that point in loadup.el? I'd prefer to either not do anything, or issue a warning, or error out, yes. Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-11-01 12:33 ` Stefan Monnier @ 2013-11-04 17:37 ` Eli Zaretskii 0 siblings, 0 replies; 50+ messages in thread From: Eli Zaretskii @ 2013-11-04 17:37 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: 15260@debbugs.gnu.org > Date: Fri, 01 Nov 2013 08:33:04 -0400 > > > Would you prefer that we error out of default-directory is not a > > unibyte string at that point in loadup.el? > > I'd prefer to either not do anything, or issue a warning, or error > out, yes. I eventually opted for erroring out, mostly to be able to catch any unforseen problems and use cases I missed. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 17:59 ` Eli Zaretskii 2013-10-31 19:24 ` Stefan Monnier @ 2013-11-04 17:35 ` Eli Zaretskii 2013-11-04 18:38 ` Stefan Monnier 1 sibling, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-11-04 17:35 UTC (permalink / raw) To: monnier, rgm; +Cc: 15260-done > Date: Thu, 31 Oct 2013 19:59:52 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15260@debbugs.gnu.org > > If no problems pop up, I will commit this in a few days. No further comments, and I got fed up with resolving merge conflicts every day, so I committed the changes, and I'm marking this bug done. Thanks to everybody for their feedback and support. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-11-04 17:35 ` Eli Zaretskii @ 2013-11-04 18:38 ` Stefan Monnier 0 siblings, 0 replies; 50+ messages in thread From: Stefan Monnier @ 2013-11-04 18:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260-done > Thanks to everybody for their feedback and support. Thank you, Eli, Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 1:01 ` Stefan Monnier 2013-10-31 3:47 ` Eli Zaretskii @ 2013-10-31 17:16 ` Eli Zaretskii 2013-10-31 18:09 ` Stefan Monnier 1 sibling, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-31 17:16 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Kenichi Handa <handa@gnu.org>, 15260@debbugs.gnu.org > Date: Wed, 30 Oct 2013 21:01:21 -0400 > > > Why do we have this confusing inconsistency? > > make_string is a bug. There's no way to know/guess if the string should > be unibyte or multibyte. So, it should be removed and replaced by calls > to either make_unibyte_string or make_multibyte_string. Here's one more gotcha I bumped into while working on this bug. Suppose the filesystem where you build Emacs uses a file-name encoding whose coding-system-category is 'charset'. Example: cpNNNN. Then, when Emacs comes up after dumping, it loads subdirs.el in each directory on load-path. To do this, it calls 'openp' to look for DIR/subdirs.el, which involves calling ENCODE_FILE on "DIR/subdirs.el", in order to pass that to 'faccessat' or 'open'. Now, if the charset that is needed to encode this file name is not yet loaded into Emacs, Emacs will try to load it. To this end, it will look along charset-map-path for the corresponding map file, and for that it will again call 'openp', recursively. That 'openp' call will again want to ENCODE_FILE with the same encoding, which will again cause Emacs to try to load the corresponding map file, etc. etc., until we exhaust the specpdl stack. I worked around this by keeping charset-map-path in unibyte form until later into the startup procedure. Is there a more elegant and less kludgey way? ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 17:16 ` Eli Zaretskii @ 2013-10-31 18:09 ` Stefan Monnier 2013-10-31 18:37 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Stefan Monnier @ 2013-10-31 18:09 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 >>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes: >> From: Stefan Monnier <monnier@iro.umontreal.ca> >> Cc: Kenichi Handa <handa@gnu.org>, 15260@debbugs.gnu.org >> Date: Wed, 30 Oct 2013 21:01:21 -0400 >> >> > Why do we have this confusing inconsistency? >> >> make_string is a bug. There's no way to know/guess if the string should >> be unibyte or multibyte. So, it should be removed and replaced by calls >> to either make_unibyte_string or make_multibyte_string. > Here's one more gotcha I bumped into while working on this bug. > Suppose the filesystem where you build Emacs uses a file-name encoding > whose coding-system-category is 'charset'. Example: cpNNNN. Then, > when Emacs comes up after dumping, it loads subdirs.el in each > directory on load-path. To do this, it calls 'openp' to look for > DIR/subdirs.el, which involves calling ENCODE_FILE on > "DIR/subdirs.el", in order to pass that to 'faccessat' or 'open'. > Now, if the charset that is needed to encode this file name is not yet > loaded into Emacs, Emacs will try to load it. To this end, it will > look along charset-map-path for the corresponding map file, and for > that it will again call 'openp', recursively. That 'openp' call will > again want to ENCODE_FILE with the same encoding, which will again > cause Emacs to try to load the corresponding map file, etc. etc., > until we exhaust the specpdl stack. So you mean that we have: - charset-map-path is a multibyte string. - the file-name encoding uses a charset that's not yet loaded. How do we get into such a state? Stefan ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 18:09 ` Stefan Monnier @ 2013-10-31 18:37 ` Eli Zaretskii 2013-10-31 19:41 ` Eli Zaretskii 0 siblings, 1 reply; 50+ messages in thread From: Eli Zaretskii @ 2013-10-31 18:37 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: handa@gnu.org, 15260@debbugs.gnu.org > Date: Thu, 31 Oct 2013 14:09:39 -0400 > > So you mean that we have: > - charset-map-path is a multibyte string. > - the file-name encoding uses a charset that's not yet loaded. Yes. > How do we get into such a state? Not sure about the details, since I don't really understand when Emacs needs to load the charset map. Perhaps the map is needed only when we need to encode a string, not for decoding? Phenomenologically, this happened when charset-map-path was already decoded (as opposed to being a unibyte string) when this part of startup.el runs: ;; Convert preloaded file names in load-history to absolute. (let ((simple-file-name ;; Look for simple.el or simple.elc and use their directory ;; as the place where all Lisp files live. (locate-file "simple" load-path (get-load-suffixes))) lisp-dir) locate-file eventually calls 'openp', which wants to encode directories from load-path concatenated with simple.el etc. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 18:37 ` Eli Zaretskii @ 2013-10-31 19:41 ` Eli Zaretskii 0 siblings, 0 replies; 50+ messages in thread From: Eli Zaretskii @ 2013-10-31 19:41 UTC (permalink / raw) To: monnier; +Cc: 15260 > Date: Thu, 31 Oct 2013 20:37:52 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15260@debbugs.gnu.org > > > From: Stefan Monnier <monnier@iro.umontreal.ca> > > Cc: handa@gnu.org, 15260@debbugs.gnu.org > > Date: Thu, 31 Oct 2013 14:09:39 -0400 > > > > So you mean that we have: > > - charset-map-path is a multibyte string. > > - the file-name encoding uses a charset that's not yet loaded. > > Yes. > > > How do we get into such a state? > > Not sure about the details, since I don't really understand when Emacs > needs to load the charset map. Perhaps the map is needed only when we > need to encode a string, not for decoding? Actually, as can be seen from load_charset_map, we do different things when the map is needed for decoding and for encoding. So what probably happened was that when the file names in load-path etc. were decoded from cpNNNN, the map file was loaded and load_charset_map did whatever was necessary to set up the decoder for this encoding. Then, when we need to encode a file name using the same cpNNNN, the map file is loaded again, and load_charset_map now sets up the encoder. When the decoder was set up, charset-map-path was still in unibyte form, so the whole thing worked, because ENCODE_FILE doesn't try to encode unibyte strings. But once charset-map-path itself was decoded, the recursive call to 'openp' inside load_charset_map_from_file tried to encode it, and triggered infinite recursion. ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-29 1:35 ` Stefan Monnier 2013-10-29 3:47 ` Eli Zaretskii @ 2013-11-01 13:58 ` Kenichi Handa 1 sibling, 0 replies; 50+ messages in thread From: Kenichi Handa @ 2013-11-01 13:58 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15260 In article <jwveh746euj.fsf-monnier+emacsbugs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > > emacs = EMACSLOADPATH="$(abs_lisp)" LC_ALL=C "$(EMACS)" $(EMACSOPT) > > ^^^^^^^^ > > Does anyone know or remember why we set LC_ALL=C while running > > commands in lisp/ (and the same in leim/)? > IIRC the issue was to avoid things like misdetecting coding-systems > because of the user's locale setting, in the files we load/compile. As far as I remember, yes. > IOW, it was to work around bugs (e.g. missing coding: cookie) and is > likely unneeded nowadays. I agree. --- Kenichi Handa handa@gnu.org ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-28 16:47 ` Eli Zaretskii 2013-10-28 18:33 ` Eli Zaretskii @ 2013-10-31 21:45 ` Glenn Morris 2013-11-01 7:45 ` Eli Zaretskii 1 sibling, 1 reply; 50+ messages in thread From: Glenn Morris @ 2013-10-31 21:45 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15260 Eli Zaretskii wrote: > mule-cmds.el calls reset-language-environment, and language/english.el > calls set-language-info-alist; both have the effect of resetting > default-file-name-coding-system to latin-1 (!? an interesting > "default" for a Unicode-era Emacs, perhaps Handa-san could comment why > we still do that). I know nothing about this, but eg glib defaults to utf-8, which seems like a better default to me these days: https://developer.gnome.org/glib/stable/glib-Character-Set-Conversion.html#file-name-encodings ^ permalink raw reply [flat|nested] 50+ messages in thread
* bug#15260: cannot build in a directory with non-ascii characters 2013-10-31 21:45 ` Glenn Morris @ 2013-11-01 7:45 ` Eli Zaretskii 0 siblings, 0 replies; 50+ messages in thread From: Eli Zaretskii @ 2013-11-01 7:45 UTC (permalink / raw) To: Glenn Morris; +Cc: 15260 > From: Glenn Morris <rgm@gnu.org> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, handa@gnu.org, 15260@debbugs.gnu.org > Date: Thu, 31 Oct 2013 17:45:32 -0400 > > Eli Zaretskii wrote: > > > mule-cmds.el calls reset-language-environment, and language/english.el > > calls set-language-info-alist; both have the effect of resetting > > default-file-name-coding-system to latin-1 (!? an interesting > > "default" for a Unicode-era Emacs, perhaps Handa-san could comment why > > we still do that). > > I know nothing about this, but eg glib defaults to utf-8, which seems > like a better default to me these days: Yes, probably. That's why I wrote that comment in parens. Fortunately, the final patch side-steps this issue altogether by keeping all the related file names as unibyte strings, so that the current defaults for encoding file names do not affect anything. So we can reason about the default independently of the issues in this bug. ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2013-11-04 18:38 UTC | newest] Thread overview: 50+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-03 17:46 bug#15260: cannot build in a directory with non-ascii characters Glenn Morris 2013-10-23 20:48 ` Glenn Morris 2013-10-24 18:25 ` Eli Zaretskii 2013-10-24 18:35 ` Glenn Morris 2013-10-25 14:25 ` Eli Zaretskii 2013-10-25 17:08 ` Glenn Morris 2013-10-25 18:31 ` Eli Zaretskii 2013-10-25 18:40 ` Glenn Morris 2013-10-25 18:46 ` Eli Zaretskii 2013-10-25 19:27 ` Eli Zaretskii 2013-10-26 7:50 ` Eli Zaretskii 2013-10-26 19:15 ` Glenn Morris 2013-10-26 20:04 ` Eli Zaretskii 2013-10-27 3:56 ` Eli Zaretskii 2013-10-27 16:19 ` Eli Zaretskii 2013-10-27 19:02 ` Eli Zaretskii 2013-10-27 19:43 ` Eli Zaretskii 2013-10-27 4:28 ` Stefan Monnier 2013-10-27 16:11 ` Eli Zaretskii 2013-10-28 0:30 ` Stefan Monnier 2013-10-28 3:39 ` Eli Zaretskii 2013-10-28 4:05 ` Stefan Monnier 2013-10-28 16:47 ` Eli Zaretskii 2013-10-28 18:33 ` Eli Zaretskii 2013-10-28 22:00 ` Glenn Morris 2013-10-29 3:42 ` Eli Zaretskii 2013-10-29 1:35 ` Stefan Monnier 2013-10-29 3:47 ` Eli Zaretskii 2013-10-29 13:56 ` Stefan Monnier 2013-10-30 18:19 ` Eli Zaretskii 2013-10-31 1:01 ` Stefan Monnier 2013-10-31 3:47 ` Eli Zaretskii 2013-10-31 13:40 ` Stefan Monnier 2013-10-31 16:25 ` Eli Zaretskii 2013-10-31 18:04 ` Stefan Monnier 2013-10-31 17:59 ` Eli Zaretskii 2013-10-31 19:24 ` Stefan Monnier 2013-10-31 19:33 ` Eli Zaretskii 2013-11-01 9:27 ` Eli Zaretskii 2013-11-01 12:33 ` Stefan Monnier 2013-11-04 17:37 ` Eli Zaretskii 2013-11-04 17:35 ` Eli Zaretskii 2013-11-04 18:38 ` Stefan Monnier 2013-10-31 17:16 ` Eli Zaretskii 2013-10-31 18:09 ` Stefan Monnier 2013-10-31 18:37 ` Eli Zaretskii 2013-10-31 19:41 ` Eli Zaretskii 2013-11-01 13:58 ` Kenichi Handa 2013-10-31 21:45 ` Glenn Morris 2013-11-01 7:45 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).