unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 15260@debbugs.gnu.org
Subject: bug#15260: cannot build in a directory with non-ascii characters
Date: Mon, 28 Oct 2013 18:47:32 +0200	[thread overview]
Message-ID: <8361shfil7.fsf@gnu.org> (raw)
In-Reply-To: <jwvk3gy82jg.fsf-monnier+emacsbugs@gnu.org>

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rgm@gnu.org,  handa@gnu.org,  15260@debbugs.gnu.org
> Date: Mon, 28 Oct 2013 00:05:32 -0400
> 
> More specifically, for the bug to appear, you need ENCODE (DECODE (s))
> to not be the identity function.  Why is not so in the "early" Emacs?

Because life's a mess that doesn't easily fit into simple and elegant
schemes ;-)

For starters, we don't really DECODE_FILE with these file- and
directory-names.  We just use build_string or make_string, as you can
easily see in the init_* functions I mentioned.  If you are lucky and
your file names are UTF-8 encoded, this produces the same result as
DECODE_FILE.  If you are less lucky, and your file names are encoded
in something else, like Latin-N, you get a unibyte string with the
same bytes as in the original.  Then we pass these strings to various
functions, like file_accessible_directory_p, that _do_ ENCODE_FILE...
(Luckily, during most of temacs's run, both file-name-coding-system
and its default value are nil, so ENCODE_FILE is a no-op -- except
when they aren't, see the next paragraph.)

Next, it is quite possible that the file-name-coding-system changes
between the time we process and store the file name and the time we
encode and pass it to a low-level function.  This is especially true
during "loadup", when many packages are loaded and their top-level
forms are executed.  It turns out that 2 of them have side effects
that do just that: mule-cmds.el calls reset-language-environment, and
language/english.el calls set-language-info-alist; both have the
effect of resetting default-file-name-coding-system to latin-1 (!? an
interesting "default" for a Unicode-era Emacs, perhaps Handa-san could
comment why we still do that).  When this happens, your symmetry is
broken, and ENCODE_FILE (DECODE_FILE (f)) is no longer the identity
function.

And then there are other players in this game.  For example,
default-directory, which is used every time we call expand-file-name,
IOW "a lot".  If you look in init_buffer, you will see that the
default-directory of *scratch* is first set to a multibyte
representation of the unibyte string we get from getcwd.  In a
"normal" Emacs session, we promptly fix that in startup.el, after the
call to set-locale-environment initializes all the coding-systems.
But "temacs -l loadup dump" doesn't run startup.el, so we are left
with what init_buffer did, which is a string no file-name API will be
able to grok.

Another example is the use of 'equal' (and 'member', which calls
'equal') to compare file and directory names, and look them up in
lists: as you know, 'equal' will not compare a unibyte and a multibyte
string as equal.  So having a mix of unibyte and multibyte strings in
file names fails some of the code that relies on 'equal', tricking it
into doing wrong things, like deciding that Emacs is _not_ run from
the source tree.

I'm sure there's more to this saga, I'm just half-way through it...





  reply	other threads:[~2013-10-28 16:47 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-03 17:46 bug#15260: cannot build in a directory with non-ascii characters Glenn Morris
2013-10-23 20:48 ` Glenn Morris
2013-10-24 18:25   ` Eli Zaretskii
2013-10-24 18:35     ` Glenn Morris
2013-10-25 14:25       ` Eli Zaretskii
2013-10-25 17:08         ` Glenn Morris
2013-10-25 18:31           ` Eli Zaretskii
2013-10-25 18:40             ` Glenn Morris
2013-10-25 18:46               ` Eli Zaretskii
2013-10-25 19:27                 ` Eli Zaretskii
2013-10-26  7:50                   ` Eli Zaretskii
2013-10-26 19:15                     ` Glenn Morris
2013-10-26 20:04                       ` Eli Zaretskii
2013-10-27  3:56                         ` Eli Zaretskii
2013-10-27 16:19                           ` Eli Zaretskii
2013-10-27 19:02                             ` Eli Zaretskii
2013-10-27 19:43                               ` Eli Zaretskii
2013-10-27  4:28                     ` Stefan Monnier
2013-10-27 16:11                       ` Eli Zaretskii
2013-10-28  0:30                         ` Stefan Monnier
2013-10-28  3:39                           ` Eli Zaretskii
2013-10-28  4:05                             ` Stefan Monnier
2013-10-28 16:47                               ` Eli Zaretskii [this message]
2013-10-28 18:33                                 ` Eli Zaretskii
2013-10-28 22:00                                   ` Glenn Morris
2013-10-29  3:42                                     ` Eli Zaretskii
2013-10-29  1:35                                   ` Stefan Monnier
2013-10-29  3:47                                     ` Eli Zaretskii
2013-10-29 13:56                                       ` Stefan Monnier
2013-10-30 18:19                                         ` Eli Zaretskii
2013-10-31  1:01                                           ` Stefan Monnier
2013-10-31  3:47                                             ` Eli Zaretskii
2013-10-31 13:40                                               ` Stefan Monnier
2013-10-31 16:25                                                 ` Eli Zaretskii
2013-10-31 18:04                                                   ` Stefan Monnier
2013-10-31 17:59                                               ` Eli Zaretskii
2013-10-31 19:24                                                 ` Stefan Monnier
2013-10-31 19:33                                                   ` Eli Zaretskii
2013-11-01  9:27                                                     ` Eli Zaretskii
2013-11-01 12:33                                                       ` Stefan Monnier
2013-11-04 17:37                                                         ` Eli Zaretskii
2013-11-04 17:35                                                 ` Eli Zaretskii
2013-11-04 18:38                                                   ` Stefan Monnier
2013-10-31 17:16                                             ` Eli Zaretskii
2013-10-31 18:09                                               ` Stefan Monnier
2013-10-31 18:37                                                 ` Eli Zaretskii
2013-10-31 19:41                                                   ` Eli Zaretskii
2013-11-01 13:58                                     ` Kenichi Handa
2013-10-31 21:45                                 ` Glenn Morris
2013-11-01  7:45                                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8361shfil7.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=15260@debbugs.gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).