From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Multibyte and unibyte file names Date: Fri, 25 Jan 2013 22:31:19 +0200 Message-ID: <83vcalj97s.fsf@gnu.org> References: <83ehhbn680.fsf@gnu.org> <83wqv2ldk1.fsf@gnu.org> <83obgel94c.fsf@gnu.org> <83k3r1lnlb.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1359145917 31914 80.91.229.3 (25 Jan 2013 20:31:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 25 Jan 2013 20:31:57 +0000 (UTC) Cc: kzhr@d1.dion.ne.jp, michael.albinus@gmx.de, emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Jan 25 21:32:16 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Typwa-00010Q-5X for ged-emacs-devel@m.gmane.org; Fri, 25 Jan 2013 21:32:12 +0100 Original-Received: from localhost ([::1]:38893 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TypwI-0005dB-Lc for ged-emacs-devel@m.gmane.org; Fri, 25 Jan 2013 15:31:54 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:55853) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TypwF-0005d4-Cx for emacs-devel@gnu.org; Fri, 25 Jan 2013 15:31:53 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TypwD-0007DB-Rs for emacs-devel@gnu.org; Fri, 25 Jan 2013 15:31:51 -0500 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:36480) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TypwD-0007Cu-Js for emacs-devel@gnu.org; Fri, 25 Jan 2013 15:31:49 -0500 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MH700G007MBRE00@a-mtaout20.012.net.il> for emacs-devel@gnu.org; Fri, 25 Jan 2013 22:31:11 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MH700GWJ7NZR600@a-mtaout20.012.net.il>; Fri, 25 Jan 2013 22:31:11 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.166 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:156637 Archived-At: > From: Stefan Monnier > Cc: emacs-devel@gnu.org, kzhr@d1.dion.ne.jp, michael.albinus@gmx.de > Date: Fri, 25 Jan 2013 06:36:39 -0500 > > >> That the callers get to see meaningful (decoded) names? > >> That file-name manipulation functions don't have the side effect of > >> encoding/decoding file names? > > If we decode unibyte file names at entry to each primitive, before > > doing anything else, and thereafter manipulate decoded multibyte > > strings, this will happen anyway. > > I get the impression that we're not talking about the same thing. Looks like that. > If you only decode on entry, then Elisp code will first see encoded file > names returned by directory-files and will then see them converted to > decoded form after passing the result to a file-name > manipulation function. No. Elisp code will see _decoded_ file names from directory-files, because we already decode them. I didn't mean to change that. What I meant was to return decoded file names from all file-name primitives, such as file-name-nondirectory, even if their input was encoded. > Which is why I suggest to decode right away in the functions that return > file names (e.g. directory-files). We already do that, so there's no issue in that department. The issue is in the file-name primitives that want to support both encoded and decoded file names, and as I understand from this discussion, this feature should stay. > > But since everybody (at least those who spoke) seem to think this is a > > w32 only problem, I will solve it for w32 only. > > I think the specific problems you mentioned are mostly non-issues under > POSIX, but the general problem of deciding which representation to use > is more general. I thought this was already decided in favor of decoded file names, a.k.a. "multibyte strings". The few calls that pass encoded file names are rare exceptions, but since we want to keep support for encoded file names, fixing those few places is not going to buy us anything except code reshuffling. The problem with encoded file names is that we have little support for them. E.g., we cannot up-/down-case them (except if we know the encoding is supported by the current locale). For multibyte encodings that are not UTF-8, we also cannot scan them by characters, only by bytes, so e.g. strchr will not generally work reliably. We are crippled. So some things will never work with encoded file names, but I guess no one cares, because most of those problems go away if the encoding is UTF-8. Fine; if no one cares, neither do I.