From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Multibyte and unibyte file names Date: Sat, 26 Jan 2013 12:54:20 +0200 Message-ID: <83r4l8jjtv.fsf@gnu.org> References: <83ehhbn680.fsf@gnu.org> <83wqv2ldk1.fsf@gnu.org> <83obgel94c.fsf@gnu.org> <83k3r1lnlb.fsf@gnu.org> <83vcalj97s.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1359197662 8102 80.91.229.3 (26 Jan 2013 10:54:22 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 26 Jan 2013 10:54:22 +0000 (UTC) Cc: kzhr@d1.dion.ne.jp, michael.albinus@gmx.de, emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Jan 26 11:54:41 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Tz3PE-0004zJ-Sc for ged-emacs-devel@m.gmane.org; Sat, 26 Jan 2013 11:54:40 +0100 Original-Received: from localhost ([::1]:47161 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tz3Ox-0005Ph-Du for ged-emacs-devel@m.gmane.org; Sat, 26 Jan 2013 05:54:23 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:33137) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tz3Oq-0005Np-52 for emacs-devel@gnu.org; Sat, 26 Jan 2013 05:54:22 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Tz3Oo-0005dk-Nu for emacs-devel@gnu.org; Sat, 26 Jan 2013 05:54:16 -0500 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:51372) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tz3Oo-0005dX-C3 for emacs-devel@gnu.org; Sat, 26 Jan 2013 05:54:14 -0500 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MH800100BJ8WT00@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Sat, 26 Jan 2013 12:54:11 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MH80018NBMAGKA0@a-mtaout22.012.net.il>; Sat, 26 Jan 2013 12:54:10 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:156645 Archived-At: > From: Stefan Monnier > Cc: emacs-devel@gnu.org, kzhr@d1.dion.ne.jp, michael.albinus@gmx.de > Date: Fri, 25 Jan 2013 17:28:40 -0500 > > > What I meant was to return decoded file names from all file-name > > primitives, such as file-name-nondirectory, even if their input was > > encoded. > > It's probably OK to do that, but I wonder why we'd need to do it It's not a goal in itself, it's a side effect: if every primitive decodes any encoded file name on entry, it will thereafter manipulate decoded strings throughout its execution, and will therefore return a decoded string. (We could, of course, encode it back if we found the argument encoded, but then it isn't exactly clear what to do when some arguments are encoded, the others aren't; and if some of them are pure-ASCII, they are not easily distinguished from encoded file names.) > under what circumstances could such a primitive receive an encoded > file-name, if all the file names returned to Elisp (by things like > directory-files) are already decoded? One way is that a primitive gets called from C. I gave one example of this in my original message. There aren't many of such examples, but if we _want_ to support encoded file names, the code needs to DTRT with them, even if this happens only once in a blue moon. > > The issue is in the file-name primitives that want to support both > > encoded and decoded file names, and as I understand from this > > discussion, this feature should stay. > > Of course, we shouldn't just reject encoded filenames, but I don't see > why we should worry too much about them. I "worry" because they need separate code, especially with multibyte encodings; writing that code for an encoding not supported by the current locale is tricky at best, if not downright impossible, and certainly inefficient. Are you saying that since this happens infrequently, we could process such file names in a broken way, e.g. finding a directory separator where there's none, as demonstrated in http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13515#5? > > So some things will never work with encoded file names, but I guess no > > one cares, because most of those problems go away if the encoding is > > UTF-8. Fine; if no one cares, neither do I. > > Actually, even with other coding systems, this shouldn't be a serious > issue since encoded file names should be rare. The code needs to be there anyway. We cannot remove it, and we cannot break it, because people will complain.