From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Multibyte and unibyte file names Date: Wed, 23 Jan 2013 16:09:18 -0500 Message-ID: References: <83ehhbn680.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1358975381 27948 80.91.229.3 (23 Jan 2013 21:09:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 23 Jan 2013 21:09:41 +0000 (UTC) Cc: Kazuhiro Ito , Michael Albinus , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jan 23 22:09:59 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Ty7a1-0000ol-4P for ged-emacs-devel@m.gmane.org; Wed, 23 Jan 2013 22:09:57 +0100 Original-Received: from localhost ([::1]:42214 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ty7Zj-0002A2-PD for ged-emacs-devel@m.gmane.org; Wed, 23 Jan 2013 16:09:39 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:60141) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ty7Zh-00029t-1O for emacs-devel@gnu.org; Wed, 23 Jan 2013 16:09:38 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ty7Zf-0004PQ-Hk for emacs-devel@gnu.org; Wed, 23 Jan 2013 16:09:36 -0500 Original-Received: from pruche.dit.umontreal.ca ([132.204.246.22]:36841) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ty7Zf-0004PH-BI; Wed, 23 Jan 2013 16:09:35 -0500 Original-Received: from fmsmemgm.homelinux.net (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id r0NL9IUp008484; Wed, 23 Jan 2013 16:09:19 -0500 Original-Received: by fmsmemgm.homelinux.net (Postfix, from userid 20848) id 1B317AE194; Wed, 23 Jan 2013 16:09:18 -0500 (EST) In-Reply-To: <83ehhbn680.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 23 Jan 2013 19:45:35 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux) X-NAI-Spam-Flag: NO X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV4470=0 X-NAI-Spam-Version: 2.2.0.9309 : core <4470> : streams <893641> : uri <1326141> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 132.204.246.22 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:156610 Archived-At: > Let me start with a question: do file primitives need to support > unibyte file names, as well as multibyte ones? [ Oh no, not this mess! ] > If dst[] is an encoded string that uses a multibyte encoding, it is > wrong to look at just the last byte of the string, because it could be > a trailing byte of some multibyte sequence, right? In theory, yes. In practice it doesn't seem to be too much of a problem, tho it could become more serious if we start using utf-16 for Windows. Part of the problem is that not all systems agree on whether a file name is a sequence of bytes or a sequence of characters. I think that for w32 it makes sense to try and always decode file names before returning them to Elisp: Most file names passed to Elisp primitives are derived from file names returned by Elisp primitives, so if Emacs decodes all the file names it returns to Elisp, we can expect to see *very* few encoded file names passed to Elisp primitives. Stefan