From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Encoding for a file containing filenames? Date: Fri, 09 Nov 2007 11:25:50 -0500 Message-ID: References: <47339ED4.1040906@f2s.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1194625574 23968 80.91.229.12 (9 Nov 2007 16:26:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 9 Nov 2007 16:26:14 +0000 (UTC) Cc: lekktu@gmail.com, eliz@gnu.org, jasonr@f2s.com, emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Nov 09 17:26:17 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1IqWgS-0005uc-FM for ged-emacs-devel@m.gmane.org; Fri, 09 Nov 2007 17:26:16 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IqWgG-00078H-OQ for ged-emacs-devel@m.gmane.org; Fri, 09 Nov 2007 11:26:04 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IqWgC-00076X-E4 for emacs-devel@gnu.org; Fri, 09 Nov 2007 11:26:00 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IqWgA-000750-Uj for emacs-devel@gnu.org; Fri, 09 Nov 2007 11:26:00 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IqWgA-00074w-OU for emacs-devel@gnu.org; Fri, 09 Nov 2007 11:25:58 -0500 Original-Received: from mercure.iro.umontreal.ca ([132.204.24.67]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1IqWg7-00073E-9m; Fri, 09 Nov 2007 11:25:55 -0500 Original-Received: from hidalgo.iro.umontreal.ca (hidalgo.iro.umontreal.ca [132.204.27.50]) by mercure.iro.umontreal.ca (Postfix) with ESMTP id F02102CF618; Fri, 9 Nov 2007 11:25:54 -0500 (EST) Original-Received: from faina.iro.umontreal.ca (faina.iro.umontreal.ca [132.204.26.177]) by hidalgo.iro.umontreal.ca (Postfix) with ESMTP id 663083FE0; Fri, 9 Nov 2007 11:25:50 -0500 (EST) Original-Received: by faina.iro.umontreal.ca (Postfix, from userid 20848) id 50A5C6CA42; Fri, 9 Nov 2007 11:25:50 -0500 (EST) In-Reply-To: (Kenichi Handa's message of "Fri, 09 Nov 2007 19:34:54 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.50 (gnu/linux) X-DIRO-MailScanner-Information: Please contact the ISP for more information X-DIRO-MailScanner: Found to be clean X-DIRO-MailScanner-SpamCheck: n'est pas un polluriel, SpamAssassin (score=-2.82, requis 5, autolearn=not spam, ALL_TRUSTED -2.82) X-DIRO-MailScanner-From: monnier@iro.umontreal.ca X-detected-kernel: by monty-python.gnu.org: Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:82875 Archived-At: >>>> But it will fail in Emacs-22 if the file (which contains file names) >>>> contains chars that Emacs-22 doesn't know how to encode to (and decode >>>> from) utf-8. >> > Are there any such chars that are likely to be used in filenames? Or is it >> > just the mule specific charsets that Emacs-22 cannot encode as utf-8. >> It's actually a bit worse: it shouldn't just be encodable with utf-8, >> but it should also be the case that encoding to utf-8 and back should >> return the exact same string (since these are filenames and will be >> compared with simple byte-comparison in the kernel). > I think the important thing is to assure the round-trip of > decode&encode (not encode&decode). Are you sure? The situation is that we have a file name as an Emacs string (i.e. decoded say from "locale" coding system) and we need to store it into a file to load it back in a later Emacs invocation (at which point we may use it to access the file, using hopefully the same "locale" coding system). So what needs to be byte-preserving is really: locale-decode -> utf8-encode -> utf8-decode -> locale-encode So as Eli points out, if locale is utf-8 there shouldn't be any problem. In any case, I'd go with utf-8. Stefan