From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: John Darrington Newsgroups: gmane.lisp.guile.user Subject: Re: Filename encoding Date: Thu, 16 Jan 2014 15:07:43 +0100 Message-ID: <20140116140743.GA16999@jocasta.intra> References: <20140115125246.53fc72c7@bother.homenet> <87bnzdun74.fsf@netris.org> <20140115195051.3272023c@bother.homenet> <87a9ewu8n1.fsf@gnu.org> <83ha9437al.fsf@gnu.org> <8738koaxkm.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1389881289 11328 80.91.229.3 (16 Jan 2014 14:08:09 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 16 Jan 2014 14:08:09 +0000 (UTC) Cc: guile-user@gnu.org To: Ludovic Court??s Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Thu Jan 16 15:08:15 2014 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1W3ncE-0008KN-Jw for guile-user@m.gmane.org; Thu, 16 Jan 2014 15:08:14 +0100 Original-Received: from localhost ([::1]:32794 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3ncE-0005K3-5P for guile-user@m.gmane.org; Thu, 16 Jan 2014 09:08:14 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56300) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3nc1-0005Ju-CP for guile-user@gnu.org; Thu, 16 Jan 2014 09:08:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3nbx-0003f3-9H for guile-user@gnu.org; Thu, 16 Jan 2014 09:08:01 -0500 Original-Received: from de.cellform.com ([88.217.224.109]:53640 helo=jocasta.intra) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3nbn-0003du-De; Thu, 16 Jan 2014 09:07:47 -0500 Original-Received: from jocasta.intra (localhost [127.0.0.1]) by jocasta.intra (8.14.4/8.14.4/Debian-4) with ESMTP id s0GE7iI7018921 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 16 Jan 2014 15:07:44 +0100 Original-Received: (from john@localhost) by jocasta.intra (8.14.4/8.14.4/Submit) id s0GE7h5V018920; Thu, 16 Jan 2014 15:07:43 +0100 Content-Disposition: inline In-Reply-To: <8738koaxkm.fsf@gnu.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 88.217.224.109 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:11029 Archived-At: On Thu, Jan 16, 2014 at 02:03:05PM +0100, Ludovic Court??s wrote: Eli Zaretskii skribis: >> From: ludo@gnu.org (Ludovic Court??s) >> Date: Thu, 16 Jan 2014 00:29:06 +0100 >> >> Does anyone know of systems where the file name encoding is commonly >> different from locale encoding? Is it the case on Windows? > > Windows stores file names on disk encoded in UTF-16, but converts them > to the current codepage if you use Posix-style interfaces like 'open' > and 'rename'. So in practice, given that Guile uses the POSIX interfaces, the assumption that file names are in the locale encoding is valid on Windows. If you know that the filename was always obtained using the Guile's interface then the issue is never pertinent. The problem comes when a function is aske to open a non-ascii named file, without any information about where that filename came from. There is no answer to this general problem. We've encountered it over the years in PSPP what we are doing now, is to pass the filename around in a structure along with a variable indicating the encoding in which that filename should be interpreted. This works up to a point, but eventually there comes an interface where the crucial information is missing. For example, what happens if the filename is in a text file. We have heuristics which can guess the encoding of a file, but that is of course not completely reliable. One has to decide on an approach which will give the lowest probability of surprises. J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.