From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Chris Vine Newsgroups: gmane.lisp.guile.user Subject: Re: Filename encoding Date: Wed, 15 Jan 2014 19:50:51 +0000 Message-ID: <20140115195051.3272023c@bother.homenet> References: <20140115125246.53fc72c7@bother.homenet> <87bnzdun74.fsf@netris.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1389815461 6062 80.91.229.3 (15 Jan 2014 19:51:01 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 15 Jan 2014 19:51:01 +0000 (UTC) Cc: guile-user@gnu.org To: Mark H Weaver Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Jan 15 20:51:07 2014 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1W3WUU-0002fB-L4 for guile-user@m.gmane.org; Wed, 15 Jan 2014 20:51:06 +0100 Original-Received: from localhost ([::1]:56924 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3WUU-0005CD-9p for guile-user@m.gmane.org; Wed, 15 Jan 2014 14:51:06 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43102) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3WUG-0005BH-K3 for guile-user@gnu.org; Wed, 15 Jan 2014 14:50:57 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3WUB-000149-6A for guile-user@gnu.org; Wed, 15 Jan 2014 14:50:52 -0500 Original-Received: from smtpout1.wanadoo.co.uk ([80.12.242.29]:30317 helo=smtpout.wanadoo.co.uk) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3WUA-000143-Nz for guile-user@gnu.org; Wed, 15 Jan 2014 14:50:47 -0500 Original-Received: from bother.homenet ([95.146.112.102]) by mwinf5d15 with ME id EKqk1n00E2Ccjlb03Kqk2d; Wed, 15 Jan 2014 20:50:45 +0100 Original-Received: from bother.homenet (localhost [127.0.0.1]) by bother.homenet (Postfix) with ESMTP id A84D388504; Wed, 15 Jan 2014 19:50:51 +0000 (GMT) In-Reply-To: <87bnzdun74.fsf@netris.org> X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.22; i686-pc-linux-gnu) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.12.242.29 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:11013 Archived-At: On Wed, 15 Jan 2014 13:14:39 -0500 Mark H Weaver wrote: > Chris Vine writes: > > > A number of guile's scheme procedures look-up or reference files on > > a file system (open-file, load and so forth). > > > > How does guile translate filenames from its internal string > > representation (ISO-8859-1/UTF-32) to narrow string filename > > encoding when looking up the file? Does it assume filenames are in > > locale encoding (not particularly safe on networked file systems) > > or does it provide a fluid for this? (glib caters for this with the > > G_FILENAME_ENCODING environmental variable.) > > It assumes filenames are in locale encoding. Ditto for virtually > everything that interfaces with POSIX-style byte strings, including > environment variables, command-line arguments, etc. Encoding errors > will raise exceptions by default. > > My hope is that this will become less of an issue over time, as > systems increasingly standardize on UTF-8. I see no other good > solution. > > Thoughts? POSIX system calls are encoding agnostic. The filename is just a series of bytes terminating with a NUL character. All guile needs to know is what encoding the person creating the filesystem has adopted in naming files and which it needs to map to. So far as filenames are concerned, this seems to me to be something for which a fluid would be just the thing - it could default to the locale encoding but a user could set it to something else. I suppose command lines and environmental variables are less problematic because they are usually local to a particular machine, although that may not necessarily be so true these days for command lines. Fluids would have a substantial advantage over glib's approach of an environmental variable. Fluids can be thread safe, environmental variables are not. (Incidentally, with glib you can set the environmental variable G_BROKEN_FILENAMES instead of G_FILENAME_ENCODING which will cause the glib file functions to use locale encoding, which I guess expresses their view on the issue. However, their solution of using environmental variables is not ideal.) Chris