From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Chris Vine Newsgroups: gmane.lisp.guile.user Subject: Re: Filename encoding Date: Wed, 15 Jan 2014 21:42:57 +0000 Message-ID: <20140115214257.45af1076@bother.homenet> References: <20140115125246.53fc72c7@bother.homenet> <87bnzdun74.fsf@netris.org> <20140115195051.3272023c@bother.homenet> <83sisp2c65.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1389822203 20105 80.91.229.3 (15 Jan 2014 21:43:23 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 15 Jan 2014 21:43:23 +0000 (UTC) Cc: guile-user@gnu.org To: Eli Zaretskii Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Jan 15 22:43:29 2014 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1W3YFC-0004dI-SO for guile-user@m.gmane.org; Wed, 15 Jan 2014 22:43:27 +0100 Original-Received: from localhost ([::1]:57388 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3YFC-0001rn-Gs for guile-user@m.gmane.org; Wed, 15 Jan 2014 16:43:26 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38844) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3YEr-0001pX-Q4 for guile-user@gnu.org; Wed, 15 Jan 2014 16:43:10 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3YEe-0008Pg-6Q for guile-user@gnu.org; Wed, 15 Jan 2014 16:43:05 -0500 Original-Received: from smtpout4.wanadoo.co.uk ([80.12.242.68]:37621 helo=smtpout.wanadoo.co.uk) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3YEd-0008PR-TA for guile-user@gnu.org; Wed, 15 Jan 2014 16:42:52 -0500 Original-Received: from bother.homenet ([95.146.112.102]) by mwinf5d58 with ME id EMip1n00A2Ccjlb03Mip56; Wed, 15 Jan 2014 22:42:50 +0100 Original-Received: from bother.homenet (localhost [127.0.0.1]) by bother.homenet (Postfix) with ESMTP id 37E1988504; Wed, 15 Jan 2014 21:42:57 +0000 (GMT) In-Reply-To: <83sisp2c65.fsf@gnu.org> X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.22; i686-pc-linux-gnu) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.12.242.68 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:11020 Archived-At: On Wed, 15 Jan 2014 23:00:18 +0200 Eli Zaretskii wrote: > > Date: Wed, 15 Jan 2014 19:50:51 +0000 > > From: Chris Vine > > Cc: guile-user@gnu.org > > > > POSIX system calls are encoding agnostic. The filename is just a > > series of bytes terminating with a NUL character. All guile needs > > to know is what encoding the person creating the filesystem has > > adopted in naming files and which it needs to map to. > > This doesn't work well, because you cannot easily take apart and > construct file names in encoding-agnostic ways. For example, some > multibyte sequence in an arbitrary encoding could include the '/' or > '\' characters, so searching for directory separators could fail, > unless you use multibyte-aware string functions (which is a nuisance, > because these functions only support a single locale at a time). > > So I think using UTF-8 internally is a much better way. I am not sure what you mean, as I am not talking about internal use. Guile uses IS0-5598-1 and UTF-32 internally for all its strings, which is fine. glib uses UTF-32 and UTF-8 internally for most purposes. It is the external representation which is in issue. This is just an encoding transformation for the library when looking up a file (be it guile, glib or anything else). As it happens (although this is beside the point) using a byte value or sequence in a filename which the operating system reserves as the '/' character, for a purpose other than designating a pathname, or a NUL character for designating anything other than end of filename, is not POSIX compliant and will not work on any operating system I know of, including windows. (As for POSIX, see SUS, Base Definitions, section 3.170 (Filename) and 3.267 (Pathname).) But as I say, that is irrelevant. Whatever the filesystem encoding happens to be, it happens to be. It might not be a narrow encoding at all. Chris