From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Newsgroups: gmane.lisp.guile.user Subject: Re: guile can't find a chinese named file Date: Wed, 15 Feb 2017 10:18:32 +0100 Message-ID: <20170215091832.GA28017@tuxteam.de> References: <87h94gqz34.fsf@fencepost.gnu.org> <87fuk0ctve.fsf@elektro.pacujo.net> <878tpsqtzl.fsf@fencepost.gnu.org> <87zii8bcdw.fsf@elektro.pacujo.net> <87y3xspcux.fsf@fencepost.gnu.org> <578885360.4452806.1487105647708@mail.yahoo.com> <87r330cwhj.fsf@elektro.pacujo.net> <191859705.4469709.1487109121157@mail.yahoo.com> <20170214221914.1483ddb1@bother.homenet> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; x-action=pgp-signed Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1487150346 7000 195.159.176.226 (15 Feb 2017 09:19:06 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 15 Feb 2017 09:19:06 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Feb 15 10:19:01 2017 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cdvjm-0001Gg-7a for guile-user@m.gmane.org; Wed, 15 Feb 2017 10:18:58 +0100 Original-Received: from localhost ([::1]:39172 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cdvjs-0007PN-1Z for guile-user@m.gmane.org; Wed, 15 Feb 2017 04:19:04 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45056) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cdvjT-0007PF-3i for guile-user@gnu.org; Wed, 15 Feb 2017 04:18:40 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cdvjP-000146-T2 for guile-user@gnu.org; Wed, 15 Feb 2017 04:18:39 -0500 Original-Received: from mail.tuxteam.de ([5.199.139.25]:54734 helo=tomasium.tuxteam.de) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cdvjP-00012l-M7 for guile-user@gnu.org; Wed, 15 Feb 2017 04:18:35 -0500 Original-Received: from tomas by tomasium.tuxteam.de with local (Exim 4.80) (envelope-from ) id 1cdvjM-0007tP-TN for guile-user@gnu.org; Wed, 15 Feb 2017 10:18:32 +0100 In-Reply-To: <20170214221914.1483ddb1@bother.homenet> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 5.199.139.25 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:13216 Archived-At: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, Feb 14, 2017 at 10:19:14PM +0000, Chris Vine wrote: > On Tue, 14 Feb 2017 21:52:01 +0000 (UTC) > Mike Gran wrote: > [snip] > > > In particular, filenames are *not*, nor can they be mapped to, > > > Unicode > > > > > strings in Linux. > > > > True. Linux should follow OpenBSD and make all locales UTF-8. > > Filenames and locales are not necessarily related. When you access a > networked file system, you get the filename encoding you are given, > which may or may not be the same as the particular locale encoding on > your particular machine on one particular day, and may or may not be a > unicode encoding. Glib, for example, enables you to set this with the > G_FILENAME_ENCODING environmental variable [...] which is, btw., "just a better approximation", but still wrong: the application creating a directory might have been "in" a different locale (and thus having a different encoding) that the one creating the file whithin that directory. Most notably, the whole path might cross several mount points, thus the whole path can well have fragments coming from several file systems. I think the only sane way to see a Linux file system path is the way Linux sees it: as a byte string. Sure, some helper infrastructure to try to make characters of that mess will be welcome, but that should be absolutely robust wrt. unexpected input e.g. bad UTF-8) and leave control to the application. Not easy. > g_filename_to_utf8() and g_filename_from_utf8() functions for this > purpose. To me, that seems insufficient, unless this just applies to one (e.g. the last) path element. Skimming the docs I can't see whether you are only supposed to do that or whether you can dump whole paths (or path fragments) into those functions. > You can tie the filename encoding to the locale encoding by > defining the G_BROKEN_FILENAMES environmental variable but that is > deprecated (the name suggests what they thing about that idea). > > You may possibly agree with this: I am not clear from your post what > connection you were making between locales and filenames. But if > OpenBSD requires all _filenames_ to be in valid UTF-8, that is a bad > decision in my view. NT has done that too. I don't know: there are arguments for both approaches -- that depends whether you think file names are composed of characters (makes sense, no?) or whether the OS doesn't care what's in them (just leave null and slash alone!). It's moving between those two views what's hard. Personally, I'd tend to have Guile being agnostic (i.e. byte arrays) at the lowest level (no conversions), and offer the application what it knows (on BSD or "modern" Windows say: "yes, that's UTF-8" and on Linux say "No idea, but you can try to convert"). Current locale is just a weak hint one might use in heuristics. For things like environment variables and command line arguments, locale is a stronger hint (but not 100%). > Linux is capable of treating filenames as just a null-terminated array > of bytes with '/' as the directory separator. It is encoding agnostic, > and that works just fine. Or not. For the OS all is fine, for the applications it's a small hell -- see those Glib functions you quoted, which -- given their interfaces -- can't possibly do the right thing (dropping their names in a search engine to skim their documentation turns up quite a lot of failure modes, if you know what I mean). regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlikHOgACgkQBcgs9XrR2kYBLACggihOlLCNLcUjlrsWh0vQMuH8 JxEAnRye7C4d1GNDJi7x6nLgI1PMamex =+A5K -----END PGP SIGNATURE-----