From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.lisp.guile.user Subject: Re: guile can't find a chinese named file Date: Wed, 15 Feb 2017 18:59:14 +0200 Message-ID: <83inobz8yl.fsf@gnu.org> References: <87h94gqz34.fsf@fencepost.gnu.org> <87fuk0ctve.fsf@elektro.pacujo.net> <878tpsqtzl.fsf@fencepost.gnu.org> <87zii8bcdw.fsf@elektro.pacujo.net> <87y3xspcux.fsf@fencepost.gnu.org> <578885360.4452806.1487105647708@mail.yahoo.com> <87r330cwhj.fsf@elektro.pacujo.net> <191859705.4469709.1487109121157@mail.yahoo.com> <20170214221914.1483ddb1@bother.homenet> <20170215091832.GA28017@tuxteam.de> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1487178143 15622 195.159.176.226 (15 Feb 2017 17:02:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 15 Feb 2017 17:02:23 +0000 (UTC) Cc: guile-user@gnu.org To: Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Feb 15 18:02:18 2017 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ce2y8-0003Xh-Ja for guile-user@m.gmane.org; Wed, 15 Feb 2017 18:02:16 +0100 Original-Received: from localhost ([::1]:41858 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ce2yE-0004Qf-41 for guile-user@m.gmane.org; Wed, 15 Feb 2017 12:02:22 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45580) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ce2uv-0002Iu-H4 for guile-user@gnu.org; Wed, 15 Feb 2017 11:58:58 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ce2ur-0002YK-Kf for guile-user@gnu.org; Wed, 15 Feb 2017 11:58:57 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:38437) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ce2ur-0002YF-HX; Wed, 15 Feb 2017 11:58:53 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4929 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ce2uq-0002VR-Pi; Wed, 15 Feb 2017 11:58:53 -0500 In-reply-to: <20170215091832.GA28017@tuxteam.de> (tomas@tuxteam.de) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:13230 Archived-At: > Date: Wed, 15 Feb 2017 10:18:32 +0100 > From: > > > Filenames and locales are not necessarily related. When you access a > > networked file system, you get the filename encoding you are given, > > which may or may not be the same as the particular locale encoding on > > your particular machine on one particular day, and may or may not be a > > unicode encoding. Glib, for example, enables you to set this with the > > G_FILENAME_ENCODING environmental variable [...] > > which is, btw., "just a better approximation", but still wrong: the > application creating a directory might have been "in" a different > locale (and thus having a different encoding) that the one creating > the file whithin that directory. > > Most notably, the whole path might cross several mount points, thus > the whole path can well have fragments coming from several file systems. A possible solution would be to decode each mount point's part as it is being resolved. > I think the only sane way to see a Linux file system path is the way > Linux sees it: as a byte string. This would lose a lot in 99% of use cases. You are, in effect, suggesting a "reverse optimization", whereby the majority of use cases is punished in favor of a small minority, based on theoretical intractability. > Sure, some helper infrastructure to try to make characters of that > mess will be welcome, but that should be absolutely robust wrt. > unexpected input e.g. bad UTF-8) and leave control to the application. Most applications won't like this burden, because most application programmers don't know enough about the issue to solve them correctly, especially for users of other OSes and locales. > > But if OpenBSD requires all _filenames_ to be in valid UTF-8, that > > is a bad decision in my view. > > NT has done that too. Windows can do that because it also transparently translates file names to the locale's encoding when files are accessed with ANSI APIs. Without such translation, this kind of decision is unwise, IMO.