From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.lisp.guile.user Subject: Re: guile can't find a chinese named file Date: Wed, 15 Feb 2017 22:32:57 +0200 Message-ID: <83d1ejyz2e.fsf@gnu.org> References: <878tpsqtzl.fsf@fencepost.gnu.org> <87zii8bcdw.fsf@elektro.pacujo.net> <87y3xspcux.fsf@fencepost.gnu.org> <578885360.4452806.1487105647708@mail.yahoo.com> <87r330cwhj.fsf@elektro.pacujo.net> <191859705.4469709.1487109121157@mail.yahoo.com> <20170214221914.1483ddb1@bother.homenet> <20170215091832.GA28017@tuxteam.de> <83inobz8yl.fsf@gnu.org> <20170215202056.GB3723@tuxteam.de> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1487190885 27089 195.159.176.226 (15 Feb 2017 20:34:45 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 15 Feb 2017 20:34:45 +0000 (UTC) Cc: guile-user@gnu.org To: tomas@tuxteam.de Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Feb 15 21:34:42 2017 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ce6Hg-0006TZ-T1 for guile-user@m.gmane.org; Wed, 15 Feb 2017 21:34:41 +0100 Original-Received: from localhost ([::1]:42913 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ce6Hm-0001uW-4a for guile-user@m.gmane.org; Wed, 15 Feb 2017 15:34:46 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52133) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ce6Fu-0000PE-Ml for guile-user@gnu.org; Wed, 15 Feb 2017 15:32:51 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ce6Fp-0001wp-Va for guile-user@gnu.org; Wed, 15 Feb 2017 15:32:50 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41301) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ce6Fp-0001wl-Ru; Wed, 15 Feb 2017 15:32:45 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1297 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ce6Fn-00031E-Ue; Wed, 15 Feb 2017 15:32:45 -0500 In-reply-to: <20170215202056.GB3723@tuxteam.de> (tomas@tuxteam.de) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:13238 Archived-At: > Date: Wed, 15 Feb 2017 21:20:56 +0100 > From: tomas@tuxteam.de > Cc: guile-user@gnu.org > > > > Most notably, the whole path might cross several mount points, thus > > > the whole path can well have fragments coming from several file systems. > > > > A possible solution would be to decode each mount point's part as it > > is being resolved. > > ...which can only be based on guesswork: there's no reliable info on > the encoding used for that file system (if it's consistent at all). You could maintain a database of encodings per file system, perhaps user-defined, or derived by some other means. E.g., for volumes that physically reside on Windows or macOS the encoding is pretty much known in advance. > > > I think the only sane way to see a Linux file system path is the way > > > Linux sees it: as a byte string. > > > > This would lose a lot in 99% of use cases. You are, in effect, > > suggesting a "reverse optimization", whereby the majority of use cases > > is punished in favor of a small minority, based on theoretical > > intractability. > > I feel queasy doing some voodoo whithout the application having > a word on it. In the Emacs context it's a bit easier, because in > the "normal" case things are pretty quickly deferred to the user > (usually). Not really, there are a lot of internal operations that access files and directories, and would wreak major havoc if they don't succeed, silently, in the absolute majority of uses. > > > NT has done that too. > > > > Windows can do that because it also transparently translates file > > names to the locale's encoding when files are accessed with ANSI APIs. > > Without such translation, this kind of decision is unwise, IMO. > > I guess (I don't *know*) Windows stores information about the encoding > at file system level (and keeps that consistent). No. At the file system level (for NTFS volumes at least) Windows file names are always UTF-16 encoded, and Windows just "knows" that. Windows converts that to the locale's codepage when you access files via an API that communicates file names encoded in that codepage. (If the conversion fails, you get question marks instead of the characters that couldn't be converted.) > Linux hasn't that, it just keeps out of it. It hasn't even a place > to state the encoding used. Exactly. Which is why forcing a single file-name encoding on Linux/Unix filesystems is IMO a bad idea.