From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Chris Vine Newsgroups: gmane.lisp.guile.user Subject: Re: Running script from directory with UTF-8 characters Date: Tue, 22 Dec 2015 14:21:25 +0000 Message-ID: <20151222142125.17ba7368@bother.homenet> References: <87twnbfkzb.fsf@elektro.pacujo.net> <20151222003447.198ea945@bother.homenet> <87io3rffo5.fsf@elektro.pacujo.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1450794125 31439 80.91.229.3 (22 Dec 2015 14:22:05 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 22 Dec 2015 14:22:05 +0000 (UTC) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Tue Dec 22 15:21:56 2015 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aBNp0-0001IK-Kh for guile-user@m.gmane.org; Tue, 22 Dec 2015 15:21:50 +0100 Original-Received: from localhost ([::1]:50714 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBNoz-00052v-T3 for guile-user@m.gmane.org; Tue, 22 Dec 2015 09:21:49 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40998) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBNop-00052R-By for guile-user@gnu.org; Tue, 22 Dec 2015 09:21:40 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aBNok-0000jy-8c for guile-user@gnu.org; Tue, 22 Dec 2015 09:21:38 -0500 Original-Received: from smtpout3.wanadoo.co.uk ([80.12.242.59]:26931 helo=smtpout.wanadoo.co.uk) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBNoj-0000ja-Tu for guile-user@gnu.org; Tue, 22 Dec 2015 09:21:34 -0500 Original-Received: from bother.homenet ([95.146.110.69]) by mwinf5d43 with ME id weMX1r00C1VswCF03eMXKi; Tue, 22 Dec 2015 15:21:31 +0100 X-ME-Helo: bother.homenet X-ME-Date: Tue, 22 Dec 2015 15:21:31 +0100 X-ME-IP: 95.146.110.69 Original-Received: from bother.homenet (localhost [127.0.0.1]) by bother.homenet (Postfix) with ESMTP id 158C412124F for ; Tue, 22 Dec 2015 14:21:26 +0000 (GMT) In-Reply-To: <87io3rffo5.fsf@elektro.pacujo.net> X-Mailer: Claws Mail 3.13.0 (GTK+ 2.24.29; i686-pc-linux-gnu) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.12.242.59 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:12262 Archived-At: On Tue, 22 Dec 2015 03:14:18 +0200 Marko Rauhamaa wrote: > Chris Vine : > > > I think the problem is that calling the native 'primitive-load' > > procedure on a filename with UTF-8 encoding with a character outside > > the ASCII range (when the locale encoding is also UTF-8) fails to > > work unless you call '(set-locale LC_ALL "")' in the program first. > > > > Of course you can't do that when passing guile a filename as a > > program argument. It does seem like a weakness, even if not a bug. > > How can it not be a bug? > > Also, Linux pathnames can contain any bytes other than NUL regardless > of the locale (and quite often do) so I hope Guile doesn't paint > itself too deep in the Unicode corner. Python is struggling with > analogous issues but has been careful to at least make it possible to > deal with bytevector pathnames and bytevector standard ports. > > For example, > > scheme@(guile-user)> (opendir ".") > $1 = # > [...] > scheme@(guile-user)> (readdir $1) > $4 = "?9t\x1b[" > scheme@(guile-user)> (open-file $4 "r") > ERROR: In procedure open-file: > ERROR: In procedure open-file: No such file or directory: > "?9t\x1b[" You can set the locale in the REPL, if that is where you are working from (as in your example), and then UTF-8 pathnames will work fine. The problem about this is that although a developer might use the REPL, and therefore maybe assume that that is what everyone else does, the end user probably just wants to run the script by passing guile a file name on the command line. To that extent I agree it is a bug. But the response to the filing of such a bug might be that that is how it is meant to work. Chris