From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: David Kastrup Newsgroups: gmane.lisp.guile.user Subject: Re: guile can't find a chinese named file Date: Mon, 30 Jan 2017 20:27:34 +0100 Message-ID: <87y3xspcux.fsf@fencepost.gnu.org> References: <874m0gd3z4.fsf@gnu.org> <87wpdc8rx7.fsf@elektro.pacujo.net> <87poj4r04c.fsf@fencepost.gnu.org> <87k29c8q3b.fsf@elektro.pacujo.net> <87h94gqz34.fsf@fencepost.gnu.org> <87fuk0ctve.fsf@elektro.pacujo.net> <878tpsqtzl.fsf@fencepost.gnu.org> <87zii8bcdw.fsf@elektro.pacujo.net> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1485804488 18321 195.159.176.226 (30 Jan 2017 19:28:08 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 30 Jan 2017 19:28:08 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) Cc: guile-user@gnu.org To: Marko Rauhamaa Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Mon Jan 30 20:28:03 2017 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cYHcQ-0004W6-2x for guile-user@m.gmane.org; Mon, 30 Jan 2017 20:28:02 +0100 Original-Received: from localhost ([::1]:34703 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cYHcV-0000M8-HU for guile-user@m.gmane.org; Mon, 30 Jan 2017 14:28:07 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39761) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cYHc6-0000Lo-NG for guile-user@gnu.org; Mon, 30 Jan 2017 14:27:44 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cYHc2-0000An-OA for guile-user@gnu.org; Mon, 30 Jan 2017 14:27:42 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:35047) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cYHc2-0000Aj-KW; Mon, 30 Jan 2017 14:27:38 -0500 Original-Received: from x2f3b97a.dyn.telefonica.de ([2.243.185.122]:55836 helo=lola) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cYHc1-0005A7-U3; Mon, 30 Jan 2017 14:27:38 -0500 In-Reply-To: <87zii8bcdw.fsf@elektro.pacujo.net> (Marko Rauhamaa's message of "Mon, 30 Jan 2017 21:01:31 +0200") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:13149 Archived-At: Marko Rauhamaa writes: > David Kastrup : > >> Marko Rauhamaa writes: >>> Guile's mistake was to move to Unicode strings in the operating system >>> interface. >> >> Emacs uses an UTF-8 based encoding internally [...] > > C uses 8-bit characters. That is a model worth emulating. That's Guile-1.8. Guile-2 uses either Latin-1 or UCS-32 in its string internals, either Latin-1 or UTF-8 in its string API, and UTF-8 in its string port internals. > UTF-8 beautifully bridges the interpretation gap between 8-bit > character strings and text. However, the interpretation step should be > done in the application and not in the programming language. Elisp is focused enough about text that I think its choice of going UTF-8 internally with a Unicode character type reasonably sane. Its strings (the quirky unibyte strings excluded) are its own variant of UTF-8 internally, and its string port equivalent (buffers) are that same variant of UTF-8. And its API talks UTF-8 for strings, Unicode (or higher) for characters, and it indexes strings and buffers via Unicode character counts. Not O(1), but with enough trickery that it works well enough in practice. If strings are to be implemented strictly Scheme-standard-conforming, they need to be O(1) indexable. The Scheme standard is rather silent about Unicode however. I am not sure that sticking to the standard where it does not deal with reality is the best choice. I think the case for Guile-2 to _also_ support "unibyte strings" would be quite stronger than for Emacs (byte arrays and binary string ports don't allow using Guile's string processing functions). As it stands, the design of Guile-2 in my book currently involves too many mandatory conversions for just passing data around with Guile itself and Guile-based applications. > Support libraries for Unicode are naturally welcome. > > Plain Unicode text is actually quite a rare programming need. It is > woefully inadequate for the human interface, which generally requires > numerous other typesetting effects. But is also causing unnecessary > grief in the computer-computer interface, where the classic textual > naming and textual protocols are actually cutely chosen octet-aligned > binary formats. Sometimes yes, sometimes not. As long as Guile wants to be a general-purpose programming and extension language, it should deal reliably and robustly and reproducibly with whatever is thrown at it. Its choice of libraries does not currently make it so, but that could be fixed by either working on the (GNU) libraries or by giving Guile its own implementation. But that needs to be considered a priority. Nobody will do this just for fun and kicks. -- David Kastrup