From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: David Kastrup Newsgroups: gmane.lisp.guile.user Subject: Re: guile can't find a chinese named file Date: Wed, 15 Feb 2017 00:58:41 +0100 Organization: Organization?!? Message-ID: <87inoc5npq.fsf@fencepost.gnu.org> References: <874m0gd3z4.fsf@gnu.org> <87wpdc8rx7.fsf@elektro.pacujo.net> <87poj4r04c.fsf@fencepost.gnu.org> <87k29c8q3b.fsf@elektro.pacujo.net> <87h94gqz34.fsf@fencepost.gnu.org> <87fuk0ctve.fsf@elektro.pacujo.net> <878tpsqtzl.fsf@fencepost.gnu.org> <87zii8bcdw.fsf@elektro.pacujo.net> <87y3xspcux.fsf@fencepost.gnu.org> <578885360.4452806.1487105647708@mail.yahoo.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1487116764 2627 195.159.176.226 (14 Feb 2017 23:59:24 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 14 Feb 2017 23:59:24 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Feb 15 00:59:20 2017 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cdn0B-00008v-0f for guile-user@m.gmane.org; Wed, 15 Feb 2017 00:59:19 +0100 Original-Received: from localhost ([::1]:37795 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cdn0F-0004nW-11 for guile-user@m.gmane.org; Tue, 14 Feb 2017 18:59:23 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54580) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cdmzr-0004nQ-Nh for guile-user@gnu.org; Tue, 14 Feb 2017 18:59:01 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cdmzo-0008Vu-My for guile-user@gnu.org; Tue, 14 Feb 2017 18:58:59 -0500 Original-Received: from [195.159.176.226] (port=57679 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cdmzo-0008VY-FM for guile-user@gnu.org; Tue, 14 Feb 2017 18:58:56 -0500 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1cdmzd-0006WL-U3 for guile-user@gnu.org; Wed, 15 Feb 2017 00:58:45 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 81 Original-X-Complaints-To: usenet@blaine.gmane.org X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw Cancel-Lock: sha1:qri23BkN5mm8HBcPfU6xgPVpIx0= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.159.176.226 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:13214 Archived-At: Mike Gran writes: > But, for what it is worth, the Latin-1/UCS-32 design decision came > from a couple of conflicting requirements. The switch happened in the > 1.9.x series. > > > There was several examples of legacy C code using Guile for an > extension language that accessed the bytes of a string directly, using > > SCM_STRING_CHARS or scm_i_string_chars. To keep from breaking legacy > code, we needed to retain the capability to use this (then already > deprecated) capability to have C programs access 8-bit-locale string > internals directly. But if you don't know whether the strings are Latin-1 or UCS-32, that's sort of academical. > Also, in R6RS, there was the requirement that functions like > "string-ref" act in "constant time". This suggested either a > codepoint-array representation for strings, or a UTF-8 array > representation with some indexing to allow for constant-time access. The problem is not that Guile has an idiosyncratic internal string representation. As you note, other programs have that. The problem is that Guile does not have an API for passing/processing strings in that representation. That means that passing strings in and out of Guile is expensive. And when working with string ports, even keeping data purely inside of Guile requires conversion processes, and string port positions are calculated in UTF8-encoded byte offsets when strings are indexed in characters. The problem is that Guile is _constantly_ required to recode strings it is processing. And to add insult to injury, it cannot do this without data loss when its string encoding assumptions are wrong. PostScript files are usually encoded in Latin-1 with occasional UCS-16 passages. Reading and writing and copying such files byte-correctly while trying to actually parse their contents is not feasible with Guile. > I still maintain that this design decision was a good one based on the > simplicity of implementation. As I said: the problem is not the chosen internal representation. The problem is that there is no API to access it, and it does not even map to string ports. > The great difficulty with the UTF-8 Guile prototype was the need to > interrogate every string access or index to decide if it was a > codepoint index or a byte index. I abandoned that effort because it > was doing my head in. Emacs tried this in version 20.2, and got rid of it in version 20.4 or so, obliterating byte-based indexing completely. Anything else would not have worked in the long run. That was when, 16 years ago? > Had we chosen that route, the result would likely have been a long, > long process of squashing difficult bugs related to byte vs codepoint > index confusion. > > But, for what it is worth, we've had a few years of the internal > representation of strings being private, so any modification of > internal representation of strings would be easier in 2017 than they > were in 2007, when the guts of strings were exposed to the C API. > (N.B. dak at gnu is on my block list, so I won't see any such > response.) Not just on yours. LilyPond is probably the largest application using Guile as its extension language, with pretty much the worst impacts of Guile-2 design decisions. So obviously nobody wants to hear from its most active developer. This is even more important now that LilyPond is getting removed from Debian and other distributions because it is still hopeless to get it to run under Guile-2 (the experimental support has encoding and stability problems and runs about a factor of 5 slower than Guile-1). The less one hears of that, the better for morale. -- David Kastrup