From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.lisp.guile.bugs Subject: bug#18520: string ports should not have an encoding Date: Tue, 23 Sep 2014 00:12:58 +0200 Message-ID: <87tx3zjod1.fsf@fencepost.gnu.org> References: <87iokgmttc.fsf@fencepost.gnu.org> <87mw9rq20u.fsf@gnu.org> <87sijjlqx0.fsf@fencepost.gnu.org> <87sijjmvlr.fsf@gnu.org> <87bnq7lgg9.fsf@fencepost.gnu.org> <87d2anl79a.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1411424242 4141 80.91.229.3 (22 Sep 2014 22:17:22 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 22 Sep 2014 22:17:22 +0000 (UTC) Cc: 18520@debbugs.gnu.org To: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Tue Sep 23 00:17:15 2014 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XWBuz-0005F1-8a for guile-bugs@m.gmane.org; Tue, 23 Sep 2014 00:17:13 +0200 Original-Received: from localhost ([::1]:49694 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XWBuy-0007U4-Q2 for guile-bugs@m.gmane.org; Mon, 22 Sep 2014 18:17:12 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56381) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XWBuu-0007TC-Jm for bug-guile@gnu.org; Mon, 22 Sep 2014 18:17:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XWBut-00009M-JY for bug-guile@gnu.org; Mon, 22 Sep 2014 18:17:08 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:57710) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XWBut-000095-FL for bug-guile@gnu.org; Mon, 22 Sep 2014 18:17:07 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1XWBun-0006ip-VJ for bug-guile@gnu.org; Mon, 22 Sep 2014 18:17:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: David Kastrup Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 22 Sep 2014 22:17:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 18520 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 18520-submit@debbugs.gnu.org id=B18520.141142419225792 (code B ref 18520); Mon, 22 Sep 2014 22:17:01 +0000 Original-Received: (at 18520) by debbugs.gnu.org; 22 Sep 2014 22:16:32 +0000 Original-Received: from localhost ([127.0.0.1]:49274 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XWBuJ-0006ht-6a for submit@debbugs.gnu.org; Mon, 22 Sep 2014 18:16:32 -0400 Original-Received: from fencepost.gnu.org ([208.118.235.10]:41908) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XWBuC-0006hc-RQ for 18520@debbugs.gnu.org; Mon, 22 Sep 2014 18:16:28 -0400 Original-Received: from localhost ([127.0.0.1]:49212 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XWBuA-0005Aj-VY; Mon, 22 Sep 2014 18:16:23 -0400 Original-Received: by lola (Postfix, from userid 1000) id 6A9A9E620D; Tue, 23 Sep 2014 00:12:58 +0200 (CEST) In-Reply-To: <87d2anl79a.fsf@gnu.org> ("Ludovic =?UTF-8?Q?Court=C3=A8s?="'s message of "Mon, 22 Sep 2014 22:39:29 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7576 Archived-At: ludo@gnu.org (Ludovic Court=C3=A8s) writes: > David Kastrup skribis: >> >> For error messages, yes. For associating a position in a string with a >> previously parsed closure, no. > > But wouldn=E2=80=99t a line/column pair be as suitable as a unique identi= fier as > the position in the file? As long as the reencoded UTF-8 is byte-identical to the original. At the current point of time, we flag non-UTF-8 sequences with a warning and continue. People complained previously about things like Latin-1 characters (most likely to occur in comments or lyrics where they cause little or well-identifiable havoc) leading to unceremonious aborts without identifiable cause. At any rate, the current behavior does not make sense. Guile 2.0 might refuse to turn a string into a port, and for Guile 2.2 the port encoding may be used to have a UTF-8 rendition of the string characters be interpreted in another encoding (like latin-1) but not the other way round. Both versions make only some half-baked sense. Most resulting problems can probably be worked around in some manner, but string ports are actually the main stringbuf-like mechanism that Scheme has (dynamically growing strings that are more compact than a list of characters). Wedging a compulsory code conversion into it that is mirrored in the port positions seems like a distraction. > Also, if the result of =E2=80=98ftell=E2=80=99 is used as a unique identi= fier, does it > really matter whether it=E2=80=99s an offset measured in bytes or in > character? In the LilyPond lexer, stuff is usually measured with byte offsets. Yes, one can certainly parse the UTF-8 character distances and hope to arrive at the same results as the UTF-8 reencoding. But the point of GUILE's character set support was not really to make everything more complicated, was it? --=20 David Kastrup