From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.lisp.guile.bugs Subject: bug#18520: string ports should not have an encoding Date: Mon, 22 Sep 2014 01:34:39 +0200 Message-ID: <87iokgmttc.fsf@fencepost.gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1411344668 31290 80.91.229.3 (22 Sep 2014 00:11:08 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 22 Sep 2014 00:11:08 +0000 (UTC) To: 18520@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Mon Sep 22 02:11:01 2014 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XVrDW-0007a9-Qw for guile-bugs@m.gmane.org; Mon, 22 Sep 2014 02:10:59 +0200 Original-Received: from localhost ([::1]:41495 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVrDV-0007ku-N4 for guile-bugs@m.gmane.org; Sun, 21 Sep 2014 20:10:57 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48122) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVrDS-0007ke-2W for bug-guile@gnu.org; Sun, 21 Sep 2014 20:10:55 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XVrDQ-0004nt-SR for bug-guile@gnu.org; Sun, 21 Sep 2014 20:10:54 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:56236) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVrDQ-0004iB-PW for bug-guile@gnu.org; Sun, 21 Sep 2014 20:10:52 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1XVqek-000715-TI for bug-guile@gnu.org; Sun, 21 Sep 2014 19:35:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: David Kastrup Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sun, 21 Sep 2014 23:35:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 18520 X-GNU-PR-Package: guile X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-guile@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.141134249926956 (code B ref -1); Sun, 21 Sep 2014 23:35:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 21 Sep 2014 23:34:59 +0000 Original-Received: from localhost ([127.0.0.1]:47794 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XVqeg-00070i-Ha for submit@debbugs.gnu.org; Sun, 21 Sep 2014 19:34:58 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:41589) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XVqed-00070Z-2X for submit@debbugs.gnu.org; Sun, 21 Sep 2014 19:34:56 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XVqeb-0005AV-QH for submit@debbugs.gnu.org; Sun, 21 Sep 2014 19:34:54 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:39329) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVqeb-0005AC-N2 for submit@debbugs.gnu.org; Sun, 21 Sep 2014 19:34:53 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43298) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVqeV-0004Qr-Ne for bug-guile@gnu.org; Sun, 21 Sep 2014 19:34:48 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XVqeU-00059w-Hm for bug-guile@gnu.org; Sun, 21 Sep 2014 19:34:47 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:43718) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVqeU-00059Q-EE for bug-guile@gnu.org; Sun, 21 Sep 2014 19:34:46 -0400 Original-Received: from localhost ([127.0.0.1]:50894 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVqeO-0006VD-MW for bug-guile@gnu.org; Sun, 21 Sep 2014 19:34:41 -0400 Original-Received: by lola (Postfix, from userid 1000) id 7CFFCDF8CA; Mon, 22 Sep 2014 01:34:39 +0200 (CEST) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7561 Archived-At: In Guile 2.0, at the time a string port is opened, the value of the fluid %default-port-encoding is used for deciding how to encode the string into a byte stream, and set-port-encoding! may then be used for deciding how to decode that byte stream back into characters. This does not make sense as ports deliver characters, and strings contain characters. There is no point in going through bytes. Guile-2.2 does not consult %default-port-encoding but uses UTF-8 consistently (I guess, overriding set-port-encoding! will again change that). That still is not satisfactory. For example, using ftell on the input port will not report the string index of the string connected to the string port but rather a byte index into a UTF-8 encoded version of the string. This is a number that has nothing to do with the original string and cannot be used for correlating string and port. Ports fundamentally deliver characters, and so reading and writing from a string source/sink should not involve _any_ coding system. Files fundamentally deliver bytes, a conversion is required. The same would be the case when opening a port on a _bytevector_. Here an encoding would make equally make sense, and ftell/fseek offsets would naturally be in bytes. But a port on a string delivers and consumes characters. Any conversion, even a fixed UTF-8 conversion, will destroy the predictable nature of with-output-to-string and with-input-from-string and the respective uses of string ports. In code like the following, the results should not depend on either the fluid-set! or the set-port-encoding!, and the ftell should always output successive integers independent from either fluid-set! or set-port-encoding!. set-port-encoding! should probably flag an error, like an fseek on an unseekable device. (fluid-set! %default-port-encoding "UTF-8") (define s (list->string (map integer->char '(20 200 2000 20000)))) (with-input-from-string s (lambda () (set-port-encoding! (current-input-port) "ISO-8859-1") (let loop ((ch (read-char (current-input-port)))) (if (not (eof-object? ch)) (begin (format #t "~d, pos=~d\n" (char->integer ch) (ftell (current-input-port))) (loop (read-char (current-input-port)))))))) Again, things are quite different from bytevectors which could be accepted instead of a string for opening ports with the string-port commands, or could have their own port open/close commands, and the respective ports then definitely would want to obey set-port-encoding! (defaulting to %default-port-encoding) for _decoding_ the bytevector. I don't know what r7rs might think here. But for me, associating encodings for connecting strings to ports does not make sense. The relation is one of characters to characters. -- David Kastrup