From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Zefram Newsgroups: gmane.lisp.guile.bugs Subject: bug#22901: drain-input doesn't decode Date: Fri, 4 Mar 2016 03:09:44 +0000 Message-ID: <20160304030944.GA1318@fysh.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1457061083 23943 80.91.229.3 (4 Mar 2016 03:11:23 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 4 Mar 2016 03:11:23 +0000 (UTC) To: 22901@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Fri Mar 04 04:11:10 2016 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1abg90-0007T5-1l for guile-bugs@m.gmane.org; Fri, 04 Mar 2016 04:11:10 +0100 Original-Received: from localhost ([::1]:38547 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1abg8z-0006qA-7M for guile-bugs@m.gmane.org; Thu, 03 Mar 2016 22:11:09 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56414) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1abg8v-0006pq-Em for bug-guile@gnu.org; Thu, 03 Mar 2016 22:11:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1abg8s-0000lj-6I for bug-guile@gnu.org; Thu, 03 Mar 2016 22:11:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:35356) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1abg8s-0000lf-2c for bug-guile@gnu.org; Thu, 03 Mar 2016 22:11:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84) (envelope-from ) id 1abg8r-0004VU-TX for bug-guile@gnu.org; Thu, 03 Mar 2016 22:11:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Zefram Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Fri, 04 Mar 2016 03:11:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 22901 X-GNU-PR-Package: guile X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-guile@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.145706100517253 (code B ref -1); Fri, 04 Mar 2016 03:11:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 4 Mar 2016 03:10:05 +0000 Original-Received: from localhost ([127.0.0.1]:60716 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1abg7x-0004UD-7B for submit@debbugs.gnu.org; Thu, 03 Mar 2016 22:10:05 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:37103) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1abg7v-0004Tg-TA for submit@debbugs.gnu.org; Thu, 03 Mar 2016 22:10:04 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1abg7p-0000bM-OJ for submit@debbugs.gnu.org; Thu, 03 Mar 2016 22:09:58 -0500 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:54346) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1abg7p-0000bI-LC for submit@debbugs.gnu.org; Thu, 03 Mar 2016 22:09:57 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56307) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1abg7o-0006Xa-Hk for bug-guile@gnu.org; Thu, 03 Mar 2016 22:09:57 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1abg7l-0000b1-92 for bug-guile@gnu.org; Thu, 03 Mar 2016 22:09:56 -0500 Original-Received: from river.fysh.org ([87.98.248.19]:55145) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1abg7l-0000ao-2X for bug-guile@gnu.org; Thu, 03 Mar 2016 22:09:53 -0500 Original-Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian)) id 1abg7c-0001TA-SN; Fri, 04 Mar 2016 03:09:44 +0000 Content-Disposition: inline X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7975 Archived-At: The documentation for drain-input says that it returns a string of characters, implying that the result is equivalent to what you'd get from calling read-char some number of times. In fact it differs in a significant respect: whereas read-char decodes input octets according to the port's selected encoding, drain-input ignores the selected encoding and always decodes according to ISO-8859-1 (thus preserving the octet values in character form). $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (write (map char->integer (let r ((l '\''())) (let ((c (read-char (current-input-port)))) (if (eof-object? c) (reverse l) (r (cons c l))))))) (newline)' "UCS-2BE" (353 610 867) $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (peek-char (current-input-port)) (write (map char->integer (string->list (drain-input (current-input-port))))) (newline)' "UCS-2BE" (1 97 2 98 3 99) The practical upshot is that the input returned by drain-input can't be used in the same way as regular input from read-char. It can still be used if the code doing the reading is totally aware of the encoding, so that it can perform the decoding manually, but this seems a failure of abstraction. The value returned by drain-input ought to be coherent with the abstraction level at which it is specified. I can see that there is a reason for drain-input to avoid performing decoding: the problem that occurs if the buffer ends in the middle of a character. If drain-input is to return decoded characters then presumably in this case it would have to read further octets beyond the buffer contents, in an unbuffered manner, until it reaches a character boundary. If this is too unpalatable, perhaps drain-input should be permitted only on ports configured for single-octet character encodings. If, on the other hand, it is decided to endorse the current non-decoding behaviour, then the break of abstraction needs to be documented. -zefram