From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.bugs Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Date: Wed, 01 Mar 2017 16:45:26 +0100 Message-ID: <87y3wpdmqx.fsf@pobox.com> References: <87y3yj99hs.fsf@pobox.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1488383183 22740 195.159.176.226 (1 Mar 2017 15:46:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 1 Mar 2017 15:46:23 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) Cc: 25397@debbugs.gnu.org To: Linas Vepstas Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Wed Mar 01 16:46:18 2017 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cj6SB-0004u3-1z for guile-bugs@m.gmane.org; Wed, 01 Mar 2017 16:46:11 +0100 Original-Received: from localhost ([::1]:47262 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cj6SG-0003i9-Sl for guile-bugs@m.gmane.org; Wed, 01 Mar 2017 10:46:16 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35693) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cj6S8-0003gR-FY for bug-guile@gnu.org; Wed, 01 Mar 2017 10:46:09 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cj6S3-0001lh-GU for bug-guile@gnu.org; Wed, 01 Mar 2017 10:46:08 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:36360) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cj6S3-0001lU-10 for bug-guile@gnu.org; Wed, 01 Mar 2017 10:46:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cj6S2-0002oz-JT for bug-guile@gnu.org; Wed, 01 Mar 2017 10:46:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Andy Wingo Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 01 Mar 2017 15:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 25397 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 25397-submit@debbugs.gnu.org id=B25397.148838313610808 (code B ref 25397); Wed, 01 Mar 2017 15:46:02 +0000 Original-Received: (at 25397) by debbugs.gnu.org; 1 Mar 2017 15:45:36 +0000 Original-Received: from localhost ([127.0.0.1]:34559 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cj6Rc-0002oF-I9 for submit@debbugs.gnu.org; Wed, 01 Mar 2017 10:45:36 -0500 Original-Received: from pb-sasl1.pobox.com ([64.147.108.66]:54548 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cj6Rb-0002o8-1l for 25397@debbugs.gnu.org; Wed, 01 Mar 2017 10:45:35 -0500 Original-Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id 82A775EF3F; Wed, 1 Mar 2017 10:45:34 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=BHvsydYzlnzrH9h4nCcTqf1Fob8=; b=KElZyb AMqvI2ZdPHcUFzNEE0WAEoJGPYYeTXfzrGbSI/7WNKD4tFW7FRiBLPdI3Ab0Frky k7rT+fAU2H1YySOdtdqZrATrYCInh8t2gAe50Ce1BWOrmdHB3DZD1fxNcZVi9/I2 qzVSFiNvfHYpsaw4WHnH309x2eGKlADMAVMns= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=Hx/mH/Eh+aFXXiyYdUtsMRW2PJLEbusa 1bEjRRkFVHRgUrWS3tN2H3PZrvf0ixyNudHHcYTdsz3B96dk31Up9HMpax1+8Khj kMii5pEogKsxKpPhWXdXOHsFfSXEN++IHDzhiXlD3c2kCHyHNdrD1qsuyiFDODWX uLjxKRQLMhc= Original-Received: from pb-sasl1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id 7389D5EF3D; Wed, 1 Mar 2017 10:45:34 -0500 (EST) Original-Received: from clucks (unknown [109.190.228.233]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl1.pobox.com (Postfix) with ESMTPSA id 57B0C5EF3A; Wed, 1 Mar 2017 10:45:33 -0500 (EST) In-Reply-To: (Linas Vepstas's message of "Mon, 9 Jan 2017 21:34:36 -0600") X-Pobox-Relay-ID: 1B2B23E6-FE96-11E6-8A12-B667064AB293-02397024!pb-sasl1.pobox.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:8631 Archived-At: On Tue 10 Jan 2017 04:34, Linas Vepstas writes: > void *wrap_puts(void* p) > { > char *wtf = p; > > SCM port = scm_current_output_port (); > > scm_puts("the port-encoding is=", port); > scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port); > > scm_puts("\nThe string to display is =", port); > scm_puts (wtf, port); > > scm_puts("\nWas expecting to see this=", port); > SCM str = scm_from_utf8_string(wtf); > scm_display(str, port); > scm_puts("\n\n", port); > > return NULL; > } So, there are a few questions here. scm_puts and scm_lfwrite are not documented, so we need to do basic science on them to see what they are supposed to do. Firstly, is scm_puts() a textual interface or a binary interface? I.e. does it write a sequence of characters or a sequence of bytes? If I look at uses of scm_puts in Guile sources, it seems clear that it's a textual interface. That is to say, at all points, the intention seems to be to write characters on a Guile port. All of the uses are of strings. Please do a "git grep" on your source to see if your perceptions correspond. Now the question is, what encoding is the argument in? If the port is UTF-16, that byte string should be decoded to characters, and that character sequence encoded to UTF-16. All of the scm_puts calls in Guile are of one-byte characters with codepoints less than 128, so when doing some port refactoring I chose to interpret the argument as latin1. FTR, in Guile 2.0, this was effectively a binary interface. Guile 2.0's scm_lfwrite interpreted the incoming bytes as ISO-8859-1 codepoints for the purposes of updating line and column, but scm_puts and scm_lfwrite just wrote out the bytes to the port directly, regardless of the encoding. That was the wrong thing. Are you arguing that the byte string given to scm_puts should be decoded from UTF-8? That would be OK. Andy