From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: Re: string port encodings Date: Thu, 31 Jan 2013 12:04:56 +0100 Message-ID: <87mwvpvciv.fsf@pobox.com> References: <87wqvems2s.fsf@pobox.com> <87hamhf7yl.fsf@gnu.org> <877gndgj5t.fsf@pobox.com> <87wqvdc9lg.fsf@gnu.org> <87txqhdmdj.fsf@pobox.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1359630378 3101 80.91.229.3 (31 Jan 2013 11:06:18 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 31 Jan 2013 11:06:18 +0000 (UTC) Cc: guile-devel@gnu.org To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Jan 31 12:06:38 2013 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1U0ryT-00009P-4X for guile-devel@m.gmane.org; Thu, 31 Jan 2013 12:06:33 +0100 Original-Received: from localhost ([::1]:59098 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U0ryA-0005Qj-QU for guile-devel@m.gmane.org; Thu, 31 Jan 2013 06:06:14 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:40521) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U0ry6-0005QR-K9 for guile-devel@gnu.org; Thu, 31 Jan 2013 06:06:12 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1U0ry5-0007ey-6A for guile-devel@gnu.org; Thu, 31 Jan 2013 06:06:10 -0500 Original-Received: from a-pb-sasl-quonix.pobox.com ([208.72.237.25]:34121 helo=sasl.smtp.pobox.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U0ry5-0007eq-1g; Thu, 31 Jan 2013 06:06:09 -0500 Original-Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id B2F9C9792; Thu, 31 Jan 2013 06:04:59 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=sasl; bh=PGpAFrVGTSfB +GFh5fgod1uIHg4=; b=v+ETB8BuVripMJbFULQUxmWWUjMtP51S0quZxyWvEHkH JjT3mkc3w8v9VUZT83gdKuBLviLuabWtZmdcd6kootJcm1FAJU6KphbXJSKi7U3E LRPsk85IfWKuuNVjoZxxor9w/7itSwuVxyXOp33o+O+j107I7MQLBxWa7SaFiqw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=jgsL3G uNddDdGiZxbOOV3wUGMgn4bqCDXnLYyFe/4HwE2DS8tAyWTJOt3fZWgyXMJKdtiS z4RHYhfZ5qgvJaXZSn1UhhVS4QzhlsZFgAw5sciLlgLPaS9DSDVEubOyRKywhw13 kr/1rgLLOCVSx85/1XCFr7xpAYGf68HoH9m6k= Original-Received: from a-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id A9E9C9791; Thu, 31 Jan 2013 06:04:59 -0500 (EST) Original-Received: from badger (unknown [88.160.190.192]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTPSA id 19CBC9790; Thu, 31 Jan 2013 06:04:58 -0500 (EST) In-Reply-To: <87txqhdmdj.fsf@pobox.com> (Andy Wingo's message of "Wed, 16 Jan 2013 19:16:24 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) X-Pobox-Relay-ID: 0D9ED3EA-6B96-11E2-B9AE-0A4F0E5B5709-02397024!a-pb-sasl-quonix.pobox.com X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 208.72.237.25 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:15645 Archived-At: Hi, On Wed 16 Jan 2013 19:16, Andy Wingo writes: > On Wed 16 Jan 2013 18:37, ludo@gnu.org (Ludovic Court=C3=A8s) writes: > >> I just think [string port encodings] may have to wait until 2.2. > > Oh yes, agreed here. Anyway let's let it simmer for a while. Another > two or three of these threads should be enough to either reaffirm or > change the current state of things :) OK that was simmering long enough ;) I just merged stable-2.0 to master. There is now a failing test. (pass-if-equal '(*TOP* (foo "\xA0")) (xml->sxml " " #:entities '((nbsp . "\xA0")))) This one fails, with (encoding-error "scm_to_stringn" "cannot convert narrow string to output locale" 84 #f #f). It passes in stable-2.0 because "ASCII" is erroneously treated as equal the same as "ISO-8859-1". In master, attempting to write a character above #\x7F to an ASCII port will cause an encoding error. It seems more correct than the 2.0 behavior. This error would have happened in stable-2.0 if I had chose an entity with a character above #\xFF. Looking further, the cause is in sxml/upstream/SSAX.scm: (define (ssax:handle-parsed-entity port name entities content-handler str-handler seed) ... (call-with-input-string ent-body (lambda (port) (content-handler port new-entities seed))) ...) Here is where I think this code goes wrong: its correctness appears to depend on the default port encoding. That is totally bogus. It was written long before we had such a thing. Again, I think the default encoding for a string port should be one that can represent all characters, and we should change this in master. Andy --=20 http://wingolog.org/