From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs) Date: Wed, 03 Apr 2013 22:11:32 +0200 Message-ID: <87a9pfml1n.fsf@gnu.org> References: <87ip43zyf0.fsf@tines.lan> <87r4irq0zp.fsf@gnu.org> <874nfnza5g.fsf@tines.lan> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1365019913 23777 80.91.229.3 (3 Apr 2013 20:11:53 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 3 Apr 2013 20:11:53 +0000 (UTC) Cc: guile-devel@gnu.org To: Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Apr 03 22:12:21 2013 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UNU2f-0001XD-7d for guile-devel@m.gmane.org; Wed, 03 Apr 2013 22:12:21 +0200 Original-Received: from localhost ([::1]:49768 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UNU2G-00010l-DX for guile-devel@m.gmane.org; Wed, 03 Apr 2013 16:11:56 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:60359) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UNU28-000100-5L for guile-devel@gnu.org; Wed, 03 Apr 2013 16:11:52 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UNU23-0007NI-9z for guile-devel@gnu.org; Wed, 03 Apr 2013 16:11:48 -0400 Original-Received: from [2a01:e0b:1:123:ca0a:a9ff:fe03:271e] (port=55564 helo=xanadu.aquilenet.fr) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UNU23-0007L6-2r for guile-devel@gnu.org; Wed, 03 Apr 2013 16:11:43 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by xanadu.aquilenet.fr (Postfix) with ESMTP id 50301889D; Wed, 3 Apr 2013 22:11:34 +0200 (CEST) Original-Received: from xanadu.aquilenet.fr ([127.0.0.1]) by localhost (xanadu.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 50GwnDmJEWUk; Wed, 3 Apr 2013 22:11:34 +0200 (CEST) Original-Received: from pluto (reverse-83.fdn.fr [80.67.176.83]) by xanadu.aquilenet.fr (Postfix) with ESMTPSA id 6A821D65; Wed, 3 Apr 2013 22:11:33 +0200 (CEST) X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 14 Germinal an 221 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0xEA52ECF4 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 83C4 F8E5 10A3 3B4C 5BEA D15D 77DD 95E2 EA52 ECF4 X-OS: x86_64-unknown-linux-gnu In-Reply-To: <874nfnza5g.fsf@tines.lan> (Mark H. Weaver's message of "Wed, 03 Apr 2013 15:28:27 -0400") User-Agent: Gnus/5.130005 (Ma Gnus v0.5) Emacs/24.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2a01:e0b:1:123:ca0a:a9ff:fe03:271e X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:16129 Archived-At: Mark H Weaver skribis: > ludo@gnu.org (Ludovic Court=C3=A8s) writes: >> Woow, well thought out. The semantics seem good. (It=E2=80=99s interes= ting to >> see how BOMs complicate things, but that=E2=80=99s life, I guess.) >> >> The patch looks good to me. The test suite is nice. It doesn=E2=80=99t= seem to >> cover all the corner cases listed above, but that can be added later on >> perhaps? > > Yes, the tests are still a work-in-progess, but I've added quite a few > more since you last looked. Nice. >> Perhaps the text above could be added to the manual, > > In the attached patch, I've added a new node to the "Input and Output" > section. Perfect. >>> +{ >>> + scm_t_port *pt =3D SCM_PTAB_ENTRY (port); >>> + int result; >>> + int i =3D 0; >>> + >>> + while (i < len && scm_peek_byte_or_eof (port) =3D=3D bytes[i]) >>> + { >>> + pt->read_pos++; >>> + i++; >>> + } >>> + >>> + result =3D (i =3D=3D len); >>> + >>> + while (i > 0) >>> + scm_unget_byte (bytes[--i], port); >>> + >>> + return result; >>> +} >> >> Should it be scm_get_byte_or_eof given that scm_unget_byte is used later? > > Yes. Bytes are only consumed if are equal to bytes[i], so an EOF will > never be consumed or passed to scm_unget_byte. > >> What if pt->read_buf_size =3D=3D 1? What if there=E2=80=99s data in sav= ed_read_buf? > > All of those details are handled by 'scm_peek_byte_or_eof', which is > guaranteed to leave 'pt->read_pos' pointing at the byte that's returned > (if not EOF). Therefore, it's always safe to increment that pointer by > one (but no more than one) after calling 'scm_peek_byte_or_eof' if it > returned non-EOF. > > Look at the code for 'scm_peek_byte_or_eof' and this will be clear. > Also note that you did the same thing in 'scm_utf8_codepoint' :) Ah yes, indeed. [...] > Here's the new patch. Any more suggestions? Not from me! OK to commit as far as I=E2=80=99m concerned. Thank you! Ludo=E2=80=99.