unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw@netris.org>
Cc: guile-devel@gnu.org
Subject: Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)
Date: Wed, 03 Apr 2013 22:11:32 +0200	[thread overview]
Message-ID: <87a9pfml1n.fsf@gnu.org> (raw)
In-Reply-To: <874nfnza5g.fsf@tines.lan> (Mark H. Weaver's message of "Wed, 03 Apr 2013 15:28:27 -0400")

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>> Woow, well thought out.  The semantics seem good.  (It’s interesting to
>> see how BOMs complicate things, but that’s life, I guess.)
>>
>> The patch looks good to me.  The test suite is nice.  It doesn’t seem to
>> cover all the corner cases listed above, but that can be added later on
>> perhaps?
>
> Yes, the tests are still a work-in-progess, but I've added quite a few
> more since you last looked.

Nice.

>> Perhaps the text above could be added to the manual,
>
> In the attached patch, I've added a new node to the "Input and Output"
> section.

Perfect.

>>> +{
>>> +  scm_t_port *pt = SCM_PTAB_ENTRY (port);
>>> +  int result;
>>> +  int i = 0;
>>> +
>>> +  while (i < len && scm_peek_byte_or_eof (port) == bytes[i])
>>> +    {
>>> +      pt->read_pos++;
>>> +      i++;
>>> +    }
>>> +
>>> +  result = (i == len);
>>> +
>>> +  while (i > 0)
>>> +    scm_unget_byte (bytes[--i], port);
>>> +
>>> +  return result;
>>> +}
>>
>> Should it be scm_get_byte_or_eof given that scm_unget_byte is used later?
>
> Yes.  Bytes are only consumed if are equal to bytes[i], so an EOF will
> never be consumed or passed to scm_unget_byte.
>
>> What if pt->read_buf_size == 1?  What if there’s data in saved_read_buf?
>
> All of those details are handled by 'scm_peek_byte_or_eof', which is
> guaranteed to leave 'pt->read_pos' pointing at the byte that's returned
> (if not EOF).  Therefore, it's always safe to increment that pointer by
> one (but no more than one) after calling 'scm_peek_byte_or_eof' if it
> returned non-EOF.
>
> Look at the code for 'scm_peek_byte_or_eof' and this will be clear.
> Also note that you did the same thing in 'scm_utf8_codepoint' :)

Ah yes, indeed.

[...]

> Here's the new patch.  Any more suggestions?

Not from me!  OK to commit as far as I’m concerned.

Thank you!

Ludo’.



  reply	other threads:[~2013-04-03 20:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-03 10:44 [PATCH] Improve handling of Unicode byte-order marks (BOMs) Mark H Weaver
2013-04-03 11:47 ` Mark H Weaver
2013-04-03 11:58 ` Ludovic Courtès
2013-04-03 19:28   ` Mark H Weaver
2013-04-03 20:11     ` Ludovic Courtès [this message]
2013-04-03 20:33       ` Mark H Weaver
2013-04-03 20:48         ` Mike Gran
2013-04-03 22:24           ` Mark H Weaver
2013-04-04  5:59             ` Mark H Weaver
2013-04-04 20:50         ` Andy Wingo
2013-04-05  7:30           ` Mark H Weaver
2013-04-05  7:42             ` Mike Gran
2013-04-05 10:04               ` Ludovic Courtès
2013-04-05 18:15                 ` Mark H Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a9pfml1n.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guile-devel@gnu.org \
    --cc=mhw@netris.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).