unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
From: David Kastrup <dak@gnu.org>
To: ludo@gnu.org (Ludovic Courtès)
Cc: 18520@debbugs.gnu.org
Subject: bug#18520: string ports should not have an encoding
Date: Tue, 23 Sep 2014 00:12:58 +0200	[thread overview]
Message-ID: <87tx3zjod1.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <87d2anl79a.fsf@gnu.org> ("Ludovic Courtès"'s message of "Mon, 22 Sep 2014 22:39:29 +0200")

ludo@gnu.org (Ludovic Courtès) writes:

> David Kastrup <dak@gnu.org> skribis:
>>
>> For error messages, yes.  For associating a position in a string with a
>> previously parsed closure, no.
>
> But wouldn’t a line/column pair be as suitable as a unique identifier as
> the position in the file?

As long as the reencoded UTF-8 is byte-identical to the original.  At
the current point of time, we flag non-UTF-8 sequences with a warning
and continue.

People complained previously about things like Latin-1 characters (most
likely to occur in comments or lyrics where they cause little or
well-identifiable havoc) leading to unceremonious aborts without
identifiable cause.

At any rate, the current behavior does not make sense.  Guile 2.0 might
refuse to turn a string into a port, and for Guile 2.2 the port encoding
may be used to have a UTF-8 rendition of the string characters be
interpreted in another encoding (like latin-1) but not the other way
round.

Both versions make only some half-baked sense.  Most resulting problems
can probably be worked around in some manner, but string ports are
actually the main stringbuf-like mechanism that Scheme has (dynamically
growing strings that are more compact than a list of characters).
Wedging a compulsory code conversion into it that is mirrored in the
port positions seems like a distraction.

> Also, if the result of ‘ftell’ is used as a unique identifier, does it
> really matter whether it’s an offset measured in bytes or in
> character?

In the LilyPond lexer, stuff is usually measured with byte offsets.
Yes, one can certainly parse the UTF-8 character distances and hope to
arrive at the same results as the UTF-8 reencoding.

But the point of GUILE's character set support was not really to make
everything more complicated, was it?

-- 
David Kastrup





  reply	other threads:[~2014-09-22 22:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-21 23:34 bug#18520: string ports should not have an encoding David Kastrup
2014-09-22 11:54 ` Ludovic Courtès
2014-09-22 13:09   ` David Kastrup
2014-09-22 12:21 ` Ludovic Courtès
2014-09-22 13:34   ` David Kastrup
2014-09-22 17:08     ` Ludovic Courtès
2014-09-22 17:20       ` David Kastrup
2014-09-22 20:39         ` Ludovic Courtès
2014-09-22 22:12           ` David Kastrup [this message]
2014-09-23  8:25             ` Ludovic Courtès
2014-09-23  9:00               ` David Kastrup
2014-09-23  9:45                 ` Ludovic Courtès
2014-09-23 11:54                   ` David Kastrup
2014-09-23 12:13                     ` Ludovic Courtès
2014-09-23 13:02                       ` David Kastrup
2014-09-23 16:01                         ` Ludovic Courtès
2014-09-23 16:21                           ` David Kastrup
2014-09-23 19:33                             ` Ludovic Courtès
2014-09-24  5:30 ` Mark H Weaver
2014-09-24 12:00   ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tx3zjod1.fsf@fencepost.gnu.org \
    --to=dak@gnu.org \
    --cc=18520@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).