unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
From: David Kastrup <dak@gnu.org>
To: ludo@gnu.org (Ludovic Courtès)
Cc: 18520@debbugs.gnu.org
Subject: bug#18520: string ports should not have an encoding
Date: Tue, 23 Sep 2014 15:02:54 +0200	[thread overview]
Message-ID: <87d2amjxq9.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <87tx3yjzzw.fsf@gnu.org> ("Ludovic Courtès"'s message of "Tue, 23 Sep 2014 14:13:55 +0200")

ludo@gnu.org (Ludovic Courtès) writes:

> David Kastrup <dak@gnu.org> skribis:
>
>> ludo@gnu.org (Ludovic Courtès) writes:
>>
>>> David Kastrup <dak@gnu.org> skribis:
>>>
>>>>> Line/column info remains identical regardless of the encoding, so I tend
>>>>> to think it’s more robust to use that.
>>>>
>>>> Column info remains identical regardless of the encoding?  Since when?
>>>
>>> The character on line L and column M is always there, regardless of
>>> whether the file is encoded in UTF-8, Latin-1, etc.
>>>
>>> Would that work for LilyPond?
>>
>> Last time I looked, in the following line x was in column 3 in latin-1
>> encoding and in column 2 in utf-8 encoding:
>>
>> üx
>
> I’m not sure what you mean.  This line contains two characters: ‘u’ with
> umlaut followed by ‘x’.  ‘ü’ is in the first column, and ‘x’ in the
> second column.

It contains three bytes. 0xc3, 0xbc, 0x78.  In utf-8, this is üx, in
Latin-1 it is üx.

This whole issue is about string ports _not_ being represented in terms
of characters but bytes.

> Is there a simple way to reproduce the issue with LilyPond?

This issue is at best marginally about LilyPond, in that the semantics
chosen for GUILE-2.0 (and switched again in GUILE-2.2) are both
surprising and a source for headaches.

They result in code like

  // we do our own utf8 encoding and verification in the parser, so we
  // use the no-conversion equivalent of latin1
  SCM str = scm_from_latin1_string (c_str ());
  scm_dynwind_begin ((scm_t_dynwind_flags)0);
  // Why doesn't scm_set_port_encoding_x work here?
  scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F);
  str_port_ = scm_open_input_string (str);
  scm_dynwind_end ();
  scm_set_port_filename_x (str_port_, ly_string2scm (name_));
}

which will, incidentally, stop working in GUILE-2.2 at which time
another workaround will be found.

GUILE is an extension language.  The stance that any kind of dealing
with characters/strings that is not under control of GUILE and its
character model is simply inappropriate.  It is not the job of GUILE to
dictate how an application has to organize matters internally.  For that
reason, its behavior needs to be straightforward and unsurprising.  That
includes sane boundaries between strings as character vectors, byte
vectors, and encoding and decoding operations.  Going through a
byte-based encoding when copying a character-based string to a string,
even when going through a string port, does not make sense.

As a sign that this does not make sense, the effects of
%default-port-encoding and set-port-encoding! on input and output string
ports are unsymmetric.  More so in GUILE-2.2 than in GUILE-2.0, but
already in GUILE-2.0.

That inconsistency (and its effects on overall performance) is what this
issue is about.  That I am tripping all over GUILE in the course of
working with LilyPond is at best incidental to this issue.  I could
equally well be tripping over it when working with TeXmacs.

I am not going to further reply to this issue since this is _not_,
I repeat _not_ some complaint that I am too stupid to understand what
GUILE is doing here.  I understand it perfectly well, and I am perfectly
able to hack around GUILE's deficiencies and inconsistencies.  One
consequence of design problems like this is that the chosen semantics
under such a fundamental design problem are arbitrary and thus more
likely to change to different semantics in future versions.  That means
a higher likelihood of future maintenance.  When I am going to have to
redo this for GUILE-2.2 anyway, I prefer doing it in a sane manner that
will stick around for good.

I don't see that here.  That does not mean that I am too stupid to work
with the GUILE 2.0 behavior or the GUILE 2.2 behavior or the GUILE 1.8
behavior (in fact, the first port to GUILE 2 will set LC_CTYPE to C and
just stick with GUILE 1.8 behavior, but that's not a long-term
perspective since working with characters rather than bytes as string
constituents _is_ nicer for the user).

-- 
David Kastrup





  reply	other threads:[~2014-09-23 13:02 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-21 23:34 bug#18520: string ports should not have an encoding David Kastrup
2014-09-22 11:54 ` Ludovic Courtès
2014-09-22 13:09   ` David Kastrup
2014-09-22 12:21 ` Ludovic Courtès
2014-09-22 13:34   ` David Kastrup
2014-09-22 17:08     ` Ludovic Courtès
2014-09-22 17:20       ` David Kastrup
2014-09-22 20:39         ` Ludovic Courtès
2014-09-22 22:12           ` David Kastrup
2014-09-23  8:25             ` Ludovic Courtès
2014-09-23  9:00               ` David Kastrup
2014-09-23  9:45                 ` Ludovic Courtès
2014-09-23 11:54                   ` David Kastrup
2014-09-23 12:13                     ` Ludovic Courtès
2014-09-23 13:02                       ` David Kastrup [this message]
2014-09-23 16:01                         ` Ludovic Courtès
2014-09-23 16:21                           ` David Kastrup
2014-09-23 19:33                             ` Ludovic Courtès
2014-09-24  5:30 ` Mark H Weaver
2014-09-24 12:00   ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d2amjxq9.fsf@fencepost.gnu.org \
    --to=dak@gnu.org \
    --cc=18520@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).