* bug#18520: string ports should not have an encoding @ 2014-09-21 23:34 David Kastrup 2014-09-22 11:54 ` Ludovic Courtès ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: David Kastrup @ 2014-09-21 23:34 UTC (permalink / raw) To: 18520 In Guile 2.0, at the time a string port is opened, the value of the fluid %default-port-encoding is used for deciding how to encode the string into a byte stream, and set-port-encoding! may then be used for deciding how to decode that byte stream back into characters. This does not make sense as ports deliver characters, and strings contain characters. There is no point in going through bytes. Guile-2.2 does not consult %default-port-encoding but uses UTF-8 consistently (I guess, overriding set-port-encoding! will again change that). That still is not satisfactory. For example, using ftell on the input port will not report the string index of the string connected to the string port but rather a byte index into a UTF-8 encoded version of the string. This is a number that has nothing to do with the original string and cannot be used for correlating string and port. Ports fundamentally deliver characters, and so reading and writing from a string source/sink should not involve _any_ coding system. Files fundamentally deliver bytes, a conversion is required. The same would be the case when opening a port on a _bytevector_. Here an encoding would make equally make sense, and ftell/fseek offsets would naturally be in bytes. But a port on a string delivers and consumes characters. Any conversion, even a fixed UTF-8 conversion, will destroy the predictable nature of with-output-to-string and with-input-from-string and the respective uses of string ports. In code like the following, the results should not depend on either the fluid-set! or the set-port-encoding!, and the ftell should always output successive integers independent from either fluid-set! or set-port-encoding!. set-port-encoding! should probably flag an error, like an fseek on an unseekable device. (fluid-set! %default-port-encoding "UTF-8") (define s (list->string (map integer->char '(20 200 2000 20000)))) (with-input-from-string s (lambda () (set-port-encoding! (current-input-port) "ISO-8859-1") (let loop ((ch (read-char (current-input-port)))) (if (not (eof-object? ch)) (begin (format #t "~d, pos=~d\n" (char->integer ch) (ftell (current-input-port))) (loop (read-char (current-input-port)))))))) Again, things are quite different from bytevectors which could be accepted instead of a string for opening ports with the string-port commands, or could have their own port open/close commands, and the respective ports then definitely would want to obey set-port-encoding! (defaulting to %default-port-encoding) for _decoding_ the bytevector. I don't know what r7rs might think here. But for me, associating encodings for connecting strings to ports does not make sense. The relation is one of characters to characters. -- David Kastrup ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-21 23:34 bug#18520: string ports should not have an encoding David Kastrup @ 2014-09-22 11:54 ` Ludovic Courtès 2014-09-22 13:09 ` David Kastrup 2014-09-22 12:21 ` Ludovic Courtès 2014-09-24 5:30 ` Mark H Weaver 2 siblings, 1 reply; 20+ messages in thread From: Ludovic Courtès @ 2014-09-22 11:54 UTC (permalink / raw) To: David Kastrup; +Cc: 18520 This has been addressed in two ways: 1. In 2.0, (srfi srfi-6) uses Unicode-capable string ports (commit ecb48dc.) 2. In 2.2, string ports are always Unicode-capable, and ‘%default-port-encoding’ is ignored (commit 6dce942.) So for 2.0, the workaround is to either use (srfi srfi-6), or force ‘%default-port-encoding’ to "UTF-8". HTH, Ludo’. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-22 11:54 ` Ludovic Courtès @ 2014-09-22 13:09 ` David Kastrup 0 siblings, 0 replies; 20+ messages in thread From: David Kastrup @ 2014-09-22 13:09 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 18520 ludo@gnu.org (Ludovic Courtès) writes: > This has been addressed in two ways: No, it hasn't. > 1. In 2.0, (srfi srfi-6) uses Unicode-capable string ports (commit > ecb48dc.) This issue report is not about adding more optional functionality on top. It is about _removing_ unwarranted redirection and complication from existing core functionality. The artifacts of making with-input-from-string and with-output-to-string go through an additional character->bytevector->character encoding/recoding layer are not invisible. > 2. In 2.2, string ports are always Unicode-capable, and > ‘%default-port-encoding’ is ignored (commit 6dce942.) String ports should not be "Unicode capable" but transparent. Characters in, characters out. ftell/fseek should be based on character position in strings rather than offsets in a magically created bytestream of some particular encoding. > So for 2.0, the workaround is to either use (srfi srfi-6), or force > ‘%default-port-encoding’ to "UTF-8". Which is what the latter _only_ does. It still interprets set-port-encoding! with respect to a byte stream meaning, and it still calculates positions according to a byte stream meaning not related to string positions: (use-modules (srfi srfi-6)) (define s (list->string (map integer->char '(20 200 2000 20000)))) (let ((port (open-input-string s))) (let loop ((ch (read-char port))) (if (not (eof-object? ch)) (begin (format #t "~d, pos=~d\n" (char->integer ch) (ftell port)) (loop (read-char port)))))) 20, pos=1 200, pos=3 2000, pos=5 20000, pos=8 Tying string ports to an artificial bytevector presentation in a manner bleeding through like that means that it is not possible to synchronize string positions and stream positions when parts of the source string are _not_ processed from within the stream. Which is precisely the problem I am currently dealing with while porting LilyPond: it has its own lexer working on an (utf-8 encoded) byte stream which is at the same time available as a string port. Whenever embedded Scheme is interpreted, the string port is moved to the proper position, GUILE reads an expression and is told what to do with it, the string port position is picked off and the LilyPond lexer is moved to the respective position to continue. If you take a look at <URL:http://git.savannah.gnu.org/cgit/lilypond.git/tree/scm/parser-ly-from-scheme.scm>, ftell on a string port is here used for correlating the positions of parsed subexpressions with the original data. Reencoding strings in utf-8 is not going to make this work with string indexing since ftell does not bear a useful relation to string positions. The behavior of ftell and port-encoding is perfectly fine for reading from bytevectors or files, and reading from bytevectors or files also does not incur a encode-when-open action governed by %default-port-encoding in GUILE-2.0 and by hardwired UTF-8 in GUILE-2.2. But strings are already decoded characters. Reencoding makes no sense and detaches things like ftell and fseek from the actual input into the port. -- David Kastrup ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-21 23:34 bug#18520: string ports should not have an encoding David Kastrup 2014-09-22 11:54 ` Ludovic Courtès @ 2014-09-22 12:21 ` Ludovic Courtès 2014-09-22 13:34 ` David Kastrup 2014-09-24 5:30 ` Mark H Weaver 2 siblings, 1 reply; 20+ messages in thread From: Ludovic Courtès @ 2014-09-22 12:21 UTC (permalink / raw) To: David Kastrup; +Cc: 18520 I see my reply failed to address some of the points raised. David Kastrup <dak@gnu.org> skribis: > Guile-2.2 does not consult %default-port-encoding but uses UTF-8 > consistently (I guess, overriding set-port-encoding! will again change > that). > > That still is not satisfactory. For example, using ftell on the input > port will not report the string index of the string connected to the > string port but rather a byte index into a UTF-8 encoded version of the > string. This is a number that has nothing to do with the original > string and cannot be used for correlating string and port. Right. > Ports fundamentally deliver characters, and so reading and writing from > a string source/sink should not involve _any_ coding system. > > Files fundamentally deliver bytes, a conversion is required. The same > would be the case when opening a port on a _bytevector_. Here an > encoding would make equally make sense, and ftell/fseek offsets would > naturally be in bytes. But a port on a string delivers and consumes > characters. Any conversion, even a fixed UTF-8 conversion, will destroy > the predictable nature of with-output-to-string and > with-input-from-string and the respective uses of string ports. Guile ports can be mixed textual/binary (unlike R6 ports, which are either textual or binary.) Thus, they fundamentally deliver bytes, possibly with a textual conversion. Although the manual isn’t clear about it, ‘ftell’, when available, returns a position in bytes. The situation for string ports here is comparable to that of other ports used for textual I/O. Do you have a situation where you were relying on 1.8’s behavior in that regard? Could we see whether this can be solved differently? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-22 12:21 ` Ludovic Courtès @ 2014-09-22 13:34 ` David Kastrup 2014-09-22 17:08 ` Ludovic Courtès 0 siblings, 1 reply; 20+ messages in thread From: David Kastrup @ 2014-09-22 13:34 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 18520 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: > >> Guile-2.2 does not consult %default-port-encoding but uses UTF-8 >> consistently (I guess, overriding set-port-encoding! will again change >> that). >> >> That still is not satisfactory. For example, using ftell on the input >> port will not report the string index of the string connected to the >> string port but rather a byte index into a UTF-8 encoded version of the >> string. This is a number that has nothing to do with the original >> string and cannot be used for correlating string and port. > > Right. > >> Ports fundamentally deliver characters, and so reading and writing from >> a string source/sink should not involve _any_ coding system. >> >> Files fundamentally deliver bytes, a conversion is required. The same >> would be the case when opening a port on a _bytevector_. Here an >> encoding would make equally make sense, and ftell/fseek offsets would >> naturally be in bytes. But a port on a string delivers and consumes >> characters. Any conversion, even a fixed UTF-8 conversion, will destroy >> the predictable nature of with-output-to-string and >> with-input-from-string and the respective uses of string ports. > > Guile ports can be mixed textual/binary (unlike R6 ports, which are > either textual or binary.) Thus, they fundamentally deliver bytes, > possibly with a textual conversion. I think that is a mischaracterization. GUILE ports at the current point of time can _only_ be binary, to the degree that strings/texts first have to be encoded into a binary stream before they can be passed through a port. Which is what this issue is about. > Although the manual isn’t clear about it, ‘ftell’, when available, > returns a position in bytes. Which is not helpful if the input does not consist of bytes. > The situation for string ports here is comparable to that of other > ports used for textual I/O. No. The situation for file ports is that ftell refers to identifiable and reproducible byte offsets of the input, the input being a file consisting of bytes and indexed using bytes. The situation for string ports is that ftell refers to unidentifiable and incidental byte offsets of a temporary inaccessible ad-hoc encoding of the input, the input being a string consisting of characters and indexed using characters. > Do you have a situation where you were relying on 1.8’s behavior in > that regard? Could we see whether this can be solved differently? I'm currently migrating LilyPond over to GUILE 2.0. LilyPond has its own UTF-8 verification, error flagging, processing and indexing. I have more than enough crashes and obscure errors to contend with as it stands, so the first port will use LC_CTYPE=C (LC_CTYPE=ISO-8859-1 does not work since then GUILE/iconv considers itself entitled to complain about improper Latin-1) and will keep GUILE 2.0 from thinking about UTF-8 at all. Moving string processing to UTF-8 will be a gradual process, and a separate project involving programmer choices about what to represent where how: much of LilyPond is written in C++ and so UTF-8 encoded strings (rather than GUILE's strings consisting of either UCS-8 or UCS-32) are ubiquitous, with most of LilyPond's core literals fitting in the common ASCII subset. Whenever GUILE chooses to take decisions from the user and programmer, problems are likely to result, and workarounds will abound. For efficiency reasons, it is not realistic to demand that any string data passed between GUILE and LilyPond will have to be encoded and reencoded at every call gate: there is a real lot of them. -- David Kastrup ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-22 13:34 ` David Kastrup @ 2014-09-22 17:08 ` Ludovic Courtès 2014-09-22 17:20 ` David Kastrup 0 siblings, 1 reply; 20+ messages in thread From: Ludovic Courtès @ 2014-09-22 17:08 UTC (permalink / raw) To: David Kastrup; +Cc: 18520 David Kastrup <dak@gnu.org> skribis: > I'm currently migrating LilyPond over to GUILE 2.0. LilyPond has its > own UTF-8 verification, error flagging, processing and indexing. Do I understand correctly that LilyPond expects Guile strings to be byte vectors, which it can feed with UTF-8 byte sequences that it built by itself? > If you take a look at > <URL:http://git.savannah.gnu.org/cgit/lilypond.git/tree/scm/parser-ly-from-scheme.scm>, > ftell on a string port is here used for correlating the positions of > parsed subexpressions with the original data. Reencoding strings in > utf-8 is not going to make this work with string indexing since ftell > does not bear a useful relation to string positions. AIUI the result of ‘ftell’ is used in only one place, while ‘port-line’ and ‘port-column’ are used in other places. The latter seems more appropriate to me when it comes to tracking source location. How is the result of ‘ftell’ used by callers of ‘read-lily-expression’? > I have more than enough crashes and obscure errors to contend with as > it stands, Could you open a separate bug with the backtrace of such crashes, if you think it may be Guile’s fault? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-22 17:08 ` Ludovic Courtès @ 2014-09-22 17:20 ` David Kastrup 2014-09-22 20:39 ` Ludovic Courtès 0 siblings, 1 reply; 20+ messages in thread From: David Kastrup @ 2014-09-22 17:20 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 18520 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: > >> I'm currently migrating LilyPond over to GUILE 2.0. LilyPond has its >> own UTF-8 verification, error flagging, processing and indexing. > > Do I understand correctly that LilyPond expects Guile strings to be byte > vectors, which it can feed with UTF-8 byte sequences that it built by > itself? Not really. LilyPond reads and parses its own files but it does divert parts through GUILE occasionally in the process. Some stuff is passed through GUILE with time delays and parts wrapped into closures and flagged with machine-identifiable source locations. >> If you take a look at >> <URL:http://git.savannah.gnu.org/cgit/lilypond.git/tree/scm/parser-ly-from-scheme.scm>, >> ftell on a string port is here used for correlating the positions of >> parsed subexpressions with the original data. Reencoding strings in >> utf-8 is not going to make this work with string indexing since ftell >> does not bear a useful relation to string positions. > > AIUI the result of ‘ftell’ is used in only one place, while > ‘port-line’ and ‘port-column’ are used in other places. The ftell information is wrapped into an alist together with a closure corresponding to the source location. At a later point of time, the surrounding string may be interpreted, and the source location is correlated with the closure and the closure used instead of a call to local-eval (which does not have the same power of evaluating materials in a preserved lexical environment as a closure has). > The latter seems more appropriate to me when it comes to tracking > source location. For error messages, yes. For associating a position in a string with a previously parsed closure, no. > How is the result of ‘ftell’ used by callers of > ‘read-lily-expression’? See above. >> I have more than enough crashes and obscure errors to contend with as >> it stands, > > Could you open a separate bug with the backtrace of such crashes, if you > think it may be Guile’s fault? The backtraces are usually quite useless for diagnosing the crashes. For example, there are crashes in scm_sloppy_assq. If you look at the code, it is clear that they can only happen for pairs that have already been collected by garbage collection. So the bug has occured quite a bit previously to the crash. So one has to figure out how the collection could possibly have happened (naturally, it didn't with GUILE 1.8). You can try doing that with the rather expensive process of "reverse execution" (which basically traces and keeps a history you can then explore backwards from the crash), but that requires that the bugs are reproducible, and with collection in a separate thread, that is not really the case. Sometimes a crash segfaults, more often you get std::exception triggered. All with the same input and executable. -- David Kastrup ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-22 17:20 ` David Kastrup @ 2014-09-22 20:39 ` Ludovic Courtès 2014-09-22 22:12 ` David Kastrup 0 siblings, 1 reply; 20+ messages in thread From: Ludovic Courtès @ 2014-09-22 20:39 UTC (permalink / raw) To: David Kastrup; +Cc: 18520 David Kastrup <dak@gnu.org> skribis: > ludo@gnu.org (Ludovic Courtès) writes: > >> David Kastrup <dak@gnu.org> skribis: >> >>> I'm currently migrating LilyPond over to GUILE 2.0. LilyPond has its >>> own UTF-8 verification, error flagging, processing and indexing. >> >> Do I understand correctly that LilyPond expects Guile strings to be byte >> vectors, which it can feed with UTF-8 byte sequences that it built by >> itself? > > Not really. LilyPond reads and parses its own files but it does divert > parts through GUILE occasionally in the process. Some stuff is passed > through GUILE with time delays and parts wrapped into closures and > flagged with machine-identifiable source locations. OK. >>> If you take a look at >>> <URL:http://git.savannah.gnu.org/cgit/lilypond.git/tree/scm/parser-ly-from-scheme.scm>, >>> ftell on a string port is here used for correlating the positions of >>> parsed subexpressions with the original data. Reencoding strings in >>> utf-8 is not going to make this work with string indexing since ftell >>> does not bear a useful relation to string positions. >> >> AIUI the result of ‘ftell’ is used in only one place, while >> ‘port-line’ and ‘port-column’ are used in other places. > > The ftell information is wrapped into an alist together with a closure > corresponding to the source location. At a later point of time, the > surrounding string may be interpreted, and the source location is > correlated with the closure and the closure used instead of a call to > local-eval (which does not have the same power of evaluating materials > in a preserved lexical environment as a closure has). > >> The latter seems more appropriate to me when it comes to tracking >> source location. > > For error messages, yes. For associating a position in a string with a > previously parsed closure, no. But wouldn’t a line/column pair be as suitable as a unique identifier as the position in the file? Also, if the result of ‘ftell’ is used as a unique identifier, does it really matter whether it’s an offset measured in bytes or in character? (Trying to make sure I understand the problem.) Thanks, Ludo’. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-22 20:39 ` Ludovic Courtès @ 2014-09-22 22:12 ` David Kastrup 2014-09-23 8:25 ` Ludovic Courtès 0 siblings, 1 reply; 20+ messages in thread From: David Kastrup @ 2014-09-22 22:12 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 18520 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: >> >> For error messages, yes. For associating a position in a string with a >> previously parsed closure, no. > > But wouldn’t a line/column pair be as suitable as a unique identifier as > the position in the file? As long as the reencoded UTF-8 is byte-identical to the original. At the current point of time, we flag non-UTF-8 sequences with a warning and continue. People complained previously about things like Latin-1 characters (most likely to occur in comments or lyrics where they cause little or well-identifiable havoc) leading to unceremonious aborts without identifiable cause. At any rate, the current behavior does not make sense. Guile 2.0 might refuse to turn a string into a port, and for Guile 2.2 the port encoding may be used to have a UTF-8 rendition of the string characters be interpreted in another encoding (like latin-1) but not the other way round. Both versions make only some half-baked sense. Most resulting problems can probably be worked around in some manner, but string ports are actually the main stringbuf-like mechanism that Scheme has (dynamically growing strings that are more compact than a list of characters). Wedging a compulsory code conversion into it that is mirrored in the port positions seems like a distraction. > Also, if the result of ‘ftell’ is used as a unique identifier, does it > really matter whether it’s an offset measured in bytes or in > character? In the LilyPond lexer, stuff is usually measured with byte offsets. Yes, one can certainly parse the UTF-8 character distances and hope to arrive at the same results as the UTF-8 reencoding. But the point of GUILE's character set support was not really to make everything more complicated, was it? -- David Kastrup ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-22 22:12 ` David Kastrup @ 2014-09-23 8:25 ` Ludovic Courtès 2014-09-23 9:00 ` David Kastrup 0 siblings, 1 reply; 20+ messages in thread From: Ludovic Courtès @ 2014-09-23 8:25 UTC (permalink / raw) To: David Kastrup; +Cc: 18520 David Kastrup <dak@gnu.org> skribis: > ludo@gnu.org (Ludovic Courtès) writes: > >> David Kastrup <dak@gnu.org> skribis: >>> >>> For error messages, yes. For associating a position in a string with a >>> previously parsed closure, no. >> >> But wouldn’t a line/column pair be as suitable as a unique identifier as >> the position in the file? > > As long as the reencoded UTF-8 is byte-identical to the original. Sorry, what do you mean by “reencoded UTF-8”? The internal string port buffer? Line/column info remains identical regardless of the encoding, so I tend to think it’s more robust to use that. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-23 8:25 ` Ludovic Courtès @ 2014-09-23 9:00 ` David Kastrup 2014-09-23 9:45 ` Ludovic Courtès 0 siblings, 1 reply; 20+ messages in thread From: David Kastrup @ 2014-09-23 9:00 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 18520 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: > >> ludo@gnu.org (Ludovic Courtès) writes: >> >>> David Kastrup <dak@gnu.org> skribis: >>>> >>>> For error messages, yes. For associating a position in a string with a >>>> previously parsed closure, no. >>> >>> But wouldn’t a line/column pair be as suitable as a unique identifier as >>> the position in the file? >> >> As long as the reencoded UTF-8 is byte-identical to the original. > > Sorry, what do you mean by “reencoded UTF-8”? The internal string port > buffer? Sure. That's where ftell gets its info from. > Line/column info remains identical regardless of the encoding, so I tend > to think it’s more robust to use that. Column info remains identical regardless of the encoding? Since when? -- David Kastrup ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-23 9:00 ` David Kastrup @ 2014-09-23 9:45 ` Ludovic Courtès 2014-09-23 11:54 ` David Kastrup 0 siblings, 1 reply; 20+ messages in thread From: Ludovic Courtès @ 2014-09-23 9:45 UTC (permalink / raw) To: David Kastrup; +Cc: 18520 David Kastrup <dak@gnu.org> skribis: >> Line/column info remains identical regardless of the encoding, so I tend >> to think it’s more robust to use that. > > Column info remains identical regardless of the encoding? Since when? The character on line L and column M is always there, regardless of whether the file is encoded in UTF-8, Latin-1, etc. Would that work for LilyPond? Ludo’. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-23 9:45 ` Ludovic Courtès @ 2014-09-23 11:54 ` David Kastrup 2014-09-23 12:13 ` Ludovic Courtès 0 siblings, 1 reply; 20+ messages in thread From: David Kastrup @ 2014-09-23 11:54 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 18520 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: > >>> Line/column info remains identical regardless of the encoding, so I tend >>> to think it’s more robust to use that. >> >> Column info remains identical regardless of the encoding? Since when? > > The character on line L and column M is always there, regardless of > whether the file is encoded in UTF-8, Latin-1, etc. > > Would that work for LilyPond? Last time I looked, in the following line x was in column 3 in latin-1 encoding and in column 2 in utf-8 encoding: üx At any rate, we are missing the point of the issue. The issue is not whether a workaround may be designed for every way in which GUILE tries tripping up its users. The question is how GUILE may provide the least amount of surprise to its users without sacrificing functionality. GUILE's current implementation uses two character set conversions for string ports. For input string ports, the first is a batch encoding when the string port is opened (using %default-port-encoding resp. "UTF-8" in GUILE-2.0 and GUILE-2.2), this encoding is set as the port's encoding (I hope) and then, unless changed, every read operation employs the encoding that is, at any given time, current. Accompanying the opening of a string with an encoding operation (whether using a forced encoding or %default-port-encoding) is expensive (not least of all because everything needs to be decoded again), leads to arbitrary semantics for port positioning, and is asymmetric since the port encoding is only used for reading on an input string and for writing on an output string. Oh, and for writing on an input string using unread-string, of course. No kidding. There is also a conversion in there. Would it be worth ditching the sort of unnecessary conversion? Well, just look at: commit be7ecef05c1eea66f30360f658c610710c5cb22e Author: Andy Wingo <wingo@pobox.com> Date: Sat Aug 31 10:44:07 2013 +0200 unread-char: inline conversion from codepoint to bytes * libguile/ports.c (scm_ungetc_unlocked): Inline the conversion from codepoint to bytes for UTF-8 and latin-1 ports. Speeds up a numbers-reading test case by 100% (!). That sounds like quite some gain just for _simplifying_ the back-and-forth conversion, and we could be just foregoing it instead (yes, peek-char as getc+ungetc presents a challenge in connection with encoding switches: I think that declaring the first impression of peek-char as sticky would be reasonable). At any rate, the above commit looks like it would make a hash out of (with-input-from-string "Huh\"" (lambda () (unread-string "\"ä" (current-input-port)) (read))) because of a broken character range check (I cannot currently check with a compilation of master since that takes about a day on my computer, but I would be surprised if the above worked fine). So yes, the required complexity to deal with GUILE's current behavior can introduce problems. -- David Kastrup ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-23 11:54 ` David Kastrup @ 2014-09-23 12:13 ` Ludovic Courtès 2014-09-23 13:02 ` David Kastrup 0 siblings, 1 reply; 20+ messages in thread From: Ludovic Courtès @ 2014-09-23 12:13 UTC (permalink / raw) To: David Kastrup; +Cc: 18520 David Kastrup <dak@gnu.org> skribis: > ludo@gnu.org (Ludovic Courtès) writes: > >> David Kastrup <dak@gnu.org> skribis: >> >>>> Line/column info remains identical regardless of the encoding, so I tend >>>> to think it’s more robust to use that. >>> >>> Column info remains identical regardless of the encoding? Since when? >> >> The character on line L and column M is always there, regardless of >> whether the file is encoded in UTF-8, Latin-1, etc. >> >> Would that work for LilyPond? > > Last time I looked, in the following line x was in column 3 in latin-1 > encoding and in column 2 in utf-8 encoding: > > üx I’m not sure what you mean. This line contains two characters: ‘u’ with umlaut followed by ‘x’. ‘ü’ is in the first column, and ‘x’ in the second column. If we get a different column number, that means we’re looking at a different line. It could be because the encoding of the input port from which that line was read was incorrectly specified. This is the issue what would need to be fixed. Is there a simple way to reproduce the issue with LilyPond? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-23 12:13 ` Ludovic Courtès @ 2014-09-23 13:02 ` David Kastrup 2014-09-23 16:01 ` Ludovic Courtès 0 siblings, 1 reply; 20+ messages in thread From: David Kastrup @ 2014-09-23 13:02 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 18520 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: > >> ludo@gnu.org (Ludovic Courtès) writes: >> >>> David Kastrup <dak@gnu.org> skribis: >>> >>>>> Line/column info remains identical regardless of the encoding, so I tend >>>>> to think it’s more robust to use that. >>>> >>>> Column info remains identical regardless of the encoding? Since when? >>> >>> The character on line L and column M is always there, regardless of >>> whether the file is encoded in UTF-8, Latin-1, etc. >>> >>> Would that work for LilyPond? >> >> Last time I looked, in the following line x was in column 3 in latin-1 >> encoding and in column 2 in utf-8 encoding: >> >> üx > > I’m not sure what you mean. This line contains two characters: ‘u’ with > umlaut followed by ‘x’. ‘ü’ is in the first column, and ‘x’ in the > second column. It contains three bytes. 0xc3, 0xbc, 0x78. In utf-8, this is üx, in Latin-1 it is üx. This whole issue is about string ports _not_ being represented in terms of characters but bytes. > Is there a simple way to reproduce the issue with LilyPond? This issue is at best marginally about LilyPond, in that the semantics chosen for GUILE-2.0 (and switched again in GUILE-2.2) are both surprising and a source for headaches. They result in code like // we do our own utf8 encoding and verification in the parser, so we // use the no-conversion equivalent of latin1 SCM str = scm_from_latin1_string (c_str ()); scm_dynwind_begin ((scm_t_dynwind_flags)0); // Why doesn't scm_set_port_encoding_x work here? scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F); str_port_ = scm_open_input_string (str); scm_dynwind_end (); scm_set_port_filename_x (str_port_, ly_string2scm (name_)); } which will, incidentally, stop working in GUILE-2.2 at which time another workaround will be found. GUILE is an extension language. The stance that any kind of dealing with characters/strings that is not under control of GUILE and its character model is simply inappropriate. It is not the job of GUILE to dictate how an application has to organize matters internally. For that reason, its behavior needs to be straightforward and unsurprising. That includes sane boundaries between strings as character vectors, byte vectors, and encoding and decoding operations. Going through a byte-based encoding when copying a character-based string to a string, even when going through a string port, does not make sense. As a sign that this does not make sense, the effects of %default-port-encoding and set-port-encoding! on input and output string ports are unsymmetric. More so in GUILE-2.2 than in GUILE-2.0, but already in GUILE-2.0. That inconsistency (and its effects on overall performance) is what this issue is about. That I am tripping all over GUILE in the course of working with LilyPond is at best incidental to this issue. I could equally well be tripping over it when working with TeXmacs. I am not going to further reply to this issue since this is _not_, I repeat _not_ some complaint that I am too stupid to understand what GUILE is doing here. I understand it perfectly well, and I am perfectly able to hack around GUILE's deficiencies and inconsistencies. One consequence of design problems like this is that the chosen semantics under such a fundamental design problem are arbitrary and thus more likely to change to different semantics in future versions. That means a higher likelihood of future maintenance. When I am going to have to redo this for GUILE-2.2 anyway, I prefer doing it in a sane manner that will stick around for good. I don't see that here. That does not mean that I am too stupid to work with the GUILE 2.0 behavior or the GUILE 2.2 behavior or the GUILE 1.8 behavior (in fact, the first port to GUILE 2 will set LC_CTYPE to C and just stick with GUILE 1.8 behavior, but that's not a long-term perspective since working with characters rather than bytes as string constituents _is_ nicer for the user). -- David Kastrup ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-23 13:02 ` David Kastrup @ 2014-09-23 16:01 ` Ludovic Courtès 2014-09-23 16:21 ` David Kastrup 0 siblings, 1 reply; 20+ messages in thread From: Ludovic Courtès @ 2014-09-23 16:01 UTC (permalink / raw) To: David Kastrup; +Cc: 18520 David Kastrup <dak@gnu.org> skribis: > They result in code like > > // we do our own utf8 encoding and verification in the parser, so we > // use the no-conversion equivalent of latin1 > SCM str = scm_from_latin1_string (c_str ()); > scm_dynwind_begin ((scm_t_dynwind_flags)0); > // Why doesn't scm_set_port_encoding_x work here? > scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F); > str_port_ = scm_open_input_string (str); > scm_dynwind_end (); > scm_set_port_filename_x (str_port_, ly_string2scm (name_)); > } So here ‘c_str’ returns a char * that is a UTF-8-encoded string, right? In that case, it should be enough to do: /* Get a Scheme string from its UTF-8 representation. */ str = scm_from_utf8_string (c_str ()); /* Create an input string port. ‘read-char’ & co. will return each character from STR, one at a time. */ str_port = open_input_string (str); scm_set_port_filename_x (str_port, file); As long as textual I/O procedures are used on ‘str_port’, there’s no need to worry about its encoding. Now, to be able to use ‘ftell’ and assume it returns the position as a number of bytes in the UTF-8 sequence, something like this should work (for 2.0; for 2.2 nothing special is needed): /* Get a Scheme string from its UTF-8 representation. */ str = scm_from_utf8_string (c_str ()); scm_dynwind_begin (0); /* Make sure the following string port uses UTF-8 as the internal encoding of its buffer. */ scm_dynwind_fluid (scm_public_ref ("guile", "%default-port-encoding"), scm_from_latin1_string ("UTF-8")); /* Create an input string port. ‘read-char’ & co. will return each character from STR, one at a time. */ str_port = open_input_string (str); scm_dynwind_end (); scm_set_port_filename_x (str_port, file); Does this help for LilyPond? Ludo’. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-23 16:01 ` Ludovic Courtès @ 2014-09-23 16:21 ` David Kastrup 2014-09-23 19:33 ` Ludovic Courtès 0 siblings, 1 reply; 20+ messages in thread From: David Kastrup @ 2014-09-23 16:21 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 18520 ludo@gnu.org (Ludovic Courtès) writes: > Does this help for LilyPond? I stated quite definitely that I am perfectly capable of dealing with the mess GUILE made of string ports. The issue is that I should not have to, nor should anybody else. This issue _is_ _not_ _about_ _LilyPond_. Working on LilyPond merely shines a light on it. So please stop painting this as a request for help. It isn't. It is a request for change. The subject line is "string ports should not have an encoding". It isn't "help, I don't understand string ports". -- David Kastrup ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-23 16:21 ` David Kastrup @ 2014-09-23 19:33 ` Ludovic Courtès 0 siblings, 0 replies; 20+ messages in thread From: Ludovic Courtès @ 2014-09-23 19:33 UTC (permalink / raw) To: David Kastrup; +Cc: 18520 [-- Attachment #1: Type: text/plain, Size: 1139 bytes --] David Kastrup <dak@gnu.org> skribis: > I stated quite definitely that I am perfectly capable of dealing with > the mess GUILE made of string ports. Good to know, this was not my understanding until now. The intent of the change in 2.2 is to hide the very fact that string ports “have an encoding.” So from that viewpoint, that bug is closed. If the bug is about ‘ftell’, that’s a different story. I would tend to suggest that ‘ftell’ and ‘seek’ for string ports operate on an abstract notion of position within the string port data. This is in fact the path that the R6RS takes: For a binary port, the port-position procedure returns the index of the position at which the next byte would be read from or written to the port as an exact non-negative integer object. For a textual port, port-position returns a value of some implementation-dependent type representing the port's position; this value may be useful only as the pos argument to set-port-position!, if the latter is supported on the port (see below). Thus, I would suggest a clarification along these lines: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 1215 bytes --] diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi index 02d92a2..8331378 100644 --- a/doc/ref/api-io.texi +++ b/doc/ref/api-io.texi @@ -443,8 +443,12 @@ open. @deffn {Scheme Procedure} seek fd_port offset whence @deffnx {C Function} scm_seek (fd_port, offset, whence) Sets the current position of @var{fd_port} to the integer -@var{offset}, which is interpreted according to the value of -@var{whence}. +@var{offset}. For a file port, @var{offset} is expressed +as a number of bytes; for other types of ports, such as string +ports, @var{offset} is an abstract representation of the +position within the port's data, not necessarily expressed +as a number of bytes. @var{offset} is interpreted according to +the value of @var{whence}. One of the following variables should be supplied for @var{whence}: @@ -460,7 +464,7 @@ Seek from the end of the file. If @var{fd_port} is a file descriptor, the underlying system call is @code{lseek}. @var{port} may be a string port. -The value returned is the new position in the file. This means +The value returned is the new position in @var{fd_port}. This means that the current position of a port can be obtained using: @lisp (seek port 0 SEEK_CUR) [-- Attachment #3: Type: text/plain, Size: 34 bytes --] Thoughts? Thanks, Ludo’. ^ permalink raw reply related [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-21 23:34 bug#18520: string ports should not have an encoding David Kastrup 2014-09-22 11:54 ` Ludovic Courtès 2014-09-22 12:21 ` Ludovic Courtès @ 2014-09-24 5:30 ` Mark H Weaver 2014-09-24 12:00 ` David Kastrup 2 siblings, 1 reply; 20+ messages in thread From: Mark H Weaver @ 2014-09-24 5:30 UTC (permalink / raw) To: David Kastrup; +Cc: 18520 David Kastrup <dak@gnu.org> writes: > In Guile 2.0, at the time a string port is opened, the value of the > fluid %default-port-encoding is used for deciding how to encode the > string into a byte stream, [...] I agree that this was a mistake. The issue is fixed on the master branch. > Ports fundamentally deliver characters, and so reading and writing from > a string source/sink should not involve _any_ coding system. David, you know as well as I that internally, there is always a coding system. Strings have a coding system too, even if it's UCS-4. Emacs uses something based on UTF-8, and I'd like to Guile to do something similar in the future. I guess you don't like the fact that it is possible to expose the internal representation via 'set-port-encoding!', 'ftell' or 'seek'. I don't see this as a problem, and arguably it's a benefit. First I'll address the non-standard 'set-port-encoding!'. As you say, it doesn't even make sense on string ports, and arguably should be an error. So why do you care if some internal details leak out when you do this nonsensical thing? Admittedly, we're missing an opportunity to report a possible bug to the user, but that's the only problem I see here. Regarding 'ftell' and 'seek', it's not entirely clear to me what's the best representation of those positions. In some situations, I guess it would be convenient for them to count unicode code points or string indices. In other situations, I could imagine it being more convenient for them to count grapheme clusters or UTF-8 bytes. R6RS, the only Scheme standard that supports getting or setting file positions, gives us complete freedom to choose our representation of positions on textual ports. The R6RS is explicit that they don't even have to be integers, and if they are, they don't have to correspond to bytes or characters. For better or for worse, Guile's ports are fundamentally based on bytes, and allow mixed binary and textual operations on all ports. Sometimes this is very helpful, for example when implementing HTTP. I can think of one other case where it's very helpful: I don't know how deeply you've looked at UTF-8, but it has some unusual properties that allow many (most?) string algorithms to be most naturally (and efficiently) implemented by operating on bytes rather than code points. Much of the time, you don't even have to be aware of the code point boundaries, which is a great savings. Efficient lookup tables based on bytes are also much cheaper than ones based on code points, etc. In fact, I intend to propose that in a future version of Guile, strings will not only be based on UTF-8 internally, but that this fact should be exposed in the API, allowing users to implement UTF-8 string operations that operate on bytes not code points. I'd also like lightweight, fast string ports that allow access to these bytes when desired. This leads me to believe that it's a feature, not a bug, that string ports use UTF-8 internally, and that it's possible (via non-standard extensions) to get access to the underlying bytes. Mark ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#18520: string ports should not have an encoding 2014-09-24 5:30 ` Mark H Weaver @ 2014-09-24 12:00 ` David Kastrup 0 siblings, 0 replies; 20+ messages in thread From: David Kastrup @ 2014-09-24 12:00 UTC (permalink / raw) To: Mark H Weaver; +Cc: 18520 Mark H Weaver <mhw@netris.org> writes: > David Kastrup <dak@gnu.org> writes: > >> In Guile 2.0, at the time a string port is opened, the value of the >> fluid %default-port-encoding is used for deciding how to encode the >> string into a byte stream, [...] > > I agree that this was a mistake. The issue is fixed on the master > branch. The mistake is having a string port use a different sequence-of-character encoding than a string. >> Ports fundamentally deliver characters, and so reading and writing >> from a string source/sink should not involve _any_ coding system. > > David, you know as well as I that internally, there is always a coding > system. Strings have a coding system too, even if it's UCS-4. Emacs > uses something based on UTF-8, and I'd like to Guile to do something > similar in the future. > > I guess you don't like the fact that it is possible to expose the > internal representation via 'set-port-encoding!', 'ftell' or 'seek'. > I don't see this as a problem, and arguably it's a benefit. Shrug. That arguable benefit went down in flames in Emacs 20. It triggered the last great migration from Emacs users to XEmacs. It took until Emacs 20.4 until the horrible mistake of exposing byte offsets to the user in either strings or buffers was corrected. You write above "Emacs uses something based on UTF-8", and it's worth pointing out that it does so starting with Emacs 23. Previously Emacs used its own peculiar multibyte encoding that existed long before UTF-8. The important thing to note is that is was _completely_ hidden from sight from Elisp users when the Emacs 20 tribulations were over. Emacs was able to swap out this multibyte encoding for the Emacs 23 coding rather transparently, and the main reason to do that was to make UTF-8 a favored encoding regarding performance of encoding/decoding and processing of Elisp source files. Emacs' internal encoding is not proper UTF-8. You can take a random byte string, tell Emacs that it is encoded in UTF-8, and decode it into Emacs' internal representation. All passages that happen to be proper uniquely represented UTF-8 will pass the transcoding unchanged, but everything else will be transcoded into a UTF-8-like representation of "unencodable byte". I think Emacs uses the UTF-8 forbidden code points from 0xd800 to 0xd880 for encoding stray bytes, or something like that. So if you reencode the unchanged "UTF-8" Emacs uses internally, the result will again faithfully reproduce the random byte stream. Garbage in, _same_ garbage out. A very important property that many of Emacs' supported file encodings share. Notable exception are various Japanese encodings based on escape characters. At any rate, unless you are using explicit conversions like string-as-unibyte or _encoding_ to Emacs' internal representation (it is available as a named coding system), the representation is not exposed. Strings are indexed per character, and buffers (which are at their heart random-access string ports) are indexed per character. Emacs has both unibyte and multibyte strings and unibyte and multibyte buffers, and unibyte strings and buffers are the source for decoding and the target for encoding into multibyte strings and buffers. XEmacs does not have unibyte strings/buffers, so a lot of string internals do not need to make the distinction. GUILE could probably get away without unibyte strings as well because it has bytevectors. This would imply that if you wanted to do stuff akin to string operations on unibyte strings, you'd have to first convert bytevectors to multibyte strings, do your operations, convert back. XEmacs chose _not_ to have unibyte strings (and the corresponding complications to support both in the primitives), Emacs chose to have them. I think both approaches are defensible. Since GUILE presents itself as an extension language and since strings will need to get passed in and out of extension languages all the time, the implementation cost of offering a low-cost unibyte string is probably even more defensible than with Elisp where Elisp is the main processing language. > First I'll address the non-standard 'set-port-encoding!'. As you say, > it doesn't even make sense on string ports, and arguably should be an > error. So why do you care if some internal details leak out when you > do this nonsensical thing? Admittedly, we're missing an opportunity > to report a possible bug to the user, but that's the only problem I > see here. > > Regarding 'ftell' and 'seek', it's not entirely clear to me what's the > best representation of those positions. In some situations, I guess > it would be convenient for them to count unicode code points or string > indices. In other situations, I could imagine it being more > convenient for them to count grapheme clusters or UTF-8 bytes. > > R6RS, the only Scheme standard that supports getting or setting file > positions, gives us complete freedom to choose our representation of > positions on textual ports. The R6RS is explicit that they don't even > have to be integers, and if they are, they don't have to correspond to > bytes or characters. R6RS gives you the freedom to match your semantics to your implementation. String ports are strings-in-progress (and Emacs buffers are strings-in-progress on steroids), so it makes sense to match the fseek/ftell semantics of string ports to those of strings and the implementation to those of strings. You don't have anything to gain from converting characters to bytes and back just because you can. > For better or for worse, Guile's ports are fundamentally based on > bytes, Seriously? The whole point of this issue was that fundamentally basing GUILE's string ports on bytes is for worse. > and allow mixed binary and textual operations on all ports. I'll go out on a limb here and state "they don't". They work with bytes (either located on file or in some internally generated or consumed byte vector) and they input/output characters on their Scheme side, and you can change the en/decoding system which which characters are put into the stream or consumed. Their external side is identical to its internal side, and the Scheme/character/string side is fundamentally different. By changing the port encoding, you can change the conversion between Scheme on the one side and internal/external on the other. All operations are binary on the internal side, and textual on the Scheme side. That there are encodings which are less costly does not fundamentally change this. > Sometimes this is very helpful, for example when implementing HTTP. I > can think of one other case where it's very helpful: > > I don't know how deeply you've looked at UTF-8, It is a somewhat safe bet that a person who is the head maintainer of an application conversing in UTF-8 while using GUILE-1.8 in its internals has had some basic amount of exposure to UTF-8. In general, the working assumption "David just has little clue about computing" is rarely helpful for dismissing matters since David tends to have picked up tidbits occasionally since he started computing on systems where lowercase letters already needed a multi-sextet representation in its 60bit words. So it is a reasonably safe bet that when David has some problems with matters, chances are that a non-negligible percentage of other users will not fare significantly better, so it is a somewhat relevant indicator what to avoid. > but it has some unusual properties that allow many (most?) string > algorithms to be most naturally (and efficiently) implemented by > operating on bytes rather than code points. Much of the time, you > don't even have to be aware of the code point boundaries, which is a > great savings. Efficient lookup tables based on bytes are also much > cheaper than ones based on code points, etc. That's all very nice but totally irrelevant for this issue. If you like UTF-8, by all means base the internal string representation of GUILE on it. It comes at a cost since strings in Scheme are writable (and there are more operations for doing so than in Elisp) and indexed by character. Emacs has paid this cost: I think the basic speed of Emacs dropped by a factor of 2 when indexing was moved from bytes to characters around Emacs 20.2 or similar. But this issue is about not using different internal coding and exposed interfaces for strings and string ports. Whatever internal string representation you choose, it does not make sense to pick a different representation and indexing for string ports. > In fact, I intend to propose that in a future version of Guile, > strings will not only be based on UTF-8 internally, but that this fact > should be exposed in the API, allowing users to implement UTF-8 string > operations that operate on bytes not code points. This experiment has been tried and crashed and burnt with the initial MULE versions in Emacs 20. Current versions _do_ offer conversion-less reinterpretations string-as-unibyte and string-as-multibyte and offer working with either string type. As explained, that comes at the cost of having to make all primitives able to work with either. They are actually rarely used by application level programmers, so most applications do not have this as a porting problem between Emacs and XEmacs (XEmacs has only multibyte strings). Personally, I'd consider that worth the cost in the case of GUILE. While XEmacs gets along without this addition, it seems important for efficient passing of data in and out of GUILE. It would also make sense to distinguish between multibyte (internal form of UTF-8, anything may happen if it is not properly formed) and external UTF-8 (reading/writing it uses a conversion process turning all illegal UTF-8 bytes into some reproducible representation). > I'd also like lightweight, fast string ports that allow access to > these bytes when desired. Any string port that does not involve encoding/decoding will be lightweight and fast, lighter and faster than any implementation having to code/decode gratuitously. Which is one of the points of this issue, even though I am more concerned with the conceptual cost than the runtime cost. But both have an impact. > This leads me to believe that it's a feature, not a bug, that string > ports use UTF-8 internally, and that it's possible (via non-standard > extensions) to get access to the underlying bytes. Getting confused about bytes and characters and introducing unnecessary conversions is not a feature. Even if you at one time use an UTF-8 based string representation, working with external UTF-8 will involve encoding/decoding processes. Forcing a string port to encode/decode during operation will remain expensive. Exposing string internals beyond quite special-purpose functions will be hard to deal with. All those lessons have already been learnt with Emacs. If you want to relearn them from scratch, the available developer power will not make basing Emacs on GUILE realistic in the next 10 years: Emacs fundamentally operates with texts. Too many reliability or efficiency problems doing that (or having to implement them as foreign datatypes altogether) will not make Guilemacs acceptable. So even in cases where multiple strategies are feasible, it may make sense to lean towards Emacs' choices. One choice that has served Emacs well is to hide its internal encoding system well from the external ones. That way its switch to an internal coding system based on UTF-8 affected almost no existing Elisp packages, and the programming model was conceptually clean. -- David Kastrup ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2014-09-24 12:00 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-09-21 23:34 bug#18520: string ports should not have an encoding David Kastrup 2014-09-22 11:54 ` Ludovic Courtès 2014-09-22 13:09 ` David Kastrup 2014-09-22 12:21 ` Ludovic Courtès 2014-09-22 13:34 ` David Kastrup 2014-09-22 17:08 ` Ludovic Courtès 2014-09-22 17:20 ` David Kastrup 2014-09-22 20:39 ` Ludovic Courtès 2014-09-22 22:12 ` David Kastrup 2014-09-23 8:25 ` Ludovic Courtès 2014-09-23 9:00 ` David Kastrup 2014-09-23 9:45 ` Ludovic Courtès 2014-09-23 11:54 ` David Kastrup 2014-09-23 12:13 ` Ludovic Courtès 2014-09-23 13:02 ` David Kastrup 2014-09-23 16:01 ` Ludovic Courtès 2014-09-23 16:21 ` David Kastrup 2014-09-23 19:33 ` Ludovic Courtès 2014-09-24 5:30 ` Mark H Weaver 2014-09-24 12:00 ` David Kastrup
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).