Re: uc_tolower (uc_toupper (x))

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* Re: uc_tolower (uc_toupper (x))
@ 2011-03-11  0:54 Mike Gran
  2011-03-11 22:33 ` Using libunistring for string comparisons et al Mark H Weaver
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Gran @ 2011-03-11  0:54 UTC (permalink / raw)
  To: Mark H Weaver, guile-devel@gnu.org

> From:Mark H Weaver <mhw@netris.org>
> To:guile-devel@gnu.org
> Cc:
> Sent:Thursday, March 10, 2011 3:39 PM
> Subject:uc_tolower (uc_toupper (x))
> 
> I've noticed that srfi-13.c very frequently does:
> 
>   uc_tolower (uc_toupper (x))
> 
> Is there a good reason to do this instead of:
> 
>   uc_tolower (x)

Unicode defines a case folding algorithm as well as
a data table for case insensitive sorting.  Setting
things to lowercase is a decent approximation of
case folding.  But doing the upper->lower operation picks
up a few more of the corner cases, like U+03C2 GREEK
SMALL LETTER FINAL SIGMA and U+03C3 GREEK SMALL LETTER SIGMA
which are the same letter with different representations,
or U+00B5 MICRO SIGN and U+039C GREEK SMALL LETTER MU
which are supposed to have the same sort ordering.

Now that we've pulled in all of libunistring, it might
be a good idea to see if it has a complete implementation
of unicode case folding, because upper->lower is also not
completely correct.

-Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Using libunistring for string comparisons et al
  2011-03-11  0:54 uc_tolower (uc_toupper (x)) Mike Gran
@ 2011-03-11 22:33 ` Mark H Weaver
  2011-03-11 22:36   ` Mark H Weaver
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Mark H Weaver @ 2011-03-11 22:33 UTC (permalink / raw)
  To: Mike Gran; +Cc: guile-devel@gnu.org

Mike Gran <spk121@yahoo.com> writes:
> [...] But doing the upper->lower operation picks
> up a few more of the corner cases, like U+03C2 GREEK
> SMALL LETTER FINAL SIGMA and U+03C3 GREEK SMALL LETTER SIGMA
> which are the same letter with different representations,
> or U+00B5 MICRO SIGN and U+039C GREEK SMALL LETTER MU
> which are supposed to have the same sort ordering.

Ah, okay.  Makes sense.

> Now that we've pulled in all of libunistring, it might
> be a good idea to see if it has a complete implementation
> of unicode case folding, because upper->lower is also not
> completely correct.

I looked into this.  Indeed, the libunistring documentation mentions
that in some languages (e.g. German), the to_upper and to_lower
conversions cannot be done properly on a per-character basis, because
the number of character can change.  These operations much be done on an
entire string.  For example:

<http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-2.html>

  (string-upcase "Straße") => "STRASSE"
  (string-foldcase "Straße") => "strasse"

libunistring contains all the necessary functions, including
case-insensitive string comparisons.  However, the only string
representations supported by these operations are: UTF-8, UTF-16,
UTF-32, or locale-encoded strings, and for comparisons both strings must
be the same encoding.

I'm aware that this proposal will be very controversial, but starting in
Guile 2.2, I think we ought to consider storing strings internally in
UTF-8, as is done in Gauche.  This would of course make string-ref and
string-set! into O(n) operations.  However, I claim that any code that
depends on string-ref and string-set! could be better written 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Using libunistring for string comparisons et al
  2011-03-11 22:33 ` Using libunistring for string comparisons et al Mark H Weaver
@ 2011-03-11 22:36   ` Mark H Weaver
  2011-03-11 23:09   ` Mark H Weaver
  2011-03-12 13:36   ` Ludovic Courtès
  2 siblings, 0 replies; 17+ messages in thread
From: Mark H Weaver @ 2011-03-11 22:36 UTC (permalink / raw)
  To: Mike Gran; +Cc: guile-devel@gnu.org

Sorry, I accidentally sent out an only partly-written draft message.
Please disregard for now; I will finish writing it later.

     Mark



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Using libunistring for string comparisons et al
  2011-03-11 22:33 ` Using libunistring for string comparisons et al Mark H Weaver
  2011-03-11 22:36   ` Mark H Weaver
@ 2011-03-11 23:09   ` Mark H Weaver
  2011-03-12 13:46     ` Ludovic Courtès
  2011-03-30  9:03     ` Using libunistring for string comparisons et al Andy Wingo
  2011-03-12 13:36   ` Ludovic Courtès
  2 siblings, 2 replies; 17+ messages in thread
From: Mark H Weaver @ 2011-03-11 23:09 UTC (permalink / raw)
  To: Mike Gran; +Cc: guile-devel

I wrote:
> I'm aware that this proposal will be very controversial, but starting in
> Guile 2.2, I think we ought to consider storing strings internally in
> UTF-8, as is done in Gauche.  This would of course make string-ref and
> string-set! into O(n) operations.  However, I claim that any code that
> depends on string-ref and string-set! could be better written 

I guess I better write at least a few more arguments now, before minds
become hardened against it :)

It's a mistake to think of strings as arrays of characters.  This is an
adequate model for simple scripts, but not for more complex ones.  Even
Gerald Sussman said so in his recent WG1 ballot.

Gerald Sussman wrote in http://trac.sacrideo.us/wg/wiki/WG1BallotSussman
> It is not very good to think of strings as 1-dimensional arrays of
> characters.  What about accents and other hairy stuff? Be afraid!
> Consider the complexity of Unicode!

I claim that any reasonable code which currently uses string-ref and
string-set! could be more cleanly written using string ports or
string-{fold,unfold}{,-right}.

Anyway, I don't have time right now to write as persuasive an argument
as I'd like to.  After 2.0.1 perhaps I will try again.

      Mark



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Using libunistring for string comparisons et al
  2011-03-11 23:09   ` Mark H Weaver
@ 2011-03-12 13:46     ` Ludovic Courtès
  2011-03-12 17:28       ` Mark H Weaver
  2011-03-13  4:05       ` O(1) accessors for UTF-8 backed strings Mark H Weaver
  2011-03-30  9:03     ` Using libunistring for string comparisons et al Andy Wingo
  1 sibling, 2 replies; 17+ messages in thread
From: Ludovic Courtès @ 2011-03-12 13:46 UTC (permalink / raw)
  To: guile-devel

Hello!

Mark H Weaver <mhw@netris.org> writes:

> I claim that any reasonable code which currently uses string-ref and
> string-set! could be more cleanly written using string ports or
> string-{fold,unfold}{,-right}.

I agree, and we should encourage this.  However...

I find Cowan’s proposal for string iteration and the R6RS editors
response interesting:

  http://www.r6rs.org/formal-comments/comment-235.txt

I also think strings should remain what they currently are, with O(1)
random access.

I think anything beyond that may require a new API, perhaps with a new
data type, which would deserve a SRFI with wider discussion.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Using libunistring for string comparisons et al
  2011-03-12 13:46     ` Ludovic Courtès
@ 2011-03-12 17:28       ` Mark H Weaver
  2011-03-13 21:30         ` Ludovic Courtès
  2011-03-13  4:05       ` O(1) accessors for UTF-8 backed strings Mark H Weaver
  1 sibling, 1 reply; 17+ messages in thread
From: Mark H Weaver @ 2011-03-12 17:28 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

ludo@gnu.org (Ludovic Courtès) writes:
> I find Cowan’s proposal for string iteration and the R6RS editors
> response interesting:
>
>   http://www.r6rs.org/formal-comments/comment-235.txt

Cowan was proposing a complex new API.  I am not, nor did Gauche.
An efficient implementation of string ports is all that is needed.

> I also think strings should remain what they currently are, with O(1)
> random access.

I understand your position, and perhaps you are right.

Unfortunately, the alternatives are not pleasant.  We have a bunch of
bugs in our string handling functions.  Currently, our case-insensitive
string comparisons and case conversions are not correct for several
languages including German, according to the R6RS among other things.

We could easily fix these problems by using libunistring, which provides
the operations we need, but only if we use a single string
representation, and one that is supported by libunistring (UTF-8,
UTF-16, or UTF-32).

So, our options appear to be:

  * Use only wide strings internally.

  * Reimplement several complex functions from libunistring within guile
    (string comparisons and case conversions).

  * Convert strings to a libunistring-supported representation, and
    possibly back again, on each operation.  For example, this will be
    needed when comparing two narrow strings, when comparing a narrow
    string to a wide string, or when applying a case conversion to a
    narrow string.

Our use of two different internal string representations is another
problem.  Right now, our string comparisons are painfully inefficient.
Take a look at compare_strings in srfi-13.c.  It's also broken w.r.t.
case-insensitive comparisons.  In order to fix this and make it
efficient, we'll need to make several different variants:

  * case-sensitive
     * narrow-narrow
     * narrow-wide
     * wide-wide (use libunistring's u32_cmp2 for this)

  * case-insensitive
     * narrow-narrow
     * narrow-wide
     * wide-wide (use libunistring for this)

The case-insensitive narrow-narrow comparison must be able to handle
this, for example (from r6rs-lib):

  (string-ci=? "Straße" "Strasse") => #t

I'm not yet sure what's involved in implementing the case-insensitive
narrow-wide case properly.

    Best,
     Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Using libunistring for string comparisons et al
  2011-03-12 17:28       ` Mark H Weaver
@ 2011-03-13 21:30         ` Ludovic Courtès
  2011-03-30  9:05           ` Andy Wingo
  0 siblings, 1 reply; 17+ messages in thread
From: Ludovic Courtès @ 2011-03-13 21:30 UTC (permalink / raw)
  To: guile-devel

Hi Mark,

Mark H Weaver <mhw@netris.org> writes:

> Unfortunately, the alternatives are not pleasant.  We have a bunch of
> bugs in our string handling functions.  Currently, our case-insensitive
> string comparisons and case conversions are not correct for several
> languages including German, according to the R6RS among other things.
>
> We could easily fix these problems by using libunistring, which provides
> the operations we need, but only if we use a single string
> representation, and one that is supported by libunistring (UTF-8,
> UTF-16, or UTF-32).

I don’t think so.  For instance, you could “upgrade” narrow strings to
UTF-32 and then use libunistring on that.  That would fix case-folding
for “Straße”, I guess.

> So, our options appear to be:
>
>   * Use only wide strings internally.
>
>   * Reimplement several complex functions from libunistring within guile
>     (string comparisons and case conversions).
>
>   * Convert strings to a libunistring-supported representation, and
>     possibly back again, on each operation.  For example, this will be
>     needed when comparing two narrow strings, when comparing a narrow
>     string to a wide string, or when applying a case conversion to a
>     narrow string.
>
> Our use of two different internal string representations is another
> problem.  Right now, our string comparisons are painfully inefficient.

Inefficient in the (unlikely) case that you’re comparing a narrow and a
wide string of different lengths.

So yes, the current implementation has bugs, but I think most if not all
can be fixed with minimal changes.  Would you like to look into it
for 2.0.x?

Using UTF-8 internally has problems of its own, as Mike explained, which
is why it was rejected in the first place.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Using libunistring for string comparisons et al
  2011-03-13 21:30         ` Ludovic Courtès
@ 2011-03-30  9:05           ` Andy Wingo
  0 siblings, 0 replies; 17+ messages in thread
From: Andy Wingo @ 2011-03-30  9:05 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

On Sun 13 Mar 2011 22:30, ludo@gnu.org (Ludovic Courtès) writes:

> So yes, the current implementation has bugs, but I think most if not all
> can be fixed with minimal changes.  Would you like to look into it
> for 2.0.x?

I very much agree with this sentiment for 2.0.x.  Let's not let the
perfect be the enemy of the good.  We can revisit things on master, for
2.2.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 17+ messages in thread

* O(1) accessors for UTF-8 backed strings
  2011-03-12 13:46     ` Ludovic Courtès
  2011-03-12 17:28       ` Mark H Weaver
@ 2011-03-13  4:05       ` Mark H Weaver
  2011-03-13  4:42         ` Alex Shinn
  2011-03-30  9:20         ` Andy Wingo
  1 sibling, 2 replies; 17+ messages in thread
From: Mark H Weaver @ 2011-03-13  4:05 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

ludo@gnu.org (Ludovic Courtès) writes:
> I also think strings should remain what they currently are, with O(1)
> random access.

I just realized that it is possible to implement O(1) accessors for
UTF-8 backed strings.

The simplest solution is to break the stringbuf into chunks, where each
chunk holds exactly CHUNK_SIZE characters, except for the last chunk
which might have fewer.  To find a character at index i, we scan forward
(i % CHUNK_SIZE) characters from the beginning of chunk[i / CHUNK_SIZE].
Ideally, CHUNK_SIZE is a power of 2, which allows us to use bitwise
operations instead of division.  The number of characters to scan is
bounded by CHUNK_SIZE, and therefore takes O(1) time.

String-set! in general requires us to resize a chunk.  Although chunks
contain a constant number of characters, that of course maps to a
variable number of bytes.  There are various tricks we can do to reduce
the number of reallocations done, but even in the worse case, the time
spent is bounded by CHUNK_SIZE, and thus O(1).

Converting a string to UTF-8 now consists of concatenating its chunks.
For short strings (<= CHUNK_SIZE) we can simply return the address of
the first chunk.  In the general case, we'll have to recombine the
chunks into a new block, which of course takes O(n) time.

With a little added complexity, we can reduce this cost in common cases,
and even eliminate it completely when string-set! is not used.  The idea
is to initially store a stringbuf in a single piece.  Only after the
first string-set! would the stringbuf be broken up into chunks.

A flag in the stringbuf would indicate whether or not it is chunked.  In
a chunked string, each chunk[i] points to its own memory block.  In an
unchunked string, the entire string is in one block of UTF-8, pointed to
by chunk[0].  chunk[i] for i>0 points into the middle of the block, at
an offset of i*CHUNK_SIZE characters.  This allows string-ref to treat
chunked and unchunked stringbufs equivalently.

It is probably worthwhile to actually _convert_ a stringbuf back into
unchunked form whenever we convert a string to utf8.  That way, in the
common case where string-set! is only used to initialize the string,
later conversions to utf8 can be done in constant time (except the first
one, which can be charged to the initialization routine).

Now, it is my hope that string-set! will be seldom used, and that we can
encourage people to instead use string ports and/or string-unfold and
string-unfold-right.  These other functions can be implemented without
string-set!, and will not require the use of the chunked representation.

Therefore, it is my hope that the chunked representation can be avoided
altogether in most all cases, which enables fast constant-time
conversions to UTF-8, and thus the efficient use of libunistring and any
other library that understands UTF-8.

What do you think?

      Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: O(1) accessors for UTF-8 backed strings
  2011-03-13  4:05       ` O(1) accessors for UTF-8 backed strings Mark H Weaver
@ 2011-03-13  4:42         ` Alex Shinn
  2011-03-15 15:46           ` Mark H Weaver
  2011-03-30  9:20         ` Andy Wingo
  1 sibling, 1 reply; 17+ messages in thread
From: Alex Shinn @ 2011-03-13  4:42 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

On Sun, Mar 13, 2011 at 1:05 PM, Mark H Weaver <mhw@netris.org> wrote:
> ludo@gnu.org (Ludovic Courtès) writes:
>> I also think strings should remain what they currently are, with O(1)
>> random access.
>
> I just realized that it is possible to implement O(1) accessors for
> UTF-8 backed strings.

It's possible with several approaches, but not necessarily worth it:

http://trac.sacrideo.us/wg/wiki/StringRepresentations

-- 
Alex



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: O(1) accessors for UTF-8 backed strings
  2011-03-13  4:42         ` Alex Shinn
@ 2011-03-15 15:46           ` Mark H Weaver
  2011-03-16  0:07             ` Alex Shinn
  2011-03-19 13:02             ` Andy Wingo
  0 siblings, 2 replies; 17+ messages in thread
From: Mark H Weaver @ 2011-03-15 15:46 UTC (permalink / raw)
  To: Alex Shinn; +Cc: Ludovic Courtès, guile-devel

Alex Shinn <alexshinn@gmail.com> wrote:
> On Sun, Mar 13, 2011 at 1:05 PM, Mark H Weaver <mhw@netris.org> wrote:
>> I just realized that it is possible to implement O(1) accessors for
>> UTF-8 backed strings.
>
> It's possible with several approaches, but not necessarily worth it:
>
> http://trac.sacrideo.us/wg/wiki/StringRepresentations

Alex, can you please clarify your position?  I fear that readers of your
message might assume that you are against my proposal to store strings
internally in UTF-8.  Having read the text that you referenced above, I
suspect that you are in favor of using UTF-8 with O(n) string accessors.

For those who may not be familiar with the special properties of UTF-8,
please read at least the section on "Common Algorithms and Usage
Patterns" near the end of the text Alex referenced.  In summary, many
operations on UTF-8 such as substring searches, regexp searches, and
parsing can be done one byte at a time, using the same inner loop that
would be used for ASCII or Latin-1.  Also, although it is not mentioned
there, even simple string comparisons (done lexigraphically by code
point) can be done byte-wise on UTF-8.

I'd also like to point out that the R6RS is the only relevant standard
that mandates O(1) string accessors.  The R5RS did not require this, and
WG1 for the R7RS has already voted against this requirement.

  http://trac.sacrideo.us/wg/ticket/27

I'll write more on this later.

    Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: O(1) accessors for UTF-8 backed strings
  2011-03-15 15:46           ` Mark H Weaver
@ 2011-03-16  0:07             ` Alex Shinn
  2011-03-19 13:02             ` Andy Wingo
  1 sibling, 0 replies; 17+ messages in thread
From: Alex Shinn @ 2011-03-16  0:07 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

On Wed, Mar 16, 2011 at 12:46 AM, Mark H Weaver <mhw@netris.org> wrote:
> Alex Shinn <alexshinn@gmail.com> wrote:
>> On Sun, Mar 13, 2011 at 1:05 PM, Mark H Weaver <mhw@netris.org> wrote:
>>> I just realized that it is possible to implement O(1) accessors for
>>> UTF-8 backed strings.
>>
>> It's possible with several approaches, but not necessarily worth it:
>>
>> http://trac.sacrideo.us/wg/wiki/StringRepresentations
>
> Alex, can you please clarify your position?  I fear that readers of your
> message might assume that you are against my proposal to store strings
> internally in UTF-8.  Having read the text that you referenced above, I
> suspect that you are in favor of using UTF-8 with O(n) string accessors.

I didn't intend to make a recommendation either way,
just point to a useful resource where people have collected
ideas and data on the topic so you could make an informed
decision.

You are correct that I personally prefer simple UTF-8 with O(n)
string accessors, which is why the Unicode support I added
for Chicken does this, as does my own chibi-scheme.  But
the best string representation depends on your use cases.

-- 
Alex



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: O(1) accessors for UTF-8 backed strings
  2011-03-15 15:46           ` Mark H Weaver
  2011-03-16  0:07             ` Alex Shinn
@ 2011-03-19 13:02             ` Andy Wingo
  1 sibling, 0 replies; 17+ messages in thread
From: Andy Wingo @ 2011-03-19 13:02 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

On Tue 15 Mar 2011 16:46, Mark H Weaver <mhw@netris.org> writes:

> I'd also like to point out that the R6RS is the only relevant standard
> that mandates O(1) string accessors.  The R5RS did not require this, and
> WG1 for the R7RS has already voted against this requirement.
>
>   http://trac.sacrideo.us/wg/ticket/27

Only tangentially related to the question under discussion, but I find
this a bit baffling.  I found
http://trac.sacrideo.us/wg/wiki/StringRepresentations, but it's a lot of
original work (record1 ??), and doesn't lead to any conclusions.

http://trac.sacrideo.us/wg/wiki/WG1Ballot1Results lists no rationales.

Grr.

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: O(1) accessors for UTF-8 backed strings
  2011-03-13  4:05       ` O(1) accessors for UTF-8 backed strings Mark H Weaver
  2011-03-13  4:42         ` Alex Shinn
@ 2011-03-30  9:20         ` Andy Wingo
  1 sibling, 0 replies; 17+ messages in thread
From: Andy Wingo @ 2011-03-30  9:20 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

On Sun 13 Mar 2011 05:05, Mark H Weaver <mhw@netris.org> writes:

> ludo@gnu.org (Ludovic Courtès) writes:
>> I also think strings should remain what they currently are, with O(1)
>> random access.
>
> I just realized that it is possible to implement O(1) accessors for
> UTF-8 backed strings.

If we switch to UTF-8, I like this proposal, generally.

It would be good if it continued to support cheap, shared substrings,
but without the mutexen we have now.  Perhaps it would also be good to
do away with stringbuf objects entirely; dunno.

O(log N) accessors would also fine be with me, e.g. via skip lists or
tries or something.

A tricky problem!

Regards,

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Using libunistring for string comparisons et al
  2011-03-11 23:09   ` Mark H Weaver
  2011-03-12 13:46     ` Ludovic Courtès
@ 2011-03-30  9:03     ` Andy Wingo
  2011-03-31 14:19       ` Ludovic Courtès
  1 sibling, 1 reply; 17+ messages in thread
From: Andy Wingo @ 2011-03-30  9:03 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

Hi Mark!

I think UTF-8 could be a good plan for 2.1/2.2, but I wanted to make
sure we understand what string-ref is good for...

On Sat 12 Mar 2011 00:09, Mark H Weaver <mhw@netris.org> writes:

> I claim that any reasonable code which currently uses string-ref and
> string-set! could be more cleanly written using string ports or
> string-{fold,unfold}{,-right}.

Folding and unfolding traverses the entire string.  Sometimes this is
indeed what you want.

But sometimes you want to search in a string, roll back and forward,
keep pointers to various parts in the string, etc.  Integer indices work
well as pointers to parts of strings, is what I'm saying here.  They are
immediate values that can be compared with < and >.

Any change to Guile's internal character encoding should not start from
the premise that string-ref is obsolete or unimportant, especially
considering that there is no other standard "string pointer" mechanism.

Regards,

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Using libunistring for string comparisons et al
  2011-03-30  9:03     ` Using libunistring for string comparisons et al Andy Wingo
@ 2011-03-31 14:19       ` Ludovic Courtès
  0 siblings, 0 replies; 17+ messages in thread
From: Ludovic Courtès @ 2011-03-31 14:19 UTC (permalink / raw)
  To: guile-devel

Hello,

Andy Wingo <wingo@pobox.com> writes:

> Any change to Guile's internal character encoding should not start from
> the premise that string-ref is obsolete or unimportant, especially
> considering that there is no other standard "string pointer" mechanism.

+1

There are idioms like:

  (let ((start (string-index s #\,))
        (end   (string-rindex s #\,)))
    (substring s (+ 1 start) end))

which may not have a better equivalent (sometimes ‘string-tokenize’ can
be used, sometimes not.)

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Using libunistring for string comparisons et al
  2011-03-11 22:33 ` Using libunistring for string comparisons et al Mark H Weaver
  2011-03-11 22:36   ` Mark H Weaver
  2011-03-11 23:09   ` Mark H Weaver
@ 2011-03-12 13:36   ` Ludovic Courtès
  2 siblings, 0 replies; 17+ messages in thread
From: Ludovic Courtès @ 2011-03-12 13:36 UTC (permalink / raw)
  To: guile-devel

Hello!

Mark H Weaver <mhw@netris.org> writes:

> I'm aware that this proposal will be very controversial, but starting in
> Guile 2.2, I think we ought to consider storing strings internally in
> UTF-8, as is done in Gauche.

I don’t think so.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-03-31 14:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-11  0:54 uc_tolower (uc_toupper (x)) Mike Gran
2011-03-11 22:33 ` Using libunistring for string comparisons et al Mark H Weaver
2011-03-11 22:36   ` Mark H Weaver
2011-03-11 23:09   ` Mark H Weaver
2011-03-12 13:46     ` Ludovic Courtès
2011-03-12 17:28       ` Mark H Weaver
2011-03-13 21:30         ` Ludovic Courtès
2011-03-30  9:05           ` Andy Wingo
2011-03-13  4:05       ` O(1) accessors for UTF-8 backed strings Mark H Weaver
2011-03-13  4:42         ` Alex Shinn
2011-03-15 15:46           ` Mark H Weaver
2011-03-16  0:07             ` Alex Shinn
2011-03-19 13:02             ` Andy Wingo
2011-03-30  9:20         ` Andy Wingo
2011-03-30  9:03     ` Using libunistring for string comparisons et al Andy Wingo
2011-03-31 14:19       ` Ludovic Courtès
2011-03-12 13:36   ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).