unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: Alex Shinn <alexshinn@gmail.com>
Cc: "Ludovic Courtès" <ludo@gnu.org>, guile-devel@gnu.org
Subject: Re: O(1) accessors for UTF-8 backed strings
Date: Tue, 15 Mar 2011 11:46:07 -0400	[thread overview]
Message-ID: <87ipvkmlog.fsf@netris.org> (raw)
In-Reply-To: <AANLkTinRhXC_XQGJpqWS5vQBerFSY_+iv+ee_ibrH5uo@mail.gmail.com> (Alex Shinn's message of "Sun, 13 Mar 2011 13:42:34 +0900")

Alex Shinn <alexshinn@gmail.com> wrote:
> On Sun, Mar 13, 2011 at 1:05 PM, Mark H Weaver <mhw@netris.org> wrote:
>> I just realized that it is possible to implement O(1) accessors for
>> UTF-8 backed strings.
>
> It's possible with several approaches, but not necessarily worth it:
>
> http://trac.sacrideo.us/wg/wiki/StringRepresentations

Alex, can you please clarify your position?  I fear that readers of your
message might assume that you are against my proposal to store strings
internally in UTF-8.  Having read the text that you referenced above, I
suspect that you are in favor of using UTF-8 with O(n) string accessors.

For those who may not be familiar with the special properties of UTF-8,
please read at least the section on "Common Algorithms and Usage
Patterns" near the end of the text Alex referenced.  In summary, many
operations on UTF-8 such as substring searches, regexp searches, and
parsing can be done one byte at a time, using the same inner loop that
would be used for ASCII or Latin-1.  Also, although it is not mentioned
there, even simple string comparisons (done lexigraphically by code
point) can be done byte-wise on UTF-8.

I'd also like to point out that the R6RS is the only relevant standard
that mandates O(1) string accessors.  The R5RS did not require this, and
WG1 for the R7RS has already voted against this requirement.

  http://trac.sacrideo.us/wg/ticket/27

I'll write more on this later.

    Mark



  reply	other threads:[~2011-03-15 15:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-11  0:54 uc_tolower (uc_toupper (x)) Mike Gran
2011-03-11 22:33 ` Using libunistring for string comparisons et al Mark H Weaver
2011-03-11 22:36   ` Mark H Weaver
2011-03-11 23:09   ` Mark H Weaver
2011-03-12 13:46     ` Ludovic Courtès
2011-03-12 17:28       ` Mark H Weaver
2011-03-13 21:30         ` Ludovic Courtès
2011-03-30  9:05           ` Andy Wingo
2011-03-13  4:05       ` O(1) accessors for UTF-8 backed strings Mark H Weaver
2011-03-13  4:42         ` Alex Shinn
2011-03-15 15:46           ` Mark H Weaver [this message]
2011-03-16  0:07             ` Alex Shinn
2011-03-19 13:02             ` Andy Wingo
2011-03-30  9:20         ` Andy Wingo
2011-03-30  9:03     ` Using libunistring for string comparisons et al Andy Wingo
2011-03-31 14:19       ` Ludovic Courtès
2011-03-12 13:36   ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ipvkmlog.fsf@netris.org \
    --to=mhw@netris.org \
    --cc=alexshinn@gmail.com \
    --cc=guile-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).