From: Camm Maguire <camm@maguirefamily.org>
To: Raymond Toy <toy.raymond@gmail.com>
Cc: gcl-devel@gnu.org, emacs-devel@gnu.org
Subject: Re: utf8 and emacs text/string multibyte representation
Date: Thu, 30 Oct 2014 10:16:15 -0400 [thread overview]
Message-ID: <877fzhwsls.fsf@maguirefamily.org> (raw)
In-Reply-To: <m1tx2mna2g.fsf@gmail.com> (Raymond Toy's message of "Wed, 29 Oct 2014 08:56:55 -0700")
Greetings! Don't worry -- I'm not committed to this idea yet, just
exploring!
Do these other lisps allocate a fresh character on each aref? Do they
maintain some ~2^21 sized table in core? (And isn't emacs a "lisp"
:-)).
Take care,
Raymond Toy <toy.raymond@gmail.com> writes:
>>>>>> "Camm" == Camm Maguire <camm@maguirefamily.org> writes:
>
> Camm> Greetings! I've recently been considering supporting unicode in gcl by
> Camm> representing strings internally in utf8. It appears that emacs does the
> Camm> same or similar. Apart from the obvious memory footprint benefits, I'd
> Camm> like to ask what other advantages/disadvantages have been discovered.
> Camm> Much of the utf8 literature emphasizes that most algorithms can proceed
> Camm> conventionally in byte-wise fashion, including lexicographical ordering
> Camm> comparisons, given that almost all jobs are sequential, at least
> Camm> initially. A cached internal pointer storing the last referenced
> Camm> codepoint offset makes access essentially O(1). Yet setting string
> Camm> elements can trigger reallocations/memmove operations. While these can
> Camm> be aggregated over the setting of multiple elements, operations like
> Camm> nreverse look ridiculous if left in terms of calls to aref and aset.
>
> Camm> Thoughts, advice and experiences most appreciated.
>
> Have you looked at what other Lisp implementations do? AFAIK, none use
> utf-8. CCL and clisp use utf-32, cmucl and allegro use utf-16, sbcl
> and ecl(?) have two string types: 8-bit base-string and 32-bit
> strings.
>
> As a one-man operation (unfortunately), I'd go with the easiest one to
> get right and follow either ccl or cmucl. The rest of the support for
> unicode can be added with libraries like cl-unicode and/or babel, if
> need be.
>
> --
> Ray
>
>
> _______________________________________________
> Gcl-devel mailing list
> Gcl-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/gcl-devel
>
>
>
>
--
Camm Maguire camm@maguirefamily.org
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah
next prev parent reply other threads:[~2014-10-30 14:16 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-28 22:33 Referring to revisions in the git future Alan Mackenzie
2014-10-28 22:54 ` Óscar Fuentes
2014-10-28 23:05 ` Alan Mackenzie
2014-10-28 23:24 ` Óscar Fuentes
2014-10-31 22:47 ` Paul Eggert
2014-10-29 0:49 ` Eric S. Raymond
2014-10-29 3:38 ` Stephen J. Turnbull
2014-10-29 12:26 ` Stefan Monnier
2014-10-29 12:41 ` Alexander Baier
2014-10-29 14:52 ` Barry Warsaw
2014-10-29 15:01 ` David Kastrup
2014-10-29 15:06 ` Eric S. Raymond
2014-10-29 18:12 ` Barry Warsaw
2014-10-29 22:09 ` Lars Magne Ingebrigtsen
2014-10-29 22:29 ` Eric S. Raymond
2014-10-29 23:31 ` Paul Eggert
2014-10-30 0:01 ` Nic Ferrier
2014-10-30 1:53 ` Stefan Monnier
2014-10-30 2:10 ` Eric S. Raymond
2014-10-30 2:13 ` Paul Eggert
2014-10-30 2:48 ` Eric S. Raymond
2014-10-30 2:25 ` Glenn Morris
2014-10-30 10:10 ` David Kastrup
2014-10-30 13:03 ` Stefan Monnier
2014-10-30 13:40 ` David Kastrup
2014-10-30 14:00 ` Stefan Monnier
2014-10-30 13:02 ` Stefan Monnier
2014-10-30 15:12 ` Eric S. Raymond
2014-10-30 16:49 ` Stefan Monnier
2014-10-30 6:46 ` Jan Djärv
2014-10-30 7:36 ` Ivan Shmakov
2014-10-30 8:09 ` Jan Djärv
2014-10-30 8:31 ` Eric S. Raymond
2014-10-30 9:53 ` Andreas Schwab
2014-10-30 10:13 ` Eric S. Raymond
2014-10-30 10:32 ` Andreas Schwab
2014-10-30 11:13 ` Nicolas Richard
2014-10-30 10:12 ` David Kastrup
2014-10-30 13:29 ` Stefan Monnier
2014-10-30 15:33 ` DVCS design philosophy Eric S. Raymond
2014-10-30 16:59 ` Stefan Monnier
2014-10-30 17:41 ` Eric S. Raymond
2014-10-31 20:18 ` Referring to revisions in the git future Nicolas Richard
2014-10-31 21:11 ` Stefan Monnier
2014-11-01 1:44 ` Stephen J. Turnbull
2014-11-01 7:58 ` David Kastrup
2014-10-30 14:20 ` Barry Warsaw
2014-11-01 1:23 ` Stephen J. Turnbull
2014-10-30 15:52 ` Eli Zaretskii
2014-10-30 3:32 ` Stephen J. Turnbull
2014-10-30 4:35 ` Barry Warsaw
2014-10-30 5:24 ` Stephen J. Turnbull
2014-10-30 10:17 ` David Kastrup
2014-10-30 13:42 ` Alex Bennée
2014-10-30 13:19 ` Stefan Monnier
2014-10-31 6:36 ` Stephen J. Turnbull
2014-10-31 19:42 ` David Kastrup
2014-11-01 1:34 ` Stephen J. Turnbull
2014-11-01 7:05 ` Tassilo Horn
2014-11-01 7:09 ` Dima Kogan
2014-11-01 7:28 ` Paul Eggert
2014-11-01 7:49 ` David Kastrup
2014-11-01 9:46 ` Alan Mackenzie
2014-11-01 10:13 ` Eli Zaretskii
2014-11-01 11:33 ` Alan Mackenzie
2014-11-01 13:06 ` Eli Zaretskii
2014-11-01 13:21 ` Alan Mackenzie
2014-11-01 10:29 ` David Kastrup
2014-11-01 11:29 ` Alan Mackenzie
2014-11-01 11:57 ` David Kastrup
2014-11-01 12:29 ` Alan Mackenzie
2014-11-01 12:47 ` Ivan Shmakov
2014-11-01 13:46 ` Alan Mackenzie
2014-11-01 18:58 ` Stephen J. Turnbull
2014-11-01 12:49 ` David Kastrup
2014-10-29 1:11 ` Stefan Monnier
2014-10-29 6:06 ` Werner LEMBERG
2014-10-29 9:01 ` David Kastrup
2014-10-29 8:50 ` David Kastrup
2014-10-29 9:52 ` Eric S. Raymond
2014-10-29 11:00 ` David Kastrup
2014-10-29 14:32 ` Eli Zaretskii
2014-10-29 14:35 ` David Kastrup
2014-10-29 14:55 ` Eli Zaretskii
2014-10-30 4:44 ` Richard Stallman
2014-10-30 8:32 ` Eric S. Raymond
2014-10-30 10:25 ` David Kastrup
2014-10-30 11:51 ` Eric S. Raymond
2014-10-30 12:14 ` David Kastrup
2014-10-30 15:01 ` Eric S. Raymond
2014-10-30 15:53 ` Eli Zaretskii
2014-10-30 15:56 ` Eric S. Raymond
2014-10-30 16:44 ` Eli Zaretskii
2014-10-31 7:47 ` Richard Stallman
2014-10-31 8:17 ` Eli Zaretskii
2014-10-31 10:21 ` Eric S. Raymond
2014-10-29 12:35 ` Stefan Monnier
2014-10-29 13:00 ` Jose E. Marchesi
2014-10-29 13:59 ` Stefan Monnier
2014-10-29 14:39 ` Eric S. Raymond
2014-10-29 14:46 ` Rasmus
2014-10-29 14:52 ` Eric S. Raymond
2014-10-30 0:58 ` Rob Browning
2014-10-29 15:27 ` Stefan Monnier
2014-10-29 14:04 ` utf8 and emacs text/string multibyte representation Camm Maguire
2014-10-29 14:51 ` Eli Zaretskii
2014-10-29 15:55 ` Camm Maguire
2014-10-29 16:19 ` Eli Zaretskii
2014-10-30 14:13 ` Camm Maguire
2014-10-30 16:06 ` Eli Zaretskii
2014-10-30 16:27 ` Camm Maguire
2014-10-30 16:35 ` Eli Zaretskii
2014-10-31 18:05 ` Camm Maguire
2014-11-01 9:01 ` Eli Zaretskii
2014-11-01 18:32 ` Stephen J. Turnbull
2014-11-01 18:41 ` David Kastrup
2014-11-01 19:09 ` Stephen J. Turnbull
2014-11-02 0:56 ` Stefan Monnier
2014-11-01 1:16 ` Stephen J. Turnbull
2014-10-29 16:45 ` Stefan Monnier
2014-10-29 15:56 ` Raymond Toy
2014-10-30 14:16 ` Camm Maguire [this message]
2014-10-31 18:47 ` Sam Steingold
2014-10-31 21:00 ` Andreas Schwab
2014-10-31 19:52 ` [Gcl-devel] " Stefan Monnier
2014-10-30 3:08 ` Stephen J. Turnbull
2014-10-29 13:26 ` Referring to revisions in the git future Eric S. Raymond
2014-10-29 14:04 ` Stefan Monnier
2014-10-29 14:49 ` Eric S. Raymond
2014-10-30 2:43 ` Stephen J. Turnbull
2014-10-29 13:08 ` Jan Djärv
2014-10-29 13:27 ` Eric S. Raymond
2014-10-29 13:49 ` Eric S. Raymond
2014-10-29 18:03 ` Jan Djärv
2014-10-29 11:18 ` Alan Mackenzie
2014-10-29 11:37 ` David Kastrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877fzhwsls.fsf@maguirefamily.org \
--to=camm@maguirefamily.org \
--cc=emacs-devel@gnu.org \
--cc=gcl-devel@gnu.org \
--cc=toy.raymond@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).