unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Raymond Toy <toy.raymond@gmail.com>
To: gcl-devel@gnu.org
Cc: emacs-devel@gnu.org
Subject: Re: utf8 and emacs text/string multibyte representation
Date: Wed, 29 Oct 2014 08:56:55 -0700	[thread overview]
Message-ID: <m1tx2mna2g.fsf@gmail.com> (raw)
In-Reply-To: 87zjcfx985.fsf_-_@maguirefamily.org

>>>>> "Camm" == Camm Maguire <camm@maguirefamily.org> writes:

    Camm> Greetings!  I've recently been considering supporting unicode in gcl by
    Camm> representing strings internally in utf8.  It appears that emacs does the
    Camm> same or similar.  Apart from the obvious memory footprint benefits, I'd
    Camm> like to ask what other advantages/disadvantages have been discovered.
    Camm> Much of the utf8 literature emphasizes that most algorithms can proceed
    Camm> conventionally in byte-wise fashion, including lexicographical ordering
    Camm> comparisons, given that almost all jobs are sequential, at least
    Camm> initially.  A cached internal pointer storing the last referenced
    Camm> codepoint offset makes access essentially O(1).  Yet setting string
    Camm> elements can trigger reallocations/memmove operations.  While these can
    Camm> be aggregated over the setting of multiple elements, operations like
    Camm> nreverse look ridiculous if left in terms of calls to aref and aset.

    Camm> Thoughts, advice and experiences most appreciated.

Have you looked at what other Lisp implementations do? AFAIK, none use
utf-8. CCL and clisp use utf-32, cmucl and allegro use utf-16, sbcl
and ecl(?) have two string types: 8-bit base-string and 32-bit
strings.

As a one-man operation (unfortunately), I'd go with the easiest one to
get right and follow either ccl or cmucl.  The rest of the support for
unicode can be added with libraries like cl-unicode and/or babel, if
need be.

--
Ray

  parent reply	other threads:[~2014-10-29 15:56 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-28 22:33 Referring to revisions in the git future Alan Mackenzie
2014-10-28 22:54 ` Óscar Fuentes
2014-10-28 23:05   ` Alan Mackenzie
2014-10-28 23:24     ` Óscar Fuentes
2014-10-31 22:47     ` Paul Eggert
2014-10-29  0:49 ` Eric S. Raymond
2014-10-29  3:38   ` Stephen J. Turnbull
2014-10-29 12:26     ` Stefan Monnier
2014-10-29 12:41       ` Alexander Baier
2014-10-29 14:52   ` Barry Warsaw
2014-10-29 15:01     ` David Kastrup
2014-10-29 15:06       ` Eric S. Raymond
2014-10-29 18:12         ` Barry Warsaw
2014-10-29 22:09           ` Lars Magne Ingebrigtsen
2014-10-29 22:29             ` Eric S. Raymond
2014-10-29 23:31               ` Paul Eggert
2014-10-30  0:01                 ` Nic Ferrier
2014-10-30  1:53                 ` Stefan Monnier
2014-10-30  2:10                   ` Eric S. Raymond
2014-10-30  2:13                     ` Paul Eggert
2014-10-30  2:48                       ` Eric S. Raymond
2014-10-30  2:25                     ` Glenn Morris
2014-10-30 10:10                       ` David Kastrup
2014-10-30 13:03                         ` Stefan Monnier
2014-10-30 13:40                           ` David Kastrup
2014-10-30 14:00                             ` Stefan Monnier
2014-10-30 13:02                     ` Stefan Monnier
2014-10-30 15:12                       ` Eric S. Raymond
2014-10-30 16:49                         ` Stefan Monnier
2014-10-30  6:46                 ` Jan Djärv
2014-10-30  7:36                   ` Ivan Shmakov
2014-10-30  8:09                     ` Jan Djärv
2014-10-30  8:31                     ` Eric S. Raymond
2014-10-30  9:53                       ` Andreas Schwab
2014-10-30 10:13                         ` Eric S. Raymond
2014-10-30 10:32                           ` Andreas Schwab
2014-10-30 11:13                             ` Nicolas Richard
2014-10-30 10:12                       ` David Kastrup
2014-10-30 13:29                       ` Stefan Monnier
2014-10-30 15:33                         ` DVCS design philosophy Eric S. Raymond
2014-10-30 16:59                           ` Stefan Monnier
2014-10-30 17:41                             ` Eric S. Raymond
2014-10-31 20:18                         ` Referring to revisions in the git future Nicolas Richard
2014-10-31 21:11                           ` Stefan Monnier
2014-11-01  1:44                             ` Stephen J. Turnbull
2014-11-01  7:58                             ` David Kastrup
2014-10-30 14:20                       ` Barry Warsaw
2014-11-01  1:23                         ` Stephen J. Turnbull
2014-10-30 15:52                     ` Eli Zaretskii
2014-10-30  3:32           ` Stephen J. Turnbull
2014-10-30  4:35             ` Barry Warsaw
2014-10-30  5:24               ` Stephen J. Turnbull
2014-10-30 10:17               ` David Kastrup
2014-10-30 13:42               ` Alex Bennée
2014-10-30 13:19             ` Stefan Monnier
2014-10-31  6:36               ` Stephen J. Turnbull
2014-10-31 19:42               ` David Kastrup
2014-11-01  1:34                 ` Stephen J. Turnbull
2014-11-01  7:05                   ` Tassilo Horn
2014-11-01  7:09                     ` Dima Kogan
2014-11-01  7:28                     ` Paul Eggert
2014-11-01  7:49                     ` David Kastrup
2014-11-01  9:46                       ` Alan Mackenzie
2014-11-01 10:13                         ` Eli Zaretskii
2014-11-01 11:33                           ` Alan Mackenzie
2014-11-01 13:06                             ` Eli Zaretskii
2014-11-01 13:21                               ` Alan Mackenzie
2014-11-01 10:29                         ` David Kastrup
2014-11-01 11:29                           ` Alan Mackenzie
2014-11-01 11:57                             ` David Kastrup
2014-11-01 12:29                               ` Alan Mackenzie
2014-11-01 12:47                                 ` Ivan Shmakov
2014-11-01 13:46                                   ` Alan Mackenzie
2014-11-01 18:58                                     ` Stephen J. Turnbull
2014-11-01 12:49                                 ` David Kastrup
2014-10-29  1:11 ` Stefan Monnier
2014-10-29  6:06   ` Werner LEMBERG
2014-10-29  9:01     ` David Kastrup
2014-10-29  8:50 ` David Kastrup
2014-10-29  9:52   ` Eric S. Raymond
2014-10-29 11:00     ` David Kastrup
2014-10-29 14:32       ` Eli Zaretskii
2014-10-29 14:35         ` David Kastrup
2014-10-29 14:55           ` Eli Zaretskii
2014-10-30  4:44             ` Richard Stallman
2014-10-30  8:32               ` Eric S. Raymond
2014-10-30 10:25                 ` David Kastrup
2014-10-30 11:51                   ` Eric S. Raymond
2014-10-30 12:14                     ` David Kastrup
2014-10-30 15:01                       ` Eric S. Raymond
2014-10-30 15:53                 ` Eli Zaretskii
2014-10-30 15:56                   ` Eric S. Raymond
2014-10-30 16:44                     ` Eli Zaretskii
2014-10-31  7:47                 ` Richard Stallman
2014-10-31  8:17                   ` Eli Zaretskii
2014-10-31 10:21                   ` Eric S. Raymond
2014-10-29 12:35     ` Stefan Monnier
2014-10-29 13:00       ` Jose E. Marchesi
2014-10-29 13:59         ` Stefan Monnier
2014-10-29 14:39           ` Eric S. Raymond
2014-10-29 14:46             ` Rasmus
2014-10-29 14:52               ` Eric S. Raymond
2014-10-30  0:58               ` Rob Browning
2014-10-29 15:27             ` Stefan Monnier
2014-10-29 14:04         ` utf8 and emacs text/string multibyte representation Camm Maguire
2014-10-29 14:51           ` Eli Zaretskii
2014-10-29 15:55             ` Camm Maguire
2014-10-29 16:19               ` Eli Zaretskii
2014-10-30 14:13                 ` Camm Maguire
2014-10-30 16:06                   ` Eli Zaretskii
2014-10-30 16:27                     ` Camm Maguire
2014-10-30 16:35                       ` Eli Zaretskii
2014-10-31 18:05                         ` Camm Maguire
2014-11-01  9:01                           ` Eli Zaretskii
2014-11-01 18:32                             ` Stephen J. Turnbull
2014-11-01 18:41                               ` David Kastrup
2014-11-01 19:09                                 ` Stephen J. Turnbull
2014-11-02  0:56                                 ` Stefan Monnier
2014-11-01  1:16                         ` Stephen J. Turnbull
2014-10-29 16:45             ` Stefan Monnier
2014-10-29 15:56           ` Raymond Toy [this message]
2014-10-30 14:16             ` Camm Maguire
2014-10-31 18:47               ` Sam Steingold
2014-10-31 21:00                 ` Andreas Schwab
2014-10-31 19:52               ` [Gcl-devel] " Stefan Monnier
2014-10-30  3:08           ` Stephen J. Turnbull
2014-10-29 13:26       ` Referring to revisions in the git future Eric S. Raymond
2014-10-29 14:04         ` Stefan Monnier
2014-10-29 14:49           ` Eric S. Raymond
2014-10-30  2:43           ` Stephen J. Turnbull
2014-10-29 13:08     ` Jan Djärv
2014-10-29 13:27       ` Eric S. Raymond
2014-10-29 13:49         ` Eric S. Raymond
2014-10-29 18:03           ` Jan Djärv
2014-10-29 11:18   ` Alan Mackenzie
2014-10-29 11:37     ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1tx2mna2g.fsf@gmail.com \
    --to=toy.raymond@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=gcl-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).