From: David Kastrup <dak@gnu.org>
To: emacs-devel@gnu.org
Cc: gcl-devel@gnu.org
Subject: Re: utf8 and emacs text/string multibyte representation
Date: Sat, 01 Nov 2014 19:41:22 +0100 [thread overview]
Message-ID: <87y4ru227h.fsf@fencepost.gnu.org> (raw)
In-Reply-To: 87mw8a22mo.fsf@uwakimon.sk.tsukuba.ac.jp
"Stephen J. Turnbull" <stephen@xemacs.org> writes:
> Eli Zaretskii writes:
>
> > > Been discussing this elsewhere, and its come to my attention that not
> > > only do all unicode code-points not fit into UTF-16, but all unicode
> > > characters don't fit into unicode code-points :-). Presumably this is
> > > why emacs expanded to 22bits?
> >
> > Not sure what you mean here. All Unicode characters do fit into the
> > Unicode codepoint space. Emacs extends that codepoint space beyond 22
> > bits because it needs to support cultures which don't want unification
> > yet.
>
> I suppose he means grapheme complexes, such as various accented
> characters that can be constructed from composing characters but do
> not have precomposed forms in Unicode. As you say, that's not why
> Emacs extended the code space.
>
> > > Did you consider leaving aref, char-code and code-char alone and writing
> > > unicode functions on top of these, i.e. unicode-length!=length, as
> > > opposed to making aref itself do this translation under the hood,
> > > thereby violating the expectation of O(1) access, (which is certainly
> > > offered in other kinds of arrays, though it is questionable whether real
> > > users actually expect this for strings)?
>
> Actually, originally Emacs allowed you to treat text (buffers and
> strings) either as sequences of characters or arrays of bytes, and
> this was a real bug-breeder (and why XEmacs chose the pain of the
> incompatible separation of integer type from character type).
>
> I'm not sure if the feature is present in modern Emacs, but at the
> very least the usage is so rare today that I'm unaware of any.
string-as-unibyte and string-as-multibyte most certainly are available
for going from one to the other. But the commands working on either
unibyte or multibyte strings are the same. Similar for buffers. I have
no idea whether this is a problem vector for creating inconsistent
multibyte content. I could imagine it to be, but so could be
user-created CCL programs for code conversion.
--
David Kastrup
next prev parent reply other threads:[~2014-11-01 18:41 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-28 22:33 Referring to revisions in the git future Alan Mackenzie
2014-10-28 22:54 ` Óscar Fuentes
2014-10-28 23:05 ` Alan Mackenzie
2014-10-28 23:24 ` Óscar Fuentes
2014-10-31 22:47 ` Paul Eggert
2014-10-29 0:49 ` Eric S. Raymond
2014-10-29 3:38 ` Stephen J. Turnbull
2014-10-29 12:26 ` Stefan Monnier
2014-10-29 12:41 ` Alexander Baier
2014-10-29 14:52 ` Barry Warsaw
2014-10-29 15:01 ` David Kastrup
2014-10-29 15:06 ` Eric S. Raymond
2014-10-29 18:12 ` Barry Warsaw
2014-10-29 22:09 ` Lars Magne Ingebrigtsen
2014-10-29 22:29 ` Eric S. Raymond
2014-10-29 23:31 ` Paul Eggert
2014-10-30 0:01 ` Nic Ferrier
2014-10-30 1:53 ` Stefan Monnier
2014-10-30 2:10 ` Eric S. Raymond
2014-10-30 2:13 ` Paul Eggert
2014-10-30 2:48 ` Eric S. Raymond
2014-10-30 2:25 ` Glenn Morris
2014-10-30 10:10 ` David Kastrup
2014-10-30 13:03 ` Stefan Monnier
2014-10-30 13:40 ` David Kastrup
2014-10-30 14:00 ` Stefan Monnier
2014-10-30 13:02 ` Stefan Monnier
2014-10-30 15:12 ` Eric S. Raymond
2014-10-30 16:49 ` Stefan Monnier
2014-10-30 6:46 ` Jan Djärv
2014-10-30 7:36 ` Ivan Shmakov
2014-10-30 8:09 ` Jan Djärv
2014-10-30 8:31 ` Eric S. Raymond
2014-10-30 9:53 ` Andreas Schwab
2014-10-30 10:13 ` Eric S. Raymond
2014-10-30 10:32 ` Andreas Schwab
2014-10-30 11:13 ` Nicolas Richard
2014-10-30 10:12 ` David Kastrup
2014-10-30 13:29 ` Stefan Monnier
2014-10-30 15:33 ` DVCS design philosophy Eric S. Raymond
2014-10-30 16:59 ` Stefan Monnier
2014-10-30 17:41 ` Eric S. Raymond
2014-10-31 20:18 ` Referring to revisions in the git future Nicolas Richard
2014-10-31 21:11 ` Stefan Monnier
2014-11-01 1:44 ` Stephen J. Turnbull
2014-11-01 7:58 ` David Kastrup
2014-10-30 14:20 ` Barry Warsaw
2014-11-01 1:23 ` Stephen J. Turnbull
2014-10-30 15:52 ` Eli Zaretskii
2014-10-30 3:32 ` Stephen J. Turnbull
2014-10-30 4:35 ` Barry Warsaw
2014-10-30 5:24 ` Stephen J. Turnbull
2014-10-30 10:17 ` David Kastrup
2014-10-30 13:42 ` Alex Bennée
2014-10-30 13:19 ` Stefan Monnier
2014-10-31 6:36 ` Stephen J. Turnbull
2014-10-31 19:42 ` David Kastrup
2014-11-01 1:34 ` Stephen J. Turnbull
2014-11-01 7:05 ` Tassilo Horn
2014-11-01 7:09 ` Dima Kogan
2014-11-01 7:28 ` Paul Eggert
2014-11-01 7:49 ` David Kastrup
2014-11-01 9:46 ` Alan Mackenzie
2014-11-01 10:13 ` Eli Zaretskii
2014-11-01 11:33 ` Alan Mackenzie
2014-11-01 13:06 ` Eli Zaretskii
2014-11-01 13:21 ` Alan Mackenzie
2014-11-01 10:29 ` David Kastrup
2014-11-01 11:29 ` Alan Mackenzie
2014-11-01 11:57 ` David Kastrup
2014-11-01 12:29 ` Alan Mackenzie
2014-11-01 12:47 ` Ivan Shmakov
2014-11-01 13:46 ` Alan Mackenzie
2014-11-01 18:58 ` Stephen J. Turnbull
2014-11-01 12:49 ` David Kastrup
2014-10-29 1:11 ` Stefan Monnier
2014-10-29 6:06 ` Werner LEMBERG
2014-10-29 9:01 ` David Kastrup
2014-10-29 8:50 ` David Kastrup
2014-10-29 9:52 ` Eric S. Raymond
2014-10-29 11:00 ` David Kastrup
2014-10-29 14:32 ` Eli Zaretskii
2014-10-29 14:35 ` David Kastrup
2014-10-29 14:55 ` Eli Zaretskii
2014-10-30 4:44 ` Richard Stallman
2014-10-30 8:32 ` Eric S. Raymond
2014-10-30 10:25 ` David Kastrup
2014-10-30 11:51 ` Eric S. Raymond
2014-10-30 12:14 ` David Kastrup
2014-10-30 15:01 ` Eric S. Raymond
2014-10-30 15:53 ` Eli Zaretskii
2014-10-30 15:56 ` Eric S. Raymond
2014-10-30 16:44 ` Eli Zaretskii
2014-10-31 7:47 ` Richard Stallman
2014-10-31 8:17 ` Eli Zaretskii
2014-10-31 10:21 ` Eric S. Raymond
2014-10-29 12:35 ` Stefan Monnier
2014-10-29 13:00 ` Jose E. Marchesi
2014-10-29 13:59 ` Stefan Monnier
2014-10-29 14:39 ` Eric S. Raymond
2014-10-29 14:46 ` Rasmus
2014-10-29 14:52 ` Eric S. Raymond
2014-10-30 0:58 ` Rob Browning
2014-10-29 15:27 ` Stefan Monnier
2014-10-29 14:04 ` utf8 and emacs text/string multibyte representation Camm Maguire
2014-10-29 14:51 ` Eli Zaretskii
2014-10-29 15:55 ` Camm Maguire
2014-10-29 16:19 ` Eli Zaretskii
2014-10-30 14:13 ` Camm Maguire
2014-10-30 16:06 ` Eli Zaretskii
2014-10-30 16:27 ` Camm Maguire
2014-10-30 16:35 ` Eli Zaretskii
2014-10-31 18:05 ` Camm Maguire
2014-11-01 9:01 ` Eli Zaretskii
2014-11-01 18:32 ` Stephen J. Turnbull
2014-11-01 18:41 ` David Kastrup [this message]
2014-11-01 19:09 ` Stephen J. Turnbull
2014-11-02 0:56 ` Stefan Monnier
2014-11-01 1:16 ` Stephen J. Turnbull
2014-10-29 16:45 ` Stefan Monnier
2014-10-29 15:56 ` Raymond Toy
2014-10-30 14:16 ` Camm Maguire
2014-10-31 18:47 ` Sam Steingold
2014-10-31 21:00 ` Andreas Schwab
2014-10-31 19:52 ` [Gcl-devel] " Stefan Monnier
2014-10-30 3:08 ` Stephen J. Turnbull
2014-10-29 13:26 ` Referring to revisions in the git future Eric S. Raymond
2014-10-29 14:04 ` Stefan Monnier
2014-10-29 14:49 ` Eric S. Raymond
2014-10-30 2:43 ` Stephen J. Turnbull
2014-10-29 13:08 ` Jan Djärv
2014-10-29 13:27 ` Eric S. Raymond
2014-10-29 13:49 ` Eric S. Raymond
2014-10-29 18:03 ` Jan Djärv
2014-10-29 11:18 ` Alan Mackenzie
2014-10-29 11:37 ` David Kastrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y4ru227h.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=gcl-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).