unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <stephen@xemacs.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Camm Maguire <camm@maguirefamily.org>,
	gcl-devel@gnu.org, emacs-devel@gnu.org
Subject: Re: utf8 and emacs text/string multibyte representation
Date: Sun, 02 Nov 2014 03:32:15 +0900	[thread overview]
Message-ID: <87mw8a22mo.fsf@uwakimon.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <831tpnz442.fsf@gnu.org>

Eli Zaretskii writes:

 > > Been discussing this elsewhere, and its come to my attention that not
 > > only do all unicode code-points not fit into UTF-16, but all unicode
 > > characters don't fit into unicode code-points :-).  Presumably this is
 > > why emacs expanded to 22bits?
 > 
 > Not sure what you mean here.  All Unicode characters do fit into the
 > Unicode codepoint space.  Emacs extends that codepoint space beyond 22
 > bits because it needs to support cultures which don't want unification
 > yet.

I suppose he means grapheme complexes, such as various accented
characters that can be constructed from composing characters but do
not have precomposed forms in Unicode.  As you say, that's not why
Emacs extended the code space.

 > > Did you consider leaving aref, char-code and code-char alone and writing
 > > unicode functions on top of these, i.e. unicode-length!=length, as
 > > opposed to making aref itself do this translation under the hood,
 > > thereby violating the expectation of O(1) access, (which is certainly
 > > offered in other kinds of arrays, though it is questionable whether real
 > > users actually expect this for strings)?

Actually, originally Emacs allowed you to treat text (buffers and
strings) either as sequences of characters or arrays of bytes, and
this was a real bug-breeder (and why XEmacs chose the pain of the
incompatible separation of integer type from character type).

I'm not sure if the feature is present in modern Emacs, but at the
very least the usage is so rare today that I'm unaware of any.

That's not what you asked, but it implies the answer "no, and you
shouldn't, either" to your question.  This is despite the fact that
yes, in many languages and applications users *do* expect O(1) access
to individual characters in text.



  reply	other threads:[~2014-11-01 18:32 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-28 22:33 Referring to revisions in the git future Alan Mackenzie
2014-10-28 22:54 ` Óscar Fuentes
2014-10-28 23:05   ` Alan Mackenzie
2014-10-28 23:24     ` Óscar Fuentes
2014-10-31 22:47     ` Paul Eggert
2014-10-29  0:49 ` Eric S. Raymond
2014-10-29  3:38   ` Stephen J. Turnbull
2014-10-29 12:26     ` Stefan Monnier
2014-10-29 12:41       ` Alexander Baier
2014-10-29 14:52   ` Barry Warsaw
2014-10-29 15:01     ` David Kastrup
2014-10-29 15:06       ` Eric S. Raymond
2014-10-29 18:12         ` Barry Warsaw
2014-10-29 22:09           ` Lars Magne Ingebrigtsen
2014-10-29 22:29             ` Eric S. Raymond
2014-10-29 23:31               ` Paul Eggert
2014-10-30  0:01                 ` Nic Ferrier
2014-10-30  1:53                 ` Stefan Monnier
2014-10-30  2:10                   ` Eric S. Raymond
2014-10-30  2:13                     ` Paul Eggert
2014-10-30  2:48                       ` Eric S. Raymond
2014-10-30  2:25                     ` Glenn Morris
2014-10-30 10:10                       ` David Kastrup
2014-10-30 13:03                         ` Stefan Monnier
2014-10-30 13:40                           ` David Kastrup
2014-10-30 14:00                             ` Stefan Monnier
2014-10-30 13:02                     ` Stefan Monnier
2014-10-30 15:12                       ` Eric S. Raymond
2014-10-30 16:49                         ` Stefan Monnier
2014-10-30  6:46                 ` Jan Djärv
2014-10-30  7:36                   ` Ivan Shmakov
2014-10-30  8:09                     ` Jan Djärv
2014-10-30  8:31                     ` Eric S. Raymond
2014-10-30  9:53                       ` Andreas Schwab
2014-10-30 10:13                         ` Eric S. Raymond
2014-10-30 10:32                           ` Andreas Schwab
2014-10-30 11:13                             ` Nicolas Richard
2014-10-30 10:12                       ` David Kastrup
2014-10-30 13:29                       ` Stefan Monnier
2014-10-30 15:33                         ` DVCS design philosophy Eric S. Raymond
2014-10-30 16:59                           ` Stefan Monnier
2014-10-30 17:41                             ` Eric S. Raymond
2014-10-31 20:18                         ` Referring to revisions in the git future Nicolas Richard
2014-10-31 21:11                           ` Stefan Monnier
2014-11-01  1:44                             ` Stephen J. Turnbull
2014-11-01  7:58                             ` David Kastrup
2014-10-30 14:20                       ` Barry Warsaw
2014-11-01  1:23                         ` Stephen J. Turnbull
2014-10-30 15:52                     ` Eli Zaretskii
2014-10-30  3:32           ` Stephen J. Turnbull
2014-10-30  4:35             ` Barry Warsaw
2014-10-30  5:24               ` Stephen J. Turnbull
2014-10-30 10:17               ` David Kastrup
2014-10-30 13:42               ` Alex Bennée
2014-10-30 13:19             ` Stefan Monnier
2014-10-31  6:36               ` Stephen J. Turnbull
2014-10-31 19:42               ` David Kastrup
2014-11-01  1:34                 ` Stephen J. Turnbull
2014-11-01  7:05                   ` Tassilo Horn
2014-11-01  7:09                     ` Dima Kogan
2014-11-01  7:28                     ` Paul Eggert
2014-11-01  7:49                     ` David Kastrup
2014-11-01  9:46                       ` Alan Mackenzie
2014-11-01 10:13                         ` Eli Zaretskii
2014-11-01 11:33                           ` Alan Mackenzie
2014-11-01 13:06                             ` Eli Zaretskii
2014-11-01 13:21                               ` Alan Mackenzie
2014-11-01 10:29                         ` David Kastrup
2014-11-01 11:29                           ` Alan Mackenzie
2014-11-01 11:57                             ` David Kastrup
2014-11-01 12:29                               ` Alan Mackenzie
2014-11-01 12:47                                 ` Ivan Shmakov
2014-11-01 13:46                                   ` Alan Mackenzie
2014-11-01 18:58                                     ` Stephen J. Turnbull
2014-11-01 12:49                                 ` David Kastrup
2014-10-29  1:11 ` Stefan Monnier
2014-10-29  6:06   ` Werner LEMBERG
2014-10-29  9:01     ` David Kastrup
2014-10-29  8:50 ` David Kastrup
2014-10-29  9:52   ` Eric S. Raymond
2014-10-29 11:00     ` David Kastrup
2014-10-29 14:32       ` Eli Zaretskii
2014-10-29 14:35         ` David Kastrup
2014-10-29 14:55           ` Eli Zaretskii
2014-10-30  4:44             ` Richard Stallman
2014-10-30  8:32               ` Eric S. Raymond
2014-10-30 10:25                 ` David Kastrup
2014-10-30 11:51                   ` Eric S. Raymond
2014-10-30 12:14                     ` David Kastrup
2014-10-30 15:01                       ` Eric S. Raymond
2014-10-30 15:53                 ` Eli Zaretskii
2014-10-30 15:56                   ` Eric S. Raymond
2014-10-30 16:44                     ` Eli Zaretskii
2014-10-31  7:47                 ` Richard Stallman
2014-10-31  8:17                   ` Eli Zaretskii
2014-10-31 10:21                   ` Eric S. Raymond
2014-10-29 12:35     ` Stefan Monnier
2014-10-29 13:00       ` Jose E. Marchesi
2014-10-29 13:59         ` Stefan Monnier
2014-10-29 14:39           ` Eric S. Raymond
2014-10-29 14:46             ` Rasmus
2014-10-29 14:52               ` Eric S. Raymond
2014-10-30  0:58               ` Rob Browning
2014-10-29 15:27             ` Stefan Monnier
2014-10-29 14:04         ` utf8 and emacs text/string multibyte representation Camm Maguire
2014-10-29 14:51           ` Eli Zaretskii
2014-10-29 15:55             ` Camm Maguire
2014-10-29 16:19               ` Eli Zaretskii
2014-10-30 14:13                 ` Camm Maguire
2014-10-30 16:06                   ` Eli Zaretskii
2014-10-30 16:27                     ` Camm Maguire
2014-10-30 16:35                       ` Eli Zaretskii
2014-10-31 18:05                         ` Camm Maguire
2014-11-01  9:01                           ` Eli Zaretskii
2014-11-01 18:32                             ` Stephen J. Turnbull [this message]
2014-11-01 18:41                               ` David Kastrup
2014-11-01 19:09                                 ` Stephen J. Turnbull
2014-11-02  0:56                                 ` Stefan Monnier
2014-11-01  1:16                         ` Stephen J. Turnbull
2014-10-29 16:45             ` Stefan Monnier
2014-10-29 15:56           ` Raymond Toy
2014-10-30 14:16             ` Camm Maguire
2014-10-31 18:47               ` Sam Steingold
2014-10-31 21:00                 ` Andreas Schwab
2014-10-31 19:52               ` [Gcl-devel] " Stefan Monnier
2014-10-30  3:08           ` Stephen J. Turnbull
2014-10-29 13:26       ` Referring to revisions in the git future Eric S. Raymond
2014-10-29 14:04         ` Stefan Monnier
2014-10-29 14:49           ` Eric S. Raymond
2014-10-30  2:43           ` Stephen J. Turnbull
2014-10-29 13:08     ` Jan Djärv
2014-10-29 13:27       ` Eric S. Raymond
2014-10-29 13:49         ` Eric S. Raymond
2014-10-29 18:03           ` Jan Djärv
2014-10-29 11:18   ` Alan Mackenzie
2014-10-29 11:37     ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mw8a22mo.fsf@uwakimon.sk.tsukuba.ac.jp \
    --to=stephen@xemacs.org \
    --cc=camm@maguirefamily.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=gcl-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).