all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <turnbull@sk.tsukuba.ac.jp>
To: David Kastrup <dak@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: Emacs rewrite in a maintainable language
Date: Mon, 19 Oct 2015 02:46:26 +0900	[thread overview]
Message-ID: <22051.56050.174581.446113@turnbull.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <87oafwm7xi.fsf@fencepost.gnu.org>

David Kastrup writes:

 > Personally I have no problem with an implementation insisting on
 > certain properties for its internal encoding.  But that implies
 > that "internal encoding" and "external UTF-8" may diverge when
 > "external UTF-8" does not exclusively contain valid UTF-8.

Then the external data shouldn't be called "UTF-8" in discussions like
this one.  The problem of data that is not valid for the presumed
encoding is not limited to UTF-8, Unicode, or even to text.  It just
happens that we have good solutions (not limited to ritual suicide)
for the text stream case.

Also, we should remember that Unicode is a wire protocol.  It's very
useful to adapt the formats defined by Unicode for constructing and
parsing internal and external data -- that can be very efficient.  But
we also need to have a strict-conformance option for I/O that is
declared to be Unicode, and that probably be the default.

 > However, if "internal encoding" is not the same as "valid UTF-8"
 > throughout, it means that code called with it has to be able to
 > deal with the representations for invalid UTF-8.

Emacs certainly can deal, since it has a 'binary' encoding and can
represent that internally.  But that's awfully inconvenient.
Something like Emacs's current implementation, Markus Kuhn's UTF-8b,
or Python's PEP 383 is really required for Emacs implementations.
(Does anybody remember that awful mail format of Win2k beta's version
of Outlook Express, where the HTML tags were encoded in ASCII and the
element content in little-endian UTF-16?)

 > [Emacs's internal text representation is] not cast into stone but
 > pretty efficient (I think Python uses 3-byte surrogate sequences
 > for raw bytes, somewhat worse)

No.  Python uses a wide-char representation.  In Python 2, it's 2
bytes on most non-glibc platforms, and 4 bytes on glibc.  In Python 3
with PEP 393 support, valid ISO-8859-1 text (even if decoded from
another external encoding) is represented in one byte, valid BMP text
(optionally with support for invalid "rawbytes", internally encoded as
lone trailing surrogates) in two bytes, and text containing characters
from the astral planes in four bytes (again with optional support for
invalid rawbytes).

 > and straightforward as it keeps the basic UTF-8 coding scheme
 > invariants intact.
 > 
 > Of course, all of this can be done simpler using an UCS-32
 > representation, but the basic tradeoffs leading to Emacs using a
 > variable-size multibyte representation are still valid in my
 > opinion.

Seems reasonable to me.  So far Python with PEP 393 has been pretty
successful, but since emoticons live in the astral planes, I suspect
it may not be the best representation for the web and phones -- one
smiley in ASCII text will quadruple the needed string storage.  I
don't see a good reason to change Emacs's representation at this
point.




  reply	other threads:[~2015-10-18 17:46 UTC|newest]

Thread overview: 250+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-11  8:11 Emacs rewrite in a maintainable language Przemysław Wojnowski
2015-10-11  8:17 ` David Kastrup
2015-10-11 22:02   ` Marcin Borkowski
2015-10-11 22:14     ` John Wiegley
2015-10-11 22:37       ` Óscar Fuentes
2015-10-11 22:37       ` Marcin Borkowski
2015-10-11 22:49         ` John Wiegley
2015-10-11 22:51         ` Óscar Fuentes
2015-10-11 23:12         ` Drew Adams
2015-10-12  2:43     ` Eli Zaretskii
2015-10-12 15:35       ` Eli Zaretskii
2015-10-12 21:30         ` Daniel Colascione
2015-10-12 20:01     ` Richard Stallman
2015-10-11  8:54 ` Alexis
2015-10-11 10:53   ` Przemysław Wojnowski
2015-10-11 11:23 ` Thomas Koch
2015-10-11 13:11   ` Dmitry Gutov
2015-10-11 13:36     ` David Kastrup
2015-10-11 13:39       ` Dmitry Gutov
2015-10-11 13:55         ` David Kastrup
2015-10-11 14:03           ` Dmitry Gutov
2015-10-11 12:52 ` Daniel Colascione
2015-10-11 12:59 ` Fabrice Popineau
2015-10-11 17:25 ` John Wiegley
2015-10-11 18:32   ` Óscar Fuentes
2015-10-11 19:14     ` Eli Zaretskii
2015-10-11 19:43       ` Óscar Fuentes
2015-10-11 19:53         ` Eli Zaretskii
2015-10-11 20:13           ` Óscar Fuentes
2015-10-12  2:33             ` Eli Zaretskii
2015-10-12  3:59               ` Paul Eggert
2015-10-12  8:12                 ` Steinar Bang
2015-10-12  9:36                   ` Marcin Borkowski
2015-10-12 10:20                     ` David Kastrup
2015-10-12 12:23               ` Óscar Fuentes
2015-10-12 16:08                 ` Eli Zaretskii
2015-10-12 20:00         ` Richard Stallman
2015-10-13  2:36           ` Rustom Mody
2015-10-12 10:56       ` Michael Heerdegen
2015-10-11 21:52     ` John Wiegley
2015-10-12  7:14       ` David Kastrup
2015-10-12 12:48     ` Oleh Krehel
2015-10-12 13:22       ` Óscar Fuentes
2015-10-12 14:18         ` Oleh Krehel
2015-10-12 15:04           ` David Kastrup
2015-10-12 18:24         ` John Wiegley
2015-10-12 19:21           ` Óscar Fuentes
2015-10-12 19:39             ` John Wiegley
2015-10-12 19:46               ` Eli Zaretskii
2015-10-12 19:58                 ` Eli Zaretskii
2015-10-12 20:11                   ` John Wiegley
2015-10-12 20:42                     ` Marcin Borkowski
2015-10-12 20:46                     ` Óscar Fuentes
2015-10-13 14:57                     ` Eli Zaretskii
2015-10-13 16:22                       ` John Wiegley
2015-10-13 16:40                         ` Drew Adams
2015-10-13 16:49                           ` John Wiegley
2015-10-12 20:40               ` Drew Adams
2015-10-13  4:18                 ` John Wiegley
2015-10-13  6:00                   ` immerrr again
2015-10-13 14:59                     ` Eli Zaretskii
2015-10-13  1:12               ` Óscar Fuentes
2015-10-13 10:01               ` David Kastrup
2015-10-13 15:12                 ` Eli Zaretskii
2015-10-13 15:20                   ` David Kastrup
2015-10-14 15:04                     ` Chris Patti
2015-10-14 15:34                       ` Jay Belanger
2015-10-16 12:25                   ` Guile-Emacs Ludovic Courtès
2015-10-16 12:03                 ` Emacs rewrite in a maintainable language Ludovic Courtès
2015-10-16 13:30                   ` Eli Zaretskii
2015-10-16 14:55                     ` Wolfgang Jenkner
2015-10-16 15:14                       ` Eli Zaretskii
2015-10-16 15:25                     ` Ludovic Courtès
2015-10-16 15:51                       ` David Kastrup
2015-10-16 14:29                   ` David Kastrup
2015-10-16 15:08                     ` Eli Zaretskii
2015-10-16 15:28                       ` David Kastrup
2015-10-16 16:05                         ` Eli Zaretskii
2015-10-16 15:31                       ` Ludovic Courtès
2015-10-16 16:11                         ` Eli Zaretskii
2015-10-16 19:34                           ` Przemysław Wojnowski
2015-10-16 19:51                             ` David Kastrup
2015-10-16 19:52                             ` Eli Zaretskii
2015-10-16 20:51                           ` Ludovic Courtès
2015-10-17  5:27                             ` David Kastrup
2015-10-17  7:20                             ` Eli Zaretskii
2015-10-17  9:44                               ` Ludovic Courtès
2015-10-17 10:24                                 ` Eli Zaretskii
2015-10-18 10:22                                   ` Ludovic Courtès
2015-10-18 11:33                                     ` David Kastrup
2015-10-18 12:54                                       ` Taylan Ulrich Bayırlı/Kammer
2015-10-18 13:17                                         ` David Kastrup
2015-10-18 14:40                                           ` Taylan Ulrich Bayırlı/Kammer
2015-10-18 15:31                                             ` David Kastrup
2015-10-18 16:19                                         ` Eli Zaretskii
2015-10-18 16:37                                           ` Taylan Ulrich Bayırlı/Kammer
2015-10-18 16:44                                             ` Eli Zaretskii
2015-10-18 17:06                                               ` Taylan Ulrich Bayırlı/Kammer
2015-10-18 17:11                                                 ` David Kastrup
2015-10-18 17:36                                                 ` Eli Zaretskii
2015-10-18 17:52                                                   ` John Wiegley
2015-10-18 18:23                                                     ` Daniel Colascione
2015-10-18 18:35                                                       ` David Kastrup
2015-10-18 18:53                                                         ` Daniel Colascione
2015-10-18 19:03                                                           ` David Kastrup
2015-10-18 19:13                                                           ` Paul Eggert
2015-10-18 19:35                                                           ` Taylan Ulrich Bayırlı/Kammer
2015-10-18 19:49                                                             ` Daniel Colascione
2015-10-19  7:59                                                               ` Taylan Ulrich Bayırlı/Kammer
2015-10-19 10:50                                                                 ` Stephen J. Turnbull
2015-10-19 10:59                                                                   ` Eli Zaretskii
2015-10-19 11:31                                                                     ` David Kastrup
2015-10-19 12:04                                                                       ` David Kastrup
2015-10-19 11:24                                                                   ` David Kastrup
2015-10-20  4:18                                                                     ` Stephen J. Turnbull
2015-10-20  7:36                                                                       ` David Kastrup
2015-10-20 10:17                                                                         ` Stephen J. Turnbull
2015-10-19 12:26                                                                   ` Taylan Ulrich Bayırlı/Kammer
2015-10-19 12:53                                                                     ` David Kastrup
2015-10-19 17:43                                                                   ` Tom Tromey
2015-10-19 18:06                                                                     ` David Kastrup
2015-10-20  2:46                                                                       ` Tom Tromey
2015-10-19  8:46                                                               ` David Kastrup
2015-10-19  9:39                                                                 ` Przemysław Wojnowski
2015-10-19  5:33                                                           ` Richard Stallman
2015-10-26 11:01                                                             ` Alexis
2015-10-18 19:16                                                       ` Taylan Ulrich Bayırlı/Kammer
2015-10-18 22:38                                                     ` Nicolas Petton
2015-10-20  7:34                                                       ` John Wiegley
2015-10-18 16:40                                           ` John Wiegley
2015-10-18 16:56                                             ` David Kastrup
2015-10-18 17:46                                               ` Stephen J. Turnbull [this message]
2015-10-19  7:45                                             ` Gian Uberto Lauri
2015-10-17 15:38                                 ` David Kastrup
2015-10-17 16:25                                   ` Taylan Ulrich Bayırlı/Kammer
2015-10-17 16:43                                     ` David Kastrup
2015-10-17 17:00                                       ` Taylan Ulrich Bayırlı/Kammer
2015-10-17 16:48                                     ` Eli Zaretskii
2015-10-17 17:03                                       ` Taylan Ulrich Bayırlı/Kammer
2015-10-17 17:08                                         ` David Kastrup
2015-10-17 17:10                                         ` Eli Zaretskii
2015-10-17 18:31                                           ` Stephen J. Turnbull
2015-10-17 17:04                                     ` David Kastrup
2015-10-17 17:32                                       ` Taylan Ulrich Bayırlı/Kammer
2015-10-17 17:42                                         ` David Kastrup
2015-10-17 18:34                                           ` Taylan Ulrich Bayırlı/Kammer
2015-10-17 19:15                                             ` Eli Zaretskii
2015-10-17 21:22                                               ` Taylan Ulrich Bayırlı/Kammer
2015-10-18  0:23                                                 ` John Wiegley
2015-10-18 15:53                           ` Richard Stallman
2015-10-18 16:58               ` Tom Tromey
2015-10-18 17:40                 ` John Wiegley
2015-10-18 19:40                   ` Eli Zaretskii
2015-10-18 20:47                     ` David Kastrup
2015-10-19  3:55                   ` Tom Tromey
2015-10-20  7:33                     ` John Wiegley
2015-10-12 19:43             ` Eli Zaretskii
2015-10-13  8:27               ` Przemysław Wojnowski
2015-10-13  8:52                 ` Gian Uberto Lauri
2015-10-13 10:19                 ` Tassilo Horn
2015-10-13 15:14                   ` Eli Zaretskii
2015-10-13 19:45                     ` Tassilo Horn
2015-10-13 15:05                 ` Eli Zaretskii
2015-10-13 16:09                   ` John Wiegley
2015-10-13 20:43                   ` Przemysław Wojnowski
2015-10-13 16:06                 ` John Wiegley
2015-10-13 20:20                   ` Przemysław Wojnowski
2015-10-13 21:22                     ` emacs IDE features (was: Emacs rewrite in a maintainable language) Andrés Ramírez
2015-10-13 22:13                       ` emacs IDE features John Wiegley
2015-10-14 11:11                         ` Phillip Lord
2015-10-16 23:13                           ` John Wiegley
2015-10-17  7:40                             ` Eli Zaretskii
2015-10-17 23:51                               ` John Wiegley
2015-10-18  0:48                                 ` Sacha Chua
2015-10-18  2:34                                   ` Xue Fuqiao
2015-10-18 17:05                                     ` Sacha Chua
2015-10-18 17:31                                       ` John Wiegley
2015-10-18 16:21                                   ` John Wiegley
2015-10-19 16:37                                   ` Christopher Allan Webber
2015-10-18 16:47                                 ` Eli Zaretskii
2015-10-18 17:30                                   ` John Wiegley
2015-10-19 10:30                             ` Phillip Lord
2015-10-20  6:56                               ` John Wiegley
2015-10-13 22:10                     ` Emacs rewrite in a maintainable language John Wiegley
2015-10-12 23:00             ` Camm Maguire
2015-10-13  1:38               ` Alexis
2015-10-13  1:40                 ` Daniel Colascione
2015-10-13 23:34             ` Richard Stallman
2015-10-13 23:55               ` John Wiegley
2015-10-13  5:28           ` Ken Raeburn
2015-10-13  5:39             ` John Wiegley
2015-10-13 10:13               ` David Kastrup
2015-10-14  1:43                 ` Daniel Colascione
2015-10-13  6:49             ` Stephen J. Turnbull
2015-10-12 13:50       ` David Kastrup
2015-10-12 15:17         ` Oleh Krehel
2015-10-12 15:35           ` David Kastrup
2015-10-12 16:29           ` Eli Zaretskii
2015-10-12 22:39           ` Paul Eggert
2015-10-13 11:27             ` Oleh Krehel
2015-10-13 11:46               ` Alan Mackenzie
2015-10-13 12:02                 ` Oleh Krehel
2015-10-13 12:21                   ` Alan Mackenzie
2015-10-13 12:22                   ` Mathieu Lirzin
2015-10-13 13:52                     ` John Yates
2015-10-13 14:30                       ` David Kastrup
2015-10-13 16:26                         ` Andreas Schwab
2015-10-13 16:40                           ` John Wiegley
2015-10-13 14:38                       ` Oleh Krehel
2015-10-13 13:06                   ` Sergey Organov
2015-10-13 14:19                     ` Artur Malabarba
2015-10-13 14:39                       ` David Kastrup
2015-10-13 15:21                         ` Artur Malabarba
2015-10-13 15:53                           ` David Kastrup
2015-10-13 16:09                             ` Oleh Krehel
2015-10-13 16:23                               ` David Kastrup
2015-10-13 16:31                               ` Eli Zaretskii
2015-10-13 16:38                                 ` David Kastrup
2015-10-13 15:21                 ` Eli Zaretskii
2015-10-13 15:42                   ` David Kastrup
2015-10-13 15:32               ` Paul Eggert
2015-10-13 16:13                 ` Oleh Krehel
2015-10-13 21:02                   ` Andy Moreton
2015-10-14  8:15                     ` Oleh Krehel
2015-10-14 13:28                       ` Andy Moreton
2015-10-14 16:18                       ` Eli Zaretskii
2015-10-14 10:22                     ` Przemysław Wojnowski
2015-10-14 10:56                       ` Tassilo Horn
2015-10-14 11:14                         ` Przemysław Wojnowski
2015-10-14 11:33                         ` Oleh Krehel
2015-10-14 12:12                           ` Tassilo Horn
2015-10-14 11:46                         ` David Kastrup
2015-10-14 12:29                           ` Tassilo Horn
2015-10-14 11:21                     ` Mathieu Lirzin
2015-10-14 16:05                     ` John Wiegley
2015-10-13 16:13               ` John Wiegley
2015-10-12 15:09       ` Paul Eggert
2015-10-12 15:24         ` David Kastrup
2015-10-12 15:24         ` Oleh Krehel
2015-10-12 16:31           ` Eli Zaretskii
2015-10-12 17:20         ` Stephen J. Turnbull
2015-10-13 12:02           ` Marcus Harnisch
2015-10-13 23:38             ` Richard Stallman
2015-10-14  1:46             ` Daniel Colascione
2015-10-14 13:08               ` Marcus Harnisch
2015-10-12 16:18       ` Eli Zaretskii
2015-10-12 17:47         ` Steinar Bang
2015-10-12 17:59           ` David Kastrup
2015-10-12 21:28       ` Daniel Colascione
2015-10-11 18:36   ` Przemysław Wojnowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22051.56050.174581.446113@turnbull.sk.tsukuba.ac.jp \
    --to=turnbull@sk.tsukuba.ac.jp \
    --cc=dak@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.