all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Alan Mackenzie <acm@muc.de>
To: David Kastrup <dak@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Date: Thu, 19 Nov 2009 18:08:48 +0000	[thread overview]
Message-ID: <20091119180848.GE1314@muc.de> (raw)
In-Reply-To: <87aayiihe9.fsf@lola.goethe.zz>

Hi, David!

On Thu, Nov 19, 2009 at 05:55:10PM +0100, David Kastrup wrote:
> Alan Mackenzie <acm@muc.de> writes:

> > On Thu, Nov 19, 2009 at 10:30:18AM -0500, Stefan Monnier wrote:
> >> > The actual character in the string is ñ (#x3f).

> >> No: the string does not contain any characters, only bytes, because
> >> it's a unibyte string.

> > I'm thinking from the lisp viewpoint.  The string is a data
> > structure which contains characters.  I really don't want to have to
> > think about the difference between "chars" and "bytes" when I'm
> > hacking lisp.  If I do, then the abstraction "string" is broken.

> >> So it contains the byte 241, not the character ñ.

> > That is then a bug.  I wrote "(aset nl 0 ?ñ)", not "(aset nl 0 241)".

> Huh?  ?ñ is the Emacs code point of ñ.  Which is pretty much identical
> to the Unicode code point in Emacs 23.

No, you (all of you) are missing the point.  That point is that if an
Emacs Lisp hacker writes "?ñ", it should work, regardless of
what "codepoint" it has, what "bytes" represent it, whether those
"bytes" are coded with a different codepoint, or what have you.  All of
that stuff is uninteresting.  If it gets interesting, like now, it is
because it is buggy.

> >> The byte 241 can be inserted in multibyte strings and buffers
> >> because it is also a char of code 4194289 (which gets displayed as
> >> \361).

OK.  Surely displaying it as "\361" is a bug?  Should it not display as
"\17777761".  If it did, it would have saved half of my ranting.

> > Hang on a mo'!  How can the byte 241 "be" a char of code 4194289?
> > This is some strange usage of the word "be" that I wasn't previously
> > aware of.  ;-)

> Emacs encodes most of its things in utf-8.  A Unicode code point is an
> integer.  You can encode it in different encodings, resulting in
> different byte streams.  Inside of a byte stream encoded in utf-8, the
> isolated byte 241 does not correspond to a Unicode character.  It is not
> valid utf-8.  When Emacs reads a file supposedly in utf-8, it wants to
> represent _all_ possible byte streams in order to be able to save
> unchanged data unmolested.

That's a good explanation - it's sort of like &lt; in html.  Thanks.

> So it encodes the entity "illegal isolated byte 241 in an utf-8
> document" with the character code 4194289 which has a representation in
> Emacs' internal variant of utf-8, but is outside of the range of
> Unicode.

So, how did the character "ñ" get turned into the illegal byte #xf1?  Is
that the bug?

> > At this point, would you please just agree with me that when I do

> >    (setq nl "\n")
> >    (aset nl 0 ?ñ)
> >    (insert nl)

> > , what should appear on the screen should be "ñ", NOT "\361"?  Thanks!

> You assume that ?ñ is a character.

I do indeed.  It is self evident.

Now, would you too please just agree that when I execute the three forms
above, and "ñ" should appear?

The identical argument applies to "ä".  They are character used in
writing wierd European languages like Spanish and German.  Emacs should
not have difficulty with them.  It is a standard Emacs idiom that ?x (or
?\x) is the integer representing the character x.  Indeed (unlike in
XEmacs), characters ARE integers.  Why does this not work for, e.g.,
ISO-8559-1?

> But in Emacs, it is an integer, a Unicode code point in Emacs 23.

That sounds like the sort of argument one might read on
gnu-misc-discuss.  ;-)  Sorry.  Are you saying that Emacs is converting
"?ñ" and "?ä" into the wrong integers? 

> As long as there is something like a unibyte string, there is no way
> to distinguish the character 241 and the byte 241 except when Emacs is
> told explicitly.

What is the correct Emacs internal representation for "ñ" and "ä"?  They
surely cannot share internal representations with other
(non-)characters?

> Because Emacs has no separate "character" data type.

For which I am thankful.

> -- 
> David Kastrup

-- 
Alan Mackenzie (Nuremberg, Germany).




  reply	other threads:[~2009-11-19 18:08 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-18 19:12 [acm@muc.de: Re: Inadequate documentation of silly characters on screen.] Alan Mackenzie
2009-11-19  1:27 ` Fwd: Re: Inadequate documentation of silly characters on screen Stefan Monnier
2009-11-19  8:20   ` Alan Mackenzie
2009-11-19  8:50     ` Miles Bader
2009-11-19 10:16     ` Fwd: " Andreas Schwab
2009-11-19 12:21       ` Alan Mackenzie
2009-11-19 13:21       ` Jason Rumney
2009-11-19 13:35         ` Stefan Monnier
2009-11-19 14:18         ` Alan Mackenzie
2009-11-19 14:58           ` Jason Rumney
2009-11-19 15:42             ` Alan Mackenzie
2009-11-19 19:39               ` Eli Zaretskii
2009-11-19 15:30           ` Stefan Monnier
2009-11-19 15:58             ` Alan Mackenzie
2009-11-19 16:06               ` Andreas Schwab
2009-11-19 16:47               ` Aidan Kehoe
2009-11-19 17:29                 ` Alan Mackenzie
2009-11-19 18:21                   ` Aidan Kehoe
2009-11-20  2:43                   ` Stephen J. Turnbull
2009-11-19 19:45                 ` Eli Zaretskii
2009-11-19 20:07                   ` Eli Zaretskii
2009-11-19 19:55                 ` Stefan Monnier
2009-11-20  3:13                   ` Stephen J. Turnbull
2009-11-19 16:55               ` David Kastrup
2009-11-19 18:08                 ` Alan Mackenzie [this message]
2009-11-19 19:25                   ` Davis Herring
2009-11-19 21:25                     ` Alan Mackenzie
2009-11-19 22:31                       ` David Kastrup
2009-11-21 22:52                         ` Richard Stallman
2009-11-23  2:08                           ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stefan Monnier
2009-11-23 20:38                             ` Richard Stallman
2009-11-23 21:34                               ` Per Starbäck
2009-11-24 22:47                                 ` Richard Stallman
2009-11-25  1:33                                   ` Kenichi Handa
2009-11-25  2:29                                     ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-25  2:50                                       ` Lennart Borgman
2009-11-25  6:25                                       ` Stephen J. Turnbull
2009-11-25  5:40                                     ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Ulrich Mueller
2009-11-26 22:59                                       ` Displaying bytes Reiner Steib
2009-11-27  0:16                                         ` Ulrich Mueller
2009-11-27  1:41                                         ` Stefan Monnier
2009-11-27  4:14                                         ` Stephen J. Turnbull
2009-11-25  5:59                                     ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stephen J. Turnbull
2009-11-25  8:16                                       ` Kenichi Handa
2009-11-29 16:01                                     ` Richard Stallman
2009-11-29 16:31                                       ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-29 22:01                                         ` Juri Linkov
2009-11-30  6:05                                           ` tomas
2009-11-30 12:09                                             ` Andreas Schwab
2009-11-30 12:39                                               ` tomas
2009-11-29 22:19                                       ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Kim F. Storm
2009-11-30  1:42                                         ` Stephen J. Turnbull
2009-11-24  1:28                               ` Displaying bytes Stefan Monnier
2009-11-24 22:47                                 ` Richard Stallman
2009-11-25  2:18                                   ` Stefan Monnier
2009-11-26  6:24                                     ` Richard Stallman
2009-11-26  8:59                                       ` David Kastrup
2009-11-26 14:57                                       ` Stefan Monnier
2009-11-26 16:28                                         ` Lennart Borgman
2009-11-27  6:36                                         ` Richard Stallman
2009-11-24 22:47                                 ` Richard Stallman
2009-11-20  8:48                       ` Fwd: Re: Inadequate documentation of silly characters on screen Eli Zaretskii
2009-11-19 19:52                   ` Eli Zaretskii
2009-11-19 20:53                     ` Alan Mackenzie
2009-11-19 22:16                       ` David Kastrup
2009-11-20  8:55                         ` Eli Zaretskii
2009-11-19 20:05                   ` Stefan Monnier
2009-11-19 21:27                     ` Alan Mackenzie
2009-11-19 19:43               ` Eli Zaretskii
2009-11-19 21:57                 ` Alan Mackenzie
2009-11-19 23:10                   ` Stefan Monnier
2009-11-19 20:02               ` Stefan Monnier
2009-11-19 14:08     ` Stefan Monnier
2009-11-19 14:50       ` Jason Rumney
2009-11-19 15:27         ` Stefan Monnier
2009-11-19 23:12           ` Miles Bader
2009-11-20  2:16             ` Stefan Monnier
2009-11-20  3:37             ` Stephen J. Turnbull
2009-11-20  4:30               ` Stefan Monnier
2009-11-20  7:18                 ` Stephen J. Turnbull
2009-11-20 14:16                   ` Stefan Monnier
2009-11-21  4:13                     ` Stephen J. Turnbull
2009-11-21  5:24                       ` Stefan Monnier
2009-11-21  6:42                         ` Stephen J. Turnbull
2009-11-21  6:49                           ` Stefan Monnier
2009-11-21  7:27                             ` Stephen J. Turnbull
2009-11-23  1:58                               ` Stefan Monnier
2009-11-21 12:33                           ` David Kastrup
2009-11-21 13:55                             ` Stephen J. Turnbull
2009-11-21 14:36                               ` David Kastrup
2009-11-21 17:53                                 ` Stephen J. Turnbull
2009-11-21 23:30                                   ` David Kastrup
2009-11-22  1:27                                     ` Sebastian Rose
2009-11-22  8:06                                       ` David Kastrup
2009-11-22 23:52                                         ` Sebastian Rose
2009-11-19 17:08       ` Fwd: " Alan Mackenzie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091119180848.GE1314@muc.de \
    --to=acm@muc.de \
    --cc=dak@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.