all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Davis Herring" <herring@lanl.gov>
To: "Alan Mackenzie" <acm@muc.de>
Cc: David Kastrup <dak@gnu.org>, emacs-devel@gnu.org
Subject: Re: Fwd: Re: Inadequate documentation of silly characters on  screen.
Date: Thu, 19 Nov 2009 11:25:05 -0800 (PST)	[thread overview]
Message-ID: <47325.130.55.118.19.1258658705.squirrel@webmail.lanl.gov> (raw)
In-Reply-To: <20091119180848.GE1314@muc.de>

[I end up having to say the same thing several times here; I thought it
preferable to omitting any of Alan's questions or any aspect of the
problem.  It's not meant to be a rant.]

> No, you (all of you) are missing the point.  That point is that if an
> Emacs Lisp hacker writes "?ñ", it should work, regardless of
> what "codepoint" it has, what "bytes" represent it, whether those
> "bytes" are coded with a different codepoint, or what have you.  All of
> that stuff is uninteresting.  If it gets interesting, like now, it is
> because it is buggy.

When you wrote ?ñ, it did work -- that character has the Unicode (and
Emacs 23) code point 241, so that two-character token is entirely
equivalent to the token "241" in Emacs source.  (This is independent of
the encoding of the source file: the same two characters might be
represented by many different octet sequences in the source file, but you
always get 241 as the value (which is a code point and is distinct from
octet sequences anyway).)

But you didn't insert that object!  You forced it into a (perhaps
surprisingly: unibyte) string, which interpreted its argument (the integer
241) as a raw byte value, because that's what unibyte strings contain. 
When you then inserted the string, Emacs transformed it into a (somewhat
artificial) character whose meaning is "this was really the byte 241,
which, since it corresponds to no UTF-8 character, must merely be
reproduced literally on disk" and whose Emacs code point is 4194289. 
(That integer looks like it could be derived from 241 by sign-extension
for the convenience of Emacs hackers; the connection is unimportant to the
user.)

> OK.  Surely displaying it as "\361" is a bug?  Should it not display as
> "\17777761".  If it did, it would have saved half of my ranting.

No: characters are displayed according to their meaning, not their
internal code point.  As it happens, this character's whole meaning is
"the byte #o361", so that's what's displayed.

> So, how did the character "ñ" get turned into the illegal byte #xf1?  Is
> that the bug?

By its use in `aset' in a unibyte context (determined entirely by the
target string).

>> You assume that ?ñ is a character.
>
> I do indeed.  It is self evident.

Its characterness is determined by context, because (as you know) Emacs
has no distinct character type.  So, in the isolation of English prose, we
have no way of telling whether ?ñ "is" a character or an integer, any more
than we can guess about 241.  (We can guess about the writer's desires,
but not about the real effects.)

> Now, would you too please just agree that when I execute the three forms
> above, and "ñ" should appear?

That's Stefan's point: should common string literals generate multibyte
strings (so as to change the meaning, not of the string, but of `aset', to
what you want)?  Maybe: one could also address the issue by disallowing
`aset' on unibyte strings (or strings entirely) and introducing
`aset-unibyte' (and perhaps `aset-multibyte') so that the argument
interpretation (and the O(n) nature of the latter) would be made clear to
the programmer.  Maybe the doc-string for `aset' should just bear a really
loud warning.

It bears more consideration than merely "yes" to your question, as
reasonable as it seems.

> What is the correct Emacs internal representation for "ñ" and "ä"?  They
> surely cannot share internal representations with other
> (non-)characters?

They have the unique internal representation as (mostly) Unicode code
points (integers) 241 and 228, which happen to be identical to the
representations of bytes of those values (which interpretation prevails in
a unibyte context).

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.




  reply	other threads:[~2009-11-19 19:25 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-18 19:12 [acm@muc.de: Re: Inadequate documentation of silly characters on screen.] Alan Mackenzie
2009-11-19  1:27 ` Fwd: Re: Inadequate documentation of silly characters on screen Stefan Monnier
2009-11-19  8:20   ` Alan Mackenzie
2009-11-19  8:50     ` Miles Bader
2009-11-19 10:16     ` Fwd: " Andreas Schwab
2009-11-19 12:21       ` Alan Mackenzie
2009-11-19 13:21       ` Jason Rumney
2009-11-19 13:35         ` Stefan Monnier
2009-11-19 14:18         ` Alan Mackenzie
2009-11-19 14:58           ` Jason Rumney
2009-11-19 15:42             ` Alan Mackenzie
2009-11-19 19:39               ` Eli Zaretskii
2009-11-19 15:30           ` Stefan Monnier
2009-11-19 15:58             ` Alan Mackenzie
2009-11-19 16:06               ` Andreas Schwab
2009-11-19 16:47               ` Aidan Kehoe
2009-11-19 17:29                 ` Alan Mackenzie
2009-11-19 18:21                   ` Aidan Kehoe
2009-11-20  2:43                   ` Stephen J. Turnbull
2009-11-19 19:45                 ` Eli Zaretskii
2009-11-19 20:07                   ` Eli Zaretskii
2009-11-19 19:55                 ` Stefan Monnier
2009-11-20  3:13                   ` Stephen J. Turnbull
2009-11-19 16:55               ` David Kastrup
2009-11-19 18:08                 ` Alan Mackenzie
2009-11-19 19:25                   ` Davis Herring [this message]
2009-11-19 21:25                     ` Alan Mackenzie
2009-11-19 22:31                       ` David Kastrup
2009-11-21 22:52                         ` Richard Stallman
2009-11-23  2:08                           ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stefan Monnier
2009-11-23 20:38                             ` Richard Stallman
2009-11-23 21:34                               ` Per Starbäck
2009-11-24 22:47                                 ` Richard Stallman
2009-11-25  1:33                                   ` Kenichi Handa
2009-11-25  2:29                                     ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-25  2:50                                       ` Lennart Borgman
2009-11-25  6:25                                       ` Stephen J. Turnbull
2009-11-25  5:40                                     ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Ulrich Mueller
2009-11-26 22:59                                       ` Displaying bytes Reiner Steib
2009-11-27  0:16                                         ` Ulrich Mueller
2009-11-27  1:41                                         ` Stefan Monnier
2009-11-27  4:14                                         ` Stephen J. Turnbull
2009-11-25  5:59                                     ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stephen J. Turnbull
2009-11-25  8:16                                       ` Kenichi Handa
2009-11-29 16:01                                     ` Richard Stallman
2009-11-29 16:31                                       ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-29 22:01                                         ` Juri Linkov
2009-11-30  6:05                                           ` tomas
2009-11-30 12:09                                             ` Andreas Schwab
2009-11-30 12:39                                               ` tomas
2009-11-29 22:19                                       ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Kim F. Storm
2009-11-30  1:42                                         ` Stephen J. Turnbull
2009-11-24  1:28                               ` Displaying bytes Stefan Monnier
2009-11-24 22:47                                 ` Richard Stallman
2009-11-25  2:18                                   ` Stefan Monnier
2009-11-26  6:24                                     ` Richard Stallman
2009-11-26  8:59                                       ` David Kastrup
2009-11-26 14:57                                       ` Stefan Monnier
2009-11-26 16:28                                         ` Lennart Borgman
2009-11-27  6:36                                         ` Richard Stallman
2009-11-24 22:47                                 ` Richard Stallman
2009-11-20  8:48                       ` Fwd: Re: Inadequate documentation of silly characters on screen Eli Zaretskii
2009-11-19 19:52                   ` Eli Zaretskii
2009-11-19 20:53                     ` Alan Mackenzie
2009-11-19 22:16                       ` David Kastrup
2009-11-20  8:55                         ` Eli Zaretskii
2009-11-19 20:05                   ` Stefan Monnier
2009-11-19 21:27                     ` Alan Mackenzie
2009-11-19 19:43               ` Eli Zaretskii
2009-11-19 21:57                 ` Alan Mackenzie
2009-11-19 23:10                   ` Stefan Monnier
2009-11-19 20:02               ` Stefan Monnier
2009-11-19 14:08     ` Stefan Monnier
2009-11-19 14:50       ` Jason Rumney
2009-11-19 15:27         ` Stefan Monnier
2009-11-19 23:12           ` Miles Bader
2009-11-20  2:16             ` Stefan Monnier
2009-11-20  3:37             ` Stephen J. Turnbull
2009-11-20  4:30               ` Stefan Monnier
2009-11-20  7:18                 ` Stephen J. Turnbull
2009-11-20 14:16                   ` Stefan Monnier
2009-11-21  4:13                     ` Stephen J. Turnbull
2009-11-21  5:24                       ` Stefan Monnier
2009-11-21  6:42                         ` Stephen J. Turnbull
2009-11-21  6:49                           ` Stefan Monnier
2009-11-21  7:27                             ` Stephen J. Turnbull
2009-11-23  1:58                               ` Stefan Monnier
2009-11-21 12:33                           ` David Kastrup
2009-11-21 13:55                             ` Stephen J. Turnbull
2009-11-21 14:36                               ` David Kastrup
2009-11-21 17:53                                 ` Stephen J. Turnbull
2009-11-21 23:30                                   ` David Kastrup
2009-11-22  1:27                                     ` Sebastian Rose
2009-11-22  8:06                                       ` David Kastrup
2009-11-22 23:52                                         ` Sebastian Rose
2009-11-19 17:08       ` Fwd: " Alan Mackenzie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47325.130.55.118.19.1258658705.squirrel@webmail.lanl.gov \
    --to=herring@lanl.gov \
    --cc=acm@muc.de \
    --cc=dak@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.