From: David Kastrup <dak@gnu.org>
To: emacs-devel@gnu.org
Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Date: Thu, 19 Nov 2009 17:55:10 +0100 [thread overview]
Message-ID: <87aayiihe9.fsf@lola.goethe.zz> (raw)
In-Reply-To: 20091119155848.GB1314@muc.de
Alan Mackenzie <acm@muc.de> writes:
> On Thu, Nov 19, 2009 at 10:30:18AM -0500, Stefan Monnier wrote:
>> > The actual character in the string is ñ (#x3f).
>
>> No: the string does not contain any characters, only bytes, because
>> it's a unibyte string.
>
> I'm thinking from the lisp viewpoint. The string is a data structure
> which contains characters. I really don't want to have to think about
> the difference between "chars" and "bytes" when I'm hacking lisp. If
> I do, then the abstraction "string" is broken.
>
>> So it contains the byte 241, not the character ñ.
>
> That is then a bug. I wrote "(aset nl 0 ?ñ)", not "(aset nl 0 241)".
Huh? ?ñ is the Emacs code point of ñ. Which is pretty much identical
to the Unicode code point in Emacs 23.
>> The byte 241 can be inserted in multibyte strings and buffers because
>> it is also a char of code 4194289 (which gets displayed as \361).
>
> Hang on a mo'! How can the byte 241 "be" a char of code 4194289?
> This is some strange usage of the word "be" that I wasn't previously
> aware of. ;-)
Emacs encodes most of its things in utf-8. A Unicode code point is an
integer. You can encode it in different encodings, resulting in
different byte streams. Inside of a byte stream encoded in utf-8, the
isolated byte 241 does not correspond to a Unicode character. It is not
valid utf-8. When Emacs reads a file supposedly in utf-8, it wants to
represent _all_ possible byte streams in order to be able to save
unchanged data unmolested.
So it encodes the entity "illegal isolated byte 241 in an utf-8
document" with the character code 4194289 which has a representation in
Emacs' internal variant of utf-8, but is outside of the range of
Unicode.
> At this point, would you please just agree with me that when I do
>
> (setq nl "\n")
> (aset nl 0 ?ñ)
> (insert nl)
>
> , what should appear on the screen should be "ñ", NOT "\361"? Thanks!
You assume that ?ñ is a character. But in Emacs, it is an integer, a
Unicode code point in Emacs 23. As long as there is something like a
unibyte string, there is no way to distinguish the character 241 and the
byte 241 except when Emacs is told explicitly.
Because Emacs has no separate "character" data type.
--
David Kastrup
next prev parent reply other threads:[~2009-11-19 16:55 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-18 19:12 [acm@muc.de: Re: Inadequate documentation of silly characters on screen.] Alan Mackenzie
2009-11-19 1:27 ` Fwd: Re: Inadequate documentation of silly characters on screen Stefan Monnier
2009-11-19 8:20 ` Alan Mackenzie
2009-11-19 8:50 ` Miles Bader
2009-11-19 10:16 ` Fwd: " Andreas Schwab
2009-11-19 12:21 ` Alan Mackenzie
2009-11-19 13:21 ` Jason Rumney
2009-11-19 13:35 ` Stefan Monnier
2009-11-19 14:18 ` Alan Mackenzie
2009-11-19 14:58 ` Jason Rumney
2009-11-19 15:42 ` Alan Mackenzie
2009-11-19 19:39 ` Eli Zaretskii
2009-11-19 15:30 ` Stefan Monnier
2009-11-19 15:58 ` Alan Mackenzie
2009-11-19 16:06 ` Andreas Schwab
2009-11-19 16:47 ` Aidan Kehoe
2009-11-19 17:29 ` Alan Mackenzie
2009-11-19 18:21 ` Aidan Kehoe
2009-11-20 2:43 ` Stephen J. Turnbull
2009-11-19 19:45 ` Eli Zaretskii
2009-11-19 20:07 ` Eli Zaretskii
2009-11-19 19:55 ` Stefan Monnier
2009-11-20 3:13 ` Stephen J. Turnbull
2009-11-19 16:55 ` David Kastrup [this message]
2009-11-19 18:08 ` Alan Mackenzie
2009-11-19 19:25 ` Davis Herring
2009-11-19 21:25 ` Alan Mackenzie
2009-11-19 22:31 ` David Kastrup
2009-11-21 22:52 ` Richard Stallman
2009-11-23 2:08 ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stefan Monnier
2009-11-23 20:38 ` Richard Stallman
2009-11-23 21:34 ` Per Starbäck
2009-11-24 22:47 ` Richard Stallman
2009-11-25 1:33 ` Kenichi Handa
2009-11-25 2:29 ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-25 2:50 ` Lennart Borgman
2009-11-25 6:25 ` Stephen J. Turnbull
2009-11-25 5:40 ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Ulrich Mueller
2009-11-26 22:59 ` Displaying bytes Reiner Steib
2009-11-27 0:16 ` Ulrich Mueller
2009-11-27 1:41 ` Stefan Monnier
2009-11-27 4:14 ` Stephen J. Turnbull
2009-11-25 5:59 ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stephen J. Turnbull
2009-11-25 8:16 ` Kenichi Handa
2009-11-29 16:01 ` Richard Stallman
2009-11-29 16:31 ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-29 22:01 ` Juri Linkov
2009-11-30 6:05 ` tomas
2009-11-30 12:09 ` Andreas Schwab
2009-11-30 12:39 ` tomas
2009-11-29 22:19 ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Kim F. Storm
2009-11-30 1:42 ` Stephen J. Turnbull
2009-11-24 1:28 ` Displaying bytes Stefan Monnier
2009-11-24 22:47 ` Richard Stallman
2009-11-25 2:18 ` Stefan Monnier
2009-11-26 6:24 ` Richard Stallman
2009-11-26 8:59 ` David Kastrup
2009-11-26 14:57 ` Stefan Monnier
2009-11-26 16:28 ` Lennart Borgman
2009-11-27 6:36 ` Richard Stallman
2009-11-24 22:47 ` Richard Stallman
2009-11-20 8:48 ` Fwd: Re: Inadequate documentation of silly characters on screen Eli Zaretskii
2009-11-19 19:52 ` Eli Zaretskii
2009-11-19 20:53 ` Alan Mackenzie
2009-11-19 22:16 ` David Kastrup
2009-11-20 8:55 ` Eli Zaretskii
2009-11-19 20:05 ` Stefan Monnier
2009-11-19 21:27 ` Alan Mackenzie
2009-11-19 19:43 ` Eli Zaretskii
2009-11-19 21:57 ` Alan Mackenzie
2009-11-19 23:10 ` Stefan Monnier
2009-11-19 20:02 ` Stefan Monnier
2009-11-19 14:08 ` Stefan Monnier
2009-11-19 14:50 ` Jason Rumney
2009-11-19 15:27 ` Stefan Monnier
2009-11-19 23:12 ` Miles Bader
2009-11-20 2:16 ` Stefan Monnier
2009-11-20 3:37 ` Stephen J. Turnbull
2009-11-20 4:30 ` Stefan Monnier
2009-11-20 7:18 ` Stephen J. Turnbull
2009-11-20 14:16 ` Stefan Monnier
2009-11-21 4:13 ` Stephen J. Turnbull
2009-11-21 5:24 ` Stefan Monnier
2009-11-21 6:42 ` Stephen J. Turnbull
2009-11-21 6:49 ` Stefan Monnier
2009-11-21 7:27 ` Stephen J. Turnbull
2009-11-23 1:58 ` Stefan Monnier
2009-11-21 12:33 ` David Kastrup
2009-11-21 13:55 ` Stephen J. Turnbull
2009-11-21 14:36 ` David Kastrup
2009-11-21 17:53 ` Stephen J. Turnbull
2009-11-21 23:30 ` David Kastrup
2009-11-22 1:27 ` Sebastian Rose
2009-11-22 8:06 ` David Kastrup
2009-11-22 23:52 ` Sebastian Rose
2009-11-19 17:08 ` Fwd: " Alan Mackenzie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87aayiihe9.fsf@lola.goethe.zz \
--to=dak@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.