From: Kenichi Handa <handa@m17n.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: ulm@gentoo.org, emacs-devel@gnu.org
Subject: Re: Display of characters #xa0 and #xad in unibyte buffers
Date: Mon, 28 Sep 2009 20:24:24 +0900 [thread overview]
Message-ID: <tl78wfzl3br.fsf@m17n.org> (raw)
In-Reply-To: <831vlrsh6q.fsf@gnu.org> (message from Eli Zaretskii on Mon, 28 Sep 2009 08:43:09 +0200)
In article <831vlrsh6q.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> >
> > > > >> $ emacs -Q
> > > > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> > > > >>
> > > > >> The characters are displayed as "_-" (approximately).
> > > > >>
> > > > >> Shouldn't they be displayed as "\240\255", considering that these are
> > > > >> raw bytes with no specific meaning?
> > > >
> > > > > There are no ``raw bytes'' in a unibyte buffer. Every byte there is
> > > > > interpreted as a character, and shown as such. This is the main
> > > > > feature of unibyte buffers; otherwise, who'd want them?
> >
> > I think the main feature of unibyte buffers is to handle
> > raw-bytes as is.
> How do we even know that they are raw bytes, and how do we
> distinguish, in a unibyte buffer, ü from \374, say? Just because they
> were inserted by C-q NNN or by some other mechanism?
They are not distinguished.
> > For those who want to see a raw-byte as a character of their locale
> > (language environment), we have
> > unibyte-display-via-language-environment.
> I thought bytes in unibyte buffers are always interpreted as
> characters of the locale, as Emacs 19 did.
Not really because we don't perform automatic
unibyte<->multibyte decoding/encoding anymore. So, if we
cut #xC0 in a unibyte buffer and yank it in a multibyte
buffer, eight-bit character is inserted instead of U+00C0.
> Are you saying that they
> are by default always interpreted as raw bytes, unless
> unibyte-display-via-language-environment is set?
unibyte-display-via-language-environment just controls how
to display them, and it doesn't affect how they are
interpreted.
Actually, the interpretation of characters in a unnibyte
buffer is still inconsistent. For instance,
skip-syntax-forward treats #x80..#xFF as characters
U+0080..U+00FF. Thus #xC0 is a word-constituent and #xD7 is
a symbol. We must fix it somehow. But, how? We currently
don't have a suitable syntax code for eight-bit chars.
---
Kenichi Handa
handa@m17n.org
next prev parent reply other threads:[~2009-09-28 11:24 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-24 15:06 Display of characters #xa0 and #xad in unibyte buffers Ulrich Mueller
2009-09-25 8:31 ` Eli Zaretskii
2009-09-25 9:00 ` Ulrich Mueller
2009-09-25 9:14 ` Eli Zaretskii
2009-09-28 1:10 ` Kenichi Handa
2009-09-28 6:43 ` Eli Zaretskii
2009-09-28 11:24 ` Kenichi Handa [this message]
2009-09-28 14:10 ` Eli Zaretskii
2009-09-28 22:38 ` Stefan Monnier
2009-09-29 1:05 ` Kenichi Handa
2009-09-29 1:35 ` Stefan Monnier
2009-09-29 2:37 ` Kenichi Handa
2009-09-29 3:15 ` Stefan Monnier
2009-09-29 7:52 ` Kenichi Handa
[not found] ` <831vljpm0v.fsf@gnu.org>
2009-10-05 0:49 ` Kenichi Handa
2009-09-25 9:38 ` Stephen J. Turnbull
2009-09-25 14:09 ` Stefan Monnier
2009-09-26 8:26 ` Ulrich Mueller
2009-09-25 9:44 ` Andreas Schwab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tl78wfzl3br.fsf@m17n.org \
--to=handa@m17n.org \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=ulm@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).