unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: ulm@gentoo.org, emacs-devel@gnu.org
Subject: Re: Display of characters #xa0 and #xad in unibyte buffers
Date: Mon, 28 Sep 2009 20:24:24 +0900	[thread overview]
Message-ID: <tl78wfzl3br.fsf@m17n.org> (raw)
In-Reply-To: <831vlrsh6q.fsf@gnu.org> (message from Eli Zaretskii on Mon, 28 Sep 2009 08:43:09 +0200)

In article <831vlrsh6q.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > 
> > > > >> $ emacs -Q
> > > > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> > > > >> 
> > > > >> The characters are displayed as "_-" (approximately).
> > > > >> 
> > > > >> Shouldn't they be displayed as "\240\255", considering that these are
> > > > >> raw bytes with no specific meaning?
> > > > 
> > > > > There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
> > > > > interpreted as a character, and shown as such.  This is the main
> > > > > feature of unibyte buffers; otherwise, who'd want them?
> > 
> > I think the main feature of unibyte buffers is to handle
> > raw-bytes as is.

> How do we even know that they are raw bytes, and how do we
> distinguish, in a unibyte buffer, ü from \374, say?  Just because they
> were inserted by C-q NNN or by some other mechanism?

They are not distinguished.

> > For those who want to see a raw-byte as a character of their locale
> > (language environment), we have
> > unibyte-display-via-language-environment.

> I thought bytes in unibyte buffers are always interpreted as
> characters of the locale, as Emacs 19 did.

Not really because we don't perform automatic
unibyte<->multibyte decoding/encoding anymore.  So, if we
cut #xC0 in a unibyte buffer and yank it in a multibyte
buffer, eight-bit character is inserted instead of U+00C0.

> Are you saying that they
> are by default always interpreted as raw bytes, unless
> unibyte-display-via-language-environment is set?

unibyte-display-via-language-environment just controls how
to display them, and it doesn't affect how they are
interpreted.

Actually, the interpretation of characters in a unnibyte
buffer is still inconsistent.  For instance,
skip-syntax-forward treats #x80..#xFF as characters
U+0080..U+00FF.  Thus #xC0 is a word-constituent and #xD7 is
a symbol.  We must fix it somehow.  But, how?  We currently
don't have a suitable syntax code for eight-bit chars.

---
Kenichi Handa
handa@m17n.org




  reply	other threads:[~2009-09-28 11:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-24 15:06 Display of characters #xa0 and #xad in unibyte buffers Ulrich Mueller
2009-09-25  8:31 ` Eli Zaretskii
2009-09-25  9:00   ` Ulrich Mueller
2009-09-25  9:14     ` Eli Zaretskii
2009-09-28  1:10       ` Kenichi Handa
2009-09-28  6:43         ` Eli Zaretskii
2009-09-28 11:24           ` Kenichi Handa [this message]
2009-09-28 14:10             ` Eli Zaretskii
2009-09-28 22:38             ` Stefan Monnier
2009-09-29  1:05               ` Kenichi Handa
2009-09-29  1:35                 ` Stefan Monnier
2009-09-29  2:37                   ` Kenichi Handa
2009-09-29  3:15                     ` Stefan Monnier
2009-09-29  7:52                       ` Kenichi Handa
     [not found]                         ` <831vljpm0v.fsf@gnu.org>
2009-10-05  0:49                           ` Kenichi Handa
2009-09-25  9:38   ` Stephen J. Turnbull
2009-09-25 14:09   ` Stefan Monnier
2009-09-26  8:26     ` Ulrich Mueller
2009-09-25  9:44 ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tl78wfzl3br.fsf@m17n.org \
    --to=handa@m17n.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=ulm@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).