From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Display of characters #xa0 and #xad in unibyte buffers Date: Mon, 28 Sep 2009 20:24:24 +0900 Message-ID: References: <19131.35568.835627.216245@a1i15.kph.uni-mainz.de> <833a6bv30o.fsf@gnu.org> <19132.34451.565451.857731@a1ihome1.kph.uni-mainz.de> <83ws3ntmgv.fsf@gnu.org> <831vlrsh6q.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1254137096 22026 80.91.229.12 (28 Sep 2009 11:24:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 28 Sep 2009 11:24:56 +0000 (UTC) Cc: ulm@gentoo.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Sep 28 13:24:49 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MsELY-0006LO-V6 for ged-emacs-devel@m.gmane.org; Mon, 28 Sep 2009 13:24:49 +0200 Original-Received: from localhost ([127.0.0.1]:42989 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MsELW-0004iE-2E for ged-emacs-devel@m.gmane.org; Mon, 28 Sep 2009 07:24:46 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MsELQ-0004hp-Qp for emacs-devel@gnu.org; Mon, 28 Sep 2009 07:24:40 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MsELL-0004gq-1x for emacs-devel@gnu.org; Mon, 28 Sep 2009 07:24:39 -0400 Original-Received: from [199.232.76.173] (port=56787 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MsELK-0004gn-Uq for emacs-devel@gnu.org; Mon, 28 Sep 2009 07:24:34 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:35322) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MsELH-00007V-ER; Mon, 28 Sep 2009 07:24:32 -0400 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id n8SBOOfc024757; Mon, 28 Sep 2009 20:24:24 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp4.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id n8SBOOEZ012833; Mon, 28 Sep 2009 20:24:24 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp4.aist.go.jp with ESMTP id n8SBOO4Y013595; Mon, 28 Sep 2009 20:24:24 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1MsELA-0006w9-56; Mon, 28 Sep 2009 20:24:24 +0900 In-Reply-To: <831vlrsh6q.fsf@gnu.org> (message from Eli Zaretskii on Mon, 28 Sep 2009 08:43:09 +0200) X-detected-operating-system: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:115718 Archived-At: In article <831vlrsh6q.fsf@gnu.org>, Eli Zaretskii writes: > > In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii write= s: > >=20 > > > > >> $ emacs -Q > > > > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 R= ET > > > > >>=20 > > > > >> The characters are displayed as "_-" (approximately). > > > > >>=20 > > > > >> Shouldn't they be displayed as "\240\255", considering that thes= e are > > > > >> raw bytes with no specific meaning? > > > >=20 > > > > > There are no ``raw bytes'' in a unibyte buffer. Every byte there= is > > > > > interpreted as a character, and shown as such. This is the main > > > > > feature of unibyte buffers; otherwise, who'd want them? > >=20 > > I think the main feature of unibyte buffers is to handle > > raw-bytes as is. > How do we even know that they are raw bytes, and how do we > distinguish, in a unibyte buffer, =FC from \374, say? Just because they > were inserted by C-q NNN or by some other mechanism? They are not distinguished. > > For those who want to see a raw-byte as a character of their locale > > (language environment), we have > > unibyte-display-via-language-environment. > I thought bytes in unibyte buffers are always interpreted as > characters of the locale, as Emacs 19 did. Not really because we don't perform automatic unibyte<->multibyte decoding/encoding anymore. So, if we cut #xC0 in a unibyte buffer and yank it in a multibyte buffer, eight-bit character is inserted instead of U+00C0. > Are you saying that they > are by default always interpreted as raw bytes, unless > unibyte-display-via-language-environment is set? unibyte-display-via-language-environment just controls how to display them, and it doesn't affect how they are interpreted. Actually, the interpretation of characters in a unnibyte buffer is still inconsistent. For instance, skip-syntax-forward treats #x80..#xFF as characters U+0080..U+00FF. Thus #xC0 is a word-constituent and #xD7 is a symbol. We must fix it somehow. But, how? We currently don't have a suitable syntax code for eight-bit chars. --- Kenichi Handa handa@m17n.org