From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Display of characters #xa0 and #xad in unibyte buffers Date: Mon, 28 Sep 2009 10:10:32 +0900 Message-ID: References: <19131.35568.835627.216245@a1i15.kph.uni-mainz.de> <833a6bv30o.fsf@gnu.org> <19132.34451.565451.857731@a1ihome1.kph.uni-mainz.de> <83ws3ntmgv.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1254100268 3811 80.91.229.12 (28 Sep 2009 01:11:08 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 28 Sep 2009 01:11:08 +0000 (UTC) Cc: ulm@gentoo.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Sep 28 03:11:01 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Ms4lX-00078n-Dt for ged-emacs-devel@m.gmane.org; Mon, 28 Sep 2009 03:10:59 +0200 Original-Received: from localhost ([127.0.0.1]:33899 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ms4lW-0003Ov-HO for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2009 21:10:58 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Ms4lQ-0003O8-64 for emacs-devel@gnu.org; Sun, 27 Sep 2009 21:10:52 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Ms4lK-0003Nn-6q for emacs-devel@gnu.org; Sun, 27 Sep 2009 21:10:50 -0400 Original-Received: from [199.232.76.173] (port=39769 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ms4lK-0003Nk-2i for emacs-devel@gnu.org; Sun, 27 Sep 2009 21:10:46 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:60134) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Ms4lF-0007rh-Ua; Sun, 27 Sep 2009 21:10:42 -0400 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id n8S1AX4L017671; Mon, 28 Sep 2009 10:10:33 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id n8S1AW4h015526; Mon, 28 Sep 2009 10:10:32 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id n8S1AWYY017011; Mon, 28 Sep 2009 10:10:32 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1Ms4l6-0005Ui-5J; Mon, 28 Sep 2009 10:10:32 +0900 In-Reply-To: <83ws3ntmgv.fsf@gnu.org> (message from Eli Zaretskii on Fri, 25 Sep 2009 12:14:40 +0300) X-detected-operating-system: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:115704 Archived-At: In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii writes: > > >> $ emacs -Q > > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET > > >> > > >> The characters are displayed as "_-" (approximately). > > >> > > >> Shouldn't they be displayed as "\240\255", considering that these are > > >> raw bytes with no specific meaning? > > > > > There are no ``raw bytes'' in a unibyte buffer. Every byte there is > > > interpreted as a character, and shown as such. This is the main > > > feature of unibyte buffers; otherwise, who'd want them? I think the main feature of unibyte buffers is to handle raw-bytes as is. For those who want to see a raw-byte as a character of their locale (language environment), we have unibyte-display-via-language-environment. > > Different question then: Why are all other characters in the range from > > #x80 to #xff shown in the backslash-escaped notation, #xa0 and #xad > > being the only exceptions? > I don't know, but it sounds like a bug. Or maybe what I wrote above > is just my pipe dream, not the reality. > Handa-san, can you please comment on this? The code for handling nobreak-char-display in get_next_display_element should pay attention to unibyte-display-via-language-environment. I've just installed the attached change. In article , Stefan Monnier writes: > The patch below should help. [...] > --- xdisp.c.~1.1301.~ 2009-09-20 13:01:24.000000000 -0400 > +++ xdisp.c 2009-09-25 10:02:08.000000000 -0400 > @@ -5794,7 +5794,8 @@ > /* Handle non-break space in the mode where it only gets > highlighting. */ > - if (EQ (Vnobreak_char_display, Qt) > + if ((it->multibyte_p || unibyte_display_via_language_environment) > + && EQ (Vnobreak_char_display, Qt) > && it->c == 0xA0) If unibyte_display_via_language_environment is nonzero, we must compare DECODE_CHAR (unibyte, it->c) against 0xA0. Otherwise, for instance in KOI8 locale, we wrongly display some box-drawing character in KOI8 charset. --- Kenichi Handa handa@m17n.org Index: xdisp.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/xdisp.c,v retrieving revision 1.1304 diff -u -r1.1304 xdisp.c --- xdisp.c 27 Sep 2009 19:11:13 -0000 1.1304 +++ xdisp.c 28 Sep 2009 01:03:40 -0000 @@ -5684,6 +5684,10 @@ { Lisp_Object dv; struct charset *unibyte = CHARSET_FROM_ID (charset_unibyte); + int nbsp_or_shy = 0; /* 1:NO-BREAK SPACE, 2:SOFT HYPHEN, 0:ELSE */ +#define IS_NBSP (nbsp_or_shy == 1) +#define IS_SHY (nbsp_or_shy == 2) + int decoded = it->c; if (it->dp && (dv = DISP_CHAR_VECTOR (it->dp, it->c), @@ -5712,6 +5716,18 @@ goto get_next; } + if (unibyte_display_via_language_environment + && it->c >= 0x80) + decoded = DECODE_CHAR (unibyte, it->c); + + if (it->c >= 0x80 && ! NILP (Vnobreak_char_display)) + { + if (it->multibyte_p) + nbsp_or_shy = it->c == 0xA0 ? 1 : it->c == 0xAD ? 2 : 0; + else if (unibyte_display_via_language_environment) + nbsp_or_shy = decoded == 0xA0 ? 1 : decoded == 0xAD ? 2 : 0; + } + /* Translate control characters into `\003' or `^C' form. Control characters coming from a display table entry are currently not translated because we use IT->dpvec to hold @@ -5724,21 +5740,19 @@ If it->multibyte_p is zero, eight-bit characters that don't have corresponding multibyte char code are also translated to octal form. */ - else if ((it->c < ' ' - ? (it->area != TEXT_AREA - /* In mode line, treat \n, \t like other crl chars. */ - || (it->c != '\t' - && it->glyph_row - && (it->glyph_row->mode_line_p || it->avoid_cursor_p)) - || (it->c != '\n' && it->c != '\t')) - : (it->multibyte_p - ? (!CHAR_PRINTABLE_P (it->c) - || (!NILP (Vnobreak_char_display) - && (it->c == 0xA0 /* NO-BREAK SPACE */ - || it->c == 0xAD /* SOFT HYPHEN */))) - : (it->c >= 127 - && (! unibyte_display_via_language_environment - || (DECODE_CHAR (unibyte, it->c) <= 0xA0)))))) + if ((it->c < ' ' + ? (it->area != TEXT_AREA + /* In mode line, treat \n, \t like other crl chars. */ + || (it->c != '\t' + && it->glyph_row + && (it->glyph_row->mode_line_p || it->avoid_cursor_p)) + || (it->c != '\n' && it->c != '\t')) + : (nbsp_or_shy + || (it->multibyte_p + ? ! CHAR_PRINTABLE_P (it->c) + : (! unibyte_display_via_language_environment + ? it->c >= 0x80 + : (decoded >= 0x80 && decoded < 0xA0)))))) { /* IT->c is a control character which must be displayed either as '\003' or as `^C' where the '\\' and '^' @@ -5794,7 +5808,7 @@ highlighting. */ if (EQ (Vnobreak_char_display, Qt) - && it->c == 0xA0) + && IS_NBSP) { /* Merge the no-break-space face into the current face. */ face_id = merge_faces (it->f, Qnobreak_space, 0, @@ -5844,7 +5858,7 @@ highlighting. */ if (EQ (Vnobreak_char_display, Qt) - && it->c == 0xAD) + && IS_SHY) { it->c = '-'; XSETINT (it->ctl_chars[0], '-'); @@ -5855,10 +5869,10 @@ /* Handle non-break space and soft hyphen with the escape glyph. */ - if (it->c == 0xA0 || it->c == 0xAD) + if (nbsp_or_shy) { XSETINT (it->ctl_chars[0], escape_glyph); - it->c = (it->c == 0xA0 ? ' ' : '-'); + it->c = (IS_NBSP ? ' ' : '-'); XSETINT (it->ctl_chars[1], it->c); ctl_len = 2; goto display_control;