all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: ulm@gentoo.org, emacs-devel@gnu.org
Subject: Re: Display of characters #xa0 and #xad in unibyte buffers
Date: Mon, 28 Sep 2009 10:10:32 +0900	[thread overview]
Message-ID: <tl7fxa7lvqv.fsf@m17n.org> (raw)
In-Reply-To: <83ws3ntmgv.fsf@gnu.org> (message from Eli Zaretskii on Fri, 25 Sep 2009 12:14:40 +0300)

In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > >> $ emacs -Q
> > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> > >> 
> > >> The characters are displayed as "_-" (approximately).
> > >> 
> > >> Shouldn't they be displayed as "\240\255", considering that these are
> > >> raw bytes with no specific meaning?
> > 
> > > There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
> > > interpreted as a character, and shown as such.  This is the main
> > > feature of unibyte buffers; otherwise, who'd want them?

I think the main feature of unibyte buffers is to handle
raw-bytes as is.  For those who want to see a raw-byte as a
character of their locale (language environment), we have
unibyte-display-via-language-environment.

> > Different question then: Why are all other characters in the range from
> > #x80 to #xff shown in the backslash-escaped notation, #xa0 and #xad
> > being the only exceptions?

> I don't know, but it sounds like a bug.  Or maybe what I wrote above
> is just my pipe dream, not the reality.

> Handa-san, can you please comment on this?

The code for handling nobreak-char-display in
get_next_display_element should pay attention to
unibyte-display-via-language-environment.  I've just
installed the attached change.

In article <jwv4oqri1f7.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> The patch below should help.
[...]
> --- xdisp.c.~1.1301.~	2009-09-20 13:01:24.000000000 -0400
> +++ xdisp.c	2009-09-25 10:02:08.000000000 -0400
> @@ -5794,7 +5794,8 @@
>  	      /* Handle non-break space in the mode where it only gets
>  		 highlighting.  */
 
> -	      if (EQ (Vnobreak_char_display, Qt)
> +	      if ((it->multibyte_p || unibyte_display_via_language_environment)
> +		  && EQ (Vnobreak_char_display, Qt)
>  		  && it->c == 0xA0)

If unibyte_display_via_language_environment is nonzero, we
must compare DECODE_CHAR (unibyte, it->c) against 0xA0.
Otherwise, for instance in KOI8 locale, we wrongly display
some box-drawing character in KOI8 charset.

---
Kenichi Handa
handa@m17n.org

Index: xdisp.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/xdisp.c,v
retrieving revision 1.1304
diff -u -r1.1304 xdisp.c
--- xdisp.c	27 Sep 2009 19:11:13 -0000	1.1304
+++ xdisp.c	28 Sep 2009 01:03:40 -0000
@@ -5684,6 +5684,10 @@
 	{
 	  Lisp_Object dv;
 	  struct charset *unibyte = CHARSET_FROM_ID (charset_unibyte);
+	  int nbsp_or_shy = 0; /* 1:NO-BREAK SPACE, 2:SOFT HYPHEN, 0:ELSE */
+#define IS_NBSP (nbsp_or_shy == 1)
+#define IS_SHY (nbsp_or_shy == 2)
+	  int decoded = it->c;
 
 	  if (it->dp
 	      && (dv = DISP_CHAR_VECTOR (it->dp, it->c),
@@ -5712,6 +5716,18 @@
 	      goto get_next;
 	    }
 
+	  if (unibyte_display_via_language_environment
+	      && it->c >= 0x80)
+	    decoded = DECODE_CHAR (unibyte, it->c);
+
+	  if (it->c >= 0x80 && ! NILP (Vnobreak_char_display))
+	    {
+	      if (it->multibyte_p)
+		nbsp_or_shy = it->c == 0xA0 ? 1 : it->c == 0xAD ? 2 : 0;
+	      else if (unibyte_display_via_language_environment)
+		nbsp_or_shy = decoded == 0xA0 ? 1 : decoded == 0xAD ? 2 : 0;
+	    }
+
 	  /* Translate control characters into `\003' or `^C' form.
 	     Control characters coming from a display table entry are
 	     currently not translated because we use IT->dpvec to hold
@@ -5724,21 +5740,19 @@
 	     If it->multibyte_p is zero, eight-bit characters that
 	     don't have corresponding multibyte char code are also
 	     translated to octal form.  */
-	  else if ((it->c < ' '
-		    ? (it->area != TEXT_AREA
-		       /* In mode line, treat \n, \t like other crl chars.  */
-		       || (it->c != '\t'
-			   && it->glyph_row
-			   && (it->glyph_row->mode_line_p || it->avoid_cursor_p))
-		       || (it->c != '\n' && it->c != '\t'))
-		    : (it->multibyte_p
-		       ? (!CHAR_PRINTABLE_P (it->c)
-			  || (!NILP (Vnobreak_char_display)
-			      && (it->c == 0xA0 /* NO-BREAK SPACE */
-				  || it->c == 0xAD /* SOFT HYPHEN */)))
-		       : (it->c >= 127
-			  && (! unibyte_display_via_language_environment
-			      || (DECODE_CHAR (unibyte, it->c) <= 0xA0))))))
+	  if ((it->c < ' '
+	       ? (it->area != TEXT_AREA
+		  /* In mode line, treat \n, \t like other crl chars.  */
+		  || (it->c != '\t'
+		      && it->glyph_row
+		      && (it->glyph_row->mode_line_p || it->avoid_cursor_p))
+		  || (it->c != '\n' && it->c != '\t'))
+	       : (nbsp_or_shy
+		  || (it->multibyte_p
+		      ? ! CHAR_PRINTABLE_P (it->c)
+		      : (! unibyte_display_via_language_environment
+			 ? it->c >= 0x80
+			 : (decoded >= 0x80 && decoded < 0xA0))))))
 	    {
 	      /* IT->c is a control character which must be displayed
 		 either as '\003' or as `^C' where the '\\' and '^'
@@ -5794,7 +5808,7 @@
 		 highlighting.  */
 
 	      if (EQ (Vnobreak_char_display, Qt)
-		  && it->c == 0xA0)
+		  && IS_NBSP)
 		{
 		  /* Merge the no-break-space face into the current face.  */
 		  face_id = merge_faces (it->f, Qnobreak_space, 0,
@@ -5844,7 +5858,7 @@
 		 highlighting.  */
 
 	      if (EQ (Vnobreak_char_display, Qt)
-		  && it->c == 0xAD)
+		  && IS_SHY)
 		{
 		  it->c = '-';
 		  XSETINT (it->ctl_chars[0], '-');
@@ -5855,10 +5869,10 @@
 	      /* Handle non-break space and soft hyphen
 		 with the escape glyph.  */
 
-	      if (it->c == 0xA0 || it->c == 0xAD)
+	      if (nbsp_or_shy)
 		{
 		  XSETINT (it->ctl_chars[0], escape_glyph);
-		  it->c = (it->c == 0xA0 ? ' ' : '-');
+		  it->c = (IS_NBSP ? ' ' : '-');
 		  XSETINT (it->ctl_chars[1], it->c);
 		  ctl_len = 2;
 		  goto display_control;




  reply	other threads:[~2009-09-28  1:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-24 15:06 Display of characters #xa0 and #xad in unibyte buffers Ulrich Mueller
2009-09-25  8:31 ` Eli Zaretskii
2009-09-25  9:00   ` Ulrich Mueller
2009-09-25  9:14     ` Eli Zaretskii
2009-09-28  1:10       ` Kenichi Handa [this message]
2009-09-28  6:43         ` Eli Zaretskii
2009-09-28 11:24           ` Kenichi Handa
2009-09-28 14:10             ` Eli Zaretskii
2009-09-28 22:38             ` Stefan Monnier
2009-09-29  1:05               ` Kenichi Handa
2009-09-29  1:35                 ` Stefan Monnier
2009-09-29  2:37                   ` Kenichi Handa
2009-09-29  3:15                     ` Stefan Monnier
2009-09-29  7:52                       ` Kenichi Handa
     [not found]                         ` <831vljpm0v.fsf@gnu.org>
2009-10-05  0:49                           ` Kenichi Handa
2009-09-25  9:38   ` Stephen J. Turnbull
2009-09-25 14:09   ` Stefan Monnier
2009-09-26  8:26     ` Ulrich Mueller
2009-09-25  9:44 ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tl7fxa7lvqv.fsf@m17n.org \
    --to=handa@m17n.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=ulm@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.