Display of characters #xa0 and #xad in unibyte buffers

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Display of characters #xa0 and #xad in unibyte buffers
@ 2009-09-24 15:06 Ulrich Mueller
  2009-09-25  8:31 ` Eli Zaretskii
  2009-09-25  9:44 ` Andreas Schwab
  0 siblings, 2 replies; 19+ messages in thread
From: Ulrich Mueller @ 2009-09-24 15:06 UTC (permalink / raw)
  To: emacs-devel

Is the following a feature or a bug?

    $ emacs -Q
    M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET

The characters are displayed as "_-" (approximately).

Shouldn't they be displayed as "\240\255", considering that these are
raw bytes with no specific meaning?

Ulrich




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-24 15:06 Display of characters #xa0 and #xad in unibyte buffers Ulrich Mueller
@ 2009-09-25  8:31 ` Eli Zaretskii
  2009-09-25  9:00   ` Ulrich Mueller
                     ` (2 more replies)
  2009-09-25  9:44 ` Andreas Schwab
  1 sibling, 3 replies; 19+ messages in thread
From: Eli Zaretskii @ 2009-09-25  8:31 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel

> Date: Thu, 24 Sep 2009 17:06:24 +0200
> From: Ulrich Mueller <ulm@gentoo.org>
> 
> Is the following a feature or a bug?
> 
>     $ emacs -Q
>     M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> 
> The characters are displayed as "_-" (approximately).
> 
> Shouldn't they be displayed as "\240\255", considering that these are
> raw bytes with no specific meaning?

There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
interpreted as a character, and shown as such.  This is the main
feature of unibyte buffers; otherwise, who'd want them?




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-25  8:31 ` Eli Zaretskii
@ 2009-09-25  9:00   ` Ulrich Mueller
  2009-09-25  9:14     ` Eli Zaretskii
  2009-09-25  9:38   ` Stephen J. Turnbull
  2009-09-25 14:09   ` Stefan Monnier
  2 siblings, 1 reply; 19+ messages in thread
From: Ulrich Mueller @ 2009-09-25  9:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>>>>> On Fri, 25 Sep 2009, Eli Zaretskii wrote:

>> $ emacs -Q
>> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
>> 
>> The characters are displayed as "_-" (approximately).
>> 
>> Shouldn't they be displayed as "\240\255", considering that these are
>> raw bytes with no specific meaning?

> There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
> interpreted as a character, and shown as such.  This is the main
> feature of unibyte buffers; otherwise, who'd want them?

Different question then: Why are all other characters in the range from
#x80 to #xff shown in the backslash-escaped notation, #xa0 and #xad
being the only exceptions?

Ulrich




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-25  9:00   ` Ulrich Mueller
@ 2009-09-25  9:14     ` Eli Zaretskii
  2009-09-28  1:10       ` Kenichi Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2009-09-25  9:14 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel

> Date: Fri, 25 Sep 2009 11:00:03 +0200
> Cc: emacs-devel@gnu.org
> From: Ulrich Mueller <ulm@gentoo.org>
> 
> >>>>> On Fri, 25 Sep 2009, Eli Zaretskii wrote:
> 
> >> $ emacs -Q
> >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> >> 
> >> The characters are displayed as "_-" (approximately).
> >> 
> >> Shouldn't they be displayed as "\240\255", considering that these are
> >> raw bytes with no specific meaning?
> 
> > There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
> > interpreted as a character, and shown as such.  This is the main
> > feature of unibyte buffers; otherwise, who'd want them?
> 
> Different question then: Why are all other characters in the range from
> #x80 to #xff shown in the backslash-escaped notation, #xa0 and #xad
> being the only exceptions?

I don't know, but it sounds like a bug.  Or maybe what I wrote above
is just my pipe dream, not the reality.

Handa-san, can you please comment on this?




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-25  8:31 ` Eli Zaretskii
  2009-09-25  9:00   ` Ulrich Mueller
@ 2009-09-25  9:38   ` Stephen J. Turnbull
  2009-09-25 14:09   ` Stefan Monnier
  2 siblings, 0 replies; 19+ messages in thread
From: Stephen J. Turnbull @ 2009-09-25  9:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ulrich Mueller, emacs-devel

Eli Zaretskii writes:

 > There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
 > interpreted as a character, and shown as such.  This is the main
 > feature of unibyte buffers; otherwise, who'd want them?

Somebody working with a wire protocol who wants to avoid the overhead
of a multibyte representation and the fragility of writing code to use
Mule coding systems to implement the identity?





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-24 15:06 Display of characters #xa0 and #xad in unibyte buffers Ulrich Mueller
  2009-09-25  8:31 ` Eli Zaretskii
@ 2009-09-25  9:44 ` Andreas Schwab
  1 sibling, 0 replies; 19+ messages in thread
From: Andreas Schwab @ 2009-09-25  9:44 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel

Ulrich Mueller <ulm@gentoo.org> writes:

> Is the following a feature or a bug?
>
>     $ emacs -Q
>     M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
>
> The characters are displayed as "_-" (approximately).
>
> Shouldn't they be displayed as "\240\255", considering that these are
> raw bytes with no specific meaning?

See nobreak-char-display.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-25  8:31 ` Eli Zaretskii
  2009-09-25  9:00   ` Ulrich Mueller
  2009-09-25  9:38   ` Stephen J. Turnbull
@ 2009-09-25 14:09   ` Stefan Monnier
  2009-09-26  8:26     ` Ulrich Mueller
  2 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2009-09-25 14:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ulrich Mueller, emacs-devel

>> Is the following a feature or a bug?
>> 
>> $ emacs -Q
>> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
>> 
>> The characters are displayed as "_-" (approximately).
>> 
>> Shouldn't they be displayed as "\240\255", considering that these are
>> raw bytes with no specific meaning?
> There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
> interpreted as a character, and shown as such.

No, this used to be the case, but not any more.  This is nowadays only
done for the deprecated unibyte sessions (well, it's really controlled
by unibyte-display-via-language-environment IIUC).

It looks like a bug in the implementation of nobreak-char-display
(thanks Andreas).

The patch below should help.


        Stefan


--- xdisp.c.~1.1301.~	2009-09-20 13:01:24.000000000 -0400
+++ xdisp.c	2009-09-25 10:02:08.000000000 -0400
@@ -5794,7 +5794,8 @@
 	      /* Handle non-break space in the mode where it only gets
 		 highlighting.  */
 
-	      if (EQ (Vnobreak_char_display, Qt)
+	      if ((it->multibyte_p || unibyte_display_via_language_environment)
+		  && EQ (Vnobreak_char_display, Qt)
 		  && it->c == 0xA0)
 		{
 		  /* Merge the no-break-space face into the current face.  */
@@ -5844,6 +5845,8 @@
 	      /* Handle soft hyphens in the mode where they only get
 		 highlighting.  */
 
+	      if (it->multibyte_p || unibyte_display_via_language_environment)
+		{
 	      if (EQ (Vnobreak_char_display, Qt)
 		  && it->c == 0xAD)
 		{
@@ -5864,6 +5867,7 @@
 		  ctl_len = 2;
 		  goto display_control;
 		}
+		}
 
 	      {
 		unsigned char str[MAX_MULTIBYTE_LENGTH];




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-25 14:09   ` Stefan Monnier
@ 2009-09-26  8:26     ` Ulrich Mueller
  0 siblings, 0 replies; 19+ messages in thread
From: Ulrich Mueller @ 2009-09-26  8:26 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

>>>>> On Fri, 25 Sep 2009, Stefan Monnier wrote:

> It looks like a bug in the implementation of nobreak-char-display
> (thanks Andreas).

> The patch below should help.

It fixes the problem for me, thanks.

Still strange that the two character codes are hardcoded at a low
level, while others like #x202f ("narrow no-break space") are not.

Ulrich




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-25  9:14     ` Eli Zaretskii
@ 2009-09-28  1:10       ` Kenichi Handa
  2009-09-28  6:43         ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Kenichi Handa @ 2009-09-28  1:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ulm, emacs-devel

In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > >> $ emacs -Q
> > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> > >> 
> > >> The characters are displayed as "_-" (approximately).
> > >> 
> > >> Shouldn't they be displayed as "\240\255", considering that these are
> > >> raw bytes with no specific meaning?
> > 
> > > There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
> > > interpreted as a character, and shown as such.  This is the main
> > > feature of unibyte buffers; otherwise, who'd want them?

I think the main feature of unibyte buffers is to handle
raw-bytes as is.  For those who want to see a raw-byte as a
character of their locale (language environment), we have
unibyte-display-via-language-environment.

> > Different question then: Why are all other characters in the range from
> > #x80 to #xff shown in the backslash-escaped notation, #xa0 and #xad
> > being the only exceptions?

> I don't know, but it sounds like a bug.  Or maybe what I wrote above
> is just my pipe dream, not the reality.

> Handa-san, can you please comment on this?

The code for handling nobreak-char-display in
get_next_display_element should pay attention to
unibyte-display-via-language-environment.  I've just
installed the attached change.

In article <jwv4oqri1f7.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> The patch below should help.
[...]
> --- xdisp.c.~1.1301.~	2009-09-20 13:01:24.000000000 -0400
> +++ xdisp.c	2009-09-25 10:02:08.000000000 -0400
> @@ -5794,7 +5794,8 @@
>  	      /* Handle non-break space in the mode where it only gets
>  		 highlighting.  */
 
> -	      if (EQ (Vnobreak_char_display, Qt)
> +	      if ((it->multibyte_p || unibyte_display_via_language_environment)
> +		  && EQ (Vnobreak_char_display, Qt)
>  		  && it->c == 0xA0)

If unibyte_display_via_language_environment is nonzero, we
must compare DECODE_CHAR (unibyte, it->c) against 0xA0.
Otherwise, for instance in KOI8 locale, we wrongly display
some box-drawing character in KOI8 charset.

---
Kenichi Handa
handa@m17n.org

Index: xdisp.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/xdisp.c,v
retrieving revision 1.1304
diff -u -r1.1304 xdisp.c
--- xdisp.c	27 Sep 2009 19:11:13 -0000	1.1304
+++ xdisp.c	28 Sep 2009 01:03:40 -0000
@@ -5684,6 +5684,10 @@
 	{
 	  Lisp_Object dv;
 	  struct charset *unibyte = CHARSET_FROM_ID (charset_unibyte);
+	  int nbsp_or_shy = 0; /* 1:NO-BREAK SPACE, 2:SOFT HYPHEN, 0:ELSE */
+#define IS_NBSP (nbsp_or_shy == 1)
+#define IS_SHY (nbsp_or_shy == 2)
+	  int decoded = it->c;
 
 	  if (it->dp
 	      && (dv = DISP_CHAR_VECTOR (it->dp, it->c),
@@ -5712,6 +5716,18 @@
 	      goto get_next;
 	    }
 
+	  if (unibyte_display_via_language_environment
+	      && it->c >= 0x80)
+	    decoded = DECODE_CHAR (unibyte, it->c);
+
+	  if (it->c >= 0x80 && ! NILP (Vnobreak_char_display))
+	    {
+	      if (it->multibyte_p)
+		nbsp_or_shy = it->c == 0xA0 ? 1 : it->c == 0xAD ? 2 : 0;
+	      else if (unibyte_display_via_language_environment)
+		nbsp_or_shy = decoded == 0xA0 ? 1 : decoded == 0xAD ? 2 : 0;
+	    }
+
 	  /* Translate control characters into `\003' or `^C' form.
 	     Control characters coming from a display table entry are
 	     currently not translated because we use IT->dpvec to hold
@@ -5724,21 +5740,19 @@
 	     If it->multibyte_p is zero, eight-bit characters that
 	     don't have corresponding multibyte char code are also
 	     translated to octal form.  */
-	  else if ((it->c < ' '
-		    ? (it->area != TEXT_AREA
-		       /* In mode line, treat \n, \t like other crl chars.  */
-		       || (it->c != '\t'
-			   && it->glyph_row
-			   && (it->glyph_row->mode_line_p || it->avoid_cursor_p))
-		       || (it->c != '\n' && it->c != '\t'))
-		    : (it->multibyte_p
-		       ? (!CHAR_PRINTABLE_P (it->c)
-			  || (!NILP (Vnobreak_char_display)
-			      && (it->c == 0xA0 /* NO-BREAK SPACE */
-				  || it->c == 0xAD /* SOFT HYPHEN */)))
-		       : (it->c >= 127
-			  && (! unibyte_display_via_language_environment
-			      || (DECODE_CHAR (unibyte, it->c) <= 0xA0))))))
+	  if ((it->c < ' '
+	       ? (it->area != TEXT_AREA
+		  /* In mode line, treat \n, \t like other crl chars.  */
+		  || (it->c != '\t'
+		      && it->glyph_row
+		      && (it->glyph_row->mode_line_p || it->avoid_cursor_p))
+		  || (it->c != '\n' && it->c != '\t'))
+	       : (nbsp_or_shy
+		  || (it->multibyte_p
+		      ? ! CHAR_PRINTABLE_P (it->c)
+		      : (! unibyte_display_via_language_environment
+			 ? it->c >= 0x80
+			 : (decoded >= 0x80 && decoded < 0xA0))))))
 	    {
 	      /* IT->c is a control character which must be displayed
 		 either as '\003' or as `^C' where the '\\' and '^'
@@ -5794,7 +5808,7 @@
 		 highlighting.  */
 
 	      if (EQ (Vnobreak_char_display, Qt)
-		  && it->c == 0xA0)
+		  && IS_NBSP)
 		{
 		  /* Merge the no-break-space face into the current face.  */
 		  face_id = merge_faces (it->f, Qnobreak_space, 0,
@@ -5844,7 +5858,7 @@
 		 highlighting.  */
 
 	      if (EQ (Vnobreak_char_display, Qt)
-		  && it->c == 0xAD)
+		  && IS_SHY)
 		{
 		  it->c = '-';
 		  XSETINT (it->ctl_chars[0], '-');
@@ -5855,10 +5869,10 @@
 	      /* Handle non-break space and soft hyphen
 		 with the escape glyph.  */
 
-	      if (it->c == 0xA0 || it->c == 0xAD)
+	      if (nbsp_or_shy)
 		{
 		  XSETINT (it->ctl_chars[0], escape_glyph);
-		  it->c = (it->c == 0xA0 ? ' ' : '-');
+		  it->c = (IS_NBSP ? ' ' : '-');
 		  XSETINT (it->ctl_chars[1], it->c);
 		  ctl_len = 2;
 		  goto display_control;




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-28  1:10       ` Kenichi Handa
@ 2009-09-28  6:43         ` Eli Zaretskii
  2009-09-28 11:24           ` Kenichi Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2009-09-28  6:43 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: ulm, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: ulm@gentoo.org, emacs-devel@gnu.org
> Date: Mon, 28 Sep 2009 10:10:32 +0900
> 
> In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > >> $ emacs -Q
> > > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> > > >> 
> > > >> The characters are displayed as "_-" (approximately).
> > > >> 
> > > >> Shouldn't they be displayed as "\240\255", considering that these are
> > > >> raw bytes with no specific meaning?
> > > 
> > > > There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
> > > > interpreted as a character, and shown as such.  This is the main
> > > > feature of unibyte buffers; otherwise, who'd want them?
> 
> I think the main feature of unibyte buffers is to handle
> raw-bytes as is.

How do we even know that they are raw bytes, and how do we
distinguish, in a unibyte buffer, ü from \374, say?  Just because they
were inserted by C-q NNN or by some other mechanism?

> For those who want to see a raw-byte as a character of their locale
> (language environment), we have
> unibyte-display-via-language-environment.

I thought bytes in unibyte buffers are always interpreted as
characters of the locale, as Emacs 19 did.  Are you saying that they
are by default always interpreted as raw bytes, unless
unibyte-display-via-language-environment is set?





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-28  6:43         ` Eli Zaretskii
@ 2009-09-28 11:24           ` Kenichi Handa
  2009-09-28 14:10             ` Eli Zaretskii
  2009-09-28 22:38             ` Stefan Monnier
  0 siblings, 2 replies; 19+ messages in thread
From: Kenichi Handa @ 2009-09-28 11:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ulm, emacs-devel

In article <831vlrsh6q.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > 
> > > > >> $ emacs -Q
> > > > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> > > > >> 
> > > > >> The characters are displayed as "_-" (approximately).
> > > > >> 
> > > > >> Shouldn't they be displayed as "\240\255", considering that these are
> > > > >> raw bytes with no specific meaning?
> > > > 
> > > > > There are no ``raw bytes'' in a unibyte buffer.  Every byte there is
> > > > > interpreted as a character, and shown as such.  This is the main
> > > > > feature of unibyte buffers; otherwise, who'd want them?
> > 
> > I think the main feature of unibyte buffers is to handle
> > raw-bytes as is.

> How do we even know that they are raw bytes, and how do we
> distinguish, in a unibyte buffer, ü from \374, say?  Just because they
> were inserted by C-q NNN or by some other mechanism?

They are not distinguished.

> > For those who want to see a raw-byte as a character of their locale
> > (language environment), we have
> > unibyte-display-via-language-environment.

> I thought bytes in unibyte buffers are always interpreted as
> characters of the locale, as Emacs 19 did.

Not really because we don't perform automatic
unibyte<->multibyte decoding/encoding anymore.  So, if we
cut #xC0 in a unibyte buffer and yank it in a multibyte
buffer, eight-bit character is inserted instead of U+00C0.

> Are you saying that they
> are by default always interpreted as raw bytes, unless
> unibyte-display-via-language-environment is set?

unibyte-display-via-language-environment just controls how
to display them, and it doesn't affect how they are
interpreted.

Actually, the interpretation of characters in a unnibyte
buffer is still inconsistent.  For instance,
skip-syntax-forward treats #x80..#xFF as characters
U+0080..U+00FF.  Thus #xC0 is a word-constituent and #xD7 is
a symbol.  We must fix it somehow.  But, how?  We currently
don't have a suitable syntax code for eight-bit chars.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-28 11:24           ` Kenichi Handa
@ 2009-09-28 14:10             ` Eli Zaretskii
  2009-09-28 22:38             ` Stefan Monnier
  1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2009-09-28 14:10 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: ulm, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: ulm@gentoo.org, emacs-devel@gnu.org
> Date: Mon, 28 Sep 2009 20:24:24 +0900
> 
> > How do we even know that they are raw bytes, and how do we
> > distinguish, in a unibyte buffer, ü from \374, say?  Just because they
> > were inserted by C-q NNN or by some other mechanism?
> 
> They are not distinguished.

Thanks for getting me straight with the facts.

> We currently don't have a suitable syntax code for eight-bit chars.

Maybe we should add such a code.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-28 11:24           ` Kenichi Handa
  2009-09-28 14:10             ` Eli Zaretskii
@ 2009-09-28 22:38             ` Stefan Monnier
  2009-09-29  1:05               ` Kenichi Handa
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2009-09-28 22:38 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Eli Zaretskii, ulm, emacs-devel

> Actually, the interpretation of characters in a unnibyte
> buffer is still inconsistent.  For instance,
> skip-syntax-forward treats #x80..#xFF as characters
> U+0080..U+00FF.  Thus #xC0 is a word-constituent and #xD7 is
> a symbol.  We must fix it somehow.  But, how?  We currently
> don't have a suitable syntax code for eight-bit chars.

While I agree that treating bytes as if they were latin-1 chars isn't
ideal, I don't think the alternative (treat every byte >=128 as
punctuation, I guess) would really be much better.  So there's not much
point "fixing" this "problem".


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-28 22:38             ` Stefan Monnier
@ 2009-09-29  1:05               ` Kenichi Handa
  2009-09-29  1:35                 ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Kenichi Handa @ 2009-09-29  1:05 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eliz, ulm, emacs-devel

In article <jwvzl8e8zop.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > Actually, the interpretation of characters in a unnibyte
> > buffer is still inconsistent.  For instance,
> > skip-syntax-forward treats #x80..#xFF as characters
> > U+0080..U+00FF.  Thus #xC0 is a word-constituent and #xD7 is
> > a symbol.  We must fix it somehow.  But, how?  We currently
> > don't have a suitable syntax code for eight-bit chars.

> While I agree that treating bytes as if they were latin-1 chars isn't
> ideal, I don't think the alternative (treat every byte >=128 as
> punctuation, I guess) would really be much better.  So there's not much
> point "fixing" this "problem".

Perhaps we don't have to make much effort on this matter,
but if one is, for instance, in Greek lang. env. and working
in a unibyte buffer while setting
unibyte-display-via-language-environment to t, he sees that
(skip-syntax-forward "w") stops at #xD7 which is displayed
as "GREEK CAPITAL LETTER CHI (U+3A7)" and will think it's a
bug.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-29  1:05               ` Kenichi Handa
@ 2009-09-29  1:35                 ` Stefan Monnier
  2009-09-29  2:37                   ` Kenichi Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2009-09-29  1:35 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eliz, ulm, emacs-devel

> Perhaps we don't have to make much effort on this matter,
> but if one is, for instance, in Greek lang. env. and working
> in a unibyte buffer while setting
> unibyte-display-via-language-environment to t, he sees that
> (skip-syntax-forward "w") stops at #xD7 which is displayed
> as "GREEK CAPITAL LETTER CHI (U+3A7)" and will think it's a
> bug.

I think in such a case, the bug is to use
unibyte-display-via-language-environment and expect it to affect more
than the display.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-29  1:35                 ` Stefan Monnier
@ 2009-09-29  2:37                   ` Kenichi Handa
  2009-09-29  3:15                     ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Kenichi Handa @ 2009-09-29  2:37 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eliz, ulm, emacs-devel

In article <jwvr5tq7cte.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I think in such a case, the bug is to use
> unibyte-display-via-language-environment and expect it to affect more
> than the display.

Then, how about adding this statement in the docstring (or
to info)?

"It doesn't affect how each character is interpreted."

Though the term "interpreted" is so vague that people won't
understand what it really means.

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-29  2:37                   ` Kenichi Handa
@ 2009-09-29  3:15                     ` Stefan Monnier
  2009-09-29  7:52                       ` Kenichi Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2009-09-29  3:15 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eliz, ulm, emacs-devel

>> I think in such a case, the bug is to use
>> unibyte-display-via-language-environment and expect it to affect more
>> than the display.

> Then, how about adding this statement in the docstring (or to info)?

> "It doesn't affect how each character is interpreted."

Maybe we could try to make it more obvious that it only affects display
and nothing else.  Then again, we should also make it more clear that
you should stay away from this feature.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
  2009-09-29  3:15                     ` Stefan Monnier
@ 2009-09-29  7:52                       ` Kenichi Handa
       [not found]                         ` <831vljpm0v.fsf@gnu.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Kenichi Handa @ 2009-09-29  7:52 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eliz, ulm, emacs-devel

In article <jwvske65tow.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> I think in such a case, the bug is to use
>>> unibyte-display-via-language-environment and expect it to affect more
>>> than the display.

> > Then, how about adding this statement in the docstring (or to info)?

> > "It doesn't affect how each character is interpreted."

> Maybe we could try to make it more obvious that it only affects display
> and nothing else.  Then again, we should also make it more clear that
> you should stay away from this feature.

I'm not good at writing that kind of document.  Could
someone please install a proper docstring/info-text?

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Display of characters #xa0 and #xad in unibyte buffers
       [not found]                         ` <831vljpm0v.fsf@gnu.org>
@ 2009-10-05  0:49                           ` Kenichi Handa
  0 siblings, 0 replies; 19+ messages in thread
From: Kenichi Handa @ 2009-10-05  0:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ulm, monnier, emacs-devel

In article <831vljpm0v.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > > Maybe we could try to make it more obvious that it only affects display
> > > and nothing else.  Then again, we should also make it more clear that
> > > you should stay away from this feature.
> > 
> > I'm not good at writing that kind of document.  Could
> > someone please install a proper docstring/info-text?

> Done.

Thank you!!

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-10-05  0:49 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-24 15:06 Display of characters #xa0 and #xad in unibyte buffers Ulrich Mueller
2009-09-25  8:31 ` Eli Zaretskii
2009-09-25  9:00   ` Ulrich Mueller
2009-09-25  9:14     ` Eli Zaretskii
2009-09-28  1:10       ` Kenichi Handa
2009-09-28  6:43         ` Eli Zaretskii
2009-09-28 11:24           ` Kenichi Handa
2009-09-28 14:10             ` Eli Zaretskii
2009-09-28 22:38             ` Stefan Monnier
2009-09-29  1:05               ` Kenichi Handa
2009-09-29  1:35                 ` Stefan Monnier
2009-09-29  2:37                   ` Kenichi Handa
2009-09-29  3:15                     ` Stefan Monnier
2009-09-29  7:52                       ` Kenichi Handa
     [not found]                         ` <831vljpm0v.fsf@gnu.org>
2009-10-05  0:49                           ` Kenichi Handa
2009-09-25  9:38   ` Stephen J. Turnbull
2009-09-25 14:09   ` Stefan Monnier
2009-09-26  8:26     ` Ulrich Mueller
2009-09-25  9:44 ` Andreas Schwab

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).