* Display of characters #xa0 and #xad in unibyte buffers @ 2009-09-24 15:06 Ulrich Mueller 2009-09-25 8:31 ` Eli Zaretskii 2009-09-25 9:44 ` Andreas Schwab 0 siblings, 2 replies; 19+ messages in thread From: Ulrich Mueller @ 2009-09-24 15:06 UTC (permalink / raw) To: emacs-devel Is the following a feature or a bug? $ emacs -Q M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET The characters are displayed as "_-" (approximately). Shouldn't they be displayed as "\240\255", considering that these are raw bytes with no specific meaning? Ulrich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-24 15:06 Display of characters #xa0 and #xad in unibyte buffers Ulrich Mueller @ 2009-09-25 8:31 ` Eli Zaretskii 2009-09-25 9:00 ` Ulrich Mueller ` (2 more replies) 2009-09-25 9:44 ` Andreas Schwab 1 sibling, 3 replies; 19+ messages in thread From: Eli Zaretskii @ 2009-09-25 8:31 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel > Date: Thu, 24 Sep 2009 17:06:24 +0200 > From: Ulrich Mueller <ulm@gentoo.org> > > Is the following a feature or a bug? > > $ emacs -Q > M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET > > The characters are displayed as "_-" (approximately). > > Shouldn't they be displayed as "\240\255", considering that these are > raw bytes with no specific meaning? There are no ``raw bytes'' in a unibyte buffer. Every byte there is interpreted as a character, and shown as such. This is the main feature of unibyte buffers; otherwise, who'd want them? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-25 8:31 ` Eli Zaretskii @ 2009-09-25 9:00 ` Ulrich Mueller 2009-09-25 9:14 ` Eli Zaretskii 2009-09-25 9:38 ` Stephen J. Turnbull 2009-09-25 14:09 ` Stefan Monnier 2 siblings, 1 reply; 19+ messages in thread From: Ulrich Mueller @ 2009-09-25 9:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel >>>>> On Fri, 25 Sep 2009, Eli Zaretskii wrote: >> $ emacs -Q >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET >> >> The characters are displayed as "_-" (approximately). >> >> Shouldn't they be displayed as "\240\255", considering that these are >> raw bytes with no specific meaning? > There are no ``raw bytes'' in a unibyte buffer. Every byte there is > interpreted as a character, and shown as such. This is the main > feature of unibyte buffers; otherwise, who'd want them? Different question then: Why are all other characters in the range from #x80 to #xff shown in the backslash-escaped notation, #xa0 and #xad being the only exceptions? Ulrich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-25 9:00 ` Ulrich Mueller @ 2009-09-25 9:14 ` Eli Zaretskii 2009-09-28 1:10 ` Kenichi Handa 0 siblings, 1 reply; 19+ messages in thread From: Eli Zaretskii @ 2009-09-25 9:14 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel > Date: Fri, 25 Sep 2009 11:00:03 +0200 > Cc: emacs-devel@gnu.org > From: Ulrich Mueller <ulm@gentoo.org> > > >>>>> On Fri, 25 Sep 2009, Eli Zaretskii wrote: > > >> $ emacs -Q > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET > >> > >> The characters are displayed as "_-" (approximately). > >> > >> Shouldn't they be displayed as "\240\255", considering that these are > >> raw bytes with no specific meaning? > > > There are no ``raw bytes'' in a unibyte buffer. Every byte there is > > interpreted as a character, and shown as such. This is the main > > feature of unibyte buffers; otherwise, who'd want them? > > Different question then: Why are all other characters in the range from > #x80 to #xff shown in the backslash-escaped notation, #xa0 and #xad > being the only exceptions? I don't know, but it sounds like a bug. Or maybe what I wrote above is just my pipe dream, not the reality. Handa-san, can you please comment on this? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-25 9:14 ` Eli Zaretskii @ 2009-09-28 1:10 ` Kenichi Handa 2009-09-28 6:43 ` Eli Zaretskii 0 siblings, 1 reply; 19+ messages in thread From: Kenichi Handa @ 2009-09-28 1:10 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ulm, emacs-devel In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > >> $ emacs -Q > > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET > > >> > > >> The characters are displayed as "_-" (approximately). > > >> > > >> Shouldn't they be displayed as "\240\255", considering that these are > > >> raw bytes with no specific meaning? > > > > > There are no ``raw bytes'' in a unibyte buffer. Every byte there is > > > interpreted as a character, and shown as such. This is the main > > > feature of unibyte buffers; otherwise, who'd want them? I think the main feature of unibyte buffers is to handle raw-bytes as is. For those who want to see a raw-byte as a character of their locale (language environment), we have unibyte-display-via-language-environment. > > Different question then: Why are all other characters in the range from > > #x80 to #xff shown in the backslash-escaped notation, #xa0 and #xad > > being the only exceptions? > I don't know, but it sounds like a bug. Or maybe what I wrote above > is just my pipe dream, not the reality. > Handa-san, can you please comment on this? The code for handling nobreak-char-display in get_next_display_element should pay attention to unibyte-display-via-language-environment. I've just installed the attached change. In article <jwv4oqri1f7.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > The patch below should help. [...] > --- xdisp.c.~1.1301.~ 2009-09-20 13:01:24.000000000 -0400 > +++ xdisp.c 2009-09-25 10:02:08.000000000 -0400 > @@ -5794,7 +5794,8 @@ > /* Handle non-break space in the mode where it only gets > highlighting. */ > - if (EQ (Vnobreak_char_display, Qt) > + if ((it->multibyte_p || unibyte_display_via_language_environment) > + && EQ (Vnobreak_char_display, Qt) > && it->c == 0xA0) If unibyte_display_via_language_environment is nonzero, we must compare DECODE_CHAR (unibyte, it->c) against 0xA0. Otherwise, for instance in KOI8 locale, we wrongly display some box-drawing character in KOI8 charset. --- Kenichi Handa handa@m17n.org Index: xdisp.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/xdisp.c,v retrieving revision 1.1304 diff -u -r1.1304 xdisp.c --- xdisp.c 27 Sep 2009 19:11:13 -0000 1.1304 +++ xdisp.c 28 Sep 2009 01:03:40 -0000 @@ -5684,6 +5684,10 @@ { Lisp_Object dv; struct charset *unibyte = CHARSET_FROM_ID (charset_unibyte); + int nbsp_or_shy = 0; /* 1:NO-BREAK SPACE, 2:SOFT HYPHEN, 0:ELSE */ +#define IS_NBSP (nbsp_or_shy == 1) +#define IS_SHY (nbsp_or_shy == 2) + int decoded = it->c; if (it->dp && (dv = DISP_CHAR_VECTOR (it->dp, it->c), @@ -5712,6 +5716,18 @@ goto get_next; } + if (unibyte_display_via_language_environment + && it->c >= 0x80) + decoded = DECODE_CHAR (unibyte, it->c); + + if (it->c >= 0x80 && ! NILP (Vnobreak_char_display)) + { + if (it->multibyte_p) + nbsp_or_shy = it->c == 0xA0 ? 1 : it->c == 0xAD ? 2 : 0; + else if (unibyte_display_via_language_environment) + nbsp_or_shy = decoded == 0xA0 ? 1 : decoded == 0xAD ? 2 : 0; + } + /* Translate control characters into `\003' or `^C' form. Control characters coming from a display table entry are currently not translated because we use IT->dpvec to hold @@ -5724,21 +5740,19 @@ If it->multibyte_p is zero, eight-bit characters that don't have corresponding multibyte char code are also translated to octal form. */ - else if ((it->c < ' ' - ? (it->area != TEXT_AREA - /* In mode line, treat \n, \t like other crl chars. */ - || (it->c != '\t' - && it->glyph_row - && (it->glyph_row->mode_line_p || it->avoid_cursor_p)) - || (it->c != '\n' && it->c != '\t')) - : (it->multibyte_p - ? (!CHAR_PRINTABLE_P (it->c) - || (!NILP (Vnobreak_char_display) - && (it->c == 0xA0 /* NO-BREAK SPACE */ - || it->c == 0xAD /* SOFT HYPHEN */))) - : (it->c >= 127 - && (! unibyte_display_via_language_environment - || (DECODE_CHAR (unibyte, it->c) <= 0xA0)))))) + if ((it->c < ' ' + ? (it->area != TEXT_AREA + /* In mode line, treat \n, \t like other crl chars. */ + || (it->c != '\t' + && it->glyph_row + && (it->glyph_row->mode_line_p || it->avoid_cursor_p)) + || (it->c != '\n' && it->c != '\t')) + : (nbsp_or_shy + || (it->multibyte_p + ? ! CHAR_PRINTABLE_P (it->c) + : (! unibyte_display_via_language_environment + ? it->c >= 0x80 + : (decoded >= 0x80 && decoded < 0xA0)))))) { /* IT->c is a control character which must be displayed either as '\003' or as `^C' where the '\\' and '^' @@ -5794,7 +5808,7 @@ highlighting. */ if (EQ (Vnobreak_char_display, Qt) - && it->c == 0xA0) + && IS_NBSP) { /* Merge the no-break-space face into the current face. */ face_id = merge_faces (it->f, Qnobreak_space, 0, @@ -5844,7 +5858,7 @@ highlighting. */ if (EQ (Vnobreak_char_display, Qt) - && it->c == 0xAD) + && IS_SHY) { it->c = '-'; XSETINT (it->ctl_chars[0], '-'); @@ -5855,10 +5869,10 @@ /* Handle non-break space and soft hyphen with the escape glyph. */ - if (it->c == 0xA0 || it->c == 0xAD) + if (nbsp_or_shy) { XSETINT (it->ctl_chars[0], escape_glyph); - it->c = (it->c == 0xA0 ? ' ' : '-'); + it->c = (IS_NBSP ? ' ' : '-'); XSETINT (it->ctl_chars[1], it->c); ctl_len = 2; goto display_control; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-28 1:10 ` Kenichi Handa @ 2009-09-28 6:43 ` Eli Zaretskii 2009-09-28 11:24 ` Kenichi Handa 0 siblings, 1 reply; 19+ messages in thread From: Eli Zaretskii @ 2009-09-28 6:43 UTC (permalink / raw) To: Kenichi Handa; +Cc: ulm, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: ulm@gentoo.org, emacs-devel@gnu.org > Date: Mon, 28 Sep 2009 10:10:32 +0900 > > In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > > >> $ emacs -Q > > > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET > > > >> > > > >> The characters are displayed as "_-" (approximately). > > > >> > > > >> Shouldn't they be displayed as "\240\255", considering that these are > > > >> raw bytes with no specific meaning? > > > > > > > There are no ``raw bytes'' in a unibyte buffer. Every byte there is > > > > interpreted as a character, and shown as such. This is the main > > > > feature of unibyte buffers; otherwise, who'd want them? > > I think the main feature of unibyte buffers is to handle > raw-bytes as is. How do we even know that they are raw bytes, and how do we distinguish, in a unibyte buffer, ü from \374, say? Just because they were inserted by C-q NNN or by some other mechanism? > For those who want to see a raw-byte as a character of their locale > (language environment), we have > unibyte-display-via-language-environment. I thought bytes in unibyte buffers are always interpreted as characters of the locale, as Emacs 19 did. Are you saying that they are by default always interpreted as raw bytes, unless unibyte-display-via-language-environment is set? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-28 6:43 ` Eli Zaretskii @ 2009-09-28 11:24 ` Kenichi Handa 2009-09-28 14:10 ` Eli Zaretskii 2009-09-28 22:38 ` Stefan Monnier 0 siblings, 2 replies; 19+ messages in thread From: Kenichi Handa @ 2009-09-28 11:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ulm, emacs-devel In article <831vlrsh6q.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > In article <83ws3ntmgv.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > > > > >> $ emacs -Q > > > > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET > > > > >> > > > > >> The characters are displayed as "_-" (approximately). > > > > >> > > > > >> Shouldn't they be displayed as "\240\255", considering that these are > > > > >> raw bytes with no specific meaning? > > > > > > > > > There are no ``raw bytes'' in a unibyte buffer. Every byte there is > > > > > interpreted as a character, and shown as such. This is the main > > > > > feature of unibyte buffers; otherwise, who'd want them? > > > > I think the main feature of unibyte buffers is to handle > > raw-bytes as is. > How do we even know that they are raw bytes, and how do we > distinguish, in a unibyte buffer, ü from \374, say? Just because they > were inserted by C-q NNN or by some other mechanism? They are not distinguished. > > For those who want to see a raw-byte as a character of their locale > > (language environment), we have > > unibyte-display-via-language-environment. > I thought bytes in unibyte buffers are always interpreted as > characters of the locale, as Emacs 19 did. Not really because we don't perform automatic unibyte<->multibyte decoding/encoding anymore. So, if we cut #xC0 in a unibyte buffer and yank it in a multibyte buffer, eight-bit character is inserted instead of U+00C0. > Are you saying that they > are by default always interpreted as raw bytes, unless > unibyte-display-via-language-environment is set? unibyte-display-via-language-environment just controls how to display them, and it doesn't affect how they are interpreted. Actually, the interpretation of characters in a unnibyte buffer is still inconsistent. For instance, skip-syntax-forward treats #x80..#xFF as characters U+0080..U+00FF. Thus #xC0 is a word-constituent and #xD7 is a symbol. We must fix it somehow. But, how? We currently don't have a suitable syntax code for eight-bit chars. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-28 11:24 ` Kenichi Handa @ 2009-09-28 14:10 ` Eli Zaretskii 2009-09-28 22:38 ` Stefan Monnier 1 sibling, 0 replies; 19+ messages in thread From: Eli Zaretskii @ 2009-09-28 14:10 UTC (permalink / raw) To: Kenichi Handa; +Cc: ulm, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: ulm@gentoo.org, emacs-devel@gnu.org > Date: Mon, 28 Sep 2009 20:24:24 +0900 > > > How do we even know that they are raw bytes, and how do we > > distinguish, in a unibyte buffer, ü from \374, say? Just because they > > were inserted by C-q NNN or by some other mechanism? > > They are not distinguished. Thanks for getting me straight with the facts. > We currently don't have a suitable syntax code for eight-bit chars. Maybe we should add such a code. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-28 11:24 ` Kenichi Handa 2009-09-28 14:10 ` Eli Zaretskii @ 2009-09-28 22:38 ` Stefan Monnier 2009-09-29 1:05 ` Kenichi Handa 1 sibling, 1 reply; 19+ messages in thread From: Stefan Monnier @ 2009-09-28 22:38 UTC (permalink / raw) To: Kenichi Handa; +Cc: Eli Zaretskii, ulm, emacs-devel > Actually, the interpretation of characters in a unnibyte > buffer is still inconsistent. For instance, > skip-syntax-forward treats #x80..#xFF as characters > U+0080..U+00FF. Thus #xC0 is a word-constituent and #xD7 is > a symbol. We must fix it somehow. But, how? We currently > don't have a suitable syntax code for eight-bit chars. While I agree that treating bytes as if they were latin-1 chars isn't ideal, I don't think the alternative (treat every byte >=128 as punctuation, I guess) would really be much better. So there's not much point "fixing" this "problem". Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-28 22:38 ` Stefan Monnier @ 2009-09-29 1:05 ` Kenichi Handa 2009-09-29 1:35 ` Stefan Monnier 0 siblings, 1 reply; 19+ messages in thread From: Kenichi Handa @ 2009-09-29 1:05 UTC (permalink / raw) To: Stefan Monnier; +Cc: eliz, ulm, emacs-devel In article <jwvzl8e8zop.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > > Actually, the interpretation of characters in a unnibyte > > buffer is still inconsistent. For instance, > > skip-syntax-forward treats #x80..#xFF as characters > > U+0080..U+00FF. Thus #xC0 is a word-constituent and #xD7 is > > a symbol. We must fix it somehow. But, how? We currently > > don't have a suitable syntax code for eight-bit chars. > While I agree that treating bytes as if they were latin-1 chars isn't > ideal, I don't think the alternative (treat every byte >=128 as > punctuation, I guess) would really be much better. So there's not much > point "fixing" this "problem". Perhaps we don't have to make much effort on this matter, but if one is, for instance, in Greek lang. env. and working in a unibyte buffer while setting unibyte-display-via-language-environment to t, he sees that (skip-syntax-forward "w") stops at #xD7 which is displayed as "GREEK CAPITAL LETTER CHI (U+3A7)" and will think it's a bug. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-29 1:05 ` Kenichi Handa @ 2009-09-29 1:35 ` Stefan Monnier 2009-09-29 2:37 ` Kenichi Handa 0 siblings, 1 reply; 19+ messages in thread From: Stefan Monnier @ 2009-09-29 1:35 UTC (permalink / raw) To: Kenichi Handa; +Cc: eliz, ulm, emacs-devel > Perhaps we don't have to make much effort on this matter, > but if one is, for instance, in Greek lang. env. and working > in a unibyte buffer while setting > unibyte-display-via-language-environment to t, he sees that > (skip-syntax-forward "w") stops at #xD7 which is displayed > as "GREEK CAPITAL LETTER CHI (U+3A7)" and will think it's a > bug. I think in such a case, the bug is to use unibyte-display-via-language-environment and expect it to affect more than the display. Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-29 1:35 ` Stefan Monnier @ 2009-09-29 2:37 ` Kenichi Handa 2009-09-29 3:15 ` Stefan Monnier 0 siblings, 1 reply; 19+ messages in thread From: Kenichi Handa @ 2009-09-29 2:37 UTC (permalink / raw) To: Stefan Monnier; +Cc: eliz, ulm, emacs-devel In article <jwvr5tq7cte.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > I think in such a case, the bug is to use > unibyte-display-via-language-environment and expect it to affect more > than the display. Then, how about adding this statement in the docstring (or to info)? "It doesn't affect how each character is interpreted." Though the term "interpreted" is so vague that people won't understand what it really means. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-29 2:37 ` Kenichi Handa @ 2009-09-29 3:15 ` Stefan Monnier 2009-09-29 7:52 ` Kenichi Handa 0 siblings, 1 reply; 19+ messages in thread From: Stefan Monnier @ 2009-09-29 3:15 UTC (permalink / raw) To: Kenichi Handa; +Cc: eliz, ulm, emacs-devel >> I think in such a case, the bug is to use >> unibyte-display-via-language-environment and expect it to affect more >> than the display. > Then, how about adding this statement in the docstring (or to info)? > "It doesn't affect how each character is interpreted." Maybe we could try to make it more obvious that it only affects display and nothing else. Then again, we should also make it more clear that you should stay away from this feature. Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-29 3:15 ` Stefan Monnier @ 2009-09-29 7:52 ` Kenichi Handa [not found] ` <831vljpm0v.fsf@gnu.org> 0 siblings, 1 reply; 19+ messages in thread From: Kenichi Handa @ 2009-09-29 7:52 UTC (permalink / raw) To: Stefan Monnier; +Cc: eliz, ulm, emacs-devel In article <jwvske65tow.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: >>> I think in such a case, the bug is to use >>> unibyte-display-via-language-environment and expect it to affect more >>> than the display. > > Then, how about adding this statement in the docstring (or to info)? > > "It doesn't affect how each character is interpreted." > Maybe we could try to make it more obvious that it only affects display > and nothing else. Then again, we should also make it more clear that > you should stay away from this feature. I'm not good at writing that kind of document. Could someone please install a proper docstring/info-text? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <831vljpm0v.fsf@gnu.org>]
* Re: Display of characters #xa0 and #xad in unibyte buffers [not found] ` <831vljpm0v.fsf@gnu.org> @ 2009-10-05 0:49 ` Kenichi Handa 0 siblings, 0 replies; 19+ messages in thread From: Kenichi Handa @ 2009-10-05 0:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ulm, monnier, emacs-devel In article <831vljpm0v.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > Maybe we could try to make it more obvious that it only affects display > > > and nothing else. Then again, we should also make it more clear that > > > you should stay away from this feature. > > > > I'm not good at writing that kind of document. Could > > someone please install a proper docstring/info-text? > Done. Thank you!! --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-25 8:31 ` Eli Zaretskii 2009-09-25 9:00 ` Ulrich Mueller @ 2009-09-25 9:38 ` Stephen J. Turnbull 2009-09-25 14:09 ` Stefan Monnier 2 siblings, 0 replies; 19+ messages in thread From: Stephen J. Turnbull @ 2009-09-25 9:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Ulrich Mueller, emacs-devel Eli Zaretskii writes: > There are no ``raw bytes'' in a unibyte buffer. Every byte there is > interpreted as a character, and shown as such. This is the main > feature of unibyte buffers; otherwise, who'd want them? Somebody working with a wire protocol who wants to avoid the overhead of a multibyte representation and the fragility of writing code to use Mule coding systems to implement the identity? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-25 8:31 ` Eli Zaretskii 2009-09-25 9:00 ` Ulrich Mueller 2009-09-25 9:38 ` Stephen J. Turnbull @ 2009-09-25 14:09 ` Stefan Monnier 2009-09-26 8:26 ` Ulrich Mueller 2 siblings, 1 reply; 19+ messages in thread From: Stefan Monnier @ 2009-09-25 14:09 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Ulrich Mueller, emacs-devel >> Is the following a feature or a bug? >> >> $ emacs -Q >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET >> >> The characters are displayed as "_-" (approximately). >> >> Shouldn't they be displayed as "\240\255", considering that these are >> raw bytes with no specific meaning? > There are no ``raw bytes'' in a unibyte buffer. Every byte there is > interpreted as a character, and shown as such. No, this used to be the case, but not any more. This is nowadays only done for the deprecated unibyte sessions (well, it's really controlled by unibyte-display-via-language-environment IIUC). It looks like a bug in the implementation of nobreak-char-display (thanks Andreas). The patch below should help. Stefan --- xdisp.c.~1.1301.~ 2009-09-20 13:01:24.000000000 -0400 +++ xdisp.c 2009-09-25 10:02:08.000000000 -0400 @@ -5794,7 +5794,8 @@ /* Handle non-break space in the mode where it only gets highlighting. */ - if (EQ (Vnobreak_char_display, Qt) + if ((it->multibyte_p || unibyte_display_via_language_environment) + && EQ (Vnobreak_char_display, Qt) && it->c == 0xA0) { /* Merge the no-break-space face into the current face. */ @@ -5844,6 +5845,8 @@ /* Handle soft hyphens in the mode where they only get highlighting. */ + if (it->multibyte_p || unibyte_display_via_language_environment) + { if (EQ (Vnobreak_char_display, Qt) && it->c == 0xAD) { @@ -5864,6 +5867,7 @@ ctl_len = 2; goto display_control; } + } { unsigned char str[MAX_MULTIBYTE_LENGTH]; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-25 14:09 ` Stefan Monnier @ 2009-09-26 8:26 ` Ulrich Mueller 0 siblings, 0 replies; 19+ messages in thread From: Ulrich Mueller @ 2009-09-26 8:26 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel >>>>> On Fri, 25 Sep 2009, Stefan Monnier wrote: > It looks like a bug in the implementation of nobreak-char-display > (thanks Andreas). > The patch below should help. It fixes the problem for me, thanks. Still strange that the two character codes are hardcoded at a low level, while others like #x202f ("narrow no-break space") are not. Ulrich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Display of characters #xa0 and #xad in unibyte buffers 2009-09-24 15:06 Display of characters #xa0 and #xad in unibyte buffers Ulrich Mueller 2009-09-25 8:31 ` Eli Zaretskii @ 2009-09-25 9:44 ` Andreas Schwab 1 sibling, 0 replies; 19+ messages in thread From: Andreas Schwab @ 2009-09-25 9:44 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel Ulrich Mueller <ulm@gentoo.org> writes: > Is the following a feature or a bug? > > $ emacs -Q > M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET > > The characters are displayed as "_-" (approximately). > > Shouldn't they be displayed as "\240\255", considering that these are > raw bytes with no specific meaning? See nobreak-char-display. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2009-10-05 0:49 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-09-24 15:06 Display of characters #xa0 and #xad in unibyte buffers Ulrich Mueller 2009-09-25 8:31 ` Eli Zaretskii 2009-09-25 9:00 ` Ulrich Mueller 2009-09-25 9:14 ` Eli Zaretskii 2009-09-28 1:10 ` Kenichi Handa 2009-09-28 6:43 ` Eli Zaretskii 2009-09-28 11:24 ` Kenichi Handa 2009-09-28 14:10 ` Eli Zaretskii 2009-09-28 22:38 ` Stefan Monnier 2009-09-29 1:05 ` Kenichi Handa 2009-09-29 1:35 ` Stefan Monnier 2009-09-29 2:37 ` Kenichi Handa 2009-09-29 3:15 ` Stefan Monnier 2009-09-29 7:52 ` Kenichi Handa [not found] ` <831vljpm0v.fsf@gnu.org> 2009-10-05 0:49 ` Kenichi Handa 2009-09-25 9:38 ` Stephen J. Turnbull 2009-09-25 14:09 ` Stefan Monnier 2009-09-26 8:26 ` Ulrich Mueller 2009-09-25 9:44 ` Andreas Schwab
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.