all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* non-breaking hyphens
@ 2011-10-17 13:56 Chong Yidong
  2011-10-17 14:45 ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Chong Yidong @ 2011-10-17 13:56 UTC (permalink / raw)
  To: emacs-devel

From the Text Display node in the Emacs manual:

     Some character sets define "no-break" versions of the space and
  hyphen characters, which are used where a line should not be broken.
  Emacs normally displays these characters with special faces
  (respectively, `nobreak-space' and `escape-glyph') to distinguish them
  from ordinary spaces and hyphens.

Hmm---inserting #x2011 (NON-BREAKING HYPHEN) into the buffer does not
show the character in the `escape-glyph' face.  Does anyone know if
#x2011 is indeed the character that manual is referring to?  Is the
manual description obsolete or is the Emacs behavior buggy?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-17 13:56 non-breaking hyphens Chong Yidong
@ 2011-10-17 14:45 ` Eli Zaretskii
  2011-10-18  3:39   ` Chong Yidong
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2011-10-17 14:45 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

> From: Chong Yidong <cyd@gnu.org>
> Date: Mon, 17 Oct 2011 09:56:46 -0400
> 
> >From the Text Display node in the Emacs manual:
> 
>      Some character sets define "no-break" versions of the space and
>   hyphen characters, which are used where a line should not be broken.
>   Emacs normally displays these characters with special faces
>   (respectively, `nobreak-space' and `escape-glyph') to distinguish them
>   from ordinary spaces and hyphens.
> 
> Hmm---inserting #x2011 (NON-BREAKING HYPHEN) into the buffer does not
> show the character in the `escape-glyph' face.  Does anyone know if
> #x2011 is indeed the character that manual is referring to?  Is the
> manual description obsolete or is the Emacs behavior buggy?

The manual is referring to #xAD, see this fragment from
get_next_display_element (and the code thereafter which references
nbsp_or_shy):

	  if (! ASCII_CHAR_P (c) && ! NILP (Vnobreak_char_display))
	    nbsp_or_shy = (c == 0xA0   ? char_is_nbsp
			   : c == 0xAD ? char_is_soft_hyphen
			   :             char_is_other);

Based on this, I'd say that the implementation is incomplete: it only
supports a subset of no-break characters defined by the Unicode
standard.

Note that the no-break characters should be displayed with the
`nobreak-space' face, not `escape-glyph' face.  I think that the
manual should also mention the nobreak-char-display variable.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-17 14:45 ` Eli Zaretskii
@ 2011-10-18  3:39   ` Chong Yidong
  2011-10-18  4:00     ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Chong Yidong @ 2011-10-18  3:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>>      Some character sets define "no-break" versions of the space and
>>   hyphen characters, which are used where a line should not be broken.
>>   Emacs normally displays these characters with special faces
>>   (respectively, `nobreak-space' and `escape-glyph') to distinguish them
>>   from ordinary spaces and hyphens.
>
> The manual is referring to #xAD

Ah, I see, so the manual is confused, since U+00AD is a "soft hyphen"
not a no-break version of a hyphen.

> see this fragment from get_next_display_element (and the code
> thereafter which references nbsp_or_shy):
>
> 	  if (! ASCII_CHAR_P (c) && ! NILP (Vnobreak_char_display))
> 	    nbsp_or_shy = (c == 0xA0   ? char_is_nbsp
> 			   : c == 0xAD ? char_is_soft_hyphen
> 			   :             char_is_other);
>
> Based on this, I'd say that the implementation is incomplete: it only
> supports a subset of no-break characters defined by the Unicode
> standard.
>
> Note that the no-break characters should be displayed with the
> `nobreak-space' face, not `escape-glyph' face.  I think that the
> manual should also mention the nobreak-char-display variable.

I'm not sure I understand the goal of `nobreak-char-display'.  Is it for
warning the user when there is an ASCII look-alike character that isn't
really ASCII?  I guess that's mainly to avoid issues with source code?

If so, handling only U+A0 and U+AD would be incomplete, as you say.
Also, the name and documentation of `nobreak-char-display' is misleading
or incomplete---it shouldn't be limited to non-breaking or shy
characters, since there are many other non-ASCII lookalikes like U+2010
(the "true" hyphen), U+2002 (the "en space"), and U+2007 (the "figure
space").

I'm guessing the reason U+A0 and U+AD are treated specially is that
those characters happened to be in Latin-1 (i.e. hysterical raisins).



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-18  3:39   ` Chong Yidong
@ 2011-10-18  4:00     ` Eli Zaretskii
  2011-10-18 12:08       ` Chong Yidong
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2011-10-18  4:00 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

> From: Chong Yidong <cyd@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 17 Oct 2011 23:39:35 -0400
> 
> I'm not sure I understand the goal of `nobreak-char-display'.  Is it for
> warning the user when there is an ASCII look-alike character that isn't
> really ASCII?  I guess that's mainly to avoid issues with source code?

Yes, probably.  But I'm only guessing here.  Can you find any traces
of discussing this in the archives?

> If so, handling only U+A0 and U+AD would be incomplete, as you say.
> Also, the name and documentation of `nobreak-char-display' is misleading
> or incomplete---it shouldn't be limited to non-breaking or shy
> characters, since there are many other non-ASCII lookalikes like U+2010
> (the "true" hyphen), U+2002 (the "en space"), and U+2007 (the "figure
> space").

I agree.

> I'm guessing the reason U+A0 and U+AD are treated specially is that
> those characters happened to be in Latin-1 (i.e. hysterical raisins).

Most probably.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-18  4:00     ` Eli Zaretskii
@ 2011-10-18 12:08       ` Chong Yidong
  2011-10-18 13:11         ` Stefan Monnier
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Chong Yidong @ 2011-10-18 12:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> I'm not sure I understand the goal of `nobreak-char-display'.  Is it for
>> warning the user when there is an ASCII look-alike character that isn't
>> really ASCII?  I guess that's mainly to avoid issues with source code?
>
> Yes, probably.  But I'm only guessing here.  Can you find any traces
> of discussing this in the archives?

This is what I found, though it does not explain the rationale for the
feature:

http://lists.gnu.org/archive/html/emacs-devel/2004-12/msg00954.html
http://lists.gnu.org/archive/html/emacs-devel/2005-01/msg00035.html

And here was a request to consider adding U+FEFF to the list of
characters handled by nobreak-char-display, which apparently petered
out:

http://lists.gnu.org/archive/html/emacs-devel/2008-04/msg00413.html

The right way to implement this feature, as brought up in the 2004
thread, would be to specify the affected characters with a char-table
rather than hardcoding them.  But we should probably leave such a change
till after 24.1.

In the meantime, I think I'll add non-breaking hyphen and hyphen to the
hardcoded list, while deferring on the various other space characters;
many of those spaces are defined to have specific widths, so it's not
clear that changing their appearance to a highlighted space is the right
thing for Emacs to do.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-18 12:08       ` Chong Yidong
@ 2011-10-18 13:11         ` Stefan Monnier
  2011-10-18 17:43           ` Drew Adams
  2011-10-18 13:41         ` Eli Zaretskii
  2011-10-19  8:27         ` Juri Linkov
  2 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2011-10-18 13:11 UTC (permalink / raw)
  To: Chong Yidong; +Cc: Eli Zaretskii, emacs-devel

>>> I'm not sure I understand the goal of `nobreak-char-display'.  Is it for
>>> warning the user when there is an ASCII look-alike character that isn't
>>> really ASCII?  I guess that's mainly to avoid issues with source code?
>> Yes, probably.  But I'm only guessing here.  Can you find any traces
>> of discussing this in the archives?

I do remember it being the result (through various discussions, as you
can imagine) of bug reports where a NBSP was accidentally inserted in
source code instead of a SPC, which can be difficult to track down.

> The right way to implement this feature, as brought up in the 2004
> thread, would be to specify the affected characters with a char-table
> rather than hardcoding them.  But we should probably leave such a change
> till after 24.1.

I believe the general solution is along the lines of what the GNU ELPA
packae "markchars" does.  The current solution was a simple solution for
the sub-cases that can happen commonly in programming languages.

The reason why it's important to handle programming languages is that
visual similarity is not understood by compilers ;-)
In contrast for text buffers (or even LaTeX and HTML), it's much
less problematic.

Also the significant cases are the ones where the similarity is between
a "plain ASCII" char and some other one.


        Stefan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-18 12:08       ` Chong Yidong
  2011-10-18 13:11         ` Stefan Monnier
@ 2011-10-18 13:41         ` Eli Zaretskii
  2011-10-19  8:27         ` Juri Linkov
  2 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2011-10-18 13:41 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

> From: Chong Yidong <cyd@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Tue, 18 Oct 2011 08:08:00 -0400
> 
> This is what I found, though it does not explain the rationale for the
> feature:
> 
> http://lists.gnu.org/archive/html/emacs-devel/2004-12/msg00954.html
> http://lists.gnu.org/archive/html/emacs-devel/2005-01/msg00035.html

Thanks.

> The right way to implement this feature, as brought up in the 2004
> thread, would be to specify the affected characters with a char-table
> rather than hardcoding them.

Yes, we should design the display of these special characters as a
single coherent feature.  For example, I discovered a few days ago
that we don't display the U+2028 LINE SEPARATOR character as a line
separator.  And there are many other similar Unicode characters that
need our consideration.

(I started to elaborate about this, but found out that I already said
that in http://lists.gnu.org/archive/html/emacs-devel/2008-04/msg01504.html).

> But we should probably leave such a change till after 24.1.

Agreed.  How about a bug report on this, to avoid forgetting the
issue?

> In the meantime, I think I'll add non-breaking hyphen and hyphen to the
> hardcoded list, while deferring on the various other space characters;
> many of those spaces are defined to have specific widths, so it's not
> clear that changing their appearance to a highlighted space is the right
> thing for Emacs to do.

I agree.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: non-breaking hyphens
  2011-10-18 13:11         ` Stefan Monnier
@ 2011-10-18 17:43           ` Drew Adams
  2011-10-19  8:28             ` Juri Linkov
  0 siblings, 1 reply; 15+ messages in thread
From: Drew Adams @ 2011-10-18 17:43 UTC (permalink / raw)
  To: 'Stefan Monnier', 'Chong Yidong'
  Cc: 'Eli Zaretskii', emacs-devel

> I believe the general solution is along the lines of what the GNU ELPA
> packae "markchars" does.  The current solution was a simple 
> solution for
> the sub-cases that can happen commonly in programming languages.
> 
> The reason why it's important to handle programming languages is that
> visual similarity is not understood by compilers ;-)
> In contrast for text buffers (or even LaTeX and HTML), it's much
> less problematic.
> 
> Also the significant cases are the ones where the similarity 
> is between a "plain ASCII" char and some other one.

FWIW, I suggest adding simple search and search-and-replace commands to check
for such groups of "false friends".  This would be in addition to the display
changes that you are discussing.

IOW, make it easy not only to spot such chars when you come across them, but to
search for and optionally replace them.

And this should be without needing to know what their code points or Unicode
names are.  The search commands would be specific to such easily
indistinguisable/confusable "false friends", and would be driven off of a
customizable "false-friends" list that calls out the preferred replacement for
each such group of "false friends".




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-18 12:08       ` Chong Yidong
  2011-10-18 13:11         ` Stefan Monnier
  2011-10-18 13:41         ` Eli Zaretskii
@ 2011-10-19  8:27         ` Juri Linkov
  2011-10-19  8:39           ` Eli Zaretskii
  2 siblings, 1 reply; 15+ messages in thread
From: Juri Linkov @ 2011-10-19  8:27 UTC (permalink / raw)
  To: Chong Yidong; +Cc: Eli Zaretskii, emacs-devel

> The right way to implement this feature, as brought up in the 2004
> thread, would be to specify the affected characters with a char-table
> rather than hardcoding them.  But we should probably leave such a change
> till after 24.1.

Since glyphless characters (like "ZERO WIDTH NO-BREAK SPACE") are
displayed now using a char-table, it makes sense to display confusable
characters with a similar char-table (e.g. `confusable-char-display')
where display methods could specify how to display them (face, etc.)

BTW, there is already a mapping in lisp/international/latin1-disp.el
in `latin1-display-ucs-per-lynx' that can be used to match confusable
characters.

> In the meantime, I think I'll add non-breaking hyphen and hyphen to the
> hardcoded list, while deferring on the various other space characters;

While they are still hadrcoded, requires another change:

=== modified file 'lisp/descr-text.el'
--- lisp/descr-text.el	2011-09-29 00:12:44 +0000
+++ lisp/descr-text.el	2011-10-19 08:22:27 +0000
@@ -606,7 +606,8 @@ (defun describe-char (pos &optional buff
                              'trailing-whitespace)
                             ((and nobreak-char-display char (eq char '#xa0))
                              'nobreak-space)
-                            ((and nobreak-char-display char (eq char '#xad))
+                            ((and nobreak-char-display char
+				  (memq char '(#xad #x2010 #x2011)))
                              'escape-glyph)
                             ((and (< char 32) (not (memq char '(9 10))))
                              'escape-glyph)))))




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-18 17:43           ` Drew Adams
@ 2011-10-19  8:28             ` Juri Linkov
  2011-10-19 13:59               ` Drew Adams
  0 siblings, 1 reply; 15+ messages in thread
From: Juri Linkov @ 2011-10-19  8:28 UTC (permalink / raw)
  To: Drew Adams; +Cc: emacs-devel

> FWIW, I suggest adding simple search and search-and-replace commands to check
> for such groups of "false friends".  This would be in addition to the display
> changes that you are discussing.

What you suggest reminds UI used when trying to save a file with characters
that can't be encoded with default coding systems, i.e. that displays:

  Click on a character (or switch to this window by `C-x o'
  and select the characters by RET) to jump to the place it appears,
  where `C-u C-x =' will give information about it.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-19  8:27         ` Juri Linkov
@ 2011-10-19  8:39           ` Eli Zaretskii
  2011-10-19 13:59             ` Drew Adams
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2011-10-19  8:39 UTC (permalink / raw)
  To: Juri Linkov; +Cc: cyd, emacs-devel

> From: Juri Linkov <juri@jurta.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org
> Date: Wed, 19 Oct 2011 11:27:15 +0300
> 
> > The right way to implement this feature, as brought up in the 2004
> > thread, would be to specify the affected characters with a char-table
> > rather than hardcoding them.  But we should probably leave such a change
> > till after 24.1.
> 
> Since glyphless characters (like "ZERO WIDTH NO-BREAK SPACE") are
> displayed now using a char-table, it makes sense to display confusable
> characters with a similar char-table (e.g. `confusable-char-display')
> where display methods could specify how to display them (face, etc.)
> 
> BTW, there is already a mapping in lisp/international/latin1-disp.el
> in `latin1-display-ucs-per-lynx' that can be used to match confusable
> characters.

Yes, we have several overlapping features that handle these and other
issues.  One other related "overlap" is glyphless characters display
vis-a-vis display tables; currently they contradict.  The current
situation is quite a mess, and we need to resolve it by designing a
coherent set of features to handle all that.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: non-breaking hyphens
  2011-10-19  8:28             ` Juri Linkov
@ 2011-10-19 13:59               ` Drew Adams
  2011-10-19 14:37                 ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: Drew Adams @ 2011-10-19 13:59 UTC (permalink / raw)
  To: 'Juri Linkov'; +Cc: emacs-devel

> > FWIW, I suggest adding simple search and search-and-replace 
> > commands to check for such groups of "false friends".  This
> > would be in addition to the display changes that you are discussing.
> 
> What you suggest reminds UI used when trying to save a file 
> with characters
> that can't be encoded with default coding systems, i.e. that displays:
> 
>   Click on a character (or switch to this window by `C-x o'
>   and select the characters by RET) to jump to the place it appears,
>   where `C-u C-x =' will give information about it.

Maybe.  But my suggestion was to drive everything off of a user-customizable
data structure: groups of confusables together with an identified preferred
value for each group (could be just the first or last confusable in the list, or
could be a separate entry for the group).

If, for example, the UI part was to provide a query-replace-faux-amis command
for a given group of confusables (or for all groups at once), every confusable
in a given group would be sought, and the default replacement would be the
identified preferred value for the group.

If you think that the preferred value for a group might often depend on the
context (e.g. mode), then the data structure could have a 3rd field for a
predicate or mode variable (or t or nil for always).  In case of overlapping
groups and predicates, the data-structure order would govern.  And so on.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: non-breaking hyphens
  2011-10-19  8:39           ` Eli Zaretskii
@ 2011-10-19 13:59             ` Drew Adams
  2011-10-19 15:12               ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Drew Adams @ 2011-10-19 13:59 UTC (permalink / raw)
  To: 'Eli Zaretskii', 'Juri Linkov'; +Cc: cyd, emacs-devel

> > BTW, there is already a mapping in lisp/international/latin1-disp.el
> > in `latin1-display-ucs-per-lynx' that can be used to match 
> > confusable characters.
> 
> Yes, we have several overlapping features that handle these and other
> issues.  One other related "overlap" is glyphless characters display
> vis-a-vis display tables; currently they contradict.  The current
> situation is quite a mess, and we need to resolve it by designing a
> coherent set of features to handle all that.

Introduce a user-customizable data structure as I suggested _now_, and drive
whatever current or future changes are needed for the UI etc. off of that,
instead of hard-coding here and there.

If you can at least decide now on the structure of the data structure, the exact
content can be modified later as needed.  Similarly, which parts of Emacs need
to use it (e.g. what you now are discovering as additional forgotten places to
hard-code) can be dealt with progressively.

Continuing to hard-code-hack here and there now is not a great idea, other
things being equal.  It's not because we are in premature "pretest" that we
shouldn't start to DTRT.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-19 13:59               ` Drew Adams
@ 2011-10-19 14:37                 ` Stefan Monnier
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Monnier @ 2011-10-19 14:37 UTC (permalink / raw)
  To: Drew Adams; +Cc: 'Juri Linkov', emacs-devel

Have people checked the `markchars' package in GNU ELPA?
It was specifically designed to solve those issues with "confusable"
chars, so it's probably a good starting point.


        Stefan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: non-breaking hyphens
  2011-10-19 13:59             ` Drew Adams
@ 2011-10-19 15:12               ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2011-10-19 15:12 UTC (permalink / raw)
  To: Drew Adams; +Cc: juri, cyd, emacs-devel

> From: "Drew Adams" <drew.adams@oracle.com>
> Cc: <cyd@gnu.org>, <emacs-devel@gnu.org>
> Date: Wed, 19 Oct 2011 06:59:14 -0700
> 
> > > BTW, there is already a mapping in lisp/international/latin1-disp.el
> > > in `latin1-display-ucs-per-lynx' that can be used to match 
> > > confusable characters.
> > 
> > Yes, we have several overlapping features that handle these and other
> > issues.  One other related "overlap" is glyphless characters display
> > vis-a-vis display tables; currently they contradict.  The current
> > situation is quite a mess, and we need to resolve it by designing a
> > coherent set of features to handle all that.
> 
> Introduce a user-customizable data structure as I suggested _now_, and drive
> whatever current or future changes are needed for the UI etc. off of that,
> instead of hard-coding here and there.

Introducing a UI when there's no sign of design anywhere in sight is
backwards.  We don't even know yet whether it will be one feature or
several different ones.

Somebody should present at least an idea of a design before any code
could be crafted that will not be thrown away soon enough.

> Continuing to hard-code-hack here and there now is not a great idea, other
> things being equal.

Right, and that's why no one's doing that.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-10-19 15:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-17 13:56 non-breaking hyphens Chong Yidong
2011-10-17 14:45 ` Eli Zaretskii
2011-10-18  3:39   ` Chong Yidong
2011-10-18  4:00     ` Eli Zaretskii
2011-10-18 12:08       ` Chong Yidong
2011-10-18 13:11         ` Stefan Monnier
2011-10-18 17:43           ` Drew Adams
2011-10-19  8:28             ` Juri Linkov
2011-10-19 13:59               ` Drew Adams
2011-10-19 14:37                 ` Stefan Monnier
2011-10-18 13:41         ` Eli Zaretskii
2011-10-19  8:27         ` Juri Linkov
2011-10-19  8:39           ` Eli Zaretskii
2011-10-19 13:59             ` Drew Adams
2011-10-19 15:12               ` Eli Zaretskii

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.