unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* whitespace includes U+3000
@ 2006-06-25  2:11 Dan Jacobson
  2006-06-26  1:00 ` Kenichi Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Jacobson @ 2006-06-25  2:11 UTC (permalink / raw)
  Cc: handa

Are emacs whitespace detectors aware of Unicode characters like
U+3000? show-trailing-whitespace and other (apropos (quote
("whitespace"))) stuff aren't.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: whitespace includes U+3000
  2006-06-25  2:11 whitespace includes U+3000 Dan Jacobson
@ 2006-06-26  1:00 ` Kenichi Handa
  2006-06-27 10:34   ` Richard Stallman
  0 siblings, 1 reply; 6+ messages in thread
From: Kenichi Handa @ 2006-06-26  1:00 UTC (permalink / raw)
  Cc: bug-gnu-emacs

In article <871wtegf13.fsf@jidanni.org>, Dan Jacobson <jidanni@jidanni.org> writes:

> Are emacs whitespace detectors aware of Unicode characters like
> U+3000?

No.  The current Emacs treat only TAB and SPACE as
"whitespace" characters.

Should be fixed in emacs-unicode-2 which contains Unicode
character category data.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: whitespace includes U+3000
  2006-06-26  1:00 ` Kenichi Handa
@ 2006-06-27 10:34   ` Richard Stallman
  2006-06-27 11:48     ` Kenichi Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Stallman @ 2006-06-27 10:34 UTC (permalink / raw)
  Cc: bug-gnu-emacs, jidanni

    > Are emacs whitespace detectors aware of Unicode characters like
    > U+3000?

    No.  The current Emacs treat only TAB and SPACE as
    "whitespace" characters.

It would be very easy to fix this by setting the syntax table entries
for those characters--if there are not too many of them.  So why not
fix it?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: whitespace includes U+3000
  2006-06-27 10:34   ` Richard Stallman
@ 2006-06-27 11:48     ` Kenichi Handa
  2006-06-28 17:25       ` Richard Stallman
  0 siblings, 1 reply; 6+ messages in thread
From: Kenichi Handa @ 2006-06-27 11:48 UTC (permalink / raw)
  Cc: bug-gnu-emacs, handa, jidanni

In article <E1FvAu1-0000gN-I2@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>> Are emacs whitespace detectors aware of Unicode characters like
>> U+3000?

>     No.  The current Emacs treat only TAB and SPACE as
>     "whitespace" characters.

> It would be very easy to fix this by setting the syntax table entries
> for those characters--if there are not too many of them.  So why not
> fix it?

Are you sure that "whitespace" of syntax has the same
meaning as the "whitespace" of show-trailing-whitespace?

For instance, currently ^L (formfeed) has syntax
"whitespace".  But, it is displayed with glyph "^L".  Should
it be the target of show-trailing-whitespace?

For instance, currently NBSP (U+00A0) has syntax "."
(punctuation), and it is displayed with special face to
indicated the existing of that character.  Should it be
changed to "whitespace" syntax, or shoudn't be changed?

Have you considered these things?

Please try M-x apropos RET whitespace RET.  The word
"whitespace" is used in slightly different meanings.  I
thinks we can't blindly use "whitespace" syntax in some
cases.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: whitespace includes U+3000
  2006-06-27 11:48     ` Kenichi Handa
@ 2006-06-28 17:25       ` Richard Stallman
  2006-06-29  2:01         ` Kenichi Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Stallman @ 2006-06-28 17:25 UTC (permalink / raw)
  Cc: bug-gnu-emacs, jidanni, handa

    >     No.  The current Emacs treat only TAB and SPACE as
    >     "whitespace" characters.

    > It would be very easy to fix this by setting the syntax table entries
    > for those characters--if there are not too many of them.  So why not
    > fix it?

    Are you sure that "whitespace" of syntax has the same
    meaning as the "whitespace" of show-trailing-whitespace?

I am not sure which one we're talking about here.
Is it show-trailing-whitespace?

If so, that would also be easy to change, if it ought to be changed.

    For instance, currently ^L (formfeed) has syntax
    "whitespace".  But, it is displayed with glyph "^L".  Should
    it be the target of show-trailing-whitespace?

No.

    For instance, currently NBSP (U+00A0) has syntax "."
    (punctuation), and it is displayed with special face to
    indicated the existing of that character.  Should it be
    changed to "whitespace" syntax, or shoudn't be changed?

The special face for that character should not be overridden, but the
other whitespace after it _and before it_ should probably be displayed
specially by show-trailing-whitespace.

You can probably get this result by putting NBSP into the pattern
for show-trailing-whitespace to recognize.  Redisplay will override
the face, for the NBSP.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: whitespace includes U+3000
  2006-06-28 17:25       ` Richard Stallman
@ 2006-06-29  2:01         ` Kenichi Handa
  0 siblings, 0 replies; 6+ messages in thread
From: Kenichi Handa @ 2006-06-29  2:01 UTC (permalink / raw)
  Cc: bug-gnu-emacs, handa, jidanni

In article <E1Fvdme-00060m-A2@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>> No.  The current Emacs treat only TAB and SPACE as
>> "whitespace" characters.

>> It would be very easy to fix this by setting the syntax table entries
>> for those characters--if there are not too many of them.  So why not
>> fix it?

>     Are you sure that "whitespace" of syntax has the same
>     meaning as the "whitespace" of show-trailing-whitespace?

> I am not sure which one we're talking about here.
> Is it show-trailing-whitespace?

show-trailing-whitespace is just an example.  I think his
question is about all Emacs functionalities handling
"whitespace" in some meaning (examples are listed by M-x
apropos RET whitespace RET).

> If so, that would also be easy to change, if it ought to be changed.

Of course it's easy to change.  The difficult thing is to
determine if it ought to be changed.

>     For instance, currently ^L (formfeed) has syntax
>     "whitespace".  But, it is displayed with glyph "^L".  Should
>     it be the target of show-trailing-whitespace?

> No.

Then we have different meanings in "whitespace"; the set of
characters that have "whitespace" syntax is different from
the set of characters that are displayed by "whitespace"
glyph.  And, we can't use "whitespace" syntax at least for
show-trailing-whitespace.

>     For instance, currently NBSP (U+00A0) has syntax "."
>     (punctuation), and it is displayed with special face to
>     indicated the existing of that character.  Should it be
>     changed to "whitespace" syntax, or shoudn't be changed?

> The special face for that character should not be overridden, but the
> other whitespace after it _and before it_ should probably be displayed
> specially by show-trailing-whitespace.

> You can probably get this result by putting NBSP into the pattern
> for show-trailing-whitespace to recognize.  Redisplay will override
> the face, for the NBSP.

What do you mean by "pattern" here?  Regular expression?
Currently the function highlight_trailing_whitespace doesn't
use regular expression but checks TAB and SPACE directly
(i.e. hardcoded).

By the way, I've just found that currently the special face
for NBSP is overriden by show-trailing-whitespace.  That is
because highlight_trailing_whitespace is called at the near
end of display_line.

Anyway, Unicode has lots more space-like characters
(e.g. U+2000..U+200B).  Should them be treated by the same
way as NBSP (i.e. displayed with nobreak-face)?  Or as
SPACE?

How about the case of fixup-whitespace?  It seems that this
function should delete only TAB and SPACE.  So, here we have
the third meaning of "whitespace"; just TAB and SPACE.

How about the case of delete-trailing-whitespace?

How about the case of ...

Do you think we should define the semantics of "whitespace"
and "space character" in all cases clearly before the
release, and should modify codes if necessary?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-06-29  2:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-25  2:11 whitespace includes U+3000 Dan Jacobson
2006-06-26  1:00 ` Kenichi Handa
2006-06-27 10:34   ` Richard Stallman
2006-06-27 11:48     ` Kenichi Handa
2006-06-28 17:25       ` Richard Stallman
2006-06-29  2:01         ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).