all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* whitespace includes U+3000
@ 2006-06-25  2:11 Dan Jacobson
  2006-06-26  1:00 ` Kenichi Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Jacobson @ 2006-06-25  2:11 UTC (permalink / raw)
  Cc: handa

Are emacs whitespace detectors aware of Unicode characters like
U+3000? show-trailing-whitespace and other (apropos (quote
("whitespace"))) stuff aren't.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: whitespace includes U+3000
  2006-06-25  2:11 whitespace includes U+3000 Dan Jacobson
@ 2006-06-26  1:00 ` Kenichi Handa
  2006-06-27 10:34   ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Kenichi Handa @ 2006-06-26  1:00 UTC (permalink / raw)
  Cc: bug-gnu-emacs

In article <871wtegf13.fsf@jidanni.org>, Dan Jacobson <jidanni@jidanni.org> writes:

> Are emacs whitespace detectors aware of Unicode characters like
> U+3000?

No.  The current Emacs treat only TAB and SPACE as
"whitespace" characters.

Should be fixed in emacs-unicode-2 which contains Unicode
character category data.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: whitespace includes U+3000
  2006-06-26  1:00 ` Kenichi Handa
@ 2006-06-27 10:34   ` Richard Stallman
  2006-06-27 11:48     ` Kenichi Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Stallman @ 2006-06-27 10:34 UTC (permalink / raw)
  Cc: bug-gnu-emacs, jidanni

    > Are emacs whitespace detectors aware of Unicode characters like
    > U+3000?

    No.  The current Emacs treat only TAB and SPACE as
    "whitespace" characters.

It would be very easy to fix this by setting the syntax table entries
for those characters--if there are not too many of them.  So why not
fix it?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: whitespace includes U+3000
  2006-06-27 10:34   ` Richard Stallman
@ 2006-06-27 11:48     ` Kenichi Handa
  2006-06-28 17:25       ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Kenichi Handa @ 2006-06-27 11:48 UTC (permalink / raw)
  Cc: bug-gnu-emacs, handa, jidanni

In article <E1FvAu1-0000gN-I2@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>> Are emacs whitespace detectors aware of Unicode characters like
>> U+3000?

>     No.  The current Emacs treat only TAB and SPACE as
>     "whitespace" characters.

> It would be very easy to fix this by setting the syntax table entries
> for those characters--if there are not too many of them.  So why not
> fix it?

Are you sure that "whitespace" of syntax has the same
meaning as the "whitespace" of show-trailing-whitespace?

For instance, currently ^L (formfeed) has syntax
"whitespace".  But, it is displayed with glyph "^L".  Should
it be the target of show-trailing-whitespace?

For instance, currently NBSP (U+00A0) has syntax "."
(punctuation), and it is displayed with special face to
indicated the existing of that character.  Should it be
changed to "whitespace" syntax, or shoudn't be changed?

Have you considered these things?

Please try M-x apropos RET whitespace RET.  The word
"whitespace" is used in slightly different meanings.  I
thinks we can't blindly use "whitespace" syntax in some
cases.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: whitespace includes U+3000
  2006-06-27 11:48     ` Kenichi Handa
@ 2006-06-28 17:25       ` Richard Stallman
  2006-06-29  2:01         ` Kenichi Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Stallman @ 2006-06-28 17:25 UTC (permalink / raw)
  Cc: bug-gnu-emacs, jidanni, handa

    >     No.  The current Emacs treat only TAB and SPACE as
    >     "whitespace" characters.

    > It would be very easy to fix this by setting the syntax table entries
    > for those characters--if there are not too many of them.  So why not
    > fix it?

    Are you sure that "whitespace" of syntax has the same
    meaning as the "whitespace" of show-trailing-whitespace?

I am not sure which one we're talking about here.
Is it show-trailing-whitespace?

If so, that would also be easy to change, if it ought to be changed.

    For instance, currently ^L (formfeed) has syntax
    "whitespace".  But, it is displayed with glyph "^L".  Should
    it be the target of show-trailing-whitespace?

No.

    For instance, currently NBSP (U+00A0) has syntax "."
    (punctuation), and it is displayed with special face to
    indicated the existing of that character.  Should it be
    changed to "whitespace" syntax, or shoudn't be changed?

The special face for that character should not be overridden, but the
other whitespace after it _and before it_ should probably be displayed
specially by show-trailing-whitespace.

You can probably get this result by putting NBSP into the pattern
for show-trailing-whitespace to recognize.  Redisplay will override
the face, for the NBSP.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: whitespace includes U+3000
  2006-06-28 17:25       ` Richard Stallman
@ 2006-06-29  2:01         ` Kenichi Handa
  2006-06-29 17:57           ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Kenichi Handa @ 2006-06-29  2:01 UTC (permalink / raw)
  Cc: bug-gnu-emacs, handa, jidanni

In article <E1Fvdme-00060m-A2@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>> No.  The current Emacs treat only TAB and SPACE as
>> "whitespace" characters.

>> It would be very easy to fix this by setting the syntax table entries
>> for those characters--if there are not too many of them.  So why not
>> fix it?

>     Are you sure that "whitespace" of syntax has the same
>     meaning as the "whitespace" of show-trailing-whitespace?

> I am not sure which one we're talking about here.
> Is it show-trailing-whitespace?

show-trailing-whitespace is just an example.  I think his
question is about all Emacs functionalities handling
"whitespace" in some meaning (examples are listed by M-x
apropos RET whitespace RET).

> If so, that would also be easy to change, if it ought to be changed.

Of course it's easy to change.  The difficult thing is to
determine if it ought to be changed.

>     For instance, currently ^L (formfeed) has syntax
>     "whitespace".  But, it is displayed with glyph "^L".  Should
>     it be the target of show-trailing-whitespace?

> No.

Then we have different meanings in "whitespace"; the set of
characters that have "whitespace" syntax is different from
the set of characters that are displayed by "whitespace"
glyph.  And, we can't use "whitespace" syntax at least for
show-trailing-whitespace.

>     For instance, currently NBSP (U+00A0) has syntax "."
>     (punctuation), and it is displayed with special face to
>     indicated the existing of that character.  Should it be
>     changed to "whitespace" syntax, or shoudn't be changed?

> The special face for that character should not be overridden, but the
> other whitespace after it _and before it_ should probably be displayed
> specially by show-trailing-whitespace.

> You can probably get this result by putting NBSP into the pattern
> for show-trailing-whitespace to recognize.  Redisplay will override
> the face, for the NBSP.

What do you mean by "pattern" here?  Regular expression?
Currently the function highlight_trailing_whitespace doesn't
use regular expression but checks TAB and SPACE directly
(i.e. hardcoded).

By the way, I've just found that currently the special face
for NBSP is overriden by show-trailing-whitespace.  That is
because highlight_trailing_whitespace is called at the near
end of display_line.

Anyway, Unicode has lots more space-like characters
(e.g. U+2000..U+200B).  Should them be treated by the same
way as NBSP (i.e. displayed with nobreak-face)?  Or as
SPACE?

How about the case of fixup-whitespace?  It seems that this
function should delete only TAB and SPACE.  So, here we have
the third meaning of "whitespace"; just TAB and SPACE.

How about the case of delete-trailing-whitespace?

How about the case of ...

Do you think we should define the semantics of "whitespace"
and "space character" in all cases clearly before the
release, and should modify codes if necessary?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: whitespace includes U+3000
  2006-06-29  2:01         ` Kenichi Handa
@ 2006-06-29 17:57           ` Richard Stallman
  2006-07-05 17:33             ` Kevin Rodgers
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Stallman @ 2006-06-29 17:57 UTC (permalink / raw)
  Cc: jidanni, handa, emacs-devel

    Then we have different meanings in "whitespace"; the set of
    characters that have "whitespace" syntax is different from
    the set of characters that are displayed by "whitespace"
    glyph.

That's right.  There is "characters that would print as whitespace"
and there is "characters that would display as whitespace in Emacs."
These are different for good reason; it is not a mistake that they
are different.

Maybe we need to clarify the documentation so that people will
understand that there are two different concepts of whitespace.

In theory we might want to use two different words for these concepts.
But that seems strained and difficult.  They really are two applications
of of the standard concept of "whitespace".  We might want to speak of
"screen whitespace" and "text whitespace".

	    And, we can't use "whitespace" syntax at least for
    show-trailing-whitespace.

Yes, that is true.

    > You can probably get this result by putting NBSP into the pattern
    > for show-trailing-whitespace to recognize.  Redisplay will override
    > the face, for the NBSP.

    What do you mean by "pattern" here?  Regular expression?

Yes, I assumed it used one.

However, on second thought, I've concluded that
show-trailing-whitespace doesn't need to know about NBSP at all.
Since NBSP is now indicated on the screen by a color, it is no longer
likely to go unnoticed.  So there is no problem with NBSP and
show-trailing-whitespace.

show-trailing-whitespace ought to know about all characters that will
be indistinguishable on the screen from "end of the line".

    By the way, I've just found that currently the special face
    for NBSP is overriden by show-trailing-whitespace.

Do you mean, show-trailing-whitespace would override the special face
for NBSP _if_ you modify it to recognize NBSP along with SPC and TAB?
That means my expectation was mistaken; I stand corrected.

But since show-trailing-whitespace does not need to recognize NBSP,
this isn't a _problem_.

    Anyway, Unicode has lots more space-like characters
    (e.g. U+2000..U+200B).  Should them be treated by the same
    way as NBSP (i.e. displayed with nobreak-face)?  Or as
    SPACE?

It depends how they are used.  How does Emacs display them?

    How about the case of fixup-whitespace?  It seems that this
    function should delete only TAB and SPACE.  So, here we have
    the third meaning of "whitespace"; just TAB and SPACE.

It is an interesting question what fixup-whitespace should do with
NBSP.  I am not sure; it depends on how NBSP is used.

When the existing space is just one NBSP, fixup-whitespace should not
change it.

Do people use multiple NBSP to force more space between two words?
If so, maybe fixup-whitespace should leave that untouched.  Or maybe
fixup-whitespace should convert a run of NBSP to a single NBSP.

When there is a series of whitespace including NBSP and SPC (or TAB),
the runs of ordinary whitespace should be compacted to a single SPC,
and the runs of NBSP should be treated as above.


Similar reasoning needs to be applied to other kinds of whitespace, to
figure out what behavior users will really find useful and helpful in
fixup-whitespace.

    How about the case of delete-trailing-whitespace?

That is meant to get rid of junk.  It should probably delete
NBSP just like SPC and TAB, since that is useless at the end of a line.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: whitespace includes U+3000
  2006-06-29 17:57           ` Richard Stallman
@ 2006-07-05 17:33             ` Kevin Rodgers
  2006-07-07  4:13               ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Kevin Rodgers @ 2006-07-05 17:33 UTC (permalink / raw)


Richard Stallman wrote:
 >     > You can probably get this result by putting NBSP into the pattern
 >     > for show-trailing-whitespace to recognize.  Redisplay will override
 >     > the face, for the NBSP.
 >
 >     What do you mean by "pattern" here?  Regular expression?
 >
 > Yes, I assumed it used one.
 >
 > However, on second thought, I've concluded that
 > show-trailing-whitespace doesn't need to know about NBSP at all.
 > Since NBSP is now indicated on the screen by a color, it is no longer
 > likely to go unnoticed.  So there is no problem with NBSP and
 > show-trailing-whitespace.

That is true by default, but not if the user has set
nobreak-char-display to nil.  I think show-trailing-whitespace should
DTRT even if the user has made such a customization and ensure that the
trailing whitespace is indicated.

 > show-trailing-whitespace ought to know about all characters that will
 > be indistinguishable on the screen from "end of the line".

Agreed! (for non-default values of display options like
nobreak-char-display as well)

 >     By the way, I've just found that currently the special face
 >     for NBSP is overriden by show-trailing-whitespace.
 >
 > Do you mean, show-trailing-whitespace would override the special face
 > for NBSP _if_ you modify it to recognize NBSP along with SPC and TAB?
 > That means my expectation was mistaken; I stand corrected.

I think that is good: it means that show-trailing-whitespace will
indicate NBSP regardless of nobreak-char-display.

 > But since show-trailing-whitespace does not need to recognize NBSP,
 > this isn't a _problem_.

I don't think so.  (To reiterate: show-trailing-whitespace does need to
recognize NBSP in case nobreak-char-display is nil).
...

 >     How about the case of delete-trailing-whitespace?
 >
 > That is meant to get rid of junk.  It should probably delete
 > NBSP just like SPC and TAB, since that is useless at the end of a line.

It would be surprising if delete-trailing-whitespace deleted anything
(e.g. NBSP) that was not displayed specially by show-trailing-whitespace.

-- 
Kevin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: whitespace includes U+3000
  2006-07-05 17:33             ` Kevin Rodgers
@ 2006-07-07  4:13               ` Richard Stallman
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Stallman @ 2006-07-07  4:13 UTC (permalink / raw)
  Cc: emacs-devel

     > However, on second thought, I've concluded that
     > show-trailing-whitespace doesn't need to know about NBSP at all.
     > Since NBSP is now indicated on the screen by a color, it is no longer
     > likely to go unnoticed.  So there is no problem with NBSP and
     > show-trailing-whitespace.

    That is true by default, but not if the user has set
    nobreak-char-display to nil.  I think show-trailing-whitespace should
    DTRT even if the user has made such a customization and ensure that the
    trailing whitespace is indicated.

I won't object, if someone wants to do it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-07-07  4:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-25  2:11 whitespace includes U+3000 Dan Jacobson
2006-06-26  1:00 ` Kenichi Handa
2006-06-27 10:34   ` Richard Stallman
2006-06-27 11:48     ` Kenichi Handa
2006-06-28 17:25       ` Richard Stallman
2006-06-29  2:01         ` Kenichi Handa
2006-06-29 17:57           ` Richard Stallman
2006-07-05 17:33             ` Kevin Rodgers
2006-07-07  4:13               ` Richard Stallman

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.