Line wrap reconsidered

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Line wrap reconsidered
@ 2020-05-25 18:13 Yuan Fu
  2020-05-25 19:23 ` Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 88+ messages in thread
From: Yuan Fu @ 2020-05-25 18:13 UTC (permalink / raw)
  To: emacs-devel

I’ve implemented and used a lisp-based line-wrapping feature for a while and it’s still sub-optimal for me. I now want to try to explore if I can add it directly to redisplay. 

Here is what I come up with: in redisplay code, instead of only checking for whitespace, check for a ‘no-wrap text-property, if the character has this property, don’t wrap before[1] this character (or maybe it can be the opposite, only wrap when the character has a ‘can-wrap property). And this text property is calculated and applied once. 

Could this be plausible? Is checking text property is fast enough for redisplay?

[1] There are some complications to this, some characters can’t have line break before them, some can’t have after; maybe  use ‘before, ‘after and nil instead of binary value.

Yuan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 18:13 Line wrap reconsidered Yuan Fu
@ 2020-05-25 19:23 ` Eli Zaretskii
  2020-05-25 19:31   ` Yuan Fu
  2020-05-26  1:55   ` Ihor Radchenko
  2020-05-25 19:31 ` Stefan Monnier
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-25 19:23 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 25 May 2020 14:13:04 -0400
> 
> Here is what I come up with: in redisplay code, instead of only checking for whitespace, check for a ‘no-wrap text-property, if the character has this property, don’t wrap before[1] this character (or maybe it can be the opposite, only wrap when the character has a ‘can-wrap property). And this text property is calculated and applied once. 

What are the use cases for such a feature?

> Is checking text property is fast enough for redisplay?

We do this all the time in the display code, so speed shouldn't be a
problem.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 19:23 ` Eli Zaretskii
@ 2020-05-25 19:31   ` Yuan Fu
  2020-05-26  1:55   ` Ihor Radchenko
  1 sibling, 0 replies; 88+ messages in thread
From: Yuan Fu @ 2020-05-25 19:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel



> On May 25, 2020, at 3:23 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 25 May 2020 14:13:04 -0400
>> 
>> Here is what I come up with: in redisplay code, instead of only checking for whitespace, check for a ‘no-wrap text-property, if the character has this property, don’t wrap before[1] this character (or maybe it can be the opposite, only wrap when the character has a ‘can-wrap property). And this text property is calculated and applied once. 
> 
> What are the use cases for such a feature?

I’m still trying to solve this problem: https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00095.html

Basically, allow Emacs to wrap at more places (between CJK char) with more complex rules (not before a period or comma, etc).

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 19:23 ` Eli Zaretskii
  2020-05-25 19:31   ` Yuan Fu
@ 2020-05-26  1:55   ` Ihor Radchenko
  2020-05-26 12:55     ` Joost Kremers
  2020-05-26 14:47     ` Eli Zaretskii
  1 sibling, 2 replies; 88+ messages in thread
From: Ihor Radchenko @ 2020-05-26  1:55 UTC (permalink / raw)
  To: Eli Zaretskii, Yuan Fu; +Cc: emacs-devel

>> Here is what I come up with: in redisplay code, instead of only
>> checking for whitespace, check for a ‘no-wrap text-property, if the
>> character has this property, don’t wrap before[1] this character (or
>> maybe it can be the opposite, only wrap when the character has a
>> ‘can-wrap property). And this text property is calculated and applied
>> once.  
>
> What are the use cases for such a feature?

There is another possible use case for this. Consider an org document
containing normal text and very wide table.

-----------------------------------------------------------------------
-----------------------------------------------------------------------
Nunc porta vulputate tellus.  Nunc eleifend leo vitae magna.  Nunc rutrum turpis sed pede.  In id erat non orci commodo lobortis.  Aliquam posuere.  Aliquam posuere.  Donec vitae dolor.  Vestibulum convallis, lorem a tempus semper, dui dui euismod elit.

| Nam euismod tellus id erat. | Donec neque quam, dignissim in. | Phasellus neque orci,                              | Nullam rutrum.                        | Nulla posuere.               | Nulla posuere.                | Nunc aliquet, augue nec. | Sed diam.                      |
| Nulla facilisis, risus a    | Nunc aliquet, augue nec         | Nulla facilisis, risus a rhoncus fermentum, tellus | In id erat non orci commodo lobortis. | Nunc rutrum turpis sed pede. | Cras placerat accumsan nulla. | Nullam rutrum.           | Donec hendrerit tempor tellus. |
-----------------------------------------------------------------------
-----------------------------------------------------------------------

With line wrapping:

-----------------------------------------------------------------------
-----------------------------------------------------------------------
Nunc porta vulputate tellus.  Nunc eleifend leo vitae magna.  Nunc
rutrum turpis sed pede.  In id erat non orci commodo lobortis.  Aliquam
posuere.  Aliquam posuere.  Donec vitae dolor.  Vestibulum convallis,
lorem a tempus semper, dui dui euismod elit. 

| Nam euismod tellus id erat. | Donec neque quam, dignissim in. |
| Phasellus neque orci,                              | Nullam rutrum.
| | Nulla posuere.               | Nulla posuere.                | Nunc
| aliquet, augue nec. | Sed diam.                      | 
| Nulla facilisis, risus a    | Nunc aliquet, augue nec         | Nulla
| facilisis, risus a rhoncus fermentum, tellus | In id erat non orci
| commodo lobortis. | Nunc rutrum turpis sed pede. | Cras placerat
| accumsan nulla. | Nullam rutrum.           | Donec hendrerit tempor
| tellus. | 
-----------------------------------------------------------------------
-----------------------------------------------------------------------

The table becomes completely unreadable with line wrapping. It would
make sense to have an option not to wrap the table even when lines are
not truncated.

If the display backend supports such things, font-lock may take care
about putting the needed ‘no-wrap or 'wrap text properties.

Best,
Ihor

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 25 May 2020 14:13:04 -0400
>> 
>> Here is what I come up with: in redisplay code, instead of only checking for whitespace, check for a ‘no-wrap text-property, if the character has this property, don’t wrap before[1] this character (or maybe it can be the opposite, only wrap when the character has a ‘can-wrap property). And this text property is calculated and applied once. 
>
> What are the use cases for such a feature?
>
>> Is checking text property is fast enough for redisplay?
>
> We do this all the time in the display code, so speed shouldn't be a
> problem.
>

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26  1:55   ` Ihor Radchenko
@ 2020-05-26 12:55     ` Joost Kremers
  2020-05-26 13:35       ` Yuan Fu
  2020-05-26 14:47     ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Joost Kremers @ 2020-05-26 12:55 UTC (permalink / raw)
  To: emacs-devel

On Tue, May 26 2020, Ihor Radchenko wrote:
>> What are the use cases for such a feature?
>
> There is another possible use case for this. Consider an org 
> document
> containing normal text and very wide table.

Do you mean that this would make it possible for 
`visual-line-mode` to wrap lines at `fill-column` rather than at 
the window edge (excluding Org tables from being wrapped, as you 
mention)? That would be awesome. (It would obsolete my package 
`visual-fill-column-mode`, which I think would be a good thing.)

-- 
Joost Kremers
Life has its moments

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 12:55     ` Joost Kremers
@ 2020-05-26 13:35       ` Yuan Fu
  0 siblings, 0 replies; 88+ messages in thread
From: Yuan Fu @ 2020-05-26 13:35 UTC (permalink / raw)
  To: Joost Kremers; +Cc: emacs-devel

> 
> Do you mean that this would make it possible for `visual-line-mode` to wrap lines at `fill-column` rather than at the window edge (excluding Org tables from being wrapped, as you mention)? That would be awesome. (It would obsolete my package `visual-fill-column-mode`, which I think would be a good thing.)

I don’t think that’s likely, unfortunately.

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26  1:55   ` Ihor Radchenko
  2020-05-26 12:55     ` Joost Kremers
@ 2020-05-26 14:47     ` Eli Zaretskii
  2020-05-26 15:01       ` Ihor Radchenko
  2020-05-26 15:59       ` Stefan Monnier
  1 sibling, 2 replies; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 14:47 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, emacs-devel

> From: Ihor Radchenko <yantar92@gmail.com>
> Cc: emacs-devel@gnu.org
> Date: Tue, 26 May 2020 09:55:46 +0800
> 
> The table becomes completely unreadable with line wrapping. It would
> make sense to have an option not to wrap the table even when lines are
> not truncated.

I don't understand what you mean by "not wrap the table".  If
truncate-lines is nil, then long lines _will_ be wrapped.  The Emacs
display simply don't have a third possibility: either long lines are
truncated or they are wrapped.  Preventing wrapping can cause it to
wrap the line at some other place, but wrap it will.

So I don't think I understand the request, and therefore don't see how
tables could be another relevant use case.  Can you elaborate?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 14:47     ` Eli Zaretskii
@ 2020-05-26 15:01       ` Ihor Radchenko
  2020-05-26 15:29         ` Eli Zaretskii
  2020-05-26 15:59       ` Stefan Monnier
  1 sibling, 1 reply; 88+ messages in thread
From: Ihor Radchenko @ 2020-05-26 15:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel

> I don't understand what you mean by "not wrap the table".  If
> truncate-lines is nil, then long lines _will_ be wrapped.  The Emacs
> display simply don't have a third possibility: either long lines are
> truncated or they are wrapped.  Preventing wrapping can cause it to
> wrap the line at some other place, but wrap it will.
>
> So I don't think I understand the request, and therefore don't see how
> tables could be another relevant use case.  Can you elaborate?

I imagine 'cannot_wrap text property. Emacs would never wrap text at the
character with 'cannot_wrap property set to non-nil.

All the text in table can have this 'cannot_wrap property set to t.
Then, all the lines in the table will never be wrapped, even when
truncate-lines is set to nil.

Best,
Ihor




Eli Zaretskii <eliz@gnu.org> writes:

>> From: Ihor Radchenko <yantar92@gmail.com>
>> Cc: emacs-devel@gnu.org
>> Date: Tue, 26 May 2020 09:55:46 +0800
>> 
>> The table becomes completely unreadable with line wrapping. It would
>> make sense to have an option not to wrap the table even when lines are
>> not truncated.
>
> I don't understand what you mean by "not wrap the table".  If
> truncate-lines is nil, then long lines _will_ be wrapped.  The Emacs
> display simply don't have a third possibility: either long lines are
> truncated or they are wrapped.  Preventing wrapping can cause it to
> wrap the line at some other place, but wrap it will.
>
> So I don't think I understand the request, and therefore don't see how
> tables could be another relevant use case.  Can you elaborate?

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 15:01       ` Ihor Radchenko
@ 2020-05-26 15:29         ` Eli Zaretskii
  2020-05-26 15:46           ` Ihor Radchenko
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 15:29 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, emacs-devel

> From: Ihor Radchenko <yantar92@gmail.com>
> Cc: casouri@gmail.com, emacs-devel@gnu.org
> Date: Tue, 26 May 2020 23:01:33 +0800
> 
> I imagine 'cannot_wrap text property. Emacs would never wrap text at the
> character with 'cannot_wrap property set to non-nil.
> 
> All the text in table can have this 'cannot_wrap property set to t.
> Then, all the lines in the table will never be wrapped, even when
> truncate-lines is set to nil.

I tried to explain why this is impossible: either those lines will be
truncated or they will be wrapped (at some character).  There's no
other possibility.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 15:29         ` Eli Zaretskii
@ 2020-05-26 15:46           ` Ihor Radchenko
  2020-05-26 16:29             ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Ihor Radchenko @ 2020-05-26 15:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel

> I tried to explain why this is impossible: either those lines will be
> truncated or they will be wrapped (at some character).  There's no
> other possibility.

I understand that it is impossible on master. However, without knowing
much about the Emacs internals, I wanted to mention possible useful
feature to be considered in the proposed patch. Since we discuss
changing display code here, I thought that it would be great if the
change also introduced selective wrapping/truncation. Now, reading your
comment, I assume that selective wrapping/truncation is very hard to
implement. 

Best,
Ihor

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Ihor Radchenko <yantar92@gmail.com>
>> Cc: casouri@gmail.com, emacs-devel@gnu.org
>> Date: Tue, 26 May 2020 23:01:33 +0800
>> 
>> I imagine 'cannot_wrap text property. Emacs would never wrap text at the
>> character with 'cannot_wrap property set to non-nil.
>> 
>> All the text in table can have this 'cannot_wrap property set to t.
>> Then, all the lines in the table will never be wrapped, even when
>> truncate-lines is set to nil.
>
> I tried to explain why this is impossible: either those lines will be
> truncated or they will be wrapped (at some character).  There's no
> other possibility.

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 15:46           ` Ihor Radchenko
@ 2020-05-26 16:29             ` Eli Zaretskii
  0 siblings, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 16:29 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, emacs-devel

> From: Ihor Radchenko <yantar92@gmail.com>
> Cc: casouri@gmail.com, emacs-devel@gnu.org
> Date: Tue, 26 May 2020 23:46:12 +0800
> 
> > I tried to explain why this is impossible: either those lines will be
> > truncated or they will be wrapped (at some character).  There's no
> > other possibility.
> 
> I understand that it is impossible on master. However, without knowing
> much about the Emacs internals, I wanted to mention possible useful
> feature to be considered in the proposed patch. Since we discuss
> changing display code here, I thought that it would be great if the
> change also introduced selective wrapping/truncation. Now, reading your
> comment, I assume that selective wrapping/truncation is very hard to
> implement. 

We are miscommunicating.  The "impossible" part in what I wrote
doesn't mean the current code cannot do that, it means I think it's
impossible in principle.

Emacs can truncate a long line, which means the characters beyond the
window-edge will not be displayed.  Or it can display those characters
on the next screen line (a.k.a. "wrap" the line).  These are the only
two possibilities that I can envision _in_principle_.  But you seem to
be talking about some third possibility, and I simply don't understand
what that third possibility could possibly be.  What would you like
Emacs to do with characters that are beyond the window's edge in this
third possibility?

I'm confused.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 14:47     ` Eli Zaretskii
  2020-05-26 15:01       ` Ihor Radchenko
@ 2020-05-26 15:59       ` Stefan Monnier
  2020-05-26 16:31         ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Stefan Monnier @ 2020-05-26 15:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, Ihor Radchenko, emacs-devel

> I don't understand what you mean by "not wrap the table".

He means to truncate the lines in the table but wrap them in the rest.
It would be a nice functionality to add (presumably controlled via some
overlay/text property).  Of course, that then means supporting
horizontal scrolling in windows that wrap some lines, but we could
probably support without too much extra work for the special case of
horizontally-scrolling only the current line (which we already support in
truncated buffers).

        Stefan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 15:59       ` Stefan Monnier
@ 2020-05-26 16:31         ` Eli Zaretskii
  2020-05-26 16:43           ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 16:31 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, yantar92, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Ihor Radchenko <yantar92@gmail.com>,  casouri@gmail.com,
>   emacs-devel@gnu.org
> Date: Tue, 26 May 2020 11:59:21 -0400
> 
> > I don't understand what you mean by "not wrap the table".
> 
> He means to truncate the lines in the table but wrap them in the rest.

But then why do we need to mark text with some property?



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 16:31         ` Eli Zaretskii
@ 2020-05-26 16:43           ` Yuan Fu
  2020-05-26 16:43             ` Ihor Radchenko
  2020-05-26 18:57             ` Eli Zaretskii
  0 siblings, 2 replies; 88+ messages in thread
From: Yuan Fu @ 2020-05-26 16:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Stefan Monnier, yantar92



> On May 26, 2020, at 12:31 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: Ihor Radchenko <yantar92@gmail.com>,  casouri@gmail.com,
>>  emacs-devel@gnu.org
>> Date: Tue, 26 May 2020 11:59:21 -0400
>> 
>>> I don't understand what you mean by "not wrap the table".
>> 
>> He means to truncate the lines in the table but wrap them in the rest.
> 
> But then why do we need to mark text with some property?

I think he means that, in such a file:

BODY TEXT…
BODY TEXT…

| LONG TABLE |
| LONG TABLE |

BODY TEXT…
BODY TEXT…

We only truncate lines in LONG TABLE, and wraps BODY TEXT. There is no way to know if some text is table or body without major mode adding text properties to distinguish them.

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 16:43           ` Yuan Fu
@ 2020-05-26 16:43             ` Ihor Radchenko
  2020-05-26 18:57             ` Eli Zaretskii
  1 sibling, 0 replies; 88+ messages in thread
From: Ihor Radchenko @ 2020-05-26 16:43 UTC (permalink / raw)
  To: Yuan Fu, Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

> I think he means that, in such a file:
>
> BODY TEXT…
> BODY TEXT…
>
> | LONG TABLE |
> | LONG TABLE |
>
> BODY TEXT…
> BODY TEXT…
>
> We only truncate lines in LONG TABLE, and wraps BODY TEXT. There is no
> way to know if some text is table or body without major mode adding
> text properties to distinguish them. 

Yes, this is precisely what I mean.
Thanks!

Regards,
Ihor



Yuan Fu <casouri@gmail.com> writes:

>> On May 26, 2020, at 12:31 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> 
>>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>>> Cc: Ihor Radchenko <yantar92@gmail.com>,  casouri@gmail.com,
>>>  emacs-devel@gnu.org
>>> Date: Tue, 26 May 2020 11:59:21 -0400
>>> 
>>>> I don't understand what you mean by "not wrap the table".
>>> 
>>> He means to truncate the lines in the table but wrap them in the rest.
>> 
>> But then why do we need to mark text with some property?
>
> I think he means that, in such a file:
>
> BODY TEXT…
> BODY TEXT…
>
> | LONG TABLE |
> | LONG TABLE |
>
> BODY TEXT…
> BODY TEXT…
>
> We only truncate lines in LONG TABLE, and wraps BODY TEXT. There is no way to know if some text is table or body without major mode adding text properties to distinguish them.
>
> Yuan

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 16:43           ` Yuan Fu
  2020-05-26 16:43             ` Ihor Radchenko
@ 2020-05-26 18:57             ` Eli Zaretskii
  2020-05-26 19:10               ` Yuan Fu
  2020-05-26 19:12               ` Ihor Radchenko
  1 sibling, 2 replies; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 18:57 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, monnier, yantar92

> From: Yuan Fu <casouri@gmail.com>
> Date: Tue, 26 May 2020 12:43:57 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  yantar92@gmail.com,
>  emacs-devel <emacs-devel@gnu.org>
> 
> There is no way to know if some text is table or body without major mode adding text properties to distinguish them.

Of course, there is: the major mode is the one that creates the table,
doesn't it?  So the mode knows where the table is.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 18:57             ` Eli Zaretskii
@ 2020-05-26 19:10               ` Yuan Fu
  2020-05-26 19:59                 ` Eli Zaretskii
  2020-05-26 19:12               ` Ihor Radchenko
  1 sibling, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-26 19:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Stefan Monnier, yantar92



> On May 26, 2020, at 2:57 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Tue, 26 May 2020 12:43:57 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> yantar92@gmail.com,
>> emacs-devel <emacs-devel@gnu.org>
>> 
>> There is no way to know if some text is table or body without major mode adding text properties to distinguish them.
> 
> Of course, there is: the major mode is the one that creates the table,
> doesn't it?  So the mode knows where the table is.

Yes, the major mode knows, but the redisplay code doesn’t. So presumably the major mode tells the redisplay code what to do (truncate or wrap) by adding text properties to the text.

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 19:10               ` Yuan Fu
@ 2020-05-26 19:59                 ` Eli Zaretskii
  0 siblings, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 19:59 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, monnier, yantar92

> From: Yuan Fu <casouri@gmail.com>
> Date: Tue, 26 May 2020 15:10:17 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  yantar92@gmail.com,
>  emacs-devel@gnu.org
> 
> > Of course, there is: the major mode is the one that creates the table,
> > doesn't it?  So the mode knows where the table is.
> 
> Yes, the major mode knows, but the redisplay code doesn’t. So presumably the major mode tells the redisplay code what to do (truncate or wrap) by adding text properties to the text.

The major mode could do this using techniques similar to the old
longlines.el package, for example.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 18:57             ` Eli Zaretskii
  2020-05-26 19:10               ` Yuan Fu
@ 2020-05-26 19:12               ` Ihor Radchenko
  2020-05-26 20:04                 ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Ihor Radchenko @ 2020-05-26 19:12 UTC (permalink / raw)
  To: Eli Zaretskii, Yuan Fu; +Cc: monnier, emacs-devel

>> There is no way to know if some text is table or body without major mode adding text properties to distinguish them.
> Of course, there is: the major mode is the one that creates the table,
> doesn't it?  So the mode knows where the table is.

I think we misunderstand each other here.
I suggest to tell Emacs display backend not to wrap some text if the
text has certain text property.
Then, putting that text property is a way for a major mode to mark
unwrappable text.
The marked text may not necessarily be a table (table is just an
example). Other cases include, for example, extremely long lines, which
would slow down Emacs if wrapped. 


Eli Zaretskii <eliz@gnu.org> writes:

>> From: Yuan Fu <casouri@gmail.com>
>> Date: Tue, 26 May 2020 12:43:57 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>>  yantar92@gmail.com,
>>  emacs-devel <emacs-devel@gnu.org>
>> 
>> There is no way to know if some text is table or body without major mode adding text properties to distinguish them.
>
> Of course, there is: the major mode is the one that creates the table,
> doesn't it?  So the mode knows where the table is.

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 19:12               ` Ihor Radchenko
@ 2020-05-26 20:04                 ` Eli Zaretskii
  2020-05-26 21:01                   ` Stefan Monnier
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 20:04 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, monnier, emacs-devel

> From: Ihor Radchenko <yantar92@gmail.com>
> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Wed, 27 May 2020 03:12:03 +0800
> 
> >> There is no way to know if some text is table or body without major mode adding text properties to distinguish them.
> > Of course, there is: the major mode is the one that creates the table,
> > doesn't it?  So the mode knows where the table is.
> 
> I think we misunderstand each other here.
> I suggest to tell Emacs display backend not to wrap some text if the
> text has certain text property.

And I propose to do the opposite: force the display engine to wrap the
lines that are outside the table, and otherwise turn on
truncate-lines.  I think this will be much easier to implement, and
could even be done entirely in Lisp.

By contrast, "selective line-wrap" of the kind that you envision is
much harder to implement, especially with text properties.  For
starters, where would you put such a text property so that the display
engine sees it with 100% guarantee when it needs to display the
relevant lines?  Keep in mind that redisplay could decide to examine
only some portions of some lines in the window, it doesn't necessarily
examine all the lines shown in the window.

> Other cases include, for example, extremely long lines, which
> would slow down Emacs if wrapped. 

Extremely long lines slow down Emacs even if truncated.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 20:04                 ` Eli Zaretskii
@ 2020-05-26 21:01                   ` Stefan Monnier
  0 siblings, 0 replies; 88+ messages in thread
From: Stefan Monnier @ 2020-05-26 21:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, Ihor Radchenko, emacs-devel

> And I propose to do the opposite: force the display engine to wrap the
> lines that are outside the table, and otherwise turn on
> truncate-lines.  I think this will be much easier to implement, and
> could even be done entirely in Lisp.

Yes, that works fine as well: the only real need is to be able to have
parts that are wrapped and parts that are truncated.  On the Elisp side
in case of Org table it would be more convenient to mark the tables as
"truncate rather than wrap" than to mark everything else as "wrap
instead of truncate", but if there's a significant difference on the
redisplay side, that should/will probably be the determining factor.

        Stefan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 18:13 Line wrap reconsidered Yuan Fu
  2020-05-25 19:23 ` Eli Zaretskii
@ 2020-05-25 19:31 ` Stefan Monnier
  2020-05-25 19:51   ` Yuan Fu
  2020-05-25 20:43 ` Lars Ingebrigtsen
  2020-05-26  8:02 ` martin rudalics
  3 siblings, 1 reply; 88+ messages in thread
From: Stefan Monnier @ 2020-05-25 19:31 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

> I’ve implemented and used a lisp-based line-wrapping feature for a while and
> it’s still sub-optimal for me. I now want to try to explore if I can add it
> directly to redisplay. 
>
> Here is what I come up with: in redisplay code, instead of only checking for
> whitespace, check for a ‘no-wrap text-property, if the character has this
> property, don’t wrap before[1] this character (or maybe it can be the
> opposite, only wrap when the character has a ‘can-wrap property). And this
> text property is calculated and applied once. 

I don't think I can discuss the quality of this proposal without first
understanding the intended use cases (including who/how/when the
property would be added).

> Could this be plausible? Is checking text property is fast enough for redisplay?

I'll let Eli answer this part (I vaguely remember that it could be
somewhat costly, but it likely depends on the details).

> [1] There are some complications to this, some characters can’t have line
> break before them, some can’t have after; maybe  use ‘before, ‘after and nil
> instead of binary value.

Hence the need to clarify the intended use case.


        Stefan




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 19:31 ` Stefan Monnier
@ 2020-05-25 19:51   ` Yuan Fu
  0 siblings, 0 replies; 88+ messages in thread
From: Yuan Fu @ 2020-05-25 19:51 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> On May 25, 2020, at 3:31 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> 
>> I’ve implemented and used a lisp-based line-wrapping feature for a while and
>> it’s still sub-optimal for me. I now want to try to explore if I can add it
>> directly to redisplay. 
>> 
>> Here is what I come up with: in redisplay code, instead of only checking for
>> whitespace, check for a ‘no-wrap text-property, if the character has this
>> property, don’t wrap before[1] this character (or maybe it can be the
>> opposite, only wrap when the character has a ‘can-wrap property). And this
>> text property is calculated and applied once. 
> 
> I don't think I can discuss the quality of this proposal without first
> understanding the intended use cases (including who/how/when the
> property would be added).
> 

I’ve added the intended use case on the previous message. I’m not sure when to add the property yet. Right now I only have a general idea of using text property as a cache, so the more complex wrapping rule does not slow down redisplay. There could be a table specifying a mapping from unicode ranges to “wrappablility” (can/can't wrap before/after) and some one will iterate each character and apply the text property. The good thing is that this “wrappability” property of a character never changes, so we only need to calculate it once.

The redisplay code doesn’t need to change greatly since checking for “wrappability” is similar to checking “whitespace_p”.

Yuan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 18:13 Line wrap reconsidered Yuan Fu
  2020-05-25 19:23 ` Eli Zaretskii
  2020-05-25 19:31 ` Stefan Monnier
@ 2020-05-25 20:43 ` Lars Ingebrigtsen
  2020-05-25 23:26   ` Yuan Fu
  2020-05-26  8:02 ` martin rudalics
  3 siblings, 1 reply; 88+ messages in thread
From: Lars Ingebrigtsen @ 2020-05-25 20:43 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

Yuan Fu <casouri@gmail.com> writes:

> Here is what I come up with: in redisplay code, instead of only
> checking for whitespace, check for a ‘no-wrap text-property, if the
> character has this property, don’t wrap before[1] this character (or
> maybe it can be the opposite, only wrap when the character has a
> ‘can-wrap property). And this text property is calculated and applied
> once.

This is like the kinsoku stuff in shr, I guess?  But in redisplay
instead?  See shr-char-kinsoku-eol-p (and related) for how it's done
there -- it basically checks (aref (char-category-set ,char) ?<) (and
related).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 20:43 ` Lars Ingebrigtsen
@ 2020-05-25 23:26   ` Yuan Fu
  2020-05-25 23:32     ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-25 23:26 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel



> On May 25, 2020, at 4:43 PM, Lars Ingebrigtsen <larsi@gnus.org> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>> Here is what I come up with: in redisplay code, instead of only
>> checking for whitespace, check for a ‘no-wrap text-property, if the
>> character has this property, don’t wrap before[1] this character (or
>> maybe it can be the opposite, only wrap when the character has a
>> ‘can-wrap property). And this text property is calculated and applied
>> once.
> 
> This is like the kinsoku stuff in shr, I guess?  But in redisplay
> instead?  See shr-char-kinsoku-eol-p (and related) for how it's done
> there -- it basically checks (aref (char-category-set ,char) ?<) (and
> related).

Thanks, category table is just the thing I need.

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 23:26   ` Yuan Fu
@ 2020-05-25 23:32     ` Yuan Fu
  2020-05-26  2:15       ` Yuan Fu
  2020-05-26 14:54       ` Eli Zaretskii
  0 siblings, 2 replies; 88+ messages in thread
From: Yuan Fu @ 2020-05-25 23:32 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 496 bytes --]

I had some problem hacking the redisplay code. For some reason, Emacs hangs after M-x toggle-word-wrap. I ran Emacs under lldb (gdb doesn’t work on Mac), and was dropped into different places each time. I attached the backtrace files. What could I did wrong?

Explanation for the code: basically I added a predicate function (char_can_wrap) that checks text property to see if a character is “wrappable”. And combined that predicate with the existing IT_DISPLAYING_WHITESPACE.

Yuan


[-- Attachment #2: wrap.patch --]
[-- Type: application/octet-stream, Size: 4557 bytes --]

diff --git a/src/xdisp.c b/src/xdisp.c
index 01f272033e..5addec900f 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -427,6 +427,37 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* Returns 0 if we can't wrap line at the character, 1 if can only
+   wrap before, 2 if can only wrap after, 3 if can wrap before and
+   after.  */
+static int char_can_wrap(struct it *it)
+{
+  Lisp_Object charpos = make_fixnum (IT_STRING_CHARPOS (*it));
+  Lisp_Object tail = Fget_text_property (charpos, Qcan_wrap, Qnil);
+  for (; CONSP (tail); tail = XCDR (tail))
+    {
+      register Lisp_Object tem = XCAR (tail);
+      if (EQ (Qcan_wrap, tem))
+        {
+          Lisp_Object val = XCDR (tail);
+          if (NILP (val))
+            { return 0; }
+          else if (EQ (Qbefore_only, val))
+            { return 1; }
+          else if (EQ (Qafter_only, val))
+            { return 2; }
+          else if (EQ (Qt, val))
+            { return 3; }
+          else
+            { return 0; }
+        }
+    }
+  return 0;
+}
+
+#define IT_CAN_WRAP_BEFORE(it) (char_can_wrap(it) >= 1)
+#define IT_CAN_WRAP_AFTER(it) (char_can_wrap(it) >= 2)
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9098,7 +9129,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
+              /* If this character displays a whitespace or the text
+                 property says you can wrap after it, the next one can
+                 wrap.  */
+	      if (IT_DISPLAYING_WHITESPACE (it) || IT_CAN_WRAP_AFTER (it))
 		may_wrap = true;
 	      else if (may_wrap)
 		{
@@ -9249,9 +9283,13 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			      bool can_wrap = true;
 
 			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+				 (or a character that allows wrapping
+				 after it) that barely fits on this
+				 screen line, but the next character
+				 is also whitespace (or a character
+				 that forbids wrapping before it), we
+				 cannot wrap here.
+                              */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9263,7 +9301,8 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && (IT_DISPLAYING_WHITESPACE (it)
+                                          || !IT_CAN_WRAP_BEFORE (it)))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9349,12 +9388,14 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		     line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
+			 may_wrap flag is also whitespace (or is a
+			 character that forbids wrapper before it), we
+			 can't wrap here, since the screen line cannot
+			 be wrapped in the middle of whitespace.
 			 Therefore, wrap_it _is_ relevant in that
 			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+		      && !(moved_forward && (IT_DISPLAYING_WHITESPACE (it)
+                                             || !IT_CAN_WRAP_BEFORE (it))))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23180,7 +23221,7 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
+	      if (IT_DISPLAYING_WHITESPACE (it) || IT_CAN_WRAP_AFTER (it))
 		may_wrap = true;
 	      else if (may_wrap)
 		{
@@ -34231,6 +34272,9 @@ syms_of_xdisp (void)
   DEFSYM (QCfile, ":file");
   DEFSYM (Qfontified, "fontified");
   DEFSYM (Qfontification_functions, "fontification-functions");
+  DEFSYM (Qcan_wrap, "can-wrap");
+  DEFSYM (Qbefore_only, "before-only");
+  DEFSYM (Qafter_only, "after-only");
 
   /* Name of the symbol which disables Lisp evaluation in 'display'
      properties.  This is used by enriched.el.  */

[-- Attachment #3: Type: text/plain, Size: 1 bytes --]



[-- Attachment #4: bt.3 --]
[-- Type: application/octet-stream, Size: 4614 bytes --]

Target 0: (emacs) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001001f699b emacs`XSYMBOL(a=(i = 0x0000000000007c80)) at lisp.h:1015:3
    frame #1: 0x00000001001f68ff emacs`make_lisp_symbol(sym=0x0000000100a13280) at lisp.h:1035:3
    frame #2: 0x00000001001e3d58 emacs`builtin_lisp_symbol(index=664) at lisp.h:1042:10
    frame #3: 0x00000001001e51c0 emacs`get_keymap(object=(i = 0x0000000107578993), error_if_not_keymap=false, autoload=true) at keymap.c:231:25
    frame #4: 0x00000001001e61c0 emacs`access_keymap_1(map=(i = 0x00000001070bae2b), idx=(i = 0x0000000000000062), t_ok=false, noinherit=false, autoload=true) at keymap.c:426:23
    frame #5: 0x00000001001e64a0 emacs`access_keymap_1(map=(i = 0x0000000106a95993), idx=(i = 0x0000000000000062), t_ok=false, noinherit=false, autoload=true) at keymap.c:456:12
    frame #6: 0x00000001001e5b96 emacs`access_keymap(map=(i = 0x0000000106a95993), idx=(i = 0x0000000000000062), t_ok=false, noinherit=false, autoload=true) at keymap.c:533:21
    frame #7: 0x00000001001eca62 emacs`Flookup_key(keymap=(i = 0x0000000106a95993), key=(i = 0x000000010711e815), accept_default=(i = 0x0000000000000000)) at keymap.c:1271:13
    frame #8: 0x00000001001f145b emacs`shadow_lookup(keymap=(i = 0x0000000106a95993), key=(i = 0x000000010711e815), accept_default=(i = 0x0000000000000000), remap=false) at keymap.c:2371:23
    frame #9: 0x00000001001f0bab emacs`Fwhere_is_internal(definition=(i = 0x0000000006711240), keymap=(i = 0x0000000000000000), firstonly=(i = 0x0000000000000030), noindirect=(i = 0x0000000000000000), no_remap=(i = 0x0000000000000000)) at keymap.c:2584:11
    frame #10: 0x00000001001e10f1 emacs`parse_tool_bar_item(key=(i = 0x0000000006711240), item=(i = 0x0000000000000000)) at keyboard.c:8721:22
    frame #11: 0x00000001001ce963 emacs`process_tool_bar_item(key=(i = 0x0000000006711240), def=(i = 0x0000000105814ae3), data=(i = 0x0000000000000000), args=0x0000000000000000) at keyboard.c:8455:12
    frame #12: 0x00000001001f726c emacs`map_keymap_item(fun=(emacs`process_tool_bar_item at keyboard.c:8433), args=(i = 0x0000000000000000), key=(i = 0x0000000006711240), val=(i = 0x0000000105814ae3), data=0x0000000000000000) at keymap.c:542:3
    frame #13: 0x00000001001e710a emacs`map_keymap_internal(map=(i = 0x0000000106018443), fun=(emacs`process_tool_bar_item at keyboard.c:8433), args=(i = 0x0000000000000000), data=0x0000000000000000) at keymap.c:589:2
    frame #14: 0x00000001001e6d91 emacs`map_keymap(map=(i = 0x0000000106018443), fun=(emacs`process_tool_bar_item at keyboard.c:8433), args=(i = 0x0000000000000000), data=0x0000000000000000, autoload=true) at keymap.c:634:8
    frame #15: 0x00000001001ce76d emacs`tool_bar_items(reuse=(i = 0x00000001069eae05), nitems=0x00007ffeefbfad2c) at keyboard.c:8419:4
    frame #16: 0x00000001000ba3fe emacs`update_tool_bar(f=0x0000000106035030, save_match_data=false) at xdisp.c:13691:8
    frame #17: 0x00000001000b5128 emacs`prepare_menu_bars at xdisp.c:12526:4
    frame #18: 0x0000000100064c6d emacs`redisplay_internal at xdisp.c:15336:5
    frame #19: 0x000000010006b979 emacs`redisplay at xdisp.c:14921:3
    frame #20: 0x00000001001bd874 emacs`read_char(commandflag=1, map=(i = 0x0000000106a96383), prev_event=(i = 0x0000000000000000), used_mouse_menu=0x00007ffeefbfe84f, end_time=0x0000000000000000) at keyboard.c:2493:6
    frame #21: 0x00000001001b7a5f emacs`read_key_sequence(keybuf=0x00007ffeefbfee70, prompt=(i = 0x0000000000000000), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9547:12
    frame #22: 0x00000001001b5e4a emacs`command_loop_1 at keyboard.c:1350:15
    frame #23: 0x00000001002eeb1f emacs`internal_condition_case(bfun=(emacs`command_loop_1 at keyboard.c:1236), handlers=(i = 0x0000000000000090), hfun=(emacs`cmd_error at keyboard.c:919)) at eval.c:1355:25
    frame #24: 0x00000001001d8191 emacs`command_loop_2(ignore=(i = 0x0000000000000000)) at keyboard.c:1091:11
    frame #25: 0x00000001002edf0a emacs`internal_catch(tag=(i = 0x000000000000c5a0), func=(emacs`command_loop_2 at keyboard.c:1087), arg=(i = 0x0000000000000000)) at eval.c:1116:25
    frame #26: 0x00000001001b48b7 emacs`command_loop at keyboard.c:1070:2
    frame #27: 0x00000001001b4697 emacs`recursive_edit_1 at keyboard.c:714:9
    frame #28: 0x00000001001b4b2b emacs`Frecursive_edit at keyboard.c:786:3
    frame #29: 0x00000001001b1707 emacs`main(argc=2, argv=0x00007ffeefbff520) at emacs.c:2035:3
    frame #30: 0x00007fff6f6becc9 libdyld.dylib`start + 1

[-- Attachment #5: Type: text/plain, Size: 1 bytes --]



[-- Attachment #6: bt.2 --]
[-- Type: application/octet-stream, Size: 3239 bytes --]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001003d0bdd emacs`validate_interval_range(object=(i = 0x000000010803d3f5), begin=0x00007ffeefbf6860, end=0x00007ffeefbf6860, force=false) at textprop.c:153:7
    frame #1: 0x00000001000a1418 emacs`compute_stop_pos(it=0x00007ffeefbf6bb0) at xdisp.c:3869:8
    frame #2: 0x00000001000a016a emacs`handle_stop(it=0x00007ffeefbf6bb0) at xdisp.c:3779:5
    frame #3: 0x000000010005b979 emacs`reseat(it=0x00007ffeefbf6bb0, pos=(charpos = 1, bytepos = 1), force_p=true) at xdisp.c:6955:4
    frame #4: 0x000000010005adb9 emacs`init_iterator(it=0x00007ffeefbf6bb0, w=0x000000010806ee30, charpos=1, bytepos=1, row=0x000000010587aa00, base_face_id=DEFAULT_FACE_ID) at xdisp.c:3311:7
    frame #5: 0x00000001000425ff emacs`start_display(it=0x00007ffeefbf6bb0, w=0x000000010806ee30, pos=(charpos = 1, bytepos = 1)) at xdisp.c:3327:3
    frame #6: 0x000000010006d4b0 emacs`try_window(window=(i = 0x000000010806ee35), pos=(charpos = 1, bytepos = 1), flags=1) at xdisp.c:19107:3
    frame #7: 0x00000001000bd7f8 emacs`redisplay_window(window=(i = 0x000000010806ee35), just_this_one_p=false) at xdisp.c:18531:8
    frame #8: 0x00000001000bb51d emacs`redisplay_window_0(window=(i = 0x000000010806ee35)) at xdisp.c:16245:5
    frame #9: 0x00000001002eec1a emacs`internal_condition_case_1(bfun=(emacs`redisplay_window_0 at xdisp.c:16243), arg=(i = 0x000000010806ee35), handlers=(i = 0x0000000108e0bf13), hfun=(emacs`redisplay_window_error at xdisp.c:16236)) at eval.c:1379:25
    frame #10: 0x00000001000b9263 emacs`redisplay_windows(window=(i = 0x000000010806ee35)) at xdisp.c:16225:4
    frame #11: 0x0000000100066412 emacs`redisplay_internal at xdisp.c:15693:5
    frame #12: 0x000000010006b979 emacs`redisplay at xdisp.c:14921:3
    frame #13: 0x00000001001bd874 emacs`read_char(commandflag=1, map=(i = 0x00000001080d0c73), prev_event=(i = 0x0000000000000000), used_mouse_menu=0x00007ffeefbfe84f, end_time=0x0000000000000000) at keyboard.c:2493:6
    frame #14: 0x00000001001b7a5f emacs`read_key_sequence(keybuf=0x00007ffeefbfee70, prompt=(i = 0x0000000000000000), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9547:12
    frame #15: 0x00000001001b5e4a emacs`command_loop_1 at keyboard.c:1350:15
    frame #16: 0x00000001002eeb1f emacs`internal_condition_case(bfun=(emacs`command_loop_1 at keyboard.c:1236), handlers=(i = 0x0000000000000090), hfun=(emacs`cmd_error at keyboard.c:919)) at eval.c:1355:25
    frame #17: 0x00000001001d8191 emacs`command_loop_2(ignore=(i = 0x0000000000000000)) at keyboard.c:1091:11
    frame #18: 0x00000001002edf0a emacs`internal_catch(tag=(i = 0x000000000000c5a0), func=(emacs`command_loop_2 at keyboard.c:1087), arg=(i = 0x0000000000000000)) at eval.c:1116:25
    frame #19: 0x00000001001b48b7 emacs`command_loop at keyboard.c:1070:2
    frame #20: 0x00000001001b4697 emacs`recursive_edit_1 at keyboard.c:714:9
    frame #21: 0x00000001001b4b2b emacs`Frecursive_edit at keyboard.c:786:3
    frame #22: 0x00000001001b1707 emacs`main(argc=2, argv=0x00007ffeefbff520) at emacs.c:2035:3
    frame #23: 0x00007fff6f6becc9 libdyld.dylib`start + 1

[-- Attachment #7: Type: text/plain, Size: 1 bytes --]



[-- Attachment #8: bt.1 --]
[-- Type: application/octet-stream, Size: 2593 bytes --]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001000d43be emacs`find_row_edges(it=0x00007ffeefbf9870, row=0x0000000107860c00, min_pos=1, min_bpos=1, max_pos=25, max_bpos=25) at xdisp.c:22572:1
    frame #1: 0x0000000100071e07 emacs`display_line(it=0x00007ffeefbf9870, cursor_vpos=0) at xdisp.c:23788:7
    frame #2: 0x000000010006d4f3 emacs`try_window(window=(i = 0x0000000106068625), pos=(charpos = 1, bytepos = 1), flags=0) at xdisp.c:19113:11
    frame #3: 0x00000001000ae419 emacs`display_echo_area_1(a1=4396058144, a2=(i = 0x0000000000000000)) at xdisp.c:11504:3
    frame #4: 0x00000001000641a2 emacs`with_echo_area_buffer(w=0x0000000106068620, which=1, fn=(emacs`display_echo_area_1 at xdisp.c:11482), a1=4396058144, a2=(i = 0x0000000000000000)) at xdisp.c:11266:8
    frame #5: 0x00000001000adfd2 emacs`display_echo_area(w=0x0000000106068620) at xdisp.c:11462:7
    frame #6: 0x000000010006283b emacs`echo_area_display(update_frame_p=false) at xdisp.c:11975:33
    frame #7: 0x0000000100064e96 emacs`redisplay_internal at xdisp.c:15378:7
    frame #8: 0x000000010006b979 emacs`redisplay at xdisp.c:14921:3
    frame #9: 0x00000001001bd874 emacs`read_char(commandflag=1, map=(i = 0x00000001090c9a03), prev_event=(i = 0x0000000000000000), used_mouse_menu=0x00007ffeefbfe84f, end_time=0x0000000000000000) at keyboard.c:2493:6
    frame #10: 0x00000001001b7a5f emacs`read_key_sequence(keybuf=0x00007ffeefbfee70, prompt=(i = 0x0000000000000000), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9547:12
    frame #11: 0x00000001001b5e4a emacs`command_loop_1 at keyboard.c:1350:15
    frame #12: 0x00000001002eeb1f emacs`internal_condition_case(bfun=(emacs`command_loop_1 at keyboard.c:1236), handlers=(i = 0x0000000000000090), hfun=(emacs`cmd_error at keyboard.c:919)) at eval.c:1355:25
    frame #13: 0x00000001001d8191 emacs`command_loop_2(ignore=(i = 0x0000000000000000)) at keyboard.c:1091:11
    frame #14: 0x00000001002edf0a emacs`internal_catch(tag=(i = 0x000000000000c5a0), func=(emacs`command_loop_2 at keyboard.c:1087), arg=(i = 0x0000000000000000)) at eval.c:1116:25
    frame #15: 0x00000001001b48b7 emacs`command_loop at keyboard.c:1070:2
    frame #16: 0x00000001001b4697 emacs`recursive_edit_1 at keyboard.c:714:9
    frame #17: 0x00000001001b4b2b emacs`Frecursive_edit at keyboard.c:786:3
    frame #18: 0x00000001001b1707 emacs`main(argc=2, argv=0x00007ffeefbff520) at emacs.c:2035:3
    frame #19: 0x00007fff6f6becc9 libdyld.dylib`start + 1

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 23:32     ` Yuan Fu
@ 2020-05-26  2:15       ` Yuan Fu
  2020-05-26  3:30         ` Yuan Fu
  2020-05-26 14:54       ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-26  2:15 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Eli Zaretskii; +Cc: emacs-devel



> On May 25, 2020, at 7:32 PM, Yuan Fu <casouri@gmail.com> wrote:
> 
> I had some problem hacking the redisplay code. For some reason, Emacs hangs after M-x toggle-word-wrap. I ran Emacs under lldb (gdb doesn’t work on Mac), and was dropped into different places each time. I attached the backtrace files. What could I did wrong?
> 
> Explanation for the code: basically I added a predicate function (char_can_wrap) that checks text property to see if a character is “wrappable”. And combined that predicate with the existing IT_DISPLAYING_WHITESPACE.
> 
> Yuan
> 
> <wrap.patch>
> <bt.3>
> <bt.2>
> <bt.1>

I found the problem, now it runs ok. Sorry for the noise.

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26  2:15       ` Yuan Fu
@ 2020-05-26  3:30         ` Yuan Fu
  2020-05-26  4:46           ` Yuan Fu
  2020-05-26 15:00           ` Eli Zaretskii
  0 siblings, 2 replies; 88+ messages in thread
From: Yuan Fu @ 2020-05-26  3:30 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 271 bytes --]

The redisplay code now works! Now I need to explore how to apply the text properties. I had a look at category table. IIUC, a category table for all unicode characters would be impractically large and unnecessary. A normal list of ranges would probably work. 

Yuan


[-- Attachment #2: wrap.patch --]
[-- Type: application/octet-stream, Size: 6019 bytes --]

diff --git a/src/xdisp.c b/src/xdisp.c
index 01f272033e..7f04c1bc67 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -427,6 +427,28 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* Return true if the current character allows wrapping before it.   */
+static bool char_can_wrap_before (struct it *it)
+{
+  Lisp_Object charpos = make_fixnum (IT_CHARPOS (*it));
+  Lisp_Object prop = Fget_text_property (charpos, Qcan_wrap, Qnil);
+  if (EQ (Qt, prop) || EQ (Qbefore_only, prop))
+    return true;
+  else
+    return false;
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool char_can_wrap_after (struct it *it)
+{
+  Lisp_Object charpos = make_fixnum (IT_CHARPOS (*it));
+  Lisp_Object prop = Fget_text_property (charpos, Qcan_wrap, Qnil);
+  if (EQ (Qt, prop) || EQ (Qafter_only, prop))
+    return true;
+  else
+    return false;
+}
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9098,9 +9120,12 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
+              /* If this character displays a whitespace or the text
+                 property says you can wrap after it, the next one can
+                 wrap.  */
+	      if (IT_DISPLAYING_WHITESPACE (it) || char_can_wrap_after (it))
 		may_wrap = true;
-	      else if (may_wrap)
+	      if (! IT_DISPLAYING_WHITESPACE (it) && may_wrap)
 		{
 		  /* We have reached a glyph that follows one or more
 		     whitespace characters.  If the position is
@@ -9249,9 +9274,13 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			      bool can_wrap = true;
 
 			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+				 (or a character that allows wrapping
+				 after it) that barely fits on this
+				 screen line, but the next character
+				 is also whitespace (or a character
+				 that forbids wrapping before it), we
+				 cannot wrap here.
+                              */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9263,7 +9292,8 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && (IT_DISPLAYING_WHITESPACE (it)
+                                          || !char_can_wrap_before (it)))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9342,19 +9372,23 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
+			 may_wrap flag is also whitespace (or is a
+			 character that forbids wrapping before it),
+			 we can't wrap here, since the screen line
+			 cannot be wrapped in the middle of whitespace
+			 (or before a character that forbids wrapping
+			 before it).  Therefore, wrap_it (previously
+			 found wrap-point) _is_ relevant in that
 			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+		      && !(moved_forward && (IT_DISPLAYING_WHITESPACE (it)
+                                             || !char_can_wrap_before (it))))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23180,9 +23214,9 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
+	      if (IT_DISPLAYING_WHITESPACE (it) || char_can_wrap_after (it))
 		may_wrap = true;
-	      else if (may_wrap)
+	      if (! IT_DISPLAYING_WHITESPACE (it) && may_wrap)
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23328,7 +23362,8 @@ #define RECORD_MAX_MIN_POS(IT)					\
 				 was a space or tab AND (ii) the
 				 current character is not.  */
 			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				  || IT_DISPLAYING_WHITESPACE (it)
+                                  || char_can_wrap_after (it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23362,7 +23397,8 @@ #define RECORD_MAX_MIN_POS(IT)					\
 					  was a space or tab AND (ii) the
 					  current character is not.  */
 				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+					   || IT_DISPLAYING_WHITESPACE (it)
+                                           || char_can_wrap_after (it)))
 				goto back_to_wrap;
 
 			    }
@@ -34231,6 +34267,9 @@ syms_of_xdisp (void)
   DEFSYM (QCfile, ":file");
   DEFSYM (Qfontified, "fontified");
   DEFSYM (Qfontification_functions, "fontification-functions");
+  DEFSYM (Qcan_wrap, "can-wrap");
+  DEFSYM (Qbefore_only, "before-only");
+  DEFSYM (Qafter_only, "after-only");
 
   /* Name of the symbol which disables Lisp evaluation in 'display'
      properties.  This is used by enriched.el.  */

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26  3:30         ` Yuan Fu
@ 2020-05-26  4:46           ` Yuan Fu
  2020-05-26 15:14             ` Eli Zaretskii
  2020-05-26 15:00           ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-26  4:46 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 664 bytes --]

Ok, turns out category table is the way to go. There are already category tables defined for my use case: "Not at eol”(>), "Not at bol”(<) and “line breakable”(|). They are used for filling but are just as appropriate as for wrapping. Now there seems to be two routes: one is to directly use category tables to determine whether to wrap, one is to still use text property to determine whether to wrap, and use category table to apply text property. Which approach is more efficient? 

Besides efficiency concerns, using text properties allows use cases described by Ihor, where user can customize on what to wrap and what not to. That seems nice.

Yuan

[-- Attachment #2: Type: text/html, Size: 1065 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26  4:46           ` Yuan Fu
@ 2020-05-26 15:14             ` Eli Zaretskii
  0 siblings, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 15:14 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Tue, 26 May 2020 00:46:01 -0400
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> Besides efficiency concerns, using text properties allows use cases described by Ihor, where user can
> customize on what to wrap and what not to. That seems nice.

I wrote there that I didn't understand that use case.

If we don't actually need the text property, avoiding it is
preferable.  However, make sure the category table can be made
buffer-local.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26  3:30         ` Yuan Fu
  2020-05-26  4:46           ` Yuan Fu
@ 2020-05-26 15:00           ` Eli Zaretskii
  1 sibling, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 15:00 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 25 May 2020 23:30:36 -0400
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> I had a look at category table. IIUC, a category table for all unicode characters would be impractically large and unnecessary.

That conclusion is mistaken.  Emacs char-tables are designed to handle
memory very efficiently.  In particular, if just a small number of
characters have a non-nil property value, the memory consumption of
such a char-table will be very low, and will grow and shrink as needed
if additional values are modified from or to nil.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 23:32     ` Yuan Fu
  2020-05-26  2:15       ` Yuan Fu
@ 2020-05-26 14:54       ` Eli Zaretskii
  2020-05-26 17:34         ` Yuan Fu
  1 sibling, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 14:54 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 25 May 2020 19:32:33 -0400
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> +static int char_can_wrap(struct it *it)
> +{
> +  Lisp_Object charpos = make_fixnum (IT_STRING_CHARPOS (*it));
> +  Lisp_Object tail = Fget_text_property (charpos, Qcan_wrap, Qnil);

Regardless of the bug you say you fixed in the meantime, the above is
wrong: it only considers the text property on buffer text, but not on
display strings and overlay strings.  Those strings can also need to
be wrapped on display.

> +  for (; CONSP (tail); tail = XCDR (tail))
> +    {
> +      register Lisp_Object tem = XCAR (tail);
> +      if (EQ (Qcan_wrap, tem))
> +        {
> +          Lisp_Object val = XCDR (tail);
> +          if (NILP (val))
> +            { return 0; }
> +          else if (EQ (Qbefore_only, val))
> +            { return 1; }
> +          else if (EQ (Qafter_only, val))
> +            { return 2; }
> +          else if (EQ (Qt, val))
> +            { return 3; }
> +          else
> +            { return 0; }
> +        }

Why is the value of the text property a cons cell?  Why not a simple
symbol?

> -	      if (IT_DISPLAYING_WHITESPACE (it))
> +              /* If this character displays a whitespace or the text
> +                 property says you can wrap after it, the next one can
> +                 wrap.  */
> +	      if (IT_DISPLAYING_WHITESPACE (it) || IT_CAN_WRAP_AFTER (it))

It is cleaner to define a new macro that handles both conditions, and
use it instead of IT_DISPLAYING_WHITESPACE.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 14:54       ` Eli Zaretskii
@ 2020-05-26 17:34         ` Yuan Fu
  2020-05-26 19:50           ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-26 17:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 656 bytes --]

Hi Eli,

I fixed the problems and it now works. If you apply the patch below and load kinsaku.el, open the test.txt and M-x toggle-word-wrap. You should see the text properly wrapped: wrapping between CJK characters and whitespaces but not between ASCII characters. Also according to kinsoku rules, CJK comma will not be placed at the beginning of a line; CJK “《” will not be place at the end of a line, etc.

It determines whether we can wrap before/after a character by looking at “<“, “>” and “|” categories, roughly corresponding to “don’t wrap before”, “don’t wrap after” and “wrap before and after”. 

Yuan


[-- Attachment #2: wrap.patch --]
[-- Type: application/octet-stream, Size: 9926 bytes --]

diff --git a/src/xdisp.c b/src/xdisp.c
index 01f272033e..3dd1450847 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -366,6 +366,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
 #include "termchar.h"
 #include "dispextern.h"
 #include "character.h"
+#include "category.h"
 #include "buffer.h"
 #include "charset.h"
 #include "indent.h"
@@ -427,6 +428,84 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+// TODO make Lisp-visible?
+/* Calculate the wrapping rule for character CH and add it as a text
+   property to current buffer at CHARPOS.  Return the text property
+   value.  */
+static Lisp_Object apply_wrap_property (Lisp_Object charpos, int ch)
+{
+  /* These are category sets we use.  */
+  int not_at_eol = 60; /* < */
+  int not_at_bol = 62; /* > */
+  int line_breakable = 124; /* | */
+  if (CHAR_HAS_CATEGORY (ch, line_breakable))
+    {
+      if (CHAR_HAS_CATEGORY (ch, not_at_bol))
+        {
+          Fput_text_property (charpos, Fadd1 (charpos),
+                              Qword_wrap, Qonly_after, Qnil);
+          return Qonly_after;
+        }
+        
+      else if (CHAR_HAS_CATEGORY (ch, not_at_eol))
+        {
+          Fput_text_property (charpos, Fadd1 (charpos),
+                              Qword_wrap, Qonly_before, Qnil);
+          return Qonly_before;
+        }
+      else
+        {
+          Fput_text_property (charpos, Fadd1 (charpos),
+                              Qword_wrap, Qt, Qnil);
+          return Qt;
+        }
+    }
+  else
+    {
+      /* For normal characters, since they _can_ appear at the
+         beginning of a line, we make their rule only_before.  */
+      Fput_text_property (charpos, Fadd1 (charpos),
+                              Qword_wrap, Qonly_before, Qnil);
+      return Qonly_before;
+    }
+}
+
+/* Return true if the current character allows wrapping before it.   */
+static bool char_can_wrap_before (struct it *it)
+{
+  Lisp_Object charpos = make_fixnum (IT_CHARPOS (*it));
+  Lisp_Object prop = Fget_text_property (charpos, Qword_wrap, Qnil);
+  // TODO handle other types of it->what?
+  if (EQ (prop, Qnil) && it->what == IT_CHARACTER)
+      prop = apply_wrap_property(charpos, it->c);
+  if (EQ (Qt, prop) || EQ (Qonly_before, prop))
+    return true;
+  else
+    return false;
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool char_can_wrap_after (struct it *it)
+{
+  Lisp_Object charpos = make_fixnum (IT_CHARPOS (*it));
+  Lisp_Object prop = Fget_text_property (charpos, Qword_wrap, Qnil);
+  if (EQ (prop, Qnil) && it->what == IT_CHARACTER)
+      prop = apply_wrap_property(charpos, it->c);
+  if (EQ (Qt, prop) || EQ (Qonly_after, prop))
+    return true;
+  else
+    return false;
+}
+
+/* True if we can wrap before the current character.  */
+#define IT_CAN_WRAP_BEFORE(it) \
+  (!IT_DISPLAYING_WHITESPACE (it) && char_can_wrap_before (it))
+
+/* True if we can wrap after the current character.  */
+#define IT_CAN_WRAP_AFTER(it) \
+  (IT_DISPLAYING_WHITESPACE (it) || char_can_wrap_after (it))
+
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9098,13 +9177,13 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              /* Can we wrap here? */
+	      if (may_wrap && IT_CAN_WRAP_BEFORE(it))
 		{
 		  /* We have reached a glyph that follows one or more
-		     whitespace characters.  If the position is
-		     already found, we are done.  */
+		     whitespace characters (or a character that allows
+		     wrapping after it).  If the position is already
+		     found, we are done.  */
 		  if (atpos_it.sp >= 0)
 		    {
 		      RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9119,8 +9198,14 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		    }
 		  /* Otherwise, we can wrap here.  */
 		  SAVE_IT (wrap_it, *it, wrap_data);
-		  may_wrap = false;
 		}
+              /* This has to run after the previous block.  */
+              if (IT_CAN_WRAP_AFTER (it))
+                /* may_wrap basically means "previous char allows
+                   wrapping after it".  */
+                may_wrap = true;
+              else
+                may_wrap = false;
 	    }
 	}
 
@@ -9248,10 +9333,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			    {
 			      bool can_wrap = true;
 
-			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+			      /* If the previous character says we can
+                                 wrap after it, but the current
+                                 character says we can't wrap before
+                                 it, then we can't wrap here.  */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9263,7 +9348,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && !IT_CAN_WRAP_BEFORE(it))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9342,19 +9427,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
-			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+			 may_wrap flag says we can't wrap before it,
+			 we can't wrap here.  Therefore, wrap_it
+			 (previously found wrap-point) _is_ relevant
+			 in that case.  */
+		      && !(moved_forward && IT_CAN_WRAP_BEFORE(it)))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23180,9 +23264,8 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              /* Can we wrap here? */
+	      if (may_wrap && IT_CAN_WRAP_BEFORE(it))
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23196,9 +23279,13 @@ #define RECORD_MAX_MIN_POS(IT)					\
 		  wrap_row_min_bpos = min_bpos;
 		  wrap_row_max_pos = max_pos;
 		  wrap_row_max_bpos = max_bpos;
-		  may_wrap = false;
 		}
-	    }
+              /* This has to run after the previous block.  */
+	      if (IT_CAN_WRAP_AFTER (it))
+		may_wrap = true;
+              else
+                may_wrap = false;
+            }
 	}
 
       PRODUCE_GLYPHS (it);
@@ -23321,14 +23408,18 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			  /* If line-wrap is on, check if a previous
 			     wrap point was found.  */
 			  if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
-			      && wrap_row_used > 0
+			      && wrap_row_used > 0 /* Found.  */
 			      /* Even if there is a previous wrap
 				 point, continue the line here as
 				 usual, if (i) the previous character
-				 was a space or tab AND (ii) the
-				 current character is not.  */
-			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				 allows wrapping after it, AND (ii)
+				 the current character allows wrapping
+				 before it.  Because this is a valid
+				 break point, we can just continue to
+				 the next line at here, there is no
+				 need to wrap early at the previous
+				 wrap point.  */
+			      && (!may_wrap || !IT_CAN_WRAP_BEFORE(it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23356,13 +23447,16 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			      /* If line-wrap is on, check if a
 				 previous wrap point was found.  */
 			      else if (wrap_row_used > 0
-				       /* Even if there is a previous wrap
-					  point, continue the line here as
-					  usual, if (i) the previous character
-					  was a space or tab AND (ii) the
-					  current character is not.  */
-				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+				       /* Even if there is a previous
+					  wrap point, continue the
+					  line here as usual, if (i)
+					  the previous character was a
+					  space or tab AND (ii) the
+					  current character is not,
+					  AND (iii) the current
+					  character allows wrapping
+					  before it.  */
+				       && (!may_wrap || !IT_CAN_WRAP_BEFORE(it)))
 				goto back_to_wrap;
 
 			    }
@@ -34231,6 +34325,10 @@ syms_of_xdisp (void)
   DEFSYM (QCfile, ":file");
   DEFSYM (Qfontified, "fontified");
   DEFSYM (Qfontification_functions, "fontification-functions");
+  DEFSYM (Qword_wrap, "word-wrap");
+  DEFSYM (Qonly_before, "only-before");
+  DEFSYM (Qonly_after, "only-after");
+  DEFSYM (Qno_wrap, "no-wrap");
 
   /* Name of the symbol which disables Lisp evaluation in 'display'
      properties.  This is used by enriched.el.  */

[-- Attachment #3: Type: text/plain, Size: 1 bytes --]



[-- Attachment #4: test.txt --]
[-- Type: text/plain, Size: 801 bytes --]

ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ English English ä¸è‹±æ–‡æ··æŽ’ä¸ï¼Œè‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡ã€Šæ··æŽ’ä¸è‹±æ–‡æ··

è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æ··englishæŽ’ä¸è‹±æ–‡æ··ä¸è‹±æ–‡æ··æŽ’ä¸ï¼Œä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æ··æŽ’ä¸è‹±æ–‡æ··ä¸è‹±æ–‡æ··æŽ’ä¸

è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··englishæŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡ã€Šæ··æŽ’ä¸è‹±æ–‡æ··




ä¸è‹±æ–‡æ··è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ã€ã€

ä¸è‹±æ–‡æ··è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ä¸è‹±æ–‡æ··æŽ’ã€è‹±



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 17:34         ` Yuan Fu
@ 2020-05-26 19:50           ` Eli Zaretskii
  2020-05-26 20:31             ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-26 19:50 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Tue, 26 May 2020 13:34:01 -0400
> Cc: larsi@gnus.org,
>  emacs-devel@gnu.org
> 
> I fixed the problems and it now works. If you apply the patch below and load kinsaku.el, open the test.txt and M-x toggle-word-wrap. You should see the text properly wrapped: wrapping between CJK characters and whitespaces but not between ASCII characters. Also according to kinsoku rules, CJK comma will not be placed at the beginning of a line; CJK “《” will not be place at the end of a line, etc.
> 
> It determines whether we can wrap before/after a character by looking at “<“, “>” and “|” categories, roughly corresponding to “don’t wrap before”, “don’t wrap after” and “wrap before and after”. 

Thanks.

This still doesn't support strings, only buffer text.

Also, why are you putting a text property, instead of just examining
the category as part of IT_CAN_WRAP?  What do you need the property
for?

And finally, this feature must be optional, so some customization
knobs are missing.  But we could defer this until the basic code is in
good shape.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 19:50           ` Eli Zaretskii
@ 2020-05-26 20:31             ` Yuan Fu
  2020-05-26 22:29               ` Yuan Fu
  2020-05-27 15:20               ` Eli Zaretskii
  0 siblings, 2 replies; 88+ messages in thread
From: Yuan Fu @ 2020-05-26 20:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel



> On May 26, 2020, at 3:50 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Tue, 26 May 2020 13:34:01 -0400
>> Cc: larsi@gnus.org,
>> emacs-devel@gnu.org
>> 
>> I fixed the problems and it now works. If you apply the patch below and load kinsaku.el, open the test.txt and M-x toggle-word-wrap. You should see the text properly wrapped: wrapping between CJK characters and whitespaces but not between ASCII characters. Also according to kinsoku rules, CJK comma will not be placed at the beginning of a line; CJK “《” will not be place at the end of a line, etc.
>> 
>> It determines whether we can wrap before/after a character by looking at “<“, “>” and “|” categories, roughly corresponding to “don’t wrap before”, “don’t wrap after” and “wrap before and after”. 
> 
> Thanks.
> 
> This still doesn't support strings, only buffer text.
> 
> Also, why are you putting a text property, instead of just examining
> the category as part of IT_CAN_WRAP?  What do you need the property
> for?
> 

I don’t really know which way is better/more efficient and just took one to implement. Plus text property might allow some user customizations. I can change it to only use category table. 

> And finally, this feature must be optional, so some customization
> knobs are missing.  But we could defer this until the basic code is in
> good shape.

Cool.

I saw someone mentioning Line_Break.txt from unicode and looked it up, unicode commission has already marked out all wrap-able code points. IIUC we can add Line_Break.txt to admin/unidat and parse it and put a elisp file under /lisp/international, right? We can categoarize all the marked code points into three categories as I mentioned earlier. 

Yuan




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 20:31             ` Yuan Fu
@ 2020-05-26 22:29               ` Yuan Fu
  2020-05-27 17:29                 ` Eli Zaretskii
  2020-05-27 15:20               ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-26 22:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1576 bytes --]



> On May 26, 2020, at 4:31 PM, Yuan Fu <casouri@gmail.com> wrote:
> 
> 
> 
>> On May 26, 2020, at 3:50 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> 
>>> From: Yuan Fu <casouri@gmail.com>
>>> Date: Tue, 26 May 2020 13:34:01 -0400
>>> Cc: larsi@gnus.org,
>>> emacs-devel@gnu.org
>>> 
>>> I fixed the problems and it now works. If you apply the patch below and load kinsaku.el, open the test.txt and M-x toggle-word-wrap. You should see the text properly wrapped: wrapping between CJK characters and whitespaces but not between ASCII characters. Also according to kinsoku rules, CJK comma will not be placed at the beginning of a line; CJK “《” will not be place at the end of a line, etc.
>>> 
>>> It determines whether we can wrap before/after a character by looking at “<“, “>” and “|” categories, roughly corresponding to “don’t wrap before”, “don’t wrap after” and “wrap before and after”. 
>> 
>> Thanks.
>> 
>> This still doesn't support strings, only buffer text.
>> 
>> Also, why are you putting a text property, instead of just examining
>> the category as part of IT_CAN_WRAP?  What do you need the property
>> for?
>> 
> 
> I don’t really know which way is better/more efficient and just took one to implement. Plus text property might allow some user customizations. I can change it to only use category table. 


Here is the version that doesn’t use text properties. I assume by string you mean display properties? I checked with display property and it wraps fine in this version.

Yuan





[-- Attachment #2.1: Type: text/html, Size: 4840 bytes --]

[-- Attachment #2.2: new-wrap.patch --]
[-- Type: application/octet-stream, Size: 8388 bytes --]

diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..105b6b175a 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
 #include "termchar.h"
 #include "dispextern.h"
 #include "character.h"
+#include "category.h"
 #include "buffer.h"
 #include "charset.h"
 #include "indent.h"
@@ -508,6 +509,37 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* These are the category sets we use.  */
+#define NOT_AT_EOL 60 /* < */
+#define NOT_AT_BOL 62 /* > */
+#define LINE_BREAKABLE 124 /* | */
+
+/* Return true if the current character allows wrapping before it.   */
+static bool char_can_wrap_before (struct it *it)
+{
+  /* We used to only check for whitespace for wrapping, hence this
+     macro.  You cannot wrap before a whitespace.  */
+  return ((it->what == IT_CHARACTER
+           && !CHAR_HAS_CATEGORY(it->c, NOT_AT_BOL))
+          /* There used to be   */
+          && !IT_DISPLAYING_WHITESPACE (it));
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool char_can_wrap_after (struct it *it)
+{
+  return ((it->what == IT_CHARACTER
+           && CHAR_HAS_CATEGORY (it->c, LINE_BREAKABLE)
+           && !CHAR_HAS_CATEGORY(it->c, NOT_AT_EOL))
+          /* We used to only check for whitespace for wrapping, hence
+             this macro.  Obviously you can wrap after a space.  */
+          || IT_DISPLAYING_WHITESPACE (it));
+}
+
+#undef NOT_AT_BOL
+#undef NOT_AT_BOL
+#undef LINE_BREAKABLE
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9185,13 +9217,13 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before(it))
 		{
 		  /* We have reached a glyph that follows one or more
-		     whitespace characters.  If the position is
-		     already found, we are done.  */
+		     whitespace characters (or a character that allows
+		     wrapping after it).  If the position is already
+		     found, we are done.  */
 		  if (atpos_it.sp >= 0)
 		    {
 		      RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9206,8 +9238,14 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		    }
 		  /* Otherwise, we can wrap here.  */
 		  SAVE_IT (wrap_it, *it, wrap_data);
-		  may_wrap = false;
 		}
+              /* This has to run after the previous block.  */
+              if (char_can_wrap_after (it))
+                /* may_wrap basically means "previous char allows
+                   wrapping after it".  */
+                may_wrap = true;
+              else
+                may_wrap = false;
 	    }
 	}
 
@@ -9335,10 +9373,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			    {
 			      bool can_wrap = true;
 
-			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+			      /* If the previous character says we can
+                                 wrap after it, but the current
+                                 character says we can't wrap before
+                                 it, then we can't wrap here.  */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9350,7 +9388,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && !char_can_wrap_before(it))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9429,19 +9467,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
-			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+			 may_wrap flag says we can't wrap before it,
+			 we can't wrap here.  Therefore, wrap_it
+			 (previously found wrap-point) _is_ relevant
+			 in that case.  */
+		      && !(moved_forward && char_can_wrap_before(it)))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23292,9 +23329,8 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before(it))
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23308,9 +23344,13 @@ #define RECORD_MAX_MIN_POS(IT)					\
 		  wrap_row_min_bpos = min_bpos;
 		  wrap_row_max_pos = max_pos;
 		  wrap_row_max_bpos = max_bpos;
-		  may_wrap = false;
 		}
-	    }
+              /* This has to run after the previous block.  */
+	      if (char_can_wrap_after (it))
+		may_wrap = true;
+              else
+                may_wrap = false;
+            }
 	}
 
       PRODUCE_GLYPHS (it);
@@ -23433,14 +23473,18 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			  /* If line-wrap is on, check if a previous
 			     wrap point was found.  */
 			  if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
-			      && wrap_row_used > 0
+			      && wrap_row_used > 0 /* Found.  */
 			      /* Even if there is a previous wrap
 				 point, continue the line here as
 				 usual, if (i) the previous character
-				 was a space or tab AND (ii) the
-				 current character is not.  */
-			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				 allows wrapping after it, AND (ii)
+				 the current character allows wrapping
+				 before it.  Because this is a valid
+				 break point, we can just continue to
+				 the next line at here, there is no
+				 need to wrap early at the previous
+				 wrap point.  */
+			      && (!may_wrap || !char_can_wrap_before(it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23468,13 +23512,16 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			      /* If line-wrap is on, check if a
 				 previous wrap point was found.  */
 			      else if (wrap_row_used > 0
-				       /* Even if there is a previous wrap
-					  point, continue the line here as
-					  usual, if (i) the previous character
-					  was a space or tab AND (ii) the
-					  current character is not.  */
-				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+				       /* Even if there is a previous
+					  wrap point, continue the
+					  line here as usual, if (i)
+					  the previous character was a
+					  space or tab AND (ii) the
+					  current character is not,
+					  AND (iii) the current
+					  character allows wrapping
+					  before it.  */
+				       && (!may_wrap || !char_can_wrap_before(it)))
 				goto back_to_wrap;
 
 			    }
@@ -34349,6 +34396,10 @@ syms_of_xdisp (void)
   DEFSYM (QCfile, ":file");
   DEFSYM (Qfontified, "fontified");
   DEFSYM (Qfontification_functions, "fontification-functions");
+  DEFSYM (Qword_wrap, "word-wrap");
+  DEFSYM (Qonly_before, "only-before");
+  DEFSYM (Qonly_after, "only-after");
+  DEFSYM (Qno_wrap, "no-wrap");
 
   /* Name of the symbol which disables Lisp evaluation in 'display'
      properties.  This is used by enriched.el.  */

[-- Attachment #2.3: Type: text/html, Size: 259 bytes --]

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 22:29               ` Yuan Fu
@ 2020-05-27 17:29                 ` Eli Zaretskii
  2020-05-28 17:31                   ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-27 17:29 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Tue, 26 May 2020 18:29:04 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> Here is the version that doesn’t use text properties.

Thanks, few comments below.

> I assume by string you mean display properties? I checked with display property and it wraps fine in this version.

Display properties whose values are strings, and also before-string
and after-string overlay properties.

> +static bool char_can_wrap_before (struct it *it)
> +{
> +  /* We used to only check for whitespace for wrapping, hence this
> +     macro.  You cannot wrap before a whitespace.  */
> +  return ((it->what == IT_CHARACTER
> +           && !CHAR_HAS_CATEGORY(it->c, NOT_AT_BOL))
> +          /* There used to be   */
> +          && !IT_DISPLAYING_WHITESPACE (it));
> +}

The order here is wrong: the IT_DISPLAYING_WHITESPACE should be tested
first, as that is the more frequent situation, so it should be
processed faster.

> +/* Return true if the current character allows wrapping after it.   */
> +static bool char_can_wrap_after (struct it *it)
> +{
> +  return ((it->what == IT_CHARACTER
> +           && CHAR_HAS_CATEGORY (it->c, LINE_BREAKABLE)
> +           && !CHAR_HAS_CATEGORY(it->c, NOT_AT_EOL))
> +          /* We used to only check for whitespace for wrapping, hence
> +             this macro.  Obviously you can wrap after a space.  */
> +          || IT_DISPLAYING_WHITESPACE (it));
> +}

Do we really need two separate functions?  And note that each one
calls IT_DISPLAYING_WHITESPACE, so in some situations you will be
testing that twice for the same character -- because you have 2
separate functions.

> -	      if (IT_DISPLAYING_WHITESPACE (it))
> -		may_wrap = true;
> -	      else if (may_wrap)
> +              /* Can we wrap here? */
> +	      if (may_wrap && char_can_wrap_before(it))
>  		{
>  		  /* We have reached a glyph that follows one or more
> -		     whitespace characters.  If the position is
> -		     already found, we are done.  */
> +		     whitespace characters (or a character that allows
> +		     wrapping after it).  If the position is already
> +		     found, we are done.  */

The code calls char_can_wrap_before, but the comment says we can wrap
after it.  Which one is right?

> +              /* This has to run after the previous block.  */

This kind of comment begs the question: "why?"  Please rewrite the
comment to answer that question up front.

> -			      /* If we are at a whitespace character
> -				 that barely fits on this screen line,
> -				 but the next character is also
> -				 whitespace, we cannot wrap here.  */
> +			      /* If the previous character says we can
> +                                 wrap after it, but the current
> +                                 character says we can't wrap before
> +                                 it, then we can't wrap here.  */

It sounds like your Emacs is set up to use only spaces for indentation
in C source files, whereas our convention is to use tabs and spaces.

> +  DEFSYM (Qword_wrap, "word-wrap");
> +  DEFSYM (Qonly_before, "only-before");
> +  DEFSYM (Qonly_after, "only-after");
> +  DEFSYM (Qno_wrap, "no-wrap");

These symbols are not used in the code.

And finally, one more general comment/question: isn't your code assume
implicitly that buffer text is always processed in the logical order,
i.e. in the increasing order of buffer positions?  I mean, the fact
that you have the "before" and the "after" function seems to imply
that you do assume that, and the logic of processing the categories is
relying on that, expecting that when you see a wrap_after character,
you can wrap on the next character.  Is this so?  because if it is,
then this will break when processing RTL text, since we may process it
in the reverse order of buffer positions.  Please look into these
situations and see that the code does TRT in them.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-27 17:29                 ` Eli Zaretskii
@ 2020-05-28 17:31                   ` Yuan Fu
  2020-05-28 18:05                     ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-28 17:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

> On May 27, 2020, at 1:29 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Tue, 26 May 2020 18:29:04 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>> Here is the version that doesn’t use text properties.
> 
> Thanks, few comments below.

Thank you for reviewing.

> 
>> I assume by string you mean display properties? I checked with display property and it wraps fine in this version.
> 
> Display properties whose values are strings, and also before-string
> and after-string overlay properties.
> 
>> +static bool char_can_wrap_before (struct it *it)
>> +{
>> +  /* We used to only check for whitespace for wrapping, hence this
>> +     macro.  You cannot wrap before a whitespace.  */
>> +  return ((it->what == IT_CHARACTER
>> +           && !CHAR_HAS_CATEGORY(it->c, NOT_AT_BOL))
>> +          /* There used to be   */
>> +          && !IT_DISPLAYING_WHITESPACE (it));
>> +}
> 
> The order here is wrong: the IT_DISPLAYING_WHITESPACE should be tested
> first, as that is the more frequent situation, so it should be
> processed faster.
> 
Fixed.

Before answering your questions, this is my understanding of the word wrapping in redisplay:
On every iteration we check if current character allow wrapping after it, if so, we set may_wrap to true. That basically means the _previous char_ allows wrapping after it. While we are at the same iteration, we may want to wrapping point (wrap_it) if 1) the previous char allows wrapping after it (from may_wrap’s value set by _previous iteration_) and 2) the current character allows wrapping before it. When we found ourselves at the end of a line, we have two choices: continue to the next line (which I assume is what “continue” means in the comment), or instead go to a previously saved wrap point and break the line there. If there is no previous wrap point, we continue, if there is a previous wrap point but we actually can wrap at this point (previous char can wrap after & this char can wrap before), we just continue, since this position is a valid wrap position. Otherwise we go back to previous wrap point and wrap there (goto back_to_wrap;).

Although the original code only has one checker (IS_WHITESPACE), it serves a dual purpose: it is used to determine if we can wrap after—whitespace and tab allow wrapping after; it is also used to determine if we can wrap before—they don’t allow wrapping before (otherwise you see whitespace and tabs on the beginning of the next line).

Needless to say, I’m a newbie in Emacs C internals and redisplay, so my understanding from reading the original code and comments could be wrong. But the code seems to work right so I think the truth isn’t far away.

>> +/* Return true if the current character allows wrapping after it.   */
>> +static bool char_can_wrap_after (struct it *it)
>> +{
>> +  return ((it->what == IT_CHARACTER
>> +           && CHAR_HAS_CATEGORY (it->c, LINE_BREAKABLE)
>> +           && !CHAR_HAS_CATEGORY(it->c, NOT_AT_EOL))
>> +          /* We used to only check for whitespace for wrapping, hence
>> +             this macro.  Obviously you can wrap after a space.  */
>> +          || IT_DISPLAYING_WHITESPACE (it));
>> +}
> 
> Do we really need two separate functions?  And note that each one
> calls IT_DISPLAYING_WHITESPACE, so in some situations you will be
> testing that twice for the same character -- because you have 2
> separate functions.

IT_DISPLAYING_WHITESPACE would run twice, too: one time to check for warp after and one time for wrap before.

> 
>> -	      if (IT_DISPLAYING_WHITESPACE (it))
>> -		may_wrap = true;
>> -	      else if (may_wrap)
>> +              /* Can we wrap here? */
>> +	      if (may_wrap && char_can_wrap_before(it))
>> 		{
>> 		  /* We have reached a glyph that follows one or more
>> -		     whitespace characters.  If the position is
>> -		     already found, we are done.  */
>> +		     whitespace characters (or a character that allows
>> +		     wrapping after it).  If the position is already
>> +		     found, we are done.  */
> 
> The code calls char_can_wrap_before, but the comment says we can wrap
> after it.  Which one is right?

The comment says this char _follows_ a char that allows wrapping after, we still need to check if _this_ char allows wrapping before.

> 
>> +              /* This has to run after the previous block.  */
> 
> This kind of comment begs the question: "why?"  Please rewrite the
> comment to answer that question up front.

Fixed. Hopefully it’s clear now.

> 
>> -			      /* If we are at a whitespace character
>> -				 that barely fits on this screen line,
>> -				 but the next character is also
>> -				 whitespace, we cannot wrap here.  */
>> +			      /* If the previous character says we can
>> +                                 wrap after it, but the current
>> +                                 character says we can't wrap before
>> +                                 it, then we can't wrap here.  */
> 
> It sounds like your Emacs is set up to use only spaces for indentation
> in C source files, whereas our convention is to use tabs and spaces.
> 

I’m not sure what you mean, do you mean the indent style for the new code, or are you talking about the word wrapping?

>> +  DEFSYM (Qword_wrap, "word-wrap");
>> +  DEFSYM (Qonly_before, "only-before");
>> +  DEFSYM (Qonly_after, "only-after");
>> +  DEFSYM (Qno_wrap, "no-wrap");
> 
> These symbols are not used in the code.

Removed.

> 
> And finally, one more general comment/question: isn't your code assume
> implicitly that buffer text is always processed in the logical order,
> i.e. in the increasing order of buffer positions?  I mean, the fact
> that you have the "before" and the "after" function seems to imply
> that you do assume that, and the logic of processing the categories is
> relying on that, expecting that when you see a wrap_after character,
> you can wrap on the next character.  Is this so?  because if it is,
> then this will break when processing RTL text, since we may process it
> in the reverse order of buffer positions.  Please look into these
> situations and see that the code does TRT in them.

I tested with some Alaric text made by google translate, and the wrapping seems not take effect. IIUC bidi.c reorders the line to RTL and the redisplay iterator will still go through them LTR, is that true? I wonder how does the original wrapping works in that case. Does the old code handle bidi text?

Yuan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-28 17:31                   ` Yuan Fu
@ 2020-05-28 18:05                     ` Eli Zaretskii
  2020-05-28 19:34                       ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-28 18:05 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 28 May 2020 13:31:37 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> Before answering your questions, this is my understanding of the word wrapping in redisplay:
> On every iteration we check if current character allow wrapping after it, if so, we set may_wrap to true.

Yes.

> That basically means the _previous char_ allows wrapping after it.

No, we never wrap at the character where we set may_wrap = true, we
wrap _after_ it.  In the current code, may_wrap is set when we see a
SPC character, and when we wrap, that SPC is left on the line before
the wrap.

> While we are at the same iteration, we may want to wrapping point (wrap_it) if 1) the previous char allows wrapping after it (from may_wrap’s value set by _previous iteration_) and 2) the current character allows wrapping before it.

Yes.  But I don't understand what you mean by "at the same iteration"
here: if may_wrap was set, then we will not try to save the wrap point
until we process the next character.  So I cannot call this "the same
iteration", it's rather "the next iteration".

> When we found ourselves at the end of a line

You mean, when we reach the edge of the window, right?

> we have two choices: continue to the next line (which I assume is what “continue” means in the comment), or instead go to a previously saved wrap point and break the line there.

Which "continue" are you alluding to here?  Do you mean "lines are
continued"?  because if lines are not being wrapped, we have only one
choice: go back to the last wrap point we found, end the screen line
(a.k.a. "break the line") there, and continue with the text on the
next screen line.

> If there is no previous wrap point, we continue, if there is a previous wrap point but we actually can wrap at this point (previous char can wrap after & this char can wrap before), we just continue, since this position is a valid wrap position. Otherwise we go back to previous wrap point and wrap there (goto back_to_wrap;).

You lost me here.  The logic is actually: if there is a wrap point, go
back to it and break the line there; if there's no wrap point, break
the line where we are now (i.e. at the last character that still fits
inside the window). 

> Although the original code only has one checker (IS_WHITESPACE), it serves a dual purpose: it is used to determine if we can wrap after—whitespace and tab allow wrapping after; it is also used to determine if we can wrap before—they don’t allow wrapping before (otherwise you see whitespace and tabs on the beginning of the next line).

That is true.  I suggested to have a single function so that you could
in that single function perform the same test, just using two
different categories.  then you could basically keep the rest of the
logic intact.  If that is somehow not possible, can you explain why?

> Needless to say, I’m a newbie in Emacs C internals and redisplay, so my understanding from reading the original code and comments could be wrong. But the code seems to work right so I think the truth isn’t far away.

The truth isn't far away, but we need to go all the way so that the
result doesn't break some use cases (of which there are a gazillion in
the display code).

> > Do we really need two separate functions?  And note that each one
> > calls IT_DISPLAYING_WHITESPACE, so in some situations you will be
> > testing that twice for the same character -- because you have 2
> > separate functions.
> 
> IT_DISPLAYING_WHITESPACE would run twice, too: one time to check for warp after and one time for wrap before.

My point was to bring the two tests together so that it could be just
one test.  Is that possible?  If not, why not?

> >> +	      if (may_wrap && char_can_wrap_before(it))
> >> 		{
> >> 		  /* We have reached a glyph that follows one or more
> >> -		     whitespace characters.  If the position is
> >> -		     already found, we are done.  */
> >> +		     whitespace characters (or a character that allows
> >> +		     wrapping after it).  If the position is already
> >> +		     found, we are done.  */
> > 
> > The code calls char_can_wrap_before, but the comment says we can wrap
> > after it.  Which one is right?
> 
> The comment says this char _follows_ a char that allows wrapping after, we still need to check if _this_ char allows wrapping before.

But the comment is after the char_can_wrap_before test, so we have
already tested that, no?

> >> -			      /* If we are at a whitespace character
> >> -				 that barely fits on this screen line,
> >> -				 but the next character is also
> >> -				 whitespace, we cannot wrap here.  */
> >> +			      /* If the previous character says we can
> >> +                                 wrap after it, but the current
> >> +                                 character says we can't wrap before
> >> +                                 it, then we can't wrap here.  */
> > 
> > It sounds like your Emacs is set up to use only spaces for indentation
> > in C source files, whereas our convention is to use tabs and spaces.
> 
> I’m not sure what you mean, do you mean the indent style for the new code, or are you talking about the word wrapping?

I mean the indentation: the original code used TABs and spaces, but
your code uses only spaces.

> > And finally, one more general comment/question: isn't your code assume
> > implicitly that buffer text is always processed in the logical order,
> > i.e. in the increasing order of buffer positions?  I mean, the fact
> > that you have the "before" and the "after" function seems to imply
> > that you do assume that, and the logic of processing the categories is
> > relying on that, expecting that when you see a wrap_after character,
> > you can wrap on the next character.  Is this so?  because if it is,
> > then this will break when processing RTL text, since we may process it
> > in the reverse order of buffer positions.  Please look into these
> > situations and see that the code does TRT in them.
> 
> I tested with some Alaric text made by google translate, and the wrapping seems not take effect. IIUC bidi.c reorders the line to RTL

Yes.

> and the redisplay iterator will still go through them LTR

Not sure what you mean by "go through them LTR".  The iterator can
move forward or backward, or even jump to a far-away place.  You
cannot assume that the next character examined will be the next
character in the buffer.

> I wonder how does the original wrapping works in that case. Does the old code handle bidi text?

Of course it does, you can easily see that if you run the unmodified
Emacs.  It would be a terrible misfeature if we didn't handle wrappng
correctly in bidirectional scripts.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-28 18:05                     ` Eli Zaretskii
@ 2020-05-28 19:34                       ` Yuan Fu
  2020-05-28 20:42                         ` Yuan Fu
  2020-05-29  6:56                         ` Eli Zaretskii
  0 siblings, 2 replies; 88+ messages in thread
From: Yuan Fu @ 2020-05-28 19:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel



> On May 28, 2020, at 2:05 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 28 May 2020 13:31:37 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>> Before answering your questions, this is my understanding of the word wrapping in redisplay:
>> On every iteration we check if current character allow wrapping after it, if so, we set may_wrap to true.
> 
> Yes.
> 
>> That basically means the _previous char_ allows wrapping after it.
> 
> No, we never wrap at the character where we set may_wrap = true, we
> wrap _after_ it.  In the current code, may_wrap is set when we see a
> SPC character, and when we wrap, that SPC is left on the line before
> the wrap.

I see, I must have been misinterpreted some part of the logic. 

> 
>> While we are at the same iteration, we may want to wrapping point (wrap_it) if 1) the previous char allows wrapping after it (from may_wrap’s value set by _previous iteration_) and 2) the current character allows wrapping before it.
> 
> Yes.  But I don't understand what you mean by "at the same iteration"
> here: if may_wrap was set, then we will not try to save the wrap point
> until we process the next character.  So I cannot call this "the same
> iteration", it's rather "the next iteration".
> 

In the current code IT_DISPLAYING_WHITESPACE can check for can_wrap_before and can_wrap_after in the same time, but in my new code, we have to perform two checks in the same iteration, because some char can wrap before but now after, and some can wrap after but not before, etc. 


>> When we found ourselves at the end of a line
> 
> You mean, when we reach the edge of the window, right?

Yes.

> 
>> we have two choices: continue to the next line (which I assume is what “continue” means in the comment), or instead go to a previously saved wrap point and break the line there.
> 
> Which "continue" are you alluding to here?  Do you mean "lines are
> continued"?  because if lines are not being wrapped, we have only one
> choice: go back to the last wrap point we found, end the screen line
> (a.k.a. "break the line") there, and continue with the text on the
> next screen line.

I think there is another option where we wrap at current point. On line 23356:

			      /* If line-wrap is on, check if a
				 previous wrap point was found.  */
			      else if (wrap_row_used > 0
				       /* Even if there is a previous wrap
					  point, continue the line here as
					  usual, if (i) the previous character
					  was a space or tab AND (ii) the
					  current character is not.  */
				       && (!may_wrap
					   || IT_DISPLAYING_WHITESPACE (it)))
				goto back_to_wrap;

I was alluding to this “continue”: “continue the line here as usual”. What does it mean? Does it mean we insert a line break here? I think your response below confirms my guess.

> 
>> If there is no previous wrap point, we continue, if there is a previous wrap point but we actually can wrap at this point (previous char can wrap after & this char can wrap before), we just continue, since this position is a valid wrap position. Otherwise we go back to previous wrap point and wrap there (goto back_to_wrap;).
> 
> You lost me here.  The logic is actually: if there is a wrap point, go
> back to it and break the line there; if there's no wrap point, break
> the line where we are now (i.e. at the last character that still fits
> inside the window). 

I mean I think the logic is that, even if there is a previous wrap point, we still break here if here is also a valid wrap point. I got this impression from reading the comment mentioned above. If this position is not a valid wrap point, go back to previous wrap point and wrap here. If there is no previous wrap point, we break here.

> 
>> Although the original code only has one checker (IS_WHITESPACE), it serves a dual purpose: it is used to determine if we can wrap after—whitespace and tab allow wrapping after; it is also used to determine if we can wrap before—they don’t allow wrapping before (otherwise you see whitespace and tabs on the beginning of the next line).
> 
> That is true.  I suggested to have a single function so that you could
> in that single function perform the same test, just using two
> different categories.  then you could basically keep the rest of the
> logic intact.  If that is somehow not possible, can you explain why?

IT_DISPLAYING_WHITESPACE works because it was only check for two types of characters. Now we have four. A Boolean function can’t return more than two possibilities. Two Boolean functions combined can express four possibilities.

| Type      | wrap_before? | wrap_after? |
|-----------+--------------+-------------|
| space/tab | no           | yes         |
| other     | yes          | no          |

| Type      | wrap_before? | wrap_after? |
|-----------+--------------+-------------|
| space/tab | no           | yes         |
| CJK       | yes          | yes         |
| other     | yes          | no          |
| ??        | no           | no          |


> 
>> Needless to say, I’m a newbie in Emacs C internals and redisplay, so my understanding from reading the original code and comments could be wrong. But the code seems to work right so I think the truth isn’t far away.
> 
> The truth isn't far away, but we need to go all the way so that the
> result doesn't break some use cases (of which there are a gazillion in
> the display code).
> 

Indeed, I’ve heard a lot of urban stories about the redisplay code of Emacs ;-)


>>> Do we really need two separate functions?  And note that each one
>>> calls IT_DISPLAYING_WHITESPACE, so in some situations you will be
>>> testing that twice for the same character -- because you have 2
>>> separate functions.
>> 
>> IT_DISPLAYING_WHITESPACE would run twice, too: one time to check for warp after and one time for wrap before.
> 
> My point was to bring the two tests together so that it could be just
> one test.  Is that possible?  If not, why not?

(See above)

> 
>>>> +	      if (may_wrap && char_can_wrap_before(it))
>>>> 		{
>>>> 		  /* We have reached a glyph that follows one or more
>>>> -		     whitespace characters.  If the position is
>>>> -		     already found, we are done.  */
>>>> +		     whitespace characters (or a character that allows
>>>> +		     wrapping after it).  If the position is already
>>>> +		     found, we are done.  */
>>> 
>>> The code calls char_can_wrap_before, but the comment says we can wrap
>>> after it.  Which one is right?
>> 
>> The comment says this char _follows_ a char that allows wrapping after, we still need to check if _this_ char allows wrapping before.
> 
> But the comment is after the char_can_wrap_before test, so we have
> already tested that, no?

We have tested previous char for can_wrap_after, which is represented as may_wrap == true. We still need to check this char for can_wrap_before. For a position to be a valid wrap point, the char before must allow wrapping after and the char after must allow wrapping before. 

Say we have A|B, we can break line at the bar only if A allows wrapping after and B allows wrapping before.

> 
>>>> -			      /* If we are at a whitespace character
>>>> -				 that barely fits on this screen line,
>>>> -				 but the next character is also
>>>> -				 whitespace, we cannot wrap here.  */
>>>> +			      /* If the previous character says we can
>>>> +                                 wrap after it, but the current
>>>> +                                 character says we can't wrap before
>>>> +                                 it, then we can't wrap here.  */
>>> 
>>> It sounds like your Emacs is set up to use only spaces for indentation
>>> in C source files, whereas our convention is to use tabs and spaces.
>> 
>> I’m not sure what you mean, do you mean the indent style for the new code, or are you talking about the word wrapping?
> 
> I mean the indentation: the original code used TABs and spaces, but
> your code uses only spaces.

How do I fix it? Any guideline file?

> 
>>> And finally, one more general comment/question: isn't your code assume
>>> implicitly that buffer text is always processed in the logical order,
>>> i.e. in the increasing order of buffer positions?  I mean, the fact
>>> that you have the "before" and the "after" function seems to imply
>>> that you do assume that, and the logic of processing the categories is
>>> relying on that, expecting that when you see a wrap_after character,
>>> you can wrap on the next character.  Is this so?  because if it is,
>>> then this will break when processing RTL text, since we may process it
>>> in the reverse order of buffer positions.  Please look into these
>>> situations and see that the code does TRT in them.
>> 
>> I tested with some Alaric text made by google translate, and the wrapping seems not take effect. IIUC bidi.c reorders the line to RTL
> 
> Yes.
> 
>> and the redisplay iterator will still go through them LTR
> 
> Not sure what you mean by "go through them LTR".  The iterator can
> move forward or backward, or even jump to a far-away place.  You
> cannot assume that the next character examined will be the next
> character in the buffer.

Essentially I want to ask “if may_wrap == true, what does that mean? (In bidi context)” Did the character to the left of me (on glass) set it or the character to the right of me set it?

> 
>> I wonder how does the original wrapping works in that case. Does the old code handle bidi text?
> 
> Of course it does, you can easily see that if you run the unmodified
> Emacs.  It would be a terrible misfeature if we didn't handle wrappng
> correctly in bidirectional scripts.

Actually, I just tried again and the code works for bidi. Maybe last time I didn’t turned on word-wrap while thinking I did.

Yuan




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-28 19:34                       ` Yuan Fu
@ 2020-05-28 20:42                         ` Yuan Fu
  2020-05-29  7:17                           ` Eli Zaretskii
  2020-05-29  6:56                         ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-28 20:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1777 bytes --]


>> 
>>> Although the original code only has one checker (IS_WHITESPACE), it serves a dual purpose: it is used to determine if we can wrap after—whitespace and tab allow wrapping after; it is also used to determine if we can wrap before—they don’t allow wrapping before (otherwise you see whitespace and tabs on the beginning of the next line).
>> 
>> That is true.  I suggested to have a single function so that you could
>> in that single function perform the same test, just using two
>> different categories.  then you could basically keep the rest of the
>> logic intact.  If that is somehow not possible, can you explain why?
> 
> IT_DISPLAYING_WHITESPACE works because it was only check for two types of characters. Now we have four. A Boolean function can’t return more than two possibilities. Two Boolean functions combined can express four possibilities.
> 
> | Type      | wrap_before? | wrap_after? |
> |-----------+--------------+-------------|
> | space/tab | no           | yes         |
> | other     | yes          | no          |
> 
> | Type      | wrap_before? | wrap_after? |
> |-----------+--------------+-------------|
> | space/tab | no           | yes         |
> | CJK       | yes          | yes         |
> | other     | yes          | no          |
> | ??        | no           | no          |

I finally wrapped my head around this, yes I can make it one function which returns a enum. If that’s what you mean. In the meantime I’m still reading the unicode document, the unicode algorithm might require much more machinery. For one, the wrap-ability doesn’t only depend of the immediate characters anymore, e.g., a character before the character before could affect this character’s wrap-ability.

Yuan



[-- Attachment #2: Type: text/html, Size: 12628 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-28 20:42                         ` Yuan Fu
@ 2020-05-29  7:17                           ` Eli Zaretskii
  0 siblings, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-29  7:17 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 28 May 2020 16:42:33 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> In the meantime I’m still reading the unicode document, the unicode algorithm might require much
> more machinery. For one, the wrap-ability doesn’t only depend of the immediate characters anymore, e.g., a
> character before the character before could affect this character’s wrap-ability.

Which is why I don't recommend conflating these two jobs.
Implementing UAX#14 in Emacs is a large job.  It should start by
determining which part(s) of that algorithm will make sense in Emacs.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-28 19:34                       ` Yuan Fu
  2020-05-28 20:42                         ` Yuan Fu
@ 2020-05-29  6:56                         ` Eli Zaretskii
  2020-05-29 21:20                           ` Yuan Fu
  1 sibling, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-29  6:56 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 28 May 2020 15:34:28 -0400
> Cc: larsi@gnus.org,
>  emacs-devel@gnu.org
> 
> In the current code IT_DISPLAYING_WHITESPACE can check for can_wrap_before and can_wrap_after in the same time, but in my new code, we have to perform two checks in the same iteration, because some char can wrap before but now after, and some can wrap after but not before, etc. 

Then you'll need to augment the test

  else if (may_wrap)

with a test that the current character can be at BOL.  Maybe you
should also make may_wrap a tristate instead of just a boolean YES/NO.

> > Which "continue" are you alluding to here?  Do you mean "lines are
> > continued"?  because if lines are not being wrapped, we have only one
> > choice: go back to the last wrap point we found, end the screen line
> > (a.k.a. "break the line") there, and continue with the text on the
> > next screen line.
> 
> I think there is another option where we wrap at current point. On line 23356:
> 
> 			      /* If line-wrap is on, check if a
> 				 previous wrap point was found.  */
> 			      else if (wrap_row_used > 0
> 				       /* Even if there is a previous wrap
> 					  point, continue the line here as
> 					  usual, if (i) the previous character
> 					  was a space or tab AND (ii) the
> 					  current character is not.  */
> 				       && (!may_wrap
> 					   || IT_DISPLAYING_WHITESPACE (it)))
> 				goto back_to_wrap;
> 
> I was alluding to this “continue”: “continue the line here as usual”. What does it mean? Does it mean we insert a line break here? I think your response below confirms my guess.

Yes, "continue the line" here means we break the line here instead of
going back to the wrap point.  This code handles the case when there's
a previous wrap point, but the current character fits exactly on the
line, and that current character is a TAB or SPC.

> > I mean the indentation: the original code used TABs and spaces, but
> > your code uses only spaces.
> 
> How do I fix it? Any guideline file?

Just make sure indent-tabs-mode is non-nil when you edit C files.

> >> and the redisplay iterator will still go through them LTR
> > 
> > Not sure what you mean by "go through them LTR".  The iterator can
> > move forward or backward, or even jump to a far-away place.  You
> > cannot assume that the next character examined will be the next
> > character in the buffer.
> 
> Essentially I want to ask “if may_wrap == true, what does that mean? (In bidi context)” Did the character to the left of me (on glass) set it or the character to the right of me set it?

The one to the left.  But it is not necessarily the "previous"
character in buffer position order.

> Actually, I just tried again and the code works for bidi. Maybe last time I didn’t turned on word-wrap while thinking I did.

You need to try both with bidi-paragraph-direction set to
left-to-right and right-to-left.  If both work, then the code is OK.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-29  6:56                         ` Eli Zaretskii
@ 2020-05-29 21:20                           ` Yuan Fu
  2020-05-30  6:14                             ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-29 21:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel



> On May 29, 2020, at 2:56 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 28 May 2020 15:34:28 -0400
>> Cc: larsi@gnus.org,
>> emacs-devel@gnu.org
>> 
>> In the current code IT_DISPLAYING_WHITESPACE can check for can_wrap_before and can_wrap_after in the same time, but in my new code, we have to perform two checks in the same iteration, because some char can wrap before but now after, and some can wrap after but not before, etc. 
> 
> Then you'll need to augment the test
> 
>  else if (may_wrap)
> 
> with a test that the current character can be at BOL.  Maybe you
> should also make may_wrap a tristate instead of just a boolean YES/NO.
> 

I think I did jus that, i.e., if (may_wrap && char_can_wrap_before(it)).

>>> Which "continue" are you alluding to here?  Do you mean "lines are
>>> continued"?  because if lines are not being wrapped, we have only one
>>> choice: go back to the last wrap point we found, end the screen line
>>> (a.k.a. "break the line") there, and continue with the text on the
>>> next screen line.
>> 
>> I think there is another option where we wrap at current point. On line 23356:
>> 
>> 			      /* If line-wrap is on, check if a
>> 				 previous wrap point was found.  */
>> 			      else if (wrap_row_used > 0
>> 				       /* Even if there is a previous wrap
>> 					  point, continue the line here as
>> 					  usual, if (i) the previous character
>> 					  was a space or tab AND (ii) the
>> 					  current character is not.  */
>> 				       && (!may_wrap
>> 					   || IT_DISPLAYING_WHITESPACE (it)))
>> 				goto back_to_wrap;
>> 
>> I was alluding to this “continue”: “continue the line here as usual”. What does it mean? Does it mean we insert a line break here? I think your response below confirms my guess.
> 
> Yes, "continue the line" here means we break the line here instead of
> going back to the wrap point.  This code handles the case when there's
> a previous wrap point, but the current character fits exactly on the
> line, and that current character is a TAB or SPC.
> 
>>> I mean the indentation: the original code used TABs and spaces, but
>>> your code uses only spaces.
>> 
>> How do I fix it? Any guideline file?
> 
> Just make sure indent-tabs-mode is non-nil when you edit C files.

Ok.

> 
>>>> and the redisplay iterator will still go through them LTR
>>> 
>>> Not sure what you mean by "go through them LTR".  The iterator can
>>> move forward or backward, or even jump to a far-away place.  You
>>> cannot assume that the next character examined will be the next
>>> character in the buffer.
>> 
>> Essentially I want to ask “if may_wrap == true, what does that mean? (In bidi context)” Did the character to the left of me (on glass) set it or the character to the right of me set it?
> 
> The one to the left.  But it is not necessarily the "previous"
> character in buffer position order.

I see, but then I don’t don’t understand how does the current code work with bidi display. In bidi context, space char can’t appear on the right of the line, which is the beginning of a logic line, right? That requires the logic to reverse. Is there something I’m missing?

What I’m saying is:

Normal text: space: can wrap: after (in logical order), right (in visual order)
Cannot wrap: before (in logical order), left (in visual order)

Bidi text: space: can wrap: after (in logical order), _left_ (in visual order)
Cannot wrap: before (in logical order), _right_ (in visual order)

Since may_wrap’s meaning is in terms of left and right, it need to be reversed in bidi text, no?

> 
>> Actually, I just tried again and the code works for bidi. Maybe last time I didn’t turned on word-wrap while thinking I did.
> 
> You need to try both with bidi-paragraph-direction set to
> left-to-right and right-to-left.  If both work, then the code is OK.


One more question. If I type 123456789... and set bidi-paragraph-direction to ‘right-to-left, it is still 123456789…, just aligned to the right. I expected to see …987654321, that’s what right-to-left mean in Chinese text. Why the order of each character not revered? UAX#9 didn’t say anything helpful.

Yuan




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-29 21:20                           ` Yuan Fu
@ 2020-05-30  6:14                             ` Eli Zaretskii
  2020-05-31 17:39                               ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-30  6:14 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 29 May 2020 17:20:21 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> > Then you'll need to augment the test
> > 
> >  else if (may_wrap)
> > 
> > with a test that the current character can be at BOL.  Maybe you
> > should also make may_wrap a tristate instead of just a boolean YES/NO.
> > 
> 
> I think I did jus that, i.e., if (may_wrap && char_can_wrap_before(it)).

Fundamentally, yes.  But having a complex condition

   if (FOO && BAR)

makes the code harder to read and understand, and thus makes logical
errors easier, than a simple condition like

   if (FOOBAR == some_value)

> >> Essentially I want to ask “if may_wrap == true, what does that mean? (In bidi context)” Did the character to the left of me (on glass) set it or the character to the right of me set it?
> > 
> > The one to the left.  But it is not necessarily the "previous"
> > character in buffer position order.
> 
> I see, but then I don’t don’t understand how does the current code work with bidi display. In bidi context, space char can’t appear on the right of the line, which is the beginning of a logic line, right? That requires the logic to reverse. Is there something I’m missing?

I think the cause of the confusion is the "in bidi context" part.
There are two such "contexts", and they behave differently:

 . RTL characters when bidi-paragraph-direction is left-to-right
 . RTL characters when bidi-paragraph-direction is right-to-left

Which one were you talking about?  I was talking about the first one.

> Since may_wrap’s meaning is in terms of left and right, it need to be reversed in bidi text, no?

If bidi-paragraph-direction is right-to-left, then yes, they are
reversed.  But not if the paragraph direction is left-to-right.

> One more question. If I type 123456789... and set bidi-paragraph-direction to ‘right-to-left, it is still 123456789…, just aligned to the right. I expected to see …987654321, that’s what right-to-left mean in Chinese text. Why the order of each character not revered?

Digits and LTR characters (like English text) are rendered
left-to-right even in RTL paragraphs, so what you see is correct.
That's why this is called "bidirectional": the direction is not just
universally right-to-left.

> UAX#9 didn’t say anything helpful.

It does, albeit in a convoluted and hard-to-grasp way.  See paragraph
3.3.6 there, and then rule L2 in paragraph 3.4, which describes the
reordering procedure.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-30  6:14                             ` Eli Zaretskii
@ 2020-05-31 17:39                               ` Yuan Fu
  2020-05-31 17:55                                 ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-31 17:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel



> On May 30, 2020, at 2:14 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 29 May 2020 17:20:21 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>>> Then you'll need to augment the test
>>> 
>>> else if (may_wrap)
>>> 
>>> with a test that the current character can be at BOL.  Maybe you
>>> should also make may_wrap a tristate instead of just a boolean YES/NO.
>>> 
>> 
>> I think I did jus that, i.e., if (may_wrap && char_can_wrap_before(it)).
> 
> Fundamentally, yes.  But having a complex condition
> 
>   if (FOO && BAR)
> 
> makes the code harder to read and understand, and thus makes logical
> errors easier, than a simple condition like
> 
>   if (FOOBAR == some_value)

In principle, yes, but I doubt the current logic can be simplified. Do you have some concrete example?

> 
>>>> Essentially I want to ask “if may_wrap == true, what does that mean? (In bidi context)” Did the character to the left of me (on glass) set it or the character to the right of me set it?
>>> 
>>> The one to the left.  But it is not necessarily the "previous"
>>> character in buffer position order.
>> 
>> I see, but then I don’t don’t understand how does the current code work with bidi display. In bidi context, space char can’t appear on the right of the line, which is the beginning of a logic line, right? That requires the logic to reverse. Is there something I’m missing?
> 
> I think the cause of the confusion is the "in bidi context" part.
> There are two such "contexts", and they behave differently:
> 
> . RTL characters when bidi-paragraph-direction is left-to-right
> . RTL characters when bidi-paragraph-direction is right-to-left
> 
> Which one were you talking about?  I was talking about the first one.
> 
>> Since may_wrap’s meaning is in terms of left and right, it need to be reversed in bidi text, no?
> 
> If bidi-paragraph-direction is right-to-left, then yes, they are
> reversed.  But not if the paragraph direction is left-to-right.

Then does bidi.c handle word wrapping when bidi-paragraph-direction is right-to-left? Paragraph 3.4 mentioned that “The accumulated widths of those glyphs (in logical order) are used to determine line breaks.”

> 
>> One more question. If I type 123456789... and set bidi-paragraph-direction to ‘right-to-left, it is still 123456789…, just aligned to the right. I expected to see …987654321, that’s what right-to-left mean in Chinese text. Why the order of each character not revered?
> 
> Digits and LTR characters (like English text) are rendered
> left-to-right even in RTL paragraphs, so what you see is correct.
> That's why this is called "bidirectional": the direction is not just
> universally right-to-left.

I see, so characters have their inherited order and that never changes, only paragraphs change orders. 

> 
>> UAX#9 didn’t say anything helpful.
> 
> It does, albeit in a convoluted and hard-to-grasp way.  See paragraph
> 3.3.6 there, and then rule L2 in paragraph 3.4, which describes the
> reordering procedure.

Thanks.

Yuan




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-31 17:39                               ` Yuan Fu
@ 2020-05-31 17:55                                 ` Eli Zaretskii
  2020-05-31 18:23                                   ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-31 17:55 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 31 May 2020 13:39:56 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel <emacs-devel@gnu.org>
> 
> >> I think I did jus that, i.e., if (may_wrap && char_can_wrap_before(it)).
> > 
> > Fundamentally, yes.  But having a complex condition
> > 
> >   if (FOO && BAR)
> > 
> > makes the code harder to read and understand, and thus makes logical
> > errors easier, than a simple condition like
> > 
> >   if (FOOBAR == some_value)
> 
> In principle, yes, but I doubt the current logic can be simplified. Do you have some concrete example?

It was a suggestion.  Maybe it can't be done.  Let's see the code, and
then we could try simplifying the logic.  The issue here is that
IT_DISPLAYING_WHITESPACE is used in many places, and it is not easy to
understand how to map that to 2 different conditions.  Making just one
condition eliminates that problem and lowers the probability of
introducing bugs.

> > If bidi-paragraph-direction is right-to-left, then yes, they are
> > reversed.  But not if the paragraph direction is left-to-right.
> 
> Then does bidi.c handle word wrapping when bidi-paragraph-direction is right-to-left? Paragraph 3.4 mentioned that “The accumulated widths of those glyphs (in logical order) are used to determine line breaks.”

No, it's in xdisp.c, the code that you are changing.  bidi.c just
makes it so that the "next" character is the next one in the visual
order, i.e. it replaces a simple increment of buffer position.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-31 17:55                                 ` Eli Zaretskii
@ 2020-05-31 18:23                                   ` Yuan Fu
  2020-05-31 18:47                                     ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-05-31 18:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2228 bytes --]



> On May 31, 2020, at 1:55 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sun, 31 May 2020 13:39:56 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel <emacs-devel@gnu.org>
>> 
>>>> I think I did jus that, i.e., if (may_wrap && char_can_wrap_before(it)).
>>> 
>>> Fundamentally, yes.  But having a complex condition
>>> 
>>>  if (FOO && BAR)
>>> 
>>> makes the code harder to read and understand, and thus makes logical
>>> errors easier, than a simple condition like
>>> 
>>>  if (FOOBAR == some_value)
>> 
>> In principle, yes, but I doubt the current logic can be simplified. Do you have some concrete example?
> 
> It was a suggestion.  Maybe it can't be done.  Let's see the code, and
> then we could try simplifying the logic.  The issue here is that
> IT_DISPLAYING_WHITESPACE is used in many places, and it is not easy to
> understand how to map that to 2 different conditions.  Making just one
> condition eliminates that problem and lowers the probability of
> introducing bugs.

Aye.

> 
>>> If bidi-paragraph-direction is right-to-left, then yes, they are
>>> reversed.  But not if the paragraph direction is left-to-right.
>> 
>> Then does bidi.c handle word wrapping when bidi-paragraph-direction is right-to-left? Paragraph 3.4 mentioned that “The accumulated widths of those glyphs (in logical order) are used to determine line breaks.”
> 
> No, it's in xdisp.c, the code that you are changing.  bidi.c just
> makes it so that the "next" character is the next one in the visual
> order, i.e. it replaces a simple increment of buffer position.


Thanks for your patience. Then how does xdisp.c (or bidi.c) know how much space to leave at the left edge, when the paragraph is right-to-left? For example, in the figure below, how does Emacs determine how much space to leave at the place where I marked with yellow highlighter? I assume the iterater starts from the left edge of the first line and bidi gives it a line with a white space stretch ? The comment says “On graphics terminals, there's a single stretch glyph of a suitably computed width.” Or does the iterator go from right to left?


Yuan


[-- Attachment #2.1: Type: text/html, Size: 3775 bytes --]

[-- Attachment #2.2: PastedGraphic-1.tiff --]
[-- Type: image/tiff, Size: 81478 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-31 18:23                                   ` Yuan Fu
@ 2020-05-31 18:47                                     ` Eli Zaretskii
  2020-06-18 21:46                                       ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-31 18:47 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 31 May 2020 14:23:38 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> > No, it's in xdisp.c, the code that you are changing.  bidi.c just
> > makes it so that the "next" character is the next one in the visual
> > order, i.e. it replaces a simple increment of buffer position.
> 
> Thanks for your patience. Then how does xdisp.c (or bidi.c) know how much space to leave at the left edge, when the paragraph is right-to-left? For example, in the figure below, how does Emacs determine how much space to leave at the place where I marked with yellow highlighter? I assume the iterater starts from the left edge of the first line and bidi gives it a line with a white space stretch ? The comment says “On graphics terminals, there's a single stretch glyph of a suitably computed width.” Or does the iterator go from right to left?

The iterator thinks the characters are displayed from left to right,
so it does its calculations as usual.  The order reversal happens
because each new glyph is prepended to the previous glyphs, not
appended as in the LTR case (but the iterator doesn't know that).  So
the white space you see at the left side is actually computed and
added at the end, when all the other characters are already in the
glyph row and their width is known.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-31 18:47                                     ` Eli Zaretskii
@ 2020-06-18 21:46                                       ` Yuan Fu
  2020-06-19  6:17                                         ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-06-18 21:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1265 bytes --]

Hi Eli,

Sorry for the delay.

> 
> The iterator thinks the characters are displayed from left to right,
> so it does its calculations as usual.  The order reversal happens
> because each new glyph is prepended to the previous glyphs, not
> appended as in the LTR case (but the iterator doesn't know that).  So
> the white space you see at the left side is actually computed and
> added at the end, when all the other characters are already in the
> glyph row and their width is known.

After reading your reply I went back and scratched my head on why my code doesn’t work, because based on your information my code logic is correct. Turns out it’s merely because I didn’t handle enough cases for it->what (I only handled when it->what == IT_CHARACTER). I updated my code to handle other cases and it now works in bidi!

I have to do another change for kinsoku.el to work right in bidi. Kinsoku.el defined NOT_AT_BOL and NOT_AT_EOL categories. These categories are flipped in bidi paragraphs: what was EOL becomes BOL and vice versa. So I flipped them in my predicate function depending on it->bidi_p.

Now the word wrap and kinsoki works in both normal and bidi paragraph.

P.S., I think I fixed all the indentation with tabs.

Yuan


[-- Attachment #2: word-wrap.patch --]
[-- Type: application/octet-stream, Size: 9370 bytes --]

From ab82e5a8101b9fb8302e6291be48f3153f4a0020 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 26 May 2020 22:47:27 -0400
Subject: [PATCH] checkpoint

---
 src/xdisp.c | 149 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 110 insertions(+), 39 deletions(-)

diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..8d9bb64258 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
 #include "termchar.h"
 #include "dispextern.h"
 #include "character.h"
+#include "category.h"
 #include "buffer.h"
 #include "charset.h"
 #include "indent.h"
@@ -508,6 +509,57 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* These are the category sets we use.  */
+#define NOT_AT_EOL 60 /* < */
+#define NOT_AT_BOL 62 /* > */
+#define LINE_BREAKABLE 124 /* | */
+
+#define IT_CHAR_HAS_CATEGORY(it, cat)					\
+  ((it->what == IT_CHARACTER && CHAR_HAS_CATEGORY (it->c, cat))	\
+  || (STRINGP (it->string)						\
+      && CHAR_HAS_CATEGORY(SREF (it->string, IT_STRING_BYTEPOS (*it)), cat)) \
+  || (it->s								\
+      && CHAR_HAS_CATEGORY(it->s[IT_BYTEPOS (*it)], cat))		\
+  || (IT_BYTEPOS (*it) < ZV_BYTE					\
+      && CHAR_HAS_CATEGORY(*BYTE_POS_ADDR (IT_BYTEPOS (*it)), cat)))    \
+
+/* Return true if the current character allows wrapping before it.   */
+static bool char_can_wrap_before (struct it *it)
+{
+  /* You cannot wrap before a space or tab because
+     that way you'll have space and tab at the beginning of next
+     line.  */
+  /* In bidi context, EOL and BOL are flipped.  */
+  if (it->bidi_p)
+    return (!IT_DISPLAYING_WHITESPACE (it)
+	    && (!IT_CHAR_HAS_CATEGORY (it, NOT_AT_EOL)));
+    else
+      return (!IT_DISPLAYING_WHITESPACE (it)
+	      && (!IT_CHAR_HAS_CATEGORY (it, NOT_AT_BOL)));
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool char_can_wrap_after (struct it *it)
+{
+  /* We used to only check for whitespace characters for wrapping,
+     hence this macro.  Obviously you can wrap after a space or
+     tab.  */
+  if (it->bidi_p)
+    return (IT_DISPLAYING_WHITESPACE (it)
+	    || (IT_CHAR_HAS_CATEGORY (it, LINE_BREAKABLE)
+		&& !IT_CHAR_HAS_CATEGORY (it, NOT_AT_BOL)));
+    else
+      return (IT_DISPLAYING_WHITESPACE (it)
+	      || (IT_CHAR_HAS_CATEGORY (it, LINE_BREAKABLE)
+		  && !IT_CHAR_HAS_CATEGORY (it, NOT_AT_EOL)));
+}
+
+#undef IT_DISPLAYING_WHITESPACE
+#undef IT_CHAR_HAS_CATEGORY
+#undef NOT_AT_BOL
+#undef NOT_AT_BOL
+#undef LINE_BREAKABLE
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9185,13 +9237,14 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before(it))
 		{
 		  /* We have reached a glyph that follows one or more
-		     whitespace characters.  If the position is
-		     already found, we are done.  */
+		     whitespace characters or a character that allows
+		     wrapping after it.  If this character allows
+		     wrapping before it, save this position as a
+		     wrapping point.  */
 		  if (atpos_it.sp >= 0)
 		    {
 		      RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9206,8 +9259,17 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		    }
 		  /* Otherwise, we can wrap here.  */
 		  SAVE_IT (wrap_it, *it, wrap_data);
-		  may_wrap = false;
 		}
+	      /* This has to run after the previous block because the
+		 previous block consumes `may_wrap' and this block
+		 sets it, but the value set by this block is intended
+		 for the _next_ character/iteration.  */
+	      if (char_can_wrap_after (it))
+		/* may_wrap basically means "previous char allows
+		   wrapping after it".  */
+		may_wrap = true;
+	      else
+		may_wrap = false;
 	    }
 	}
 
@@ -9335,10 +9397,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			    {
 			      bool can_wrap = true;
 
-			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+			      /* If the previous character says we can
+                                 wrap after it, but the current
+                                 character says we can't wrap before
+                                 it, then we can't wrap here.  */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9350,7 +9412,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && !char_can_wrap_before(it))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9429,19 +9491,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
-			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+			 may_wrap flag says we can't wrap before it,
+			 we can't wrap here.  Therefore, wrap_it
+			 (previously found wrap-point) _is_ relevant
+			 in that case.  */
+		      && !(moved_forward && char_can_wrap_before(it)))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23292,9 +23353,8 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before(it))
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23308,9 +23368,13 @@ #define RECORD_MAX_MIN_POS(IT)					\
 		  wrap_row_min_bpos = min_bpos;
 		  wrap_row_max_pos = max_pos;
 		  wrap_row_max_bpos = max_bpos;
-		  may_wrap = false;
 		}
-	    }
+              /* This has to run after the previous block.  */
+	      if (char_can_wrap_after (it))
+		may_wrap = true;
+              else
+                may_wrap = false;
+            }
 	}
 
       PRODUCE_GLYPHS (it);
@@ -23433,14 +23497,18 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			  /* If line-wrap is on, check if a previous
 			     wrap point was found.  */
 			  if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
-			      && wrap_row_used > 0
+			      && wrap_row_used > 0 /* Found.  */
 			      /* Even if there is a previous wrap
 				 point, continue the line here as
 				 usual, if (i) the previous character
-				 was a space or tab AND (ii) the
-				 current character is not.  */
-			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				 allows wrapping after it, AND (ii)
+				 the current character allows wrapping
+				 before it.  Because this is a valid
+				 break point, we can just continue to
+				 the next line at here, there is no
+				 need to wrap early at the previous
+				 wrap point.  */
+			      && (!may_wrap || !char_can_wrap_before(it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23468,13 +23536,16 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			      /* If line-wrap is on, check if a
 				 previous wrap point was found.  */
 			      else if (wrap_row_used > 0
-				       /* Even if there is a previous wrap
-					  point, continue the line here as
-					  usual, if (i) the previous character
-					  was a space or tab AND (ii) the
-					  current character is not.  */
-				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+				       /* Even if there is a previous
+					  wrap point, continue the
+					  line here as usual, if (i)
+					  the previous character was a
+					  space or tab AND (ii) the
+					  current character is not,
+					  AND (iii) the current
+					  character allows wrapping
+					  before it.  */
+				       && (!may_wrap || !char_can_wrap_before(it)))
 				goto back_to_wrap;
 
 			    }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-06-18 21:46                                       ` Yuan Fu
@ 2020-06-19  6:17                                         ` Eli Zaretskii
  2020-06-19 12:04                                           ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-06-19  6:17 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 18 Jun 2020 17:46:53 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel <emacs-devel@gnu.org>
> 
> I have to do another change for kinsoku.el to work right in bidi. Kinsoku.el defined NOT_AT_BOL and NOT_AT_EOL categories. These categories are flipped in bidi paragraphs: what was EOL becomes BOL and vice versa. So I flipped them in my predicate function depending on it->bidi_p.

I don't think I understand what you mean here.  BOL and EOL are
logical-order terminology, and bidi reordering doesn't change their
meaning.

Maybe I don't understand the exact meaning of NOT_AT_EOL/NOT_AT_BOL
that Kinsoku assigns to that.  Can you provide a formal definition of
that, or point me to some document where that is explained?  The
important aspect of this is that in bidi-reordered text the character
that appears at the left edge of a line is not necessarily the first
character of the line after the preceding newline.  So the issue is
what does Kinsoku say about such situations?  IOW, definitions that
assume strict LTR text will not help us here.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-06-19  6:17                                         ` Eli Zaretskii
@ 2020-06-19 12:04                                           ` Yuan Fu
  2020-06-19 12:38                                             ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-06-19 12:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

> On Jun 19, 2020, at 2:17 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 18 Jun 2020 17:46:53 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel <emacs-devel@gnu.org>
>> 
>> I have to do another change for kinsoku.el to work right in bidi. Kinsoku.el defined NOT_AT_BOL and NOT_AT_EOL categories. These categories are flipped in bidi paragraphs: what was EOL becomes BOL and vice versa. So I flipped them in my predicate function depending on it->bidi_p.
> 
> I don't think I understand what you mean here.  BOL and EOL are
> logical-order terminology, and bidi reordering doesn't change their
> meaning.
> 
> Maybe I don't understand the exact meaning of NOT_AT_EOL/NOT_AT_BOL
> that Kinsoku assigns to that.  Can you provide a formal definition of
> that, or point me to some document where that is explained?  

Since kinsoku.el is for asian characters which are all LTR[1], the exact meaning of NOT_AT_EOL/NOT_AT_BOL in bidi context probably doesn’t really matter, but to make kinsoku retain the same behavior (thus looks right) in both RTL and LTR lines, I choose to define BOL as left edge and EOL as right edge. So NOT_AT_EOL means can’t be the right-most character in a line.

From your message I thought in RTL lines the iterator draws from right to left (you said each glyph is prepended to the previous one). So in RTL context when we are at the end of a logical line, we are at the left edge; on the other hand, in normal LTR context when we are at the end of a logical line, we are at the right edge. Hence the flip.

> The
> important aspect of this is that in bidi-reordered text the character
> that appears at the left edge of a line is not necessarily the first
> character of the line after the preceding newline.  So the issue is
> what does Kinsoku say about such situations?  IOW, definitions that
> assume strict LTR text will not help us here.

As I mentioned above, I don’t think kinsoku cares/is defined for this situation. And I took the definition to assume strict LTR, mapping BOL to left and EOL to right. The ultimate effect is that, no matter what the bidi context is, NOT_AT_EOL character, like 《, never appears at the right edge. So we don’t get

我今天看来了本书，感觉挺有意思，名字是《
钢铁是怎样炼成的》。

Instead, we have

我今天看来了本书，感觉挺有意思，名字是
《钢铁是怎样炼成的》。

Now, is that mapping TRT for other characters? I don’t know. But I think it make sense for kinsoku (again, asian text, all LRT). IMHO, maybe for a generic definition we can define BOL as left edge for LTR character and right edge for RTL character. I think that will look good for most text.

Yuan

[1] There is also a top-down layout, but I don’t think we need to worry about that.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-06-19 12:04                                           ` Yuan Fu
@ 2020-06-19 12:38                                             ` Eli Zaretskii
  2020-06-19 17:22                                               ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-06-19 12:38 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 19 Jun 2020 08:04:47 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> > Maybe I don't understand the exact meaning of NOT_AT_EOL/NOT_AT_BOL
> > that Kinsoku assigns to that.  Can you provide a formal definition of
> > that, or point me to some document where that is explained?  
> 
> Since kinsoku.el is for asian characters which are all LTR[1], the exact meaning of NOT_AT_EOL/NOT_AT_BOL in bidi context probably doesn’t really matter, but to make kinsoku retain the same behavior (thus looks right) in both RTL and LTR lines, I choose to define BOL as left edge and EOL as right edge. So NOT_AT_EOL means can’t be the right-most character in a line.
> 
> From your message I thought in RTL lines the iterator draws from right to left (you said each glyph is prepended to the previous one). So in RTL context when we are at the end of a logical line, we are at the left edge; on the other hand, in normal LTR context when we are at the end of a logical line, we are at the right edge. Hence the flip.

What do you mean by "in the RTL context"?

Remember: bidi reordering can take place in two different situations:
then the paragraph direction is left-to-right, and when it's
right-to-left.  In the former situation, the lines begin on the left,
in the latter they begin on the right.  But LTR text, such as CJK
characters, will always be rendered left-to-right, no matter what is
the paragraph direction.

So which "RTL context" did you mean here?

> As I mentioned above, I don’t think kinsoku cares/is defined for this situation. And I took the definition to assume strict LTR, mapping BOL to left and EOL to right. The ultimate effect is that, no matter what the bidi context is, NOT_AT_EOL character, like 《, never appears at the right edge. So we don’t get
> 
> 
> 我今天看来了本书，感觉挺有意思，名字是《
> 钢铁是怎样炼成的》。
> 
> Instead, we have
> 
> 我今天看来了本书，感觉挺有意思，名字是
> 《钢铁是怎样炼成的》。

What do you see in the text below?

אבגד הוזחטיכך למנן 我今天看来了本书，感觉挺有意思，名字是
《钢铁是怎样炼成的》。

(I assume you are reading your email in Emacs; if not, copy/paste this
text into an Emacs buffer whose bidi-paragraph-direction is nil, and
look at the resulting display.)

Does the above look correct, from the Kinsoku POV?  This is how LTR
CJK text will be displayed in a paragraph with right-to-left base
direction.  Do you still think something needs to be flipped here?

> Now, is that mapping TRT for other characters? I don’t know. But I think it make sense for kinsoku (again, asian text, all LRT). IMHO, maybe for a generic definition we can define BOL as left edge for LTR character and right edge for RTL character. I think that will look good for most text.

We must use BOL and EOL in their logical-order meanings, otherwise the
result will be utter confusion.  In the above example, the EOL
character in the first line is 是, and it is not at the left edge of
the line.  It is at the logical-order end of the line, i.e. the
character after it in the buffer position order is the newline.  But
if we had RTL characters instead of the CJK text above, the character
at EOL would indeed have been displayed at the left edge of the line.

> [1] There is also a top-down layout, but I don’t think we need to worry about that.

Right.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-06-19 12:38                                             ` Eli Zaretskii
@ 2020-06-19 17:22                                               ` Yuan Fu
  2020-06-19 17:47                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-06-19 17:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel



> On Jun 19, 2020, at 8:38 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 19 Jun 2020 08:04:47 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>>> Maybe I don't understand the exact meaning of NOT_AT_EOL/NOT_AT_BOL
>>> that Kinsoku assigns to that.  Can you provide a formal definition of
>>> that, or point me to some document where that is explained?  
>> 
>> Since kinsoku.el is for asian characters which are all LTR[1], the exact meaning of NOT_AT_EOL/NOT_AT_BOL in bidi context probably doesn’t really matter, but to make kinsoku retain the same behavior (thus looks right) in both RTL and LTR lines, I choose to define BOL as left edge and EOL as right edge. So NOT_AT_EOL means can’t be the right-most character in a line.
>> 
>> From your message I thought in RTL lines the iterator draws from right to left (you said each glyph is prepended to the previous one). So in RTL context when we are at the end of a logical line, we are at the left edge; on the other hand, in normal LTR context when we are at the end of a logical line, we are at the right edge. Hence the flip.
> 
> What do you mean by "in the RTL context"?
> 
> Remember: bidi reordering can take place in two different situations:
> then the paragraph direction is left-to-right, and when it's
> right-to-left.  In the former situation, the lines begin on the left,
> in the latter they begin on the right.  But LTR text, such as CJK
> characters, will always be rendered left-to-right, no matter what is
> the paragraph direction.
> 
> So which "RTL context" did you mean here?

Oooh so there are four cases: LRT text in LTR paragraph, LRT text in RTL paragraph, RTL text in RTL paragraph, and RTL text in LTR paragraph. And the order in which the iterator draws glyphs depends on the paragraph order (although it doesn’t know it). Am I right?

> 
>> As I mentioned above, I don’t think kinsoku cares/is defined for this situation. And I took the definition to assume strict LTR, mapping BOL to left and EOL to right. The ultimate effect is that, no matter what the bidi context is, NOT_AT_EOL character, like 《, never appears at the right edge. So we don’t get
>> 
>> 
>> 我今天看来了本书，感觉挺有意思，名字是《
>> 钢铁是怎样炼成的》。
>> 
>> Instead, we have
>> 
>> 我今天看来了本书，感觉挺有意思，名字是
>> 《钢铁是怎样炼成的》。
> 
> What do you see in the text below?
> 
> אבגד הוזחטיכך למנן 我今天看来了本书，感觉挺有意思，名字是
> 《钢铁是怎样炼成的》。
> 
> (I assume you are reading your email in Emacs; if not, copy/paste this
> text into an Emacs buffer whose bidi-paragraph-direction is nil, and
> look at the resulting display.)
> 
> Does the above look correct, from the Kinsoku POV?  This is how LTR
> CJK text will be displayed in a paragraph with right-to-left base
> direction.  Do you still think something needs to be flipped here?

Kinsoku looks right, yes. However the period (“。”) seems to be interpreted as RTL text, not sure why.

> 
>> Now, is that mapping TRT for other characters? I don’t know. But I think it make sense for kinsoku (again, asian text, all LRT). IMHO, maybe for a generic definition we can define BOL as left edge for LTR character and right edge for RTL character. I think that will look good for most text.
> 
> We must use BOL and EOL in their logical-order meanings, otherwise the
> result will be utter confusion.  In the above example, the EOL
> character in the first line is 是, and it is not at the left edge of
> the line.  It is at the logical-order end of the line, i.e. the
> character after it in the buffer position order is the newline.  But
> if we had RTL characters instead of the CJK text above, the character
> at EOL would indeed have been displayed at the left edge of the line.
> 

I see. However, I suggest to define EOL and BOL (in kinsoku) in terms of visual edges, instead of the logical order. Because we are using this information (NOT_AT_BOL, etc) for visual layout. When we are at a window edge and ask if this character can appear at this edge, we are interested in the visual aspect rather than the logical order, if you get what I mean. 

BTW, what does it->bidi_p mean exactly? Does it mean bidi-display-reordering is t, or current paragraph is ‘right-to-left, or the char at point is RTL, or something else? (I think I used it wrong in the patch) Can I know whether I’m at the left edge or the right edge?

Thanks for patiently educating me on this, I’m making slow progress :-)

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-06-19 17:22                                               ` Yuan Fu
@ 2020-06-19 17:47                                                 ` Eli Zaretskii
  2020-06-19 18:03                                                   ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-06-19 17:47 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 19 Jun 2020 13:22:18 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> > What do you mean by "in the RTL context"?
> > 
> > Remember: bidi reordering can take place in two different situations:
> > then the paragraph direction is left-to-right, and when it's
> > right-to-left.  In the former situation, the lines begin on the left,
> > in the latter they begin on the right.  But LTR text, such as CJK
> > characters, will always be rendered left-to-right, no matter what is
> > the paragraph direction.
> > 
> > So which "RTL context" did you mean here?
> 
> Oooh so there are four cases: LRT text in LTR paragraph, LRT text in RTL paragraph, RTL text in RTL paragraph, and RTL text in LTR paragraph. And the order in which the iterator draws glyphs depends on the paragraph order (although it doesn’t know it). Am I right?

You can say that there are 4 cases, yes.  But from the iterator POV,
there are only 2: either the text of the same direction as the
paragraph, or of the opposite direction.

> > אבגד הוזחטיכך למנן 我今天看来了本书，感觉挺有意思，名字是
> > 《钢铁是怎样炼成的》。

> > 
> > (I assume you are reading your email in Emacs; if not, copy/paste this
> > text into an Emacs buffer whose bidi-paragraph-direction is nil, and
> > look at the resulting display.)
> > 
> > Does the above look correct, from the Kinsoku POV?  This is how LTR
> > CJK text will be displayed in a paragraph with right-to-left base
> > direction.  Do you still think something needs to be flipped here?
> 
> Kinsoku looks right, yes. However the period (“。”) seems to be interpreted as RTL text, not sure why.

That's expected, since the period has a "weak directionality", so at
the end of the paragraph it takes the paragraph direction.

> > We must use BOL and EOL in their logical-order meanings, otherwise the
> > result will be utter confusion.  In the above example, the EOL
> > character in the first line is 是, and it is not at the left edge of
> > the line.  It is at the logical-order end of the line, i.e. the
> > character after it in the buffer position order is the newline.  But
> > if we had RTL characters instead of the CJK text above, the character
> > at EOL would indeed have been displayed at the left edge of the line.
> > 
> 
> I see. However, I suggest to define EOL and BOL (in kinsoku) in terms of visual edges, instead of the logical order. Because we are using this information (NOT_AT_BOL, etc) for visual layout. When we are at a window edge and ask if this character can appear at this edge, we are interested in the visual aspect rather than the logical order, if you get what I mean. 

If that works, then fine.

> BTW, what does it->bidi_p mean exactly? Does it mean bidi-display-reordering is t, or current paragraph is ‘right-to-left, or the char at point is RTL, or something else?

It means bidi reordering is in effect.  For displaying buffer text, it
is determined by bidi-display-reordering.

> Can I know whether I’m at the left edge or the right edge?

You can, but why do you need to?



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-06-19 17:47                                                 ` Eli Zaretskii
@ 2020-06-19 18:03                                                   ` Yuan Fu
  2020-06-19 18:34                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-06-19 18:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel



> On Jun 19, 2020, at 1:47 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 19 Jun 2020 13:22:18 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>>> What do you mean by "in the RTL context"?
>>> 
>>> Remember: bidi reordering can take place in two different situations:
>>> then the paragraph direction is left-to-right, and when it's
>>> right-to-left.  In the former situation, the lines begin on the left,
>>> in the latter they begin on the right.  But LTR text, such as CJK
>>> characters, will always be rendered left-to-right, no matter what is
>>> the paragraph direction.
>>> 
>>> So which "RTL context" did you mean here?
>> 
>> Oooh so there are four cases: LRT text in LTR paragraph, LRT text in RTL paragraph, RTL text in RTL paragraph, and RTL text in LTR paragraph. And the order in which the iterator draws glyphs depends on the paragraph order (although it doesn’t know it). Am I right?
> 
> You can say that there are 4 cases, yes.  But from the iterator POV,
> there are only 2: either the text of the same direction as the
> paragraph, or of the opposite direction.
> 
>>> אבגד הוזחטיכך למנן 我今天看来了本书，感觉挺有意思，名字是
>>> 《钢铁是怎样炼成的》。
> 
>>> 
>>> (I assume you are reading your email in Emacs; if not, copy/paste this
>>> text into an Emacs buffer whose bidi-paragraph-direction is nil, and
>>> look at the resulting display.)
>>> 
>>> Does the above look correct, from the Kinsoku POV?  This is how LTR
>>> CJK text will be displayed in a paragraph with right-to-left base
>>> direction.  Do you still think something needs to be flipped here?
>> 
>> Kinsoku looks right, yes. However the period (“。”) seems to be interpreted as RTL text, not sure why.
> 
> That's expected, since the period has a "weak directionality", so at
> the end of the paragraph it takes the paragraph direction.
> 
>>> We must use BOL and EOL in their logical-order meanings, otherwise the
>>> result will be utter confusion.  In the above example, the EOL
>>> character in the first line is 是, and it is not at the left edge of
>>> the line.  It is at the logical-order end of the line, i.e. the
>>> character after it in the buffer position order is the newline.  But
>>> if we had RTL characters instead of the CJK text above, the character
>>> at EOL would indeed have been displayed at the left edge of the line.
>>> 
>> 
>> I see. However, I suggest to define EOL and BOL (in kinsoku) in terms of visual edges, instead of the logical order. Because we are using this information (NOT_AT_BOL, etc) for visual layout. When we are at a window edge and ask if this character can appear at this edge, we are interested in the visual aspect rather than the logical order, if you get what I mean. 
> 
> If that works, then fine.
> 
>> BTW, what does it->bidi_p mean exactly? Does it mean bidi-display-reordering is t, or current paragraph is ‘right-to-left, or the char at point is RTL, or something else?
> 
> It means bidi reordering is in effect.  For displaying buffer text, it
> is determined by bidi-display-reordering.
> 
>> Can I know whether I’m at the left edge or the right edge?
> 
> You can, but why do you need to?

I don’t have to know if I’m at left or right edge, I just want to know if the iterator is drawing from right to left or left to right. What is the right way to know that?

Yuan




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-06-19 18:03                                                   ` Yuan Fu
@ 2020-06-19 18:34                                                     ` Eli Zaretskii
  2020-07-12 17:25                                                       ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-06-19 18:34 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 19 Jun 2020 14:03:24 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> I don’t have to know if I’m at left or right edge, I just want to know if the iterator is drawing from right to left or left to right. What is the right way to know that?

Look at the reversed_p flag of the glyph row.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-06-19 18:34                                                     ` Eli Zaretskii
@ 2020-07-12 17:25                                                       ` Yuan Fu
  2020-07-12 18:27                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-07-12 17:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel


> On Jun 19, 2020, at 2:34 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 19 Jun 2020 14:03:24 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>> I don’t have to know if I’m at left or right edge, I just want to know if the iterator is drawing from right to left or left to right. What is the right way to know that?
> 
> Look at the reversed_p flag of the glyph row.


I’m having a strange problem that when bidi-paragraph-direction is ‘left-to-right, Arabic characters wraps fine but Hebrew characters doesn’t. Is there anything special about Hebrew (or Arabic)?

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-12 17:25                                                       ` Yuan Fu
@ 2020-07-12 18:27                                                         ` Eli Zaretskii
  2020-07-12 19:28                                                           ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-07-12 18:27 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 12 Jul 2020 13:25:03 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> I’m having a strange problem that when bidi-paragraph-direction is ‘left-to-right, Arabic characters wraps fine but Hebrew characters doesn’t. Is there anything special about Hebrew (or Arabic)?

Not enough information.  Is this with your changes?



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-12 18:27                                                         ` Eli Zaretskii
@ 2020-07-12 19:28                                                           ` Yuan Fu
  2020-07-13 19:46                                                             ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-07-12 19:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2167 bytes --]

> On Jul 12, 2020, at 2:27 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sun, 12 Jul 2020 13:25:03 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>> I’m having a strange problem that when bidi-paragraph-direction is ‘left-to-right, Arabic characters wraps fine but Hebrew characters doesn’t. Is there anything special about Hebrew (or Arabic)?
> 
> Not enough information.  Is this with your changes?

Yes, this only happens with my patch. And that what puzzles me. Here I enabled white-space-mode, as you can see, the second line starts with whitespace. That means Emacs cannot find a wrap point in the Hebrew text and thus is not wrapping but simply continuing the Hebrew text.

As you might remember, in my patch the function it_can_wrap_after determines if we can set a warp point after a character. xdisp uses this function to determine if we can set a wrap point, then this function must be the culprit. It looks like this:

/* Return true if the current character allows wrapping after it.   */
static bool char_can_wrap_after (struct it *it)
{
  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.  */
  bool kinsoku_can_wrap;
  if (it->glyph_row && it->glyph_row->reversed_p)
    kinsoku_can_wrap = IT_CHAR_HAS_CATEGORY (it, LINE_BREAKABLE)
      && !IT_CHAR_HAS_CATEGORY (it, NOT_AT_BOL);
  else
    kinsoku_can_wrap = IT_CHAR_HAS_CATEGORY (it, LINE_BREAKABLE)
      && !IT_CHAR_HAS_CATEGORY (it, NOT_AT_EOL);

  /* We used to only check for whitespace characters for wrapping,
     hence this macro.  Obviously you can wrap after a space or
     tab.  */
  return (IT_DISPLAYING_WHITESPACE (it) /* || kinsoku_can_wrap */);
}

Basically, we can wrap if this char is a whitespace character, or kinsoku says this char is wrap-able. Now, if I comment out the kinsoku part (the last line after ||), Hebrew can wrap normally. But the value of kinsoku_can_wrap shouldn’t matter since IT_DISPLAYING_WHITESPACE (it) should short circuit when we are at a whitespace character.

Any guess on where could be the problem?

Yuan

[-- Attachment #2.1: Type: text/html, Size: 3400 bytes --]

[-- Attachment #2.2: Screen Shot 2020-07-12 at 3.12.41 PM.png --]
[-- Type: image/png, Size: 73560 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-12 19:28                                                           ` Yuan Fu
@ 2020-07-13 19:46                                                             ` Yuan Fu
  2020-07-18  8:15                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-07-13 19:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 613 bytes --]

I finally figured it out. The problem seems to be the use of the macro IT_CHAR_HAS_CATEGORY, which is modeled after IT_DISPLAYING_WHITESPACE. IT_CHAR_HAS_CATEGORY uses a series of ||. I changed it to a function that uses if … else if … else (which IMO should be the correct way to implement it anyway), and the problem went away.

Please have a look at the patch and see if it’s ok. If you think it’s good I can then update NEWS and the manual and submit a bug report. wrap.txt is the file I used to test word wrapping. To enable the full feature, set cjk-word-wrap to t and load kinsoku.el.

Yuan


[-- Attachment #2: word-wrap.patch --]
[-- Type: application/octet-stream, Size: 10661 bytes --]

From 5df09de567796f62472c601afe92df5179f54911 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 26 May 2020 22:47:27 -0400
Subject: [PATCH] Improve word wrapping for CJK characters

* src/xdisp.c (it_char_has_category, char_can_wrap_before,
char_can_wrap_after): New function.
(move_it_in_display_line_to, display_line): Replace
IT_DISPLAYING_WHITESPACE with char_can_wrap_before and
char_can_wrap_after.
(cjk-word-wrap): New variable.
---
 src/xdisp.c | 169 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 130 insertions(+), 39 deletions(-)

diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..3372a8aaa5 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
 #include "termchar.h"
 #include "dispextern.h"
 #include "character.h"
+#include "category.h"
 #include "buffer.h"
 #include "charset.h"
 #include "indent.h"
@@ -508,6 +509,69 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* These are the category sets we use.  */
+#define NOT_AT_EOL 60 /* < */
+#define NOT_AT_BOL 62 /* > */
+#define LINE_BREAKABLE 124 /* | */
+
+static bool it_char_has_category(struct it *it, int cat)
+{
+  if (it->what == IT_CHARACTER)
+    return CHAR_HAS_CATEGORY (it->c, cat);
+  else if (STRINGP (it->string))
+    return CHAR_HAS_CATEGORY (SREF (it->string,
+                                    IT_STRING_BYTEPOS (*it)), cat);
+  else if (it->s)
+    return CHAR_HAS_CATEGORY (it->s[IT_BYTEPOS (*it)], cat);
+  else if (IT_BYTEPOS (*it) < ZV_BYTE)
+    return CHAR_HAS_CATEGORY (*BYTE_POS_ADDR (IT_BYTEPOS (*it)), cat);
+  else
+    return false;
+}
+
+/* Return true if the current character allows wrapping before it.   */
+static bool char_can_wrap_before (struct it *it)
+{
+  if (!Vcjk_word_wrap)
+    return !IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.  */
+  int not_at_bol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_bol = NOT_AT_EOL;
+  else
+    not_at_bol = NOT_AT_BOL;
+  /* You cannot wrap before a space or tab because that way you'll
+     have space and tab at the beginning of next line.  */
+  return (!IT_DISPLAYING_WHITESPACE (it)
+          // Can be at BOL.
+          && !it_char_has_category (it, not_at_bol));
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool char_can_wrap_after (struct it *it)
+{
+  if (!Vcjk_word_wrap)
+    return IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.  */
+  int not_at_eol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_eol = NOT_AT_BOL;
+  else
+    not_at_eol = NOT_AT_EOL;
+
+  return (IT_DISPLAYING_WHITESPACE (it)
+          // Can break after && can be at EOL.
+            || (it_char_has_category (it, LINE_BREAKABLE)
+                && !it_char_has_category (it, not_at_eol)));
+}
+
+#undef IT_DISPLAYING_WHITESPACE
+#undef NOT_AT_EOL
+#undef NOT_AT_BOL
+#undef LINE_BREAKABLE
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9185,13 +9249,14 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  /* We have reached a glyph that follows one or more
-		     whitespace characters.  If the position is
-		     already found, we are done.  */
+		     whitespace characters or a character that allows
+		     wrapping after it.  If this character allows
+		     wrapping before it, save this position as a
+		     wrapping point.  */
 		  if (atpos_it.sp >= 0)
 		    {
 		      RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9206,8 +9271,17 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		    }
 		  /* Otherwise, we can wrap here.  */
 		  SAVE_IT (wrap_it, *it, wrap_data);
-		  may_wrap = false;
 		}
+	      /* This has to run after the previous block because the
+		 previous block consumes `may_wrap' and this block
+		 sets it, but the value set by this block is intended
+		 for the _next_ character/iteration.  */
+	      if (char_can_wrap_after (it))
+		/* may_wrap basically means "previous char allows
+		   wrapping after it".  */
+		may_wrap = true;
+	      else
+		may_wrap = false;
 	    }
 	}
 
@@ -9335,10 +9409,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			    {
 			      bool can_wrap = true;
 
-			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+			      /* If the previous character says we can
+                                 wrap after it, but the current
+                                 character says we can't wrap before
+                                 it, then we can't wrap here.  */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9350,7 +9424,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && !char_can_wrap_before (it))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9429,19 +9503,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
-			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+			 may_wrap flag says we can't wrap before it,
+			 we can't wrap here.  Therefore, wrap_it
+			 (previously found wrap-point) _is_ relevant
+			 in that case.  */
+		      && !(moved_forward && char_can_wrap_before (it)))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23292,9 +23365,8 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23308,9 +23380,13 @@ #define RECORD_MAX_MIN_POS(IT)					\
 		  wrap_row_min_bpos = min_bpos;
 		  wrap_row_max_pos = max_pos;
 		  wrap_row_max_bpos = max_bpos;
-		  may_wrap = false;
 		}
-	    }
+              /* This has to run after the previous block.  */
+	      if (char_can_wrap_after (it))
+		may_wrap = true;
+              else
+                may_wrap = false;
+            }
 	}
 
       PRODUCE_GLYPHS (it);
@@ -23433,14 +23509,18 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			  /* If line-wrap is on, check if a previous
 			     wrap point was found.  */
 			  if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
-			      && wrap_row_used > 0
+			      && wrap_row_used > 0 /* Found.  */
 			      /* Even if there is a previous wrap
 				 point, continue the line here as
 				 usual, if (i) the previous character
-				 was a space or tab AND (ii) the
-				 current character is not.  */
-			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				 allows wrapping after it, AND (ii)
+				 the current character allows wrapping
+				 before it.  Because this is a valid
+				 break point, we can just continue to
+				 the next line at here, there is no
+				 need to wrap early at the previous
+				 wrap point.  */
+			      && (!may_wrap || !char_can_wrap_before (it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23468,13 +23548,16 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			      /* If line-wrap is on, check if a
 				 previous wrap point was found.  */
 			      else if (wrap_row_used > 0
-				       /* Even if there is a previous wrap
-					  point, continue the line here as
-					  usual, if (i) the previous character
-					  was a space or tab AND (ii) the
-					  current character is not.  */
-				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+				       /* Even if there is a previous
+					  wrap point, continue the
+					  line here as usual, if (i)
+					  the previous character was a
+					  space or tab AND (ii) the
+					  current character is not,
+					  AND (iii) the current
+					  character allows wrapping
+					  before it.  */
+				       && (!may_wrap || !char_can_wrap_before (it)))
 				goto back_to_wrap;
 
 			    }
@@ -34594,6 +34677,14 @@ syms_of_xdisp (void)
 If `word-wrap' is enabled, you might want to reduce this.  */);
   Vtruncate_partial_width_windows = make_fixnum (50);
 
+  DEFVAR_BOOL("cjk-word-wrap", Vcjk_word_wrap,
+    doc: /*  Non-nil means wrap after CJK chracters.
+Normally when word-wrappping is on, Emacs only breaks line after
+whitespace chracters.  When this option is turned on, Emacs also
+breaks line after CJK characters.  If kinsoku.el is loaded, Emacs also
+respects kinsoku when breaking lines.  */);
+  Vcjk_word_wrap = false;
+
   DEFVAR_LISP ("line-number-display-limit", Vline_number_display_limit,
     doc: /* Maximum buffer size for which line number should be displayed.
 If the buffer is bigger than this, the line number does not appear
-- 
2.27.0


[-- Attachment #3: Type: text/plain, Size: 1 bytes --]



[-- Attachment #4: wrap.txt --]
[-- Type: text/plain, Size: 892 bytes --]

RTL in RTL

Ø§Ù„Ø«Ø¹Ù„Ø¨ Ø§Ù„Ø¨Ù†ÙŠ Ø§Ù„Ø³Ø±ÙŠØ¹ ÙŠÙ‚ÙØ² ÙÙˆÙ‚ Ø§Ù„ÙƒÙ„Ø¨ Ø§Ù„ÙƒØ³ÙˆÙ„.Ø§Ù„Ø«Ø¹Ù„Ø¨ Ø§Ù„Ø¨Ù†ÙŠ Ø§Ù„Ø³Ø±ÙŠØ¹ ÙŠÙ‚ÙØ² ÙÙˆÙ‚ Ø§Ù„ÙƒÙ„Ø¨ Ø§Ù„ÙƒØ³ÙˆÙ„.Ø§Ù„Ø«Ø¹Ù„Ø¨ Ø§Ù„Ø¨Ù†ÙŠ Ø§Ù„Ø³Ø±ÙŠØ¹ ÙŠÙ‚Ø² ÙÙˆÙ‚ Ø§Ù„ ã€ŠØ§Ù„ÙƒØ³Ùˆ  Ù„.Ø§Ù„Ø«Ø¹Ù„ ã€‹ Ø¨ Ø§Ù„ÙƒØ³ÙˆÙ„.Ø§Ù„Ø«Ø¹Ù„Ø¨ Ø§Ù„Ø¨Ù†ÙŠ Ø§Ù„Ø³Ø±ÙŠØ¹ ÙŠÙ‚ÙØ²

LRT in LRT

ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ï¼Œä¸æ–‡ã€Šä¸æ–‡ä¸æ–‡ã€‹ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡ä¸æ–‡

LRT in RTL

××‘×’×“ ×”×•×–×—×˜×™×›×š ×œ×ž× ×Ÿ æˆ‘ä»Šå¤©çœ‹äº†æœ¬ä¹¦ï¼Œæ„Ÿè§‰æŒºæœ‰æ„æ€ï¼Œåå—æ˜¯ã€Šé’¢é“æ˜¯æ€Žæ ·ç‚¼æˆçš„ã€‹

RTL in LRT

æˆ‘ä»Šå¤©çœ‹äº†æœ¬ä¹¦ï¼Œæ„Ÿè§‰æŒºæœ‰æ„æ€ï¼Œåå—æ˜¯ ×”×•×–×—×˜×™×›×š ×”×•×–×—×˜×™×›×š ×”×•×–×—×˜×™×›×šã€Š××‘×’ ×“×”×•×–×—×˜×™×›×š ×”×•×–×—×˜×™×›×š ×œ×ž× ×Ÿã€‹ ×”×•×–×—×˜×™×›×š ×”×•×–×—×˜×™×›×š ×”×•×–×—×˜×™×›×š

[-- Attachment #5: Type: text/plain, Size: 3 bytes --]





^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-13 19:46                                                             ` Yuan Fu
@ 2020-07-18  8:15                                                               ` Eli Zaretskii
  2020-07-18 17:14                                                                 ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-07-18  8:15 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 13 Jul 2020 15:46:16 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> Please have a look at the patch and see if it’s ok. If you think it’s good I can then update NEWS and the manual and submit a bug report. wrap.txt is the file I used to test word wrapping. To enable the full feature, set cjk-word-wrap to t and load kinsoku.el.

Yes, we need to update NEWS and the manual.

Also, we may need to rename cjk-word-wrap to something more accurate,
as result of your answers to my questions below.

A few minor comments below.

> * src/xdisp.c (it_char_has_category, char_can_wrap_before,
> char_can_wrap_after): New function.
                        ^^^^^^^^^^^^
"New functions", in plural.

> (move_it_in_display_line_to, display_line): Replace
> IT_DISPLAYING_WHITESPACE with char_can_wrap_before and
> char_can_wrap_after.

Please quote all references in commit log messages to functions and
variables 'like this'.

> +/* These are the category sets we use.  */
> +#define NOT_AT_EOL 60 /* < */
> +#define NOT_AT_BOL 62 /* > */
> +#define LINE_BREAKABLE 124 /* | */

Why not just use the characters themselves, as in '<' and '|' ?

Also, if these characters are from kinsoku.el, please says so in
comments, because if kinsoku.el changes, we may need to update those.

> +static bool it_char_has_category(struct it *it, int cat)
> +{
> +  if (it->what == IT_CHARACTER)
> +    return CHAR_HAS_CATEGORY (it->c, cat);
> +  else if (STRINGP (it->string))
> +    return CHAR_HAS_CATEGORY (SREF (it->string,
> +                                    IT_STRING_BYTEPOS (*it)), cat);
> +  else if (it->s)
> +    return CHAR_HAS_CATEGORY (it->s[IT_BYTEPOS (*it)], cat);
> +  else if (IT_BYTEPOS (*it) < ZV_BYTE)
> +    return CHAR_HAS_CATEGORY (*BYTE_POS_ADDR (IT_BYTEPOS (*it)), cat);
> +  else
> +    return false;
> +}

A minor stylistic nit: I'd prefer the if - elseif clauses to yield the
relevant character, and then apply CHAR_HAS_CATEGORY only once to that
character at the end.  (It is generally better to have only one return
point from a function, especially when the function is short.  If
nothing else, it makes debugging easier.)

> +  return (!IT_DISPLAYING_WHITESPACE (it)
> +          // Can be at BOL.

Please don't use //-style C++ comments, we use the C /* style */
comments instead.

> +  return (IT_DISPLAYING_WHITESPACE (it)
> +          // Can break after && can be at EOL.
> +            || (it_char_has_category (it, LINE_BREAKABLE)
> +                && !it_char_has_category (it, not_at_eol)));

Same here.

>  	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
>  	    {
> -	      if (IT_DISPLAYING_WHITESPACE (it))
> -		may_wrap = true;
> -	      else if (may_wrap)
> +              /* Can we wrap here? */
> +	      if (may_wrap && char_can_wrap_before (it))

I'm worried about a potential change in logic here, when cjk-word-wrap
is off.  Previously, we just tested IT_DISPLAYING_WHITESPACE, but now
we also test may_wrap.  Is it guaranteed that may_wrap is always true
in that case?

> @@ -23292,9 +23365,8 @@ #define RECORD_MAX_MIN_POS(IT)					\
>  
>  	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
>  	    {
> -	      if (IT_DISPLAYING_WHITESPACE (it))
> -		may_wrap = true;
> -	      else if (may_wrap)
> +              /* Can we wrap here? */
> +	      if (may_wrap && char_can_wrap_before (it))

Likewise here.

>  		{
>  		  SAVE_IT (wrap_it, *it, wrap_data);
>  		  wrap_x = x;
> @@ -23308,9 +23380,13 @@ #define RECORD_MAX_MIN_POS(IT)					\
>  		  wrap_row_min_bpos = min_bpos;
>  		  wrap_row_max_pos = max_pos;
>  		  wrap_row_max_bpos = max_bpos;
> -		  may_wrap = false;
>  		}
> -	    }
> +              /* This has to run after the previous block.  */
> +	      if (char_can_wrap_after (it))
> +		may_wrap = true;
> +              else
> +                may_wrap = false;

Please use TABs and spaces to indent code in C source files.  The last
2 lines use only spaces.

> +  DEFVAR_BOOL("cjk-word-wrap", Vcjk_word_wrap,
> +    doc: /*  Non-nil means wrap after CJK chracters.

This is unclear.  Does it mean after _any_ CJK character, or just
after some?  And if the latter, which ones?

Thanks.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-18  8:15                                                               ` Eli Zaretskii
@ 2020-07-18 17:14                                                                 ` Yuan Fu
  2020-07-18 19:49                                                                   ` Yuan Fu
                                                                                     ` (2 more replies)
  0 siblings, 3 replies; 88+ messages in thread
From: Yuan Fu @ 2020-07-18 17:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 5321 bytes --]



> On Jul 18, 2020, at 4:15 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 13 Jul 2020 15:46:16 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>> Please have a look at the patch and see if it’s ok. If you think it’s good I can then update NEWS and the manual and submit a bug report. wrap.txt is the file I used to test word wrapping. To enable the full feature, set cjk-word-wrap to t and load kinsoku.el.
> 
> Yes, we need to update NEWS and the manual.
> 
> Also, we may need to rename cjk-word-wrap to something more accurate,
> as result of your answers to my questions below.

Cool, I’ll start on NEWS and manual once we are settled on the name of the new variable. I agree cjk-word-wrap isn’t a good name. I just used it as a placeholder.

> 
> A few minor comments below.
> 
>> * src/xdisp.c (it_char_has_category, char_can_wrap_before,
>> char_can_wrap_after): New function.
>                        ^^^^^^^^^^^^
> "New functions", in plural.
> 
>> (move_it_in_display_line_to, display_line): Replace
>> IT_DISPLAYING_WHITESPACE with char_can_wrap_before and
>> char_can_wrap_after.
> 
> Please quote all references in commit log messages to functions and
> variables 'like this'.
> 
>> +/* These are the category sets we use.  */
>> +#define NOT_AT_EOL 60 /* < */
>> +#define NOT_AT_BOL 62 /* > */
>> +#define LINE_BREAKABLE 124 /* | */
> 
> Why not just use the characters themselves, as in '<' and '|' ?
> 
> Also, if these characters are from kinsoku.el, please says so in
> comments, because if kinsoku.el changes, we may need to update those.
> 

Fixed.

>> +static bool it_char_has_category(struct it *it, int cat)
>> +{
>> +  if (it->what == IT_CHARACTER)
>> +    return CHAR_HAS_CATEGORY (it->c, cat);
>> +  else if (STRINGP (it->string))
>> +    return CHAR_HAS_CATEGORY (SREF (it->string,
>> +                                    IT_STRING_BYTEPOS (*it)), cat);
>> +  else if (it->s)
>> +    return CHAR_HAS_CATEGORY (it->s[IT_BYTEPOS (*it)], cat);
>> +  else if (IT_BYTEPOS (*it) < ZV_BYTE)
>> +    return CHAR_HAS_CATEGORY (*BYTE_POS_ADDR (IT_BYTEPOS (*it)), cat);
>> +  else
>> +    return false;
>> +}
> 
> A minor stylistic nit: I'd prefer the if - elseif clauses to yield the
> relevant character, and then apply CHAR_HAS_CATEGORY only once to that
> character at the end.  (It is generally better to have only one return
> point from a function, especially when the function is short.  If
> nothing else, it makes debugging easier.)

I changed the it, do you code below this is ok?

  if (ch == 0)
    return false;
  else
    return CHAR_HAS_CATEGORY(ch, cat);


> 
>> +  return (!IT_DISPLAYING_WHITESPACE (it)
>> +          // Can be at BOL.
> 
> Please don't use //-style C++ comments, we use the C /* style */
> comments instead.
> 
>> +  return (IT_DISPLAYING_WHITESPACE (it)
>> +          // Can break after && can be at EOL.
>> +            || (it_char_has_category (it, LINE_BREAKABLE)
>> +                && !it_char_has_category (it, not_at_eol)));
> 
> Same here.

Fixed.

> 
>> 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
>> 	    {
>> -	      if (IT_DISPLAYING_WHITESPACE (it))
>> -		may_wrap = true;
>> -	      else if (may_wrap)
>> +              /* Can we wrap here? */
>> +	      if (may_wrap && char_can_wrap_before (it))
> 
> I'm worried about a potential change in logic here, when cjk-word-wrap
> is off.  Previously, we just tested IT_DISPLAYING_WHITESPACE, but now
> we also test may_wrap.  Is it guaranteed that may_wrap is always true
> in that case?
> 
>> @@ -23292,9 +23365,8 @@ #define RECORD_MAX_MIN_POS(IT)					\
>> 
>> 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
>> 	    {
>> -	      if (IT_DISPLAYING_WHITESPACE (it))
>> -		may_wrap = true;
>> -	      else if (may_wrap)
>> +              /* Can we wrap here? */
>> +	      if (may_wrap && char_can_wrap_before (it))
> 
> Likewise here.


In both can_wrap_before and can_wrap_after, I have a short circuit for the case when cjk_word_wrap is nil:

  if (!Vcjk_word_wrap)
    return IT_DISPLAYING_WHITESPACE (it);

That should guarantee the old behavior when cjk_word_wrap is nil, if that’s what you are asking about.

> 
>> 		{
>> 		  SAVE_IT (wrap_it, *it, wrap_data);
>> 		  wrap_x = x;
>> @@ -23308,9 +23380,13 @@ #define RECORD_MAX_MIN_POS(IT)					\
>> 		  wrap_row_min_bpos = min_bpos;
>> 		  wrap_row_max_pos = max_pos;
>> 		  wrap_row_max_bpos = max_bpos;
>> -		  may_wrap = false;
>> 		}
>> -	    }
>> +              /* This has to run after the previous block.  */
>> +	      if (char_can_wrap_after (it))
>> +		may_wrap = true;
>> +              else
>> +                may_wrap = false;
> 
> Please use TABs and spaces to indent code in C source files.  The last
> 2 lines use only spaces.

Sorry, fixed.

> 
>> +  DEFVAR_BOOL("cjk-word-wrap", Vcjk_word_wrap,
>> +    doc: /*  Non-nil means wrap after CJK chracters.
> 
> This is unclear.  Does it mean after _any_ CJK character, or just
> after some?  And if the latter, which ones?

I added more detail and hopefully they are clearer now.

> 
> Thanks.

Thanks!

Yuan


[-- Attachment #2: word-wrap.patch --]
[-- Type: application/octet-stream, Size: 10927 bytes --]

From 2baf9b6fd7dc8aa63f61d9dc14dbbb60cbb8c1fa Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 26 May 2020 22:47:27 -0400
Subject: [PATCH] Improve word wrapping for CJK characters

* src/xdisp.c (it_char_has_category, char_can_wrap_before,
char_can_wrap_after): New functions.
(move_it_in_display_line_to, display_line): Replace
'IT_DISPLAYING_WHITESPACE' with either 'char_can_wrap_before' or
'char_can_wrap_after'.
(cjk-word-wrap): New variable.
---
 src/xdisp.c | 180 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 142 insertions(+), 38 deletions(-)

diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..35ff381829 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
 #include "termchar.h"
 #include "dispextern.h"
 #include "character.h"
+#include "category.h"
 #include "buffer.h"
 #include "charset.h"
 #include "indent.h"
@@ -508,6 +509,77 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* These are the category sets we use.  They are defined by
+   kinsoku.el.  */
+#define NOT_AT_EOL '<'
+#define NOT_AT_BOL '>'
+#define LINE_BREAKABLE '|'
+
+static bool it_char_has_category(struct it *it, int cat)
+{
+  int ch = 0;
+  if (it->what == IT_CHARACTER)
+    ch = it->c;
+  else if (STRINGP (it->string))
+    ch = SREF (it->string, IT_STRING_BYTEPOS (*it));
+  else if (it->s)
+    ch = it->s[IT_BYTEPOS (*it)];
+  else if (IT_BYTEPOS (*it) < ZV_BYTE)
+    ch = *BYTE_POS_ADDR (IT_BYTEPOS (*it));
+
+  if (ch == 0)
+    return false;
+  else
+    return CHAR_HAS_CATEGORY (ch, cat);
+}
+
+/* Return true if the current character allows wrapping before it.   */
+static bool char_can_wrap_before (struct it *it)
+{
+  if (!Vcjk_word_wrap)
+    return !IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_bol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_bol = NOT_AT_EOL;
+  else
+    not_at_bol = NOT_AT_BOL;
+  /* You cannot wrap before a space or tab because that way you'll
+     have space and tab at the beginning of next line.  */
+  return (!IT_DISPLAYING_WHITESPACE (it)
+	  /* Can be at BOL.  */
+	  && !it_char_has_category (it, not_at_bol));
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool char_can_wrap_after (struct it *it)
+{
+  if (!Vcjk_word_wrap)
+    return IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_eol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_eol = NOT_AT_BOL;
+  else
+    not_at_eol = NOT_AT_EOL;
+
+  return (IT_DISPLAYING_WHITESPACE (it)
+	  /* Can break after && can be at EOL.  */
+	  || (it_char_has_category (it, LINE_BREAKABLE)
+	      && !it_char_has_category (it, not_at_eol)));
+}
+
+#undef IT_DISPLAYING_WHITESPACE
+#undef NOT_AT_EOL
+#undef NOT_AT_BOL
+#undef LINE_BREAKABLE
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9185,13 +9257,14 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  /* We have reached a glyph that follows one or more
-		     whitespace characters.  If the position is
-		     already found, we are done.  */
+		     whitespace characters or a character that allows
+		     wrapping after it.  If this character allows
+		     wrapping before it, save this position as a
+		     wrapping point.  */
 		  if (atpos_it.sp >= 0)
 		    {
 		      RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9206,8 +9279,17 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		    }
 		  /* Otherwise, we can wrap here.  */
 		  SAVE_IT (wrap_it, *it, wrap_data);
-		  may_wrap = false;
 		}
+	      /* This has to run after the previous block because the
+		 previous block consumes `may_wrap' and this block
+		 sets it, but the value set by this block is intended
+		 for the _next_ character/iteration.  */
+	      if (char_can_wrap_after (it))
+		/* may_wrap basically means "previous char allows
+		   wrapping after it".  */
+		may_wrap = true;
+	      else
+		may_wrap = false;
 	    }
 	}
 
@@ -9335,10 +9417,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			    {
 			      bool can_wrap = true;
 
-			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+			      /* If the previous character says we can
+				 wrap after it, but the current
+				 character says we can't wrap before
+				 it, then we can't wrap here.  */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9350,7 +9432,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && !char_can_wrap_before (it))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9429,19 +9511,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
-			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+			 may_wrap flag says we can't wrap before it,
+			 we can't wrap here.  Therefore, wrap_it
+			 (previously found wrap-point) _is_ relevant
+			 in that case.  */
+		      && !(moved_forward && char_can_wrap_before (it)))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23292,9 +23373,8 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23308,8 +23388,12 @@ #define RECORD_MAX_MIN_POS(IT)					\
 		  wrap_row_min_bpos = min_bpos;
 		  wrap_row_max_pos = max_pos;
 		  wrap_row_max_bpos = max_bpos;
-		  may_wrap = false;
 		}
+	      /* This has to run after the previous block.  */
+	      if (char_can_wrap_after (it))
+		may_wrap = true;
+	      else
+		may_wrap = false;
 	    }
 	}
 
@@ -23433,14 +23517,18 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			  /* If line-wrap is on, check if a previous
 			     wrap point was found.  */
 			  if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
-			      && wrap_row_used > 0
+			      && wrap_row_used > 0 /* Found.  */
 			      /* Even if there is a previous wrap
 				 point, continue the line here as
 				 usual, if (i) the previous character
-				 was a space or tab AND (ii) the
-				 current character is not.  */
-			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				 allows wrapping after it, AND (ii)
+				 the current character allows wrapping
+				 before it.  Because this is a valid
+				 break point, we can just continue to
+				 the next line at here, there is no
+				 need to wrap early at the previous
+				 wrap point.  */
+			      && (!may_wrap || !char_can_wrap_before (it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23468,13 +23556,16 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			      /* If line-wrap is on, check if a
 				 previous wrap point was found.  */
 			      else if (wrap_row_used > 0
-				       /* Even if there is a previous wrap
-					  point, continue the line here as
-					  usual, if (i) the previous character
-					  was a space or tab AND (ii) the
-					  current character is not.  */
-				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+				       /* Even if there is a previous
+					  wrap point, continue the
+					  line here as usual, if (i)
+					  the previous character was a
+					  space or tab AND (ii) the
+					  current character is not,
+					  AND (iii) the current
+					  character allows wrapping
+					  before it.  */
+				       && (!may_wrap || !char_can_wrap_before (it)))
 				goto back_to_wrap;
 
 			    }
@@ -34594,6 +34685,19 @@ syms_of_xdisp (void)
 If `word-wrap' is enabled, you might want to reduce this.  */);
   Vtruncate_partial_width_windows = make_fixnum (50);
 
+  DEFVAR_BOOL("cjk-word-wrap", Vcjk_word_wrap,
+    doc: /*  Non-nil means also wrap after all CJK characters.
+Normally when word-wrapping is on, Emacs only breaks line after
+whitespace characters.  When this option is turned on, Emacs also
+breaks line after CJK characters (more accurately, characters that
+have "|" category defined in characters.el).
+
+If kinsoku.el is loaded, Emacs also respects kinsoku rules when
+breaking lines.  That means some characters don't appear at the
+beginning of a line (e.g., FULLWIDTH COMMA), and some don't appear at
+the end of a line (e.g., LEFT DOUBLE ANGLE BRACKET).  */);
+  Vcjk_word_wrap = false;
+
   DEFVAR_LISP ("line-number-display-limit", Vline_number_display_limit,
     doc: /* Maximum buffer size for which line number should be displayed.
 If the buffer is bigger than this, the line number does not appear
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-18 17:14                                                                 ` Yuan Fu
@ 2020-07-18 19:49                                                                   ` Yuan Fu
  2020-07-18 20:25                                                                   ` Stefan Monnier
  2020-07-19 14:52                                                                   ` Eli Zaretskii
  2 siblings, 0 replies; 88+ messages in thread
From: Yuan Fu @ 2020-07-18 19:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 125 bytes --]

Sorry, this should be:

> I changed the it, do you code below this is ok?

* I changed it, do you think the code below is ok?

[-- Attachment #2: Type: text/html, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-18 17:14                                                                 ` Yuan Fu
  2020-07-18 19:49                                                                   ` Yuan Fu
@ 2020-07-18 20:25                                                                   ` Stefan Monnier
  2020-07-19 14:52                                                                   ` Eli Zaretskii
  2 siblings, 0 replies; 88+ messages in thread
From: Stefan Monnier @ 2020-07-18 20:25 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, Lars Ingebrigtsen, emacs-devel

> I changed the it, do you code below this is ok?
>
>   if (ch == 0)
>     return false;
>   else
>     return CHAR_HAS_CATEGORY(ch, cat);

Aka

    return (ch == 0) ? false: CHAR_HAS_CATEGORY (ch, cat);

Aka

    return ch && CHAR_HAS_CATEGORY (ch, cat)


-- Stefan


PS: Notice also the space before the open-paren




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-18 17:14                                                                 ` Yuan Fu
  2020-07-18 19:49                                                                   ` Yuan Fu
  2020-07-18 20:25                                                                   ` Stefan Monnier
@ 2020-07-19 14:52                                                                   ` Eli Zaretskii
  2020-07-19 16:16                                                                     ` Yuan Fu
  2 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-07-19 14:52 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 18 Jul 2020 13:14:15 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> > A minor stylistic nit: I'd prefer the if - elseif clauses to yield the
> > relevant character, and then apply CHAR_HAS_CATEGORY only once to that
> > character at the end.  (It is generally better to have only one return
> > point from a function, especially when the function is short.  If
> > nothing else, it makes debugging easier.)
> 
> I changed the it, do you code below this is ok?
> 
>   if (ch == 0)
>     return false;
>   else
>     return CHAR_HAS_CATEGORY(ch, cat);

Yes.  Or any of the variants shown by Stefan.

> >> 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
> >> 	    {
> >> -	      if (IT_DISPLAYING_WHITESPACE (it))
> >> -		may_wrap = true;
> >> -	      else if (may_wrap)
> >> +              /* Can we wrap here? */
> >> +	      if (may_wrap && char_can_wrap_before (it))
> > 
> > Likewise here.
> 
> 
> In both can_wrap_before and can_wrap_after, I have a short circuit for the case when cjk_word_wrap is nil:
> 
>   if (!Vcjk_word_wrap)
>     return IT_DISPLAYING_WHITESPACE (it);
> 
> That should guarantee the old behavior when cjk_word_wrap is nil, if that’s what you are asking about.

I've seen that, but what bothers me is not this.  It's the fact that
the old code didn't test may_wrap, whereas the new code does.

> >> +  DEFVAR_BOOL("cjk-word-wrap", Vcjk_word_wrap,
> >> +    doc: /*  Non-nil means wrap after CJK chracters.
> > 
> > This is unclear.  Does it mean after _any_ CJK character, or just
> > after some?  And if the latter, which ones?
> 
> I added more detail and hopefully they are clearer now.

Looks much better, thanks.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-19 14:52                                                                   ` Eli Zaretskii
@ 2020-07-19 16:16                                                                     ` Yuan Fu
  2020-07-19 16:17                                                                       ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-07-19 16:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel



> On Jul 19, 2020, at 10:52 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sat, 18 Jul 2020 13:14:15 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>>> A minor stylistic nit: I'd prefer the if - elseif clauses to yield the
>>> relevant character, and then apply CHAR_HAS_CATEGORY only once to that
>>> character at the end.  (It is generally better to have only one return
>>> point from a function, especially when the function is short.  If
>>> nothing else, it makes debugging easier.)
>> 
>> I changed the it, do you code below this is ok?
>> 
>>  if (ch == 0)
>>    return false;
>>  else
>>    return CHAR_HAS_CATEGORY(ch, cat);
> 
> Yes.  Or any of the variants shown by Stefan.

Cool.

> 
>>>> 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
>>>> 	    {
>>>> -	      if (IT_DISPLAYING_WHITESPACE (it))
>>>> -		may_wrap = true;
>>>> -	      else if (may_wrap)
>>>> +              /* Can we wrap here? */
>>>> +	      if (may_wrap && char_can_wrap_before (it))
>>> 
>>> Likewise here.
>> 
>> 
>> In both can_wrap_before and can_wrap_after, I have a short circuit for the case when cjk_word_wrap is nil:
>> 
>>  if (!Vcjk_word_wrap)
>>    return IT_DISPLAYING_WHITESPACE (it);
>> 
>> That should guarantee the old behavior when cjk_word_wrap is nil, if that’s what you are asking about.
> 
> I've seen that, but what bothers me is not this.  It's the fact that
> the old code didn't test may_wrap, whereas the new code does.
> 

I see. I changed the code a bit and added some explanation in the commit message. Hopefully that will convince you that the new logic is equivalent to the old one when cjk-word-wrap is nil.

>>>> +  DEFVAR_BOOL("cjk-word-wrap", Vcjk_word_wrap,
>>>> +    doc: /*  Non-nil means wrap after CJK chracters.
>>> 
>>> This is unclear.  Does it mean after _any_ CJK character, or just
>>> after some?  And if the latter, which ones?
>> 
>> I added more detail and hopefully they are clearer now.
> 
> Looks much better, thanks.


BTW, any ideas for alternatives for cjk-word-wrap? Maybe extended-word-wrap?

Yuan




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-19 16:16                                                                     ` Yuan Fu
@ 2020-07-19 16:17                                                                       ` Yuan Fu
  2020-08-13 19:35                                                                         ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-07-19 16:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 42 bytes --]

(And here is the new patch, sorry)

Yuan


[-- Attachment #2: word-wrap.patch --]
[-- Type: application/octet-stream, Size: 12151 bytes --]

From 45978ed9703ea7e0d4a94c06ba77786795e671a8 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 26 May 2020 22:47:27 -0400
Subject: [PATCH] Improve word wrapping for CJK characters

Note about the change around line 9257 and 23372:

Before, the test for whitespace checks for can_wrap_before and
can_wrap_after simutaniously.  Now we separate these two checks, and
the logic needs to change a little bit.  However, when we don't enable
the new wrapping feature, 'can_wrap_after' is equivalent to
'IT_DISPLAYING_WHITESPACE' and 'can_wrap_before' is equivalent to
'!IT_DISPLAYING_WHITESPACE'.  And the new logic is equivalent with the
old one in that case.

Old logic:

    if (whitespace) /* Which means can wrap after && can't wrap
                       before.  */
      may_wrap = true;

    else if (may_wrap) /* aka (!whitespace && may_wrap)
      (set wrap point)  * aka (can't wrap after && can wrap before
      may_wrap = false  *      && may_wrap)
                        */

New logic:

    if (can_wrap_after)
      next_may_wrap = true
    else
      next_may_wrap = false;

    if (may_wrap && can_wrap_before)
      (set wrap point)

    /* Update may_wrap.  */
    may_wrap = next_may_wrap;

* src/xdisp.c (it_char_has_category, char_can_wrap_before,
char_can_wrap_after): New functions.
(move_it_in_display_line_to, display_line): Replace
'IT_DISPLAYING_WHITESPACE' with either 'char_can_wrap_before' or
'char_can_wrap_after'.
(cjk-word-wrap): New variable.
---
 src/xdisp.c | 182 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 144 insertions(+), 38 deletions(-)

diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..be0f6e6a75 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
 #include "termchar.h"
 #include "dispextern.h"
 #include "character.h"
+#include "category.h"
 #include "buffer.h"
 #include "charset.h"
 #include "indent.h"
@@ -508,6 +509,77 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* These are the category sets we use.  They are defined by
+   kinsoku.el.  */
+#define NOT_AT_EOL '<'
+#define NOT_AT_BOL '>'
+#define LINE_BREAKABLE '|'
+
+static bool it_char_has_category(struct it *it, int cat)
+{
+  int ch = 0;
+  if (it->what == IT_CHARACTER)
+    ch = it->c;
+  else if (STRINGP (it->string))
+    ch = SREF (it->string, IT_STRING_BYTEPOS (*it));
+  else if (it->s)
+    ch = it->s[IT_BYTEPOS (*it)];
+  else if (IT_BYTEPOS (*it) < ZV_BYTE)
+    ch = *BYTE_POS_ADDR (IT_BYTEPOS (*it));
+
+  if (ch == 0)
+    return false;
+  else
+    return CHAR_HAS_CATEGORY (ch, cat);
+}
+
+/* Return true if the current character allows wrapping before it.   */
+static bool char_can_wrap_before (struct it *it)
+{
+  if (!Vcjk_word_wrap)
+    return !IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_bol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_bol = NOT_AT_EOL;
+  else
+    not_at_bol = NOT_AT_BOL;
+  /* You cannot wrap before a space or tab because that way you'll
+     have space and tab at the beginning of next line.  */
+  return (!IT_DISPLAYING_WHITESPACE (it)
+	  /* Can be at BOL.  */
+	  && !it_char_has_category (it, not_at_bol));
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool char_can_wrap_after (struct it *it)
+{
+  if (!Vcjk_word_wrap)
+    return IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_eol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_eol = NOT_AT_BOL;
+  else
+    not_at_eol = NOT_AT_EOL;
+
+  return (IT_DISPLAYING_WHITESPACE (it)
+	  /* Can break after && can be at EOL.  */
+	  || (it_char_has_category (it, LINE_BREAKABLE)
+	      && !it_char_has_category (it, not_at_eol)));
+}
+
+#undef IT_DISPLAYING_WHITESPACE
+#undef NOT_AT_EOL
+#undef NOT_AT_BOL
+#undef LINE_BREAKABLE
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9185,13 +9257,20 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              bool next_may_wrap = may_wrap;
+              /* Can we wrap after this character?  */
+              if (char_can_wrap_after (it))
+		next_may_wrap = true;
+              else
+                next_may_wrap = false;
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  /* We have reached a glyph that follows one or more
-		     whitespace characters.  If the position is
-		     already found, we are done.  */
+		     whitespace characters or a character that allows
+		     wrapping after it.  If this character allows
+		     wrapping before it, save this position as a
+		     wrapping point.  */
 		  if (atpos_it.sp >= 0)
 		    {
 		      RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9206,8 +9285,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		    }
 		  /* Otherwise, we can wrap here.  */
 		  SAVE_IT (wrap_it, *it, wrap_data);
-		  may_wrap = false;
+                  next_may_wrap = false;
 		}
+              /* Update may_wrap for the next iteration.  */
+              may_wrap = next_may_wrap;
 	    }
 	}
 
@@ -9335,10 +9416,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			    {
 			      bool can_wrap = true;
 
-			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+			      /* If the previous character says we can
+				 wrap after it, but the current
+				 character says we can't wrap before
+				 it, then we can't wrap here.  */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9350,7 +9431,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && !char_can_wrap_before (it))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9429,19 +9510,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
-			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+			 may_wrap flag says we can't wrap before it,
+			 we can't wrap here.  Therefore, wrap_it
+			 (previously found wrap-point) _is_ relevant
+			 in that case.  */
+		      && !(moved_forward && char_can_wrap_before (it)))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23292,9 +23372,14 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              bool next_may_wrap = may_wrap;
+              /* Can we wrap after this character?  */
+              if (char_can_wrap_after (it))
+		next_may_wrap = true;
+              else
+                next_may_wrap = false;
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23308,8 +23393,9 @@ #define RECORD_MAX_MIN_POS(IT)					\
 		  wrap_row_min_bpos = min_bpos;
 		  wrap_row_max_pos = max_pos;
 		  wrap_row_max_bpos = max_bpos;
-		  may_wrap = false;
 		}
+	      /* Update may_wrap for the next iteration.  */
+              may_wrap = next_may_wrap;
 	    }
 	}
 
@@ -23433,14 +23519,18 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			  /* If line-wrap is on, check if a previous
 			     wrap point was found.  */
 			  if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
-			      && wrap_row_used > 0
+			      && wrap_row_used > 0 /* Found.  */
 			      /* Even if there is a previous wrap
 				 point, continue the line here as
 				 usual, if (i) the previous character
-				 was a space or tab AND (ii) the
-				 current character is not.  */
-			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				 allows wrapping after it, AND (ii)
+				 the current character allows wrapping
+				 before it.  Because this is a valid
+				 break point, we can just continue to
+				 the next line at here, there is no
+				 need to wrap early at the previous
+				 wrap point.  */
+			      && (!may_wrap || !char_can_wrap_before (it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23468,13 +23558,16 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			      /* If line-wrap is on, check if a
 				 previous wrap point was found.  */
 			      else if (wrap_row_used > 0
-				       /* Even if there is a previous wrap
-					  point, continue the line here as
-					  usual, if (i) the previous character
-					  was a space or tab AND (ii) the
-					  current character is not.  */
-				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+				       /* Even if there is a previous
+					  wrap point, continue the
+					  line here as usual, if (i)
+					  the previous character was a
+					  space or tab AND (ii) the
+					  current character is not,
+					  AND (iii) the current
+					  character allows wrapping
+					  before it.  */
+				       && (!may_wrap || !char_can_wrap_before (it)))
 				goto back_to_wrap;
 
 			    }
@@ -34594,6 +34687,19 @@ syms_of_xdisp (void)
 If `word-wrap' is enabled, you might want to reduce this.  */);
   Vtruncate_partial_width_windows = make_fixnum (50);
 
+  DEFVAR_BOOL("cjk-word-wrap", Vcjk_word_wrap,
+    doc: /*  Non-nil means also wrap after all CJK characters.
+Normally when word-wrapping is on, Emacs only breaks line after
+whitespace characters.  When this option is turned on, Emacs also
+breaks line after CJK characters (more accurately, characters that
+have "|" category defined in characters.el).
+
+If kinsoku.el is loaded, Emacs also respects kinsoku rules when
+breaking lines.  That means some characters don't appear at the
+beginning of a line (e.g., FULLWIDTH COMMA), and some don't appear at
+the end of a line (e.g., LEFT DOUBLE ANGLE BRACKET).  */);
+  Vcjk_word_wrap = false;
+
   DEFVAR_LISP ("line-number-display-limit", Vline_number_display_limit,
     doc: /* Maximum buffer size for which line number should be displayed.
 If the buffer is bigger than this, the line number does not appear
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-07-19 16:17                                                                       ` Yuan Fu
@ 2020-08-13 19:35                                                                         ` Yuan Fu
  2020-08-14  5:55                                                                           ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-08-13 19:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel



> On Jul 19, 2020, at 12:17 PM, Yuan Fu <casouri@gmail.com> wrote:
> 
> (And here is the new patch, sorry)
> 
> Yuan
> 
> <word-wrap.patch>

It has been awhile, any updates on this?

Yuan



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-13 19:35                                                                         ` Yuan Fu
@ 2020-08-14  5:55                                                                           ` Eli Zaretskii
  2020-08-14 15:08                                                                             ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-08-14  5:55 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 13 Aug 2020 15:35:45 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> > (And here is the new patch, sorry)
> > 
> > Yuan
> > 
> > <word-wrap.patch>
> 
> It has been awhile, any updates on this?

AFAIR, I'm waiting for the final version, including NEWS, Custom
definitions for the new option, etc.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-14  5:55                                                                           ` Eli Zaretskii
@ 2020-08-14 15:08                                                                             ` Yuan Fu
  2020-08-15  9:10                                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-08-14 15:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel



> On Aug 14, 2020, at 1:55 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 13 Aug 2020 15:35:45 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>>> (And here is the new patch, sorry)
>>> 
>>> Yuan
>>> 
>>> <word-wrap.patch>
>> 
>> It has been awhile, any updates on this?
> 
> AFAIR, I'm waiting for the final version, including NEWS, Custom
> definitions for the new option, etc.

Ah, I should have asked earlier. I was waiting for your input in the name of the custom option, so I can start writing those things. I suggested extended-word-wrap, WDYT?

Also, (judging from the latest patch), are you convinced that the patch doesn’t change the old behavior?

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-14 15:08                                                                             ` Yuan Fu
@ 2020-08-15  9:10                                                                               ` Eli Zaretskii
  2020-08-15 13:10                                                                                 ` Fu Yuan
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-08-15  9:10 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 14 Aug 2020 11:08:24 -0400
> Cc: larsi@gnus.org,
>  emacs-devel@gnu.org
> 
> I was waiting for your input in the name of the custom option, so I can start writing those things. I suggested extended-word-wrap, WDYT?

I think word-wrap-by-character-category is better.

> Also, (judging from the latest patch), are you convinced that the patch doesn’t change the old behavior?

Yes, thanks.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-15  9:10                                                                               ` Eli Zaretskii
@ 2020-08-15 13:10                                                                                 ` Fu Yuan
  2020-08-15 14:56                                                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Fu Yuan @ 2020-08-15 13:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel



> 在 2020年8月15日，上午5:10，Eli Zaretskii <eliz@gnu.org> 写道：
> 
> 
>> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 14 Aug 2020 11:08:24 -0400
>> Cc: larsi@gnus.org,
>> emacs-devel@gnu.org
>> 
>> I was waiting for your input in the name of the custom option, so I can start writing those things. I suggested extended-word-wrap, WDYT?
> 
> I think word-wrap-by-character-category is better.
> 

That’s kinda long, how about word-wrap-by-category?

>> Also, (judging from the latest patch), are you convinced that the patch doesn’t change the old behavior?
> 
> Yes, thanks.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-15 13:10                                                                                 ` Fu Yuan
@ 2020-08-15 14:56                                                                                   ` Eli Zaretskii
  2020-08-15 17:34                                                                                     ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-08-15 14:56 UTC (permalink / raw)
  To: Fu Yuan; +Cc: larsi, emacs-devel

> From: Fu Yuan <casouri@gmail.com>
> Date: Sat, 15 Aug 2020 09:10:06 -0400
> Cc: larsi@gnus.org, emacs-devel@gnu.org
> 
> > I think word-wrap-by-character-category is better.
> > 
> 
> That’s kinda long, how about word-wrap-by-category?

Let's try that.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-15 14:56                                                                                   ` Eli Zaretskii
@ 2020-08-15 17:34                                                                                     ` Yuan Fu
  2020-08-15 17:46                                                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-08-15 17:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 548 bytes --]



> On Aug 15, 2020, at 10:56 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Fu Yuan <casouri@gmail.com>
>> Date: Sat, 15 Aug 2020 09:10:06 -0400
>> Cc: larsi@gnus.org, emacs-devel@gnu.org
>> 
>>> I think word-wrap-by-character-category is better.
>>> 
>> 
>> That’s kinda long, how about word-wrap-by-category?
> 
> Let's try that.

Here is the patch. I don’t think we are supposed to say “(load “kinsoku.el)” in Emacs User Manual, so I created a new command load-kinsoku in simple.el. I hope that’s ok?

Yuan


[-- Attachment #2: word-wrap.patch --]
[-- Type: application/octet-stream, Size: 15665 bytes --]

From 3f82a9320729ed2248622e9875d5272b835adcb6 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 26 May 2020 22:47:27 -0400
Subject: [PATCH] Improve word wrapping for CJK characters

Note about the change around line 9257 and 23372:

Before, the test for whitespace checks for can_wrap_before and
can_wrap_after simutaniously.  Now we separate these two checks, and
the logic needs to change a little bit.  However, when we don't enable
the new wrapping feature, 'can_wrap_after' is equivalent to
'IT_DISPLAYING_WHITESPACE' and 'can_wrap_before' is equivalent to
'!IT_DISPLAYING_WHITESPACE'.  And the new logic is equivalent with the
old one in that case.

Old logic:

    if (whitespace) /* Which means can wrap after && can't wrap
                       before.  */
      may_wrap = true;

    else if (may_wrap) /* aka (!whitespace && may_wrap)
      (set wrap point)  * aka (can't wrap after && can wrap before
      may_wrap = false  *      && may_wrap)
                        */

New logic:

    if (can_wrap_after)
      next_may_wrap = true
    else
      next_may_wrap = false;

    if (may_wrap && can_wrap_before)
      (set wrap point)

    /* Update may_wrap.  */
    may_wrap = next_may_wrap;

* src/xdisp.c (it_char_has_category, char_can_wrap_before,
char_can_wrap_after): New functions.
(move_it_in_display_line_to, display_line): Replace
'IT_DISPLAYING_WHITESPACE' with either 'char_can_wrap_before' or
'char_can_wrap_after'.
(word-wrap-by-category): New variable.
* doc/emacs/display.texi (Visual Line Mode): Add a paragraph about the
new feature.
* etc/NEWS: Add a news entry.
* lisp/cus-start.el (minibuffer-prompt-properties--setter): Add
'word-wrap-by-category' as a custom variable.
* lisp/simple.el (load-kinsoku): New function.
---
 doc/emacs/display.texi |  13 +++
 etc/NEWS               |  10 +++
 lisp/cus-start.el      |   1 +
 lisp/simple.el         |   6 ++
 src/xdisp.c            | 185 ++++++++++++++++++++++++++++++++---------
 5 files changed, 177 insertions(+), 38 deletions(-)

diff --git a/doc/emacs/display.texi b/doc/emacs/display.texi
index 536f4cb5da..8ddc717c97 100644
--- a/doc/emacs/display.texi
+++ b/doc/emacs/display.texi
@@ -1801,6 +1801,19 @@ Visual Line Mode
 would be visually distracting.  You can change this by customizing the
 variable @code{visual-line-fringe-indicators}.
 
+@vindex{word-wrap-by-category}
+@findex{load-kinsoku}
+  By default, Emacs only breaks lines after whitespace
+characters. That strategy produces bad results when CJK and Latin text
+are mixed together (because CJK characters don't use whitespace to
+separate words).  You can customize @code{word-wrap-by-category} to
+allow Emacs to break lines after more characters. That way
+word-wrapping for CJK-Latin text works right.  You can type @kbd{M-x
+load-kinsoku} to enable more advanced line-breaking strategies: some
+characters don't appear at the beginning of a line (e.g., FULLWIDTH
+COMMA), and some don't appear at the end of a line (e.g., LEFT DOUBLE
+ANGLE BRACKET).
+
 @node Display Custom
 @section Customization of Display
 
diff --git a/etc/NEWS b/etc/NEWS
index e97755a454..d84936c1ed 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -70,6 +70,16 @@ specify 'cursor-type' to be '(box . SIZE)', the cursor becomes a hollow
 box if the point is on an image larger than 'SIZE' pixels in any
 dimension.
 
+** New custom option 'word-wrap-by-category'.
+
+When word-wrap is enabled, this option allows Emacs to break lines
+after more characters (instead of just whitespace characters), that
+means word-wrapping for CJK text mixed with Latin text are improved.
+Also, there is a new command 'enable-kinsoku' that when called,
+enables advanced wrapping strategies that prohibits certain character
+to appear at the beginning of a line, and some others at the end of a
+line.
+
 \f
 * Editing Changes in Emacs 28.1
 
diff --git a/lisp/cus-start.el b/lisp/cus-start.el
index 6632687da4..f260573219 100644
--- a/lisp/cus-start.el
+++ b/lisp/cus-start.el
@@ -98,6 +98,7 @@ minibuffer-prompt-properties--setter
 	     (ctl-arrow display boolean)
 	     (truncate-lines display boolean)
 	     (word-wrap display boolean)
+             (word-wrap-by-category display boolean)
 	     (selective-display-ellipses display boolean)
 	     (indicate-empty-lines fringe boolean)
 	     (indicate-buffer-boundaries
diff --git a/lisp/simple.el b/lisp/simple.el
index 111afa69d1..923d964850 100644
--- a/lisp/simple.el
+++ b/lisp/simple.el
@@ -7148,6 +7148,12 @@ visual-line-mode
 (defun turn-on-visual-line-mode ()
   (visual-line-mode 1))
 
+(defun load-kinsoku ()
+  "Load kinsoku features.
+See `word-wrap-by-category' for more information."
+  (interactive)
+  (load "kinsoku.el"))
+
 (define-globalized-minor-mode global-visual-line-mode
   visual-line-mode turn-on-visual-line-mode)
 
diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..6551087375 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
 #include "termchar.h"
 #include "dispextern.h"
 #include "character.h"
+#include "category.h"
 #include "buffer.h"
 #include "charset.h"
 #include "indent.h"
@@ -508,6 +509,77 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* These are the category sets we use.  They are defined by
+   kinsoku.el and chracters.el.  */
+#define NOT_AT_EOL '<'
+#define NOT_AT_BOL '>'
+#define LINE_BREAKABLE '|'
+
+static bool it_char_has_category(struct it *it, int cat)
+{
+  int ch = 0;
+  if (it->what == IT_CHARACTER)
+    ch = it->c;
+  else if (STRINGP (it->string))
+    ch = SREF (it->string, IT_STRING_BYTEPOS (*it));
+  else if (it->s)
+    ch = it->s[IT_BYTEPOS (*it)];
+  else if (IT_BYTEPOS (*it) < ZV_BYTE)
+    ch = *BYTE_POS_ADDR (IT_BYTEPOS (*it));
+
+  if (ch == 0)
+    return false;
+  else
+    return CHAR_HAS_CATEGORY (ch, cat);
+}
+
+/* Return true if the current character allows wrapping before it.   */
+static bool char_can_wrap_before (struct it *it)
+{
+  if (!Vword_wrap_by_category)
+    return !IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_bol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_bol = NOT_AT_EOL;
+  else
+    not_at_bol = NOT_AT_BOL;
+  /* You cannot wrap before a space or tab because that way you'll
+     have space and tab at the beginning of next line.  */
+  return (!IT_DISPLAYING_WHITESPACE (it)
+	  /* Can be at BOL.  */
+	  && !it_char_has_category (it, not_at_bol));
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool char_can_wrap_after (struct it *it)
+{
+  if (!Vword_wrap_by_category)
+    return IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_eol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_eol = NOT_AT_BOL;
+  else
+    not_at_eol = NOT_AT_EOL;
+
+  return (IT_DISPLAYING_WHITESPACE (it)
+	  /* Can break after && can be at EOL.  */
+	  || (it_char_has_category (it, LINE_BREAKABLE)
+	      && !it_char_has_category (it, not_at_eol)));
+}
+
+#undef IT_DISPLAYING_WHITESPACE
+#undef NOT_AT_EOL
+#undef NOT_AT_BOL
+#undef LINE_BREAKABLE
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9185,13 +9257,20 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              bool next_may_wrap = may_wrap;
+              /* Can we wrap after this character?  */
+              if (char_can_wrap_after (it))
+		next_may_wrap = true;
+              else
+                next_may_wrap = false;
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  /* We have reached a glyph that follows one or more
-		     whitespace characters.  If the position is
-		     already found, we are done.  */
+		     whitespace characters or a character that allows
+		     wrapping after it.  If this character allows
+		     wrapping before it, save this position as a
+		     wrapping point.  */
 		  if (atpos_it.sp >= 0)
 		    {
 		      RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9206,8 +9285,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		    }
 		  /* Otherwise, we can wrap here.  */
 		  SAVE_IT (wrap_it, *it, wrap_data);
-		  may_wrap = false;
+                  next_may_wrap = false;
 		}
+              /* Update may_wrap for the next iteration.  */
+              may_wrap = next_may_wrap;
 	    }
 	}
 
@@ -9335,10 +9416,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			    {
 			      bool can_wrap = true;
 
-			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+			      /* If the previous character says we can
+				 wrap after it, but the current
+				 character says we can't wrap before
+				 it, then we can't wrap here.  */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9350,7 +9431,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && !char_can_wrap_before (it))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9429,19 +9510,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
-			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+			 may_wrap flag says we can't wrap before it,
+			 we can't wrap here.  Therefore, wrap_it
+			 (previously found wrap-point) _is_ relevant
+			 in that case.  */
+		      && !(moved_forward && char_can_wrap_before (it)))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23292,9 +23372,14 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              bool next_may_wrap = may_wrap;
+              /* Can we wrap after this character?  */
+              if (char_can_wrap_after (it))
+		next_may_wrap = true;
+              else
+                next_may_wrap = false;
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23308,8 +23393,9 @@ #define RECORD_MAX_MIN_POS(IT)					\
 		  wrap_row_min_bpos = min_bpos;
 		  wrap_row_max_pos = max_pos;
 		  wrap_row_max_bpos = max_bpos;
-		  may_wrap = false;
 		}
+	      /* Update may_wrap for the next iteration.  */
+              may_wrap = next_may_wrap;
 	    }
 	}
 
@@ -23433,14 +23519,18 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			  /* If line-wrap is on, check if a previous
 			     wrap point was found.  */
 			  if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
-			      && wrap_row_used > 0
+			      && wrap_row_used > 0 /* Found.  */
 			      /* Even if there is a previous wrap
 				 point, continue the line here as
 				 usual, if (i) the previous character
-				 was a space or tab AND (ii) the
-				 current character is not.  */
-			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				 allows wrapping after it, AND (ii)
+				 the current character allows wrapping
+				 before it.  Because this is a valid
+				 break point, we can just continue to
+				 the next line at here, there is no
+				 need to wrap early at the previous
+				 wrap point.  */
+			      && (!may_wrap || !char_can_wrap_before (it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23468,13 +23558,16 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			      /* If line-wrap is on, check if a
 				 previous wrap point was found.  */
 			      else if (wrap_row_used > 0
-				       /* Even if there is a previous wrap
-					  point, continue the line here as
-					  usual, if (i) the previous character
-					  was a space or tab AND (ii) the
-					  current character is not.  */
-				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+				       /* Even if there is a previous
+					  wrap point, continue the
+					  line here as usual, if (i)
+					  the previous character was a
+					  space or tab AND (ii) the
+					  current character is not,
+					  AND (iii) the current
+					  character allows wrapping
+					  before it.  */
+				       && (!may_wrap || !char_can_wrap_before (it)))
 				goto back_to_wrap;
 
 			    }
@@ -34594,6 +34687,22 @@ syms_of_xdisp (void)
 If `word-wrap' is enabled, you might want to reduce this.  */);
   Vtruncate_partial_width_windows = make_fixnum (50);
 
+  DEFVAR_BOOL("word-wrap-by-category", Vword_wrap_by_category,
+    doc: /*  Non-nil means also wrap after characters of a certain category.
+Normally when `word-wrap' is on, Emacs only breaks lines after
+whitespace characters.  When this option is turned on, Emacs also
+breaks lines after characters that have the "|" category (defined in
+characters.el).  This is useful for allowing breaking after CJK
+characters and improves the word-wrapping for CJK text mixed with
+Latin text.
+
+If kinsoku.el is loaded, Emacs also respects kinsoku rules when
+breaking lines.  That means some characters don't appear at the
+beginning of a line (e.g., FULLWIDTH COMMA), and some don't appear at
+the end of a line (e.g., LEFT DOUBLE ANGLE BRACKET).  You can load
+kinsoku.el with `load-kinsoku'.  */);
+  Vword_wrap_by_category = false;
+
   DEFVAR_LISP ("line-number-display-limit", Vline_number_display_limit,
     doc: /* Maximum buffer size for which line number should be displayed.
 If the buffer is bigger than this, the line number does not appear
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-15 17:34                                                                                     ` Yuan Fu
@ 2020-08-15 17:46                                                                                       ` Eli Zaretskii
  2020-08-15 18:00                                                                                         ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-08-15 17:46 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 15 Aug 2020 13:34:58 -0400
> Cc: larsi@gnus.org,
>  emacs-devel@gnu.org
> 
> I don’t think we are supposed to say “(load “kinsoku.el)” in Emacs User Manual, so I created a new command load-kinsoku in simple.el. I hope that’s ok?

I think it should be loaded by the customization form in cus-start.el
(which should offer that as an optional feature).

Thanks for the patch, I will review it soon.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-15 17:46                                                                                       ` Eli Zaretskii
@ 2020-08-15 18:00                                                                                         ` Yuan Fu
  2020-08-15 18:47                                                                                           ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-08-15 18:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 238 bytes --]



> On Aug 15, 2020, at 1:46 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> (which should offer that as an optional feature).

I don’t know what does that look like, is there a custom option that I can look at as an example?

Yuan

[-- Attachment #2: Type: text/html, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-15 18:00                                                                                         ` Yuan Fu
@ 2020-08-15 18:47                                                                                           ` Eli Zaretskii
  2020-08-16  3:22                                                                                             ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-08-15 18:47 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 15 Aug 2020 14:00:16 -0400
> Cc: larsi@gnus.org,
>  emacs-devel@gnu.org
> 
>  On Aug 15, 2020, at 1:46 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>  (which should offer that as an optional feature).
> 
> I don’t know what does that look like, is there a custom option that I can look at as an example?

What I had in mind is to have a special value of word-wrap-by-category
that would load kinsoku by calling a setup function.  You can use the
:set keyword of the defcustom to arrange for the setup function.  And
example of using :set in cus-start.el is cursor-in-non-selected-windows
(and many other options).



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-15 18:47                                                                                           ` Eli Zaretskii
@ 2020-08-16  3:22                                                                                             ` Yuan Fu
  2020-08-16 14:15                                                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-08-16  3:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel



> On Aug 15, 2020, at 2:47 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sat, 15 Aug 2020 14:00:16 -0400
>> Cc: larsi@gnus.org,
>> emacs-devel@gnu.org
>> 
>> On Aug 15, 2020, at 1:46 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> 
>> (which should offer that as an optional feature).
>> 
>> I don’t know what does that look like, is there a custom option that I can look at as an example?
> 
> What I had in mind is to have a special value of word-wrap-by-category
> that would load kinsoku by calling a setup function.  

That would be confusing tho. Suppose word-wrap-by-category can be t, nil and ‘kinsoku. If a user sets it first to 'kinsoku then to t, he would expect kinsoku stuff to not take effect; but in reality since kinsoku.el is loaded, it will be in effect until the Emacs session ends.

Since people who enable word-wrap-by-category (people who edit CJK chars) would probably also want kinsoku, could we just load kinsoku in the setter of word-wrap-by-category if it’s set to t? I think that’s ok.

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-16  3:22                                                                                             ` Yuan Fu
@ 2020-08-16 14:15                                                                                               ` Eli Zaretskii
  2020-08-16 17:31                                                                                                 ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-08-16 14:15 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 15 Aug 2020 23:22:42 -0400
> Cc: larsi@gnus.org,
>  emacs-devel@gnu.org
> 
> > What I had in mind is to have a special value of word-wrap-by-category
> > that would load kinsoku by calling a setup function.  
> 
> That would be confusing tho. Suppose word-wrap-by-category can be t, nil and ‘kinsoku. If a user sets it first to 'kinsoku then to t, he would expect kinsoku stuff to not take effect; but in reality since kinsoku.el is loaded, it will be in effect until the Emacs session ends.
> 
> Since people who enable word-wrap-by-category (people who edit CJK chars) would probably also want kinsoku, could we just load kinsoku in the setter of word-wrap-by-category if it’s set to t? I think that’s ok.

OK, let's go with this method.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-16 14:15                                                                                               ` Eli Zaretskii
@ 2020-08-16 17:31                                                                                                 ` Yuan Fu
  2020-08-22  7:42                                                                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-08-16 17:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 936 bytes --]



> On Aug 16, 2020, at 10:15 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sat, 15 Aug 2020 23:22:42 -0400
>> Cc: larsi@gnus.org,
>> emacs-devel@gnu.org
>> 
>>> What I had in mind is to have a special value of word-wrap-by-category
>>> that would load kinsoku by calling a setup function.  
>> 
>> That would be confusing tho. Suppose word-wrap-by-category can be t, nil and ‘kinsoku. If a user sets it first to 'kinsoku then to t, he would expect kinsoku stuff to not take effect; but in reality since kinsoku.el is loaded, it will be in effect until the Emacs session ends.
>> 
>> Since people who enable word-wrap-by-category (people who edit CJK chars) would probably also want kinsoku, could we just load kinsoku in the setter of word-wrap-by-category if it’s set to t? I think that’s ok.
> 
> OK, let's go with this method.

Here is the new patch :-)

Yuan


[-- Attachment #2: word-wrap.patch --]
[-- Type: application/octet-stream, Size: 14799 bytes --]

From eea495e4e2558650c448d514b6ecaa6186bd0994 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 26 May 2020 22:47:27 -0400
Subject: [PATCH] Improve word wrapping for CJK characters

Note about the change around line 9257 and 23372:

Before, the test for whitespace checks for can_wrap_before and
can_wrap_after simutaniously.  Now we separate these two checks, and
the logic needs to change a little bit.  However, when we don't enable
the new wrapping feature, 'can_wrap_after' is equivalent to
'IT_DISPLAYING_WHITESPACE' and 'can_wrap_before' is equivalent to
'!IT_DISPLAYING_WHITESPACE'.  And the new logic is equivalent with the
old one in that case.

Old logic:

    if (whitespace) /* Which means can wrap after && can't wrap
                       before.  */
      may_wrap = true;

    else if (may_wrap) /* aka (!whitespace && may_wrap)
      (set wrap point)  * aka (can't wrap after && can wrap before
      may_wrap = false  *      && may_wrap)
                        */

New logic:

    if (can_wrap_after)
      next_may_wrap = true
    else
      next_may_wrap = false;

    if (may_wrap && can_wrap_before)
      (set wrap point)

    /* Update may_wrap.  */
    may_wrap = next_may_wrap;

* src/xdisp.c (it_char_has_category, char_can_wrap_before,
char_can_wrap_after): New functions.
(move_it_in_display_line_to, display_line): Replace
'IT_DISPLAYING_WHITESPACE' with either 'char_can_wrap_before' or
'char_can_wrap_after'.
(word-wrap-by-category): New variable.
* doc/emacs/display.texi (Visual Line Mode): Add a paragraph about the
new feature.
* etc/NEWS: Add a news entry.
* lisp/cus-start.el (minibuffer-prompt-properties--setter): Add
'word-wrap-by-category' as a custom variable.
---
 doc/emacs/display.texi |   8 ++
 etc/NEWS               |   6 ++
 lisp/cus-start.el      |   5 ++
 src/xdisp.c            | 185 ++++++++++++++++++++++++++++++++---------
 4 files changed, 166 insertions(+), 38 deletions(-)

diff --git a/doc/emacs/display.texi b/doc/emacs/display.texi
index 536f4cb5da..f0bc9716c1 100644
--- a/doc/emacs/display.texi
+++ b/doc/emacs/display.texi
@@ -1801,6 +1801,14 @@ Visual Line Mode
 would be visually distracting.  You can change this by customizing the
 variable @code{visual-line-fringe-indicators}.
 
+@vindex{word-wrap-by-category}
+  By default, Emacs only breaks lines after whitespace
+characters. That strategy produces bad results when CJK and Latin text
+are mixed together (because CJK characters don't use whitespace to
+separate words).  You can customize @code{word-wrap-by-category} to
+allow Emacs to break lines after more characters. That way
+word-wrapping for CJK-Latin text works right.
+
 @node Display Custom
 @section Customization of Display
 
diff --git a/etc/NEWS b/etc/NEWS
index e97755a454..9d95186866 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -70,6 +70,12 @@ specify 'cursor-type' to be '(box . SIZE)', the cursor becomes a hollow
 box if the point is on an image larger than 'SIZE' pixels in any
 dimension.
 
+** New custom option 'word-wrap-by-category'.
+
+When word-wrap is enabled, this option allows Emacs to break lines
+after more characters (instead of just whitespace characters), that
+means word-wrapping for CJK text mixed with Latin text is improved.
+
 \f
 * Editing Changes in Emacs 28.1
 
diff --git a/lisp/cus-start.el b/lisp/cus-start.el
index 6632687da4..7ecb7b51be 100644
--- a/lisp/cus-start.el
+++ b/lisp/cus-start.el
@@ -98,6 +98,11 @@ minibuffer-prompt-properties--setter
 	     (ctl-arrow display boolean)
 	     (truncate-lines display boolean)
 	     (word-wrap display boolean)
+             (word-wrap-by-category
+              display boolean "28.1"
+              :set (lambda (symbol value)
+                     (set-default symbol value)
+                     (when value (load "kinsoku.el"))))
 	     (selective-display-ellipses display boolean)
 	     (indicate-empty-lines fringe boolean)
 	     (indicate-buffer-boundaries
diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..b1a6badd3a 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
 #include "termchar.h"
 #include "dispextern.h"
 #include "character.h"
+#include "category.h"
 #include "buffer.h"
 #include "charset.h"
 #include "indent.h"
@@ -508,6 +509,77 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* These are the category sets we use.  They are defined by
+   kinsoku.el and chracters.el.  */
+#define NOT_AT_EOL '<'
+#define NOT_AT_BOL '>'
+#define LINE_BREAKABLE '|'
+
+static bool it_char_has_category(struct it *it, int cat)
+{
+  int ch = 0;
+  if (it->what == IT_CHARACTER)
+    ch = it->c;
+  else if (STRINGP (it->string))
+    ch = SREF (it->string, IT_STRING_BYTEPOS (*it));
+  else if (it->s)
+    ch = it->s[IT_BYTEPOS (*it)];
+  else if (IT_BYTEPOS (*it) < ZV_BYTE)
+    ch = *BYTE_POS_ADDR (IT_BYTEPOS (*it));
+
+  if (ch == 0)
+    return false;
+  else
+    return CHAR_HAS_CATEGORY (ch, cat);
+}
+
+/* Return true if the current character allows wrapping before it.   */
+static bool char_can_wrap_before (struct it *it)
+{
+  if (!Vword_wrap_by_category)
+    return !IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_bol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_bol = NOT_AT_EOL;
+  else
+    not_at_bol = NOT_AT_BOL;
+  /* You cannot wrap before a space or tab because that way you'll
+     have space and tab at the beginning of next line.  */
+  return (!IT_DISPLAYING_WHITESPACE (it)
+	  /* Can be at BOL.  */
+	  && !it_char_has_category (it, not_at_bol));
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool char_can_wrap_after (struct it *it)
+{
+  if (!Vword_wrap_by_category)
+    return IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_eol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_eol = NOT_AT_BOL;
+  else
+    not_at_eol = NOT_AT_EOL;
+
+  return (IT_DISPLAYING_WHITESPACE (it)
+	  /* Can break after && can be at EOL.  */
+	  || (it_char_has_category (it, LINE_BREAKABLE)
+	      && !it_char_has_category (it, not_at_eol)));
+}
+
+#undef IT_DISPLAYING_WHITESPACE
+#undef NOT_AT_EOL
+#undef NOT_AT_BOL
+#undef LINE_BREAKABLE
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9185,13 +9257,20 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              bool next_may_wrap = may_wrap;
+              /* Can we wrap after this character?  */
+              if (char_can_wrap_after (it))
+		next_may_wrap = true;
+              else
+                next_may_wrap = false;
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  /* We have reached a glyph that follows one or more
-		     whitespace characters.  If the position is
-		     already found, we are done.  */
+		     whitespace characters or a character that allows
+		     wrapping after it.  If this character allows
+		     wrapping before it, save this position as a
+		     wrapping point.  */
 		  if (atpos_it.sp >= 0)
 		    {
 		      RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9206,8 +9285,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		    }
 		  /* Otherwise, we can wrap here.  */
 		  SAVE_IT (wrap_it, *it, wrap_data);
-		  may_wrap = false;
+                  next_may_wrap = false;
 		}
+              /* Update may_wrap for the next iteration.  */
+              may_wrap = next_may_wrap;
 	    }
 	}
 
@@ -9335,10 +9416,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			    {
 			      bool can_wrap = true;
 
-			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+			      /* If the previous character says we can
+				 wrap after it, but the current
+				 character says we can't wrap before
+				 it, then we can't wrap here.  */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9350,7 +9431,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && !char_can_wrap_before (it))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9429,19 +9510,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
-			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+			 may_wrap flag says we can't wrap before it,
+			 we can't wrap here.  Therefore, wrap_it
+			 (previously found wrap-point) _is_ relevant
+			 in that case.  */
+		      && !(moved_forward && char_can_wrap_before (it)))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23292,9 +23372,14 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              bool next_may_wrap = may_wrap;
+              /* Can we wrap after this character?  */
+              if (char_can_wrap_after (it))
+		next_may_wrap = true;
+              else
+                next_may_wrap = false;
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23308,8 +23393,9 @@ #define RECORD_MAX_MIN_POS(IT)					\
 		  wrap_row_min_bpos = min_bpos;
 		  wrap_row_max_pos = max_pos;
 		  wrap_row_max_bpos = max_bpos;
-		  may_wrap = false;
 		}
+	      /* Update may_wrap for the next iteration.  */
+              may_wrap = next_may_wrap;
 	    }
 	}
 
@@ -23433,14 +23519,18 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			  /* If line-wrap is on, check if a previous
 			     wrap point was found.  */
 			  if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
-			      && wrap_row_used > 0
+			      && wrap_row_used > 0 /* Found.  */
 			      /* Even if there is a previous wrap
 				 point, continue the line here as
 				 usual, if (i) the previous character
-				 was a space or tab AND (ii) the
-				 current character is not.  */
-			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				 allows wrapping after it, AND (ii)
+				 the current character allows wrapping
+				 before it.  Because this is a valid
+				 break point, we can just continue to
+				 the next line at here, there is no
+				 need to wrap early at the previous
+				 wrap point.  */
+			      && (!may_wrap || !char_can_wrap_before (it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23468,13 +23558,16 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			      /* If line-wrap is on, check if a
 				 previous wrap point was found.  */
 			      else if (wrap_row_used > 0
-				       /* Even if there is a previous wrap
-					  point, continue the line here as
-					  usual, if (i) the previous character
-					  was a space or tab AND (ii) the
-					  current character is not.  */
-				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+				       /* Even if there is a previous
+					  wrap point, continue the
+					  line here as usual, if (i)
+					  the previous character was a
+					  space or tab AND (ii) the
+					  current character is not,
+					  AND (iii) the current
+					  character allows wrapping
+					  before it.  */
+				       && (!may_wrap || !char_can_wrap_before (it)))
 				goto back_to_wrap;
 
 			    }
@@ -34594,6 +34687,22 @@ syms_of_xdisp (void)
 If `word-wrap' is enabled, you might want to reduce this.  */);
   Vtruncate_partial_width_windows = make_fixnum (50);
 
+  DEFVAR_BOOL("word-wrap-by-category", Vword_wrap_by_category,
+    doc: /*  Non-nil means also wrap after characters of a certain category.
+Normally when `word-wrap' is on, Emacs only breaks lines after
+whitespace characters.  When this option is turned on, Emacs also
+breaks lines after characters that have the "|" category (defined in
+characters.el).  This is useful for allowing breaking after CJK
+characters and improves the word-wrapping for CJK text mixed with
+Latin text.
+
+If this variable is set using Customize, Emacs automatically loads
+kinsoku.el.  When kinsoku.el is loaded, Emacs respects kinsoku rules
+when breaking lines.  That means some characters don't appear at the
+beginning of a line (e.g., FULLWIDTH COMMA), and some don't appear at
+the end of a line (e.g., LEFT DOUBLE ANGLE BRACKET).  */);
+  Vword_wrap_by_category = false;
+
   DEFVAR_LISP ("line-number-display-limit", Vline_number_display_limit,
     doc: /* Maximum buffer size for which line number should be displayed.
 If the buffer is bigger than this, the line number does not appear
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-16 17:31                                                                                                 ` Yuan Fu
@ 2020-08-22  7:42                                                                                                   ` Eli Zaretskii
  2020-08-22 20:58                                                                                                     ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-08-22  7:42 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 16 Aug 2020 13:31:56 -0400
> Cc: larsi@gnus.org,
>  emacs-devel@gnu.org
> 
> +@vindex{word-wrap-by-category}
> +  By default, Emacs only breaks lines after whitespace
> +characters. That strategy produces bad results when CJK and Latin text
             ^^
Two spaces between sentences, please (here and elsewhere, in
documentation and in comments).

> +are mixed together (because CJK characters don't use whitespace to
> +separate words).  You can customize @code{word-wrap-by-category} to
> +allow Emacs to break lines after more characters. That way
> +word-wrapping for CJK-Latin text works right.

This should mention char-category-set and modify-category-entry (with
a hyperlink to the ELisp manual); otherwise users will not know how to
customize this feature to their needs.

> +static bool it_char_has_category(struct it *it, int cat)

Our style is to start the function's name at BOL, like this:

  static bool
  it_char_has_category(struct it *it, int cat)

Otherwise, LGTM, thanks.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-22  7:42                                                                                                   ` Eli Zaretskii
@ 2020-08-22 20:58                                                                                                     ` Yuan Fu
  2020-08-23  7:12                                                                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yuan Fu @ 2020-08-22 20:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]



> On Aug 22, 2020, at 3:42 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sun, 16 Aug 2020 13:31:56 -0400
>> Cc: larsi@gnus.org,
>> emacs-devel@gnu.org
>> 
>> +@vindex{word-wrap-by-category}
>> +  By default, Emacs only breaks lines after whitespace
>> +characters. That strategy produces bad results when CJK and Latin text
>             ^^
> Two spaces between sentences, please (here and elsewhere, in
> documentation and in comments).
> 
>> +are mixed together (because CJK characters don't use whitespace to
>> +separate words).  You can customize @code{word-wrap-by-category} to
>> +allow Emacs to break lines after more characters. That way
>> +word-wrapping for CJK-Latin text works right.
> 
> This should mention char-category-set and modify-category-entry (with
> a hyperlink to the ELisp manual); otherwise users will not know how to
> customize this feature to their needs.
> 
>> +static bool it_char_has_category(struct it *it, int cat)
> 
> Our style is to start the function's name at BOL, like this:
> 
>  static bool
>  it_char_has_category(struct it *it, int cat)
> 
> Otherwise, LGTM, thanks.

I fixed everything and added some words to the manual, please have a look. If you have improvements over my phrasing for the manual, please feel free to just modify and push.

Thanks,
Yuan


[-- Attachment #2: word-wrap.patch --]
[-- Type: application/octet-stream, Size: 15688 bytes --]

From c54b8ebdcbabed7d4c8977b0e5da49cc680e390b Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 26 May 2020 22:47:27 -0400
Subject: [PATCH] Improve word wrapping for CJK characters

Note about the change around line 9257 and 23372:

Before, the test for whitespace checks for can_wrap_before and
can_wrap_after simutaniously.  Now we separate these two checks, and
the logic needs to change a little bit.  However, when we don't enable
the new wrapping feature, 'can_wrap_after' is equivalent to
'IT_DISPLAYING_WHITESPACE' and 'can_wrap_before' is equivalent to
'!IT_DISPLAYING_WHITESPACE'.  And the new logic is equivalent with the
old one in that case.

Old logic:

    if (whitespace) /* Which means can wrap after && can't wrap
                       before.  */
      may_wrap = true;

    else if (may_wrap) /* aka (!whitespace && may_wrap)
      (set wrap point)  * aka (can't wrap after && can wrap before
      may_wrap = false  *      && may_wrap)
                        */

New logic:

    if (can_wrap_after)
      next_may_wrap = true
    else
      next_may_wrap = false;

    if (may_wrap && can_wrap_before)
      (set wrap point)

    /* Update may_wrap.  */
    may_wrap = next_may_wrap;

* src/xdisp.c (it_char_has_category, char_can_wrap_before,
char_can_wrap_after): New functions.
(move_it_in_display_line_to, display_line): Replace
'IT_DISPLAYING_WHITESPACE' with either 'char_can_wrap_before' or
'char_can_wrap_after'.
(word-wrap-by-category): New variable.
* doc/emacs/display.texi (Visual Line Mode): Add a paragraph about the
new feature.
* etc/NEWS: Add a news entry.
* lisp/cus-start.el (minibuffer-prompt-properties--setter): Add
'word-wrap-by-category' as a custom variable.
---
 doc/emacs/display.texi |  21 +++++
 etc/NEWS               |   6 ++
 lisp/cus-start.el      |   5 ++
 src/xdisp.c            | 189 ++++++++++++++++++++++++++++++++---------
 4 files changed, 183 insertions(+), 38 deletions(-)

diff --git a/doc/emacs/display.texi b/doc/emacs/display.texi
index 536f4cb5da..4f982b58fc 100644
--- a/doc/emacs/display.texi
+++ b/doc/emacs/display.texi
@@ -1801,6 +1801,27 @@ Visual Line Mode
 would be visually distracting.  You can change this by customizing the
 variable @code{visual-line-fringe-indicators}.
 
+@vindex word-wrap-by-category
+@findex modify-category-entry
+@findex char-category-set
+@findex category-set-mnemonics
+  By default, Emacs only breaks lines after whitespace characters.
+That strategy produces bad results when CJK and Latin text are mixed
+together (because CJK characters don't use whitespace to separate
+words).  You can customize @code{word-wrap-by-category} to allow Emacs
+to break lines after any character with ``|'' category
+(@pxref{Categories,,, elisp, the Emacs Lisp Reference Manual}), which
+includes CJK characters.  Also, if this variable is set using
+Customize, Emacs automatically loads kinsoku.el.  When kinsoku.el is
+loaded, Emacs respects kinsoku rules when breaking lines.  That means
+characters with the ``>'' category don't appear at the beginning of a
+line (e.g., FULLWIDTH COMMA), and characters with the ``<'' category
+don't appear at the end of a line (e.g., LEFT DOUBLE ANGLE BRACKET).
+You can view the categories of a character by @code{char-category-set}
+and @code{category-set-mnemonics}, or type @kbd{C-u C-x =} with point
+on the character and look at the ``category'' section in the report.
+You can add categories to a character by @code{modify-category-entry}.
+
 @node Display Custom
 @section Customization of Display
 
diff --git a/etc/NEWS b/etc/NEWS
index e97755a454..9d95186866 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -70,6 +70,12 @@ specify 'cursor-type' to be '(box . SIZE)', the cursor becomes a hollow
 box if the point is on an image larger than 'SIZE' pixels in any
 dimension.
 
+** New custom option 'word-wrap-by-category'.
+
+When word-wrap is enabled, this option allows Emacs to break lines
+after more characters (instead of just whitespace characters), that
+means word-wrapping for CJK text mixed with Latin text is improved.
+
 \f
 * Editing Changes in Emacs 28.1
 
diff --git a/lisp/cus-start.el b/lisp/cus-start.el
index 6632687da4..7ecb7b51be 100644
--- a/lisp/cus-start.el
+++ b/lisp/cus-start.el
@@ -98,6 +98,11 @@ minibuffer-prompt-properties--setter
 	     (ctl-arrow display boolean)
 	     (truncate-lines display boolean)
 	     (word-wrap display boolean)
+             (word-wrap-by-category
+              display boolean "28.1"
+              :set (lambda (symbol value)
+                     (set-default symbol value)
+                     (when value (load "kinsoku.el"))))
 	     (selective-display-ellipses display boolean)
 	     (indicate-empty-lines fringe boolean)
 	     (indicate-buffer-boundaries
diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..efd41005e3 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
 #include "termchar.h"
 #include "dispextern.h"
 #include "character.h"
+#include "category.h"
 #include "buffer.h"
 #include "charset.h"
 #include "indent.h"
@@ -508,6 +509,80 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* These are the category sets we use.  They are defined by
+   kinsoku.el and chracters.el.  */
+#define NOT_AT_EOL '<'
+#define NOT_AT_BOL '>'
+#define LINE_BREAKABLE '|'
+
+static bool
+it_char_has_category(struct it *it, int cat)
+{
+  int ch = 0;
+  if (it->what == IT_CHARACTER)
+    ch = it->c;
+  else if (STRINGP (it->string))
+    ch = SREF (it->string, IT_STRING_BYTEPOS (*it));
+  else if (it->s)
+    ch = it->s[IT_BYTEPOS (*it)];
+  else if (IT_BYTEPOS (*it) < ZV_BYTE)
+    ch = *BYTE_POS_ADDR (IT_BYTEPOS (*it));
+
+  if (ch == 0)
+    return false;
+  else
+    return CHAR_HAS_CATEGORY (ch, cat);
+}
+
+/* Return true if the current character allows wrapping before it.   */
+static bool
+char_can_wrap_before (struct it *it)
+{
+  if (!Vword_wrap_by_category)
+    return !IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_bol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_bol = NOT_AT_EOL;
+  else
+    not_at_bol = NOT_AT_BOL;
+  /* You cannot wrap before a space or tab because that way you'll
+     have space and tab at the beginning of next line.  */
+  return (!IT_DISPLAYING_WHITESPACE (it)
+	  /* Can be at BOL.  */
+	  && !it_char_has_category (it, not_at_bol));
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool
+char_can_wrap_after (struct it *it)
+{
+  if (!Vword_wrap_by_category)
+    return IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_eol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_eol = NOT_AT_BOL;
+  else
+    not_at_eol = NOT_AT_EOL;
+
+  return (IT_DISPLAYING_WHITESPACE (it)
+	  /* Can break after && can be at EOL.  */
+	  || (it_char_has_category (it, LINE_BREAKABLE)
+	      && !it_char_has_category (it, not_at_eol)));
+}
+
+#undef IT_DISPLAYING_WHITESPACE
+#undef NOT_AT_EOL
+#undef NOT_AT_BOL
+#undef LINE_BREAKABLE
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9185,13 +9260,20 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              bool next_may_wrap = may_wrap;
+              /* Can we wrap after this character?  */
+              if (char_can_wrap_after (it))
+		next_may_wrap = true;
+              else
+                next_may_wrap = false;
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  /* We have reached a glyph that follows one or more
-		     whitespace characters.  If the position is
-		     already found, we are done.  */
+		     whitespace characters or a character that allows
+		     wrapping after it.  If this character allows
+		     wrapping before it, save this position as a
+		     wrapping point.  */
 		  if (atpos_it.sp >= 0)
 		    {
 		      RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9206,8 +9288,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		    }
 		  /* Otherwise, we can wrap here.  */
 		  SAVE_IT (wrap_it, *it, wrap_data);
-		  may_wrap = false;
+                  next_may_wrap = false;
 		}
+              /* Update may_wrap for the next iteration.  */
+              may_wrap = next_may_wrap;
 	    }
 	}
 
@@ -9335,10 +9419,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			    {
 			      bool can_wrap = true;
 
-			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+			      /* If the previous character says we can
+				 wrap after it, but the current
+				 character says we can't wrap before
+				 it, then we can't wrap here.  */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9350,7 +9434,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && !char_can_wrap_before (it))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9429,19 +9513,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
-			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+			 may_wrap flag says we can't wrap before it,
+			 we can't wrap here.  Therefore, wrap_it
+			 (previously found wrap-point) _is_ relevant
+			 in that case.  */
+		      && !(moved_forward && char_can_wrap_before (it)))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23292,9 +23375,14 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              bool next_may_wrap = may_wrap;
+              /* Can we wrap after this character?  */
+              if (char_can_wrap_after (it))
+		next_may_wrap = true;
+              else
+                next_may_wrap = false;
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23308,8 +23396,9 @@ #define RECORD_MAX_MIN_POS(IT)					\
 		  wrap_row_min_bpos = min_bpos;
 		  wrap_row_max_pos = max_pos;
 		  wrap_row_max_bpos = max_bpos;
-		  may_wrap = false;
 		}
+	      /* Update may_wrap for the next iteration.  */
+              may_wrap = next_may_wrap;
 	    }
 	}
 
@@ -23433,14 +23522,18 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			  /* If line-wrap is on, check if a previous
 			     wrap point was found.  */
 			  if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
-			      && wrap_row_used > 0
+			      && wrap_row_used > 0 /* Found.  */
 			      /* Even if there is a previous wrap
 				 point, continue the line here as
 				 usual, if (i) the previous character
-				 was a space or tab AND (ii) the
-				 current character is not.  */
-			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				 allows wrapping after it, AND (ii)
+				 the current character allows wrapping
+				 before it.  Because this is a valid
+				 break point, we can just continue to
+				 the next line at here, there is no
+				 need to wrap early at the previous
+				 wrap point.  */
+			      && (!may_wrap || !char_can_wrap_before (it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23468,13 +23561,16 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			      /* If line-wrap is on, check if a
 				 previous wrap point was found.  */
 			      else if (wrap_row_used > 0
-				       /* Even if there is a previous wrap
-					  point, continue the line here as
-					  usual, if (i) the previous character
-					  was a space or tab AND (ii) the
-					  current character is not.  */
-				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+				       /* Even if there is a previous
+					  wrap point, continue the
+					  line here as usual, if (i)
+					  the previous character was a
+					  space or tab AND (ii) the
+					  current character is not,
+					  AND (iii) the current
+					  character allows wrapping
+					  before it.  */
+				       && (!may_wrap || !char_can_wrap_before (it)))
 				goto back_to_wrap;
 
 			    }
@@ -34594,6 +34690,23 @@ syms_of_xdisp (void)
 If `word-wrap' is enabled, you might want to reduce this.  */);
   Vtruncate_partial_width_windows = make_fixnum (50);
 
+  DEFVAR_BOOL("word-wrap-by-category", Vword_wrap_by_category, doc: /*
+    Non-nil means also wrap after characters of a certain category.
+Normally when `word-wrap' is on, Emacs only breaks lines after
+whitespace characters.  When this option is turned on, Emacs also
+breaks lines after characters that have the "|" category (defined in
+characters.el).  This is useful for allowing breaking after CJK
+characters and improves the word-wrapping for CJK text mixed with
+Latin text.
+
+If this variable is set using Customize, Emacs automatically loads
+kinsoku.el.  When kinsoku.el is loaded, Emacs respects kinsoku rules
+when breaking lines.  That means characters with the ">" category
+don't appear at the beginning of a line (e.g., FULLWIDTH COMMA), and
+characters with the "<" category don't appear at the end of a line
+(e.g., LEFT DOUBLE ANGLE BRACKET).  */);
+  Vword_wrap_by_category = false;
+
   DEFVAR_LISP ("line-number-display-limit", Vline_number_display_limit,
     doc: /* Maximum buffer size for which line number should be displayed.
 If the buffer is bigger than this, the line number does not appear
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-22 20:58                                                                                                     ` Yuan Fu
@ 2020-08-23  7:12                                                                                                       ` Eli Zaretskii
  2020-08-24 14:00                                                                                                         ` Yuan Fu
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2020-08-23  7:12 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 22 Aug 2020 16:58:00 -0400
> Cc: larsi@gnus.org,
>  emacs-devel@gnu.org
> 
> I fixed everything and added some words to the manual, please have a look. If you have improvements over my phrasing for the manual, please feel free to just modify and push.

Thanks, done.

Please in the future make sure the patch you send still applies to the
latest branch HEAD; this one didn't.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-08-23  7:12                                                                                                       ` Eli Zaretskii
@ 2020-08-24 14:00                                                                                                         ` Yuan Fu
  0 siblings, 0 replies; 88+ messages in thread
From: Yuan Fu @ 2020-08-24 14:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel



> On Aug 23, 2020, at 3:12 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sat, 22 Aug 2020 16:58:00 -0400
>> Cc: larsi@gnus.org,
>> emacs-devel@gnu.org
>> 
>> I fixed everything and added some words to the manual, please have a look. If you have improvements over my phrasing for the manual, please feel free to just modify and push.
> 
> Thanks, done.
> 
> Please in the future make sure the patch you send still applies to the
> latest branch HEAD; this one didn’t.

Thanks! I’ll keep that in mind.

Yuan





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26 20:31             ` Yuan Fu
  2020-05-26 22:29               ` Yuan Fu
@ 2020-05-27 15:20               ` Eli Zaretskii
  1 sibling, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2020-05-27 15:20 UTC (permalink / raw)
  To: Yuan Fu; +Cc: larsi, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Tue, 26 May 2020 16:31:36 -0400
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  emacs-devel@gnu.org
> 
> I saw someone mentioning Line_Break.txt from unicode and looked it up, unicode commission has already marked out all wrap-able code points. IIUC we can add Line_Break.txt to admin/unidat and parse it and put a elisp file under /lisp/international, right? We can categoarize all the marked code points into three categories as I mentioned earlier. 

Line_Break.txt is a data file, but its use is for implementing the
Unicode Line Breaking Algorithm, which is described in UAX#14, the
Unicode Standard Annex #14.  You can find its URL at the beginning of
Line_Break.txt; I suggest to read it.  Implementing that algorithm is
something we should do, and when we do, we indeed need to import the
data in that file into Emacs and use it.  I don't see a reason to
import the data in Line_Break.txt without implementing the algorithm,
or at least most of it.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-25 18:13 Line wrap reconsidered Yuan Fu
                   ` (2 preceding siblings ...)
  2020-05-25 20:43 ` Lars Ingebrigtsen
@ 2020-05-26  8:02 ` martin rudalics
  2020-05-26 12:38   ` Yuan Fu
  3 siblings, 1 reply; 88+ messages in thread
From: martin rudalics @ 2020-05-26  8:02 UTC (permalink / raw)
  To: Yuan Fu, emacs-devel

 > Here is what I come up with: in redisplay code, instead of only checking for whitespace, check for a ‘no-wrap text-property, if the character has this property, don’t wrap before[1] this character (or maybe it can be the opposite, only wrap when the character has a ‘can-wrap property). And this text property is calculated and applied once.
 >
 > Could this be plausible? Is checking text property is fast enough for redisplay?
 >
 > [1] There are some complications to this, some characters can’t have line break before them, some can’t have after; maybe  use ‘before, ‘after and nil instead of binary value.

While you're there could you please have a short look at Adam's patch for
Bug#13399?  For whatever reason he stopped working on this back then.

Thanks, martin




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Line wrap reconsidered
  2020-05-26  8:02 ` martin rudalics
@ 2020-05-26 12:38   ` Yuan Fu
  0 siblings, 0 replies; 88+ messages in thread
From: Yuan Fu @ 2020-05-26 12:38 UTC (permalink / raw)
  To: martin rudalics; +Cc: emacs-devel



> On May 26, 2020, at 4:02 AM, martin rudalics <rudalics@gmx.at> wrote:
> 
> > Here is what I come up with: in redisplay code, instead of only checking for whitespace, check for a ‘no-wrap text-property, if the character has this property, don’t wrap before[1] this character (or maybe it can be the opposite, only wrap when the character has a ‘can-wrap property). And this text property is calculated and applied once.
> >
> > Could this be plausible? Is checking text property is fast enough for redisplay?
> >
> > [1] There are some complications to this, some characters can’t have line break before them, some can’t have after; maybe  use ‘before, ‘after and nil instead of binary value.
> 
> While you're there could you please have a short look at Adam's patch for
> Bug#13399?  For whatever reason he stopped working on this back then.

I had a look, with my proposal the various whitespaces could be wrapped, yes.

Yuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2020-08-24 14:00 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-25 18:13 Line wrap reconsidered Yuan Fu
2020-05-25 19:23 ` Eli Zaretskii
2020-05-25 19:31   ` Yuan Fu
2020-05-26  1:55   ` Ihor Radchenko
2020-05-26 12:55     ` Joost Kremers
2020-05-26 13:35       ` Yuan Fu
2020-05-26 14:47     ` Eli Zaretskii
2020-05-26 15:01       ` Ihor Radchenko
2020-05-26 15:29         ` Eli Zaretskii
2020-05-26 15:46           ` Ihor Radchenko
2020-05-26 16:29             ` Eli Zaretskii
2020-05-26 15:59       ` Stefan Monnier
2020-05-26 16:31         ` Eli Zaretskii
2020-05-26 16:43           ` Yuan Fu
2020-05-26 16:43             ` Ihor Radchenko
2020-05-26 18:57             ` Eli Zaretskii
2020-05-26 19:10               ` Yuan Fu
2020-05-26 19:59                 ` Eli Zaretskii
2020-05-26 19:12               ` Ihor Radchenko
2020-05-26 20:04                 ` Eli Zaretskii
2020-05-26 21:01                   ` Stefan Monnier
2020-05-25 19:31 ` Stefan Monnier
2020-05-25 19:51   ` Yuan Fu
2020-05-25 20:43 ` Lars Ingebrigtsen
2020-05-25 23:26   ` Yuan Fu
2020-05-25 23:32     ` Yuan Fu
2020-05-26  2:15       ` Yuan Fu
2020-05-26  3:30         ` Yuan Fu
2020-05-26  4:46           ` Yuan Fu
2020-05-26 15:14             ` Eli Zaretskii
2020-05-26 15:00           ` Eli Zaretskii
2020-05-26 14:54       ` Eli Zaretskii
2020-05-26 17:34         ` Yuan Fu
2020-05-26 19:50           ` Eli Zaretskii
2020-05-26 20:31             ` Yuan Fu
2020-05-26 22:29               ` Yuan Fu
2020-05-27 17:29                 ` Eli Zaretskii
2020-05-28 17:31                   ` Yuan Fu
2020-05-28 18:05                     ` Eli Zaretskii
2020-05-28 19:34                       ` Yuan Fu
2020-05-28 20:42                         ` Yuan Fu
2020-05-29  7:17                           ` Eli Zaretskii
2020-05-29  6:56                         ` Eli Zaretskii
2020-05-29 21:20                           ` Yuan Fu
2020-05-30  6:14                             ` Eli Zaretskii
2020-05-31 17:39                               ` Yuan Fu
2020-05-31 17:55                                 ` Eli Zaretskii
2020-05-31 18:23                                   ` Yuan Fu
2020-05-31 18:47                                     ` Eli Zaretskii
2020-06-18 21:46                                       ` Yuan Fu
2020-06-19  6:17                                         ` Eli Zaretskii
2020-06-19 12:04                                           ` Yuan Fu
2020-06-19 12:38                                             ` Eli Zaretskii
2020-06-19 17:22                                               ` Yuan Fu
2020-06-19 17:47                                                 ` Eli Zaretskii
2020-06-19 18:03                                                   ` Yuan Fu
2020-06-19 18:34                                                     ` Eli Zaretskii
2020-07-12 17:25                                                       ` Yuan Fu
2020-07-12 18:27                                                         ` Eli Zaretskii
2020-07-12 19:28                                                           ` Yuan Fu
2020-07-13 19:46                                                             ` Yuan Fu
2020-07-18  8:15                                                               ` Eli Zaretskii
2020-07-18 17:14                                                                 ` Yuan Fu
2020-07-18 19:49                                                                   ` Yuan Fu
2020-07-18 20:25                                                                   ` Stefan Monnier
2020-07-19 14:52                                                                   ` Eli Zaretskii
2020-07-19 16:16                                                                     ` Yuan Fu
2020-07-19 16:17                                                                       ` Yuan Fu
2020-08-13 19:35                                                                         ` Yuan Fu
2020-08-14  5:55                                                                           ` Eli Zaretskii
2020-08-14 15:08                                                                             ` Yuan Fu
2020-08-15  9:10                                                                               ` Eli Zaretskii
2020-08-15 13:10                                                                                 ` Fu Yuan
2020-08-15 14:56                                                                                   ` Eli Zaretskii
2020-08-15 17:34                                                                                     ` Yuan Fu
2020-08-15 17:46                                                                                       ` Eli Zaretskii
2020-08-15 18:00                                                                                         ` Yuan Fu
2020-08-15 18:47                                                                                           ` Eli Zaretskii
2020-08-16  3:22                                                                                             ` Yuan Fu
2020-08-16 14:15                                                                                               ` Eli Zaretskii
2020-08-16 17:31                                                                                                 ` Yuan Fu
2020-08-22  7:42                                                                                                   ` Eli Zaretskii
2020-08-22 20:58                                                                                                     ` Yuan Fu
2020-08-23  7:12                                                                                                       ` Eli Zaretskii
2020-08-24 14:00                                                                                                         ` Yuan Fu
2020-05-27 15:20               ` Eli Zaretskii
2020-05-26  8:02 ` martin rudalics
2020-05-26 12:38   ` Yuan Fu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).