From: Yuan Fu <casouri@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel@gnu.org
Subject: Re: Line wrap reconsidered
Date: Sat, 18 Jul 2020 13:14:15 -0400 [thread overview]
Message-ID: <3A15EC3E-78EC-40BA-8841-47B3F9749092@gmail.com> (raw)
In-Reply-To: <838sfhi6hm.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 5321 bytes --]
> On Jul 18, 2020, at 4:15 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 13 Jul 2020 15:46:16 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>>
>> Please have a look at the patch and see if it’s ok. If you think it’s good I can then update NEWS and the manual and submit a bug report. wrap.txt is the file I used to test word wrapping. To enable the full feature, set cjk-word-wrap to t and load kinsoku.el.
>
> Yes, we need to update NEWS and the manual.
>
> Also, we may need to rename cjk-word-wrap to something more accurate,
> as result of your answers to my questions below.
Cool, I’ll start on NEWS and manual once we are settled on the name of the new variable. I agree cjk-word-wrap isn’t a good name. I just used it as a placeholder.
>
> A few minor comments below.
>
>> * src/xdisp.c (it_char_has_category, char_can_wrap_before,
>> char_can_wrap_after): New function.
> ^^^^^^^^^^^^
> "New functions", in plural.
>
>> (move_it_in_display_line_to, display_line): Replace
>> IT_DISPLAYING_WHITESPACE with char_can_wrap_before and
>> char_can_wrap_after.
>
> Please quote all references in commit log messages to functions and
> variables 'like this'.
>
>> +/* These are the category sets we use. */
>> +#define NOT_AT_EOL 60 /* < */
>> +#define NOT_AT_BOL 62 /* > */
>> +#define LINE_BREAKABLE 124 /* | */
>
> Why not just use the characters themselves, as in '<' and '|' ?
>
> Also, if these characters are from kinsoku.el, please says so in
> comments, because if kinsoku.el changes, we may need to update those.
>
Fixed.
>> +static bool it_char_has_category(struct it *it, int cat)
>> +{
>> + if (it->what == IT_CHARACTER)
>> + return CHAR_HAS_CATEGORY (it->c, cat);
>> + else if (STRINGP (it->string))
>> + return CHAR_HAS_CATEGORY (SREF (it->string,
>> + IT_STRING_BYTEPOS (*it)), cat);
>> + else if (it->s)
>> + return CHAR_HAS_CATEGORY (it->s[IT_BYTEPOS (*it)], cat);
>> + else if (IT_BYTEPOS (*it) < ZV_BYTE)
>> + return CHAR_HAS_CATEGORY (*BYTE_POS_ADDR (IT_BYTEPOS (*it)), cat);
>> + else
>> + return false;
>> +}
>
> A minor stylistic nit: I'd prefer the if - elseif clauses to yield the
> relevant character, and then apply CHAR_HAS_CATEGORY only once to that
> character at the end. (It is generally better to have only one return
> point from a function, especially when the function is short. If
> nothing else, it makes debugging easier.)
I changed the it, do you code below this is ok?
if (ch == 0)
return false;
else
return CHAR_HAS_CATEGORY(ch, cat);
>
>> + return (!IT_DISPLAYING_WHITESPACE (it)
>> + // Can be at BOL.
>
> Please don't use //-style C++ comments, we use the C /* style */
> comments instead.
>
>> + return (IT_DISPLAYING_WHITESPACE (it)
>> + // Can break after && can be at EOL.
>> + || (it_char_has_category (it, LINE_BREAKABLE)
>> + && !it_char_has_category (it, not_at_eol)));
>
> Same here.
Fixed.
>
>> if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
>> {
>> - if (IT_DISPLAYING_WHITESPACE (it))
>> - may_wrap = true;
>> - else if (may_wrap)
>> + /* Can we wrap here? */
>> + if (may_wrap && char_can_wrap_before (it))
>
> I'm worried about a potential change in logic here, when cjk-word-wrap
> is off. Previously, we just tested IT_DISPLAYING_WHITESPACE, but now
> we also test may_wrap. Is it guaranteed that may_wrap is always true
> in that case?
>
>> @@ -23292,9 +23365,8 @@ #define RECORD_MAX_MIN_POS(IT) \
>>
>> if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
>> {
>> - if (IT_DISPLAYING_WHITESPACE (it))
>> - may_wrap = true;
>> - else if (may_wrap)
>> + /* Can we wrap here? */
>> + if (may_wrap && char_can_wrap_before (it))
>
> Likewise here.
In both can_wrap_before and can_wrap_after, I have a short circuit for the case when cjk_word_wrap is nil:
if (!Vcjk_word_wrap)
return IT_DISPLAYING_WHITESPACE (it);
That should guarantee the old behavior when cjk_word_wrap is nil, if that’s what you are asking about.
>
>> {
>> SAVE_IT (wrap_it, *it, wrap_data);
>> wrap_x = x;
>> @@ -23308,9 +23380,13 @@ #define RECORD_MAX_MIN_POS(IT) \
>> wrap_row_min_bpos = min_bpos;
>> wrap_row_max_pos = max_pos;
>> wrap_row_max_bpos = max_bpos;
>> - may_wrap = false;
>> }
>> - }
>> + /* This has to run after the previous block. */
>> + if (char_can_wrap_after (it))
>> + may_wrap = true;
>> + else
>> + may_wrap = false;
>
> Please use TABs and spaces to indent code in C source files. The last
> 2 lines use only spaces.
Sorry, fixed.
>
>> + DEFVAR_BOOL("cjk-word-wrap", Vcjk_word_wrap,
>> + doc: /* Non-nil means wrap after CJK chracters.
>
> This is unclear. Does it mean after _any_ CJK character, or just
> after some? And if the latter, which ones?
I added more detail and hopefully they are clearer now.
>
> Thanks.
Thanks!
Yuan
[-- Attachment #2: word-wrap.patch --]
[-- Type: application/octet-stream, Size: 10927 bytes --]
From 2baf9b6fd7dc8aa63f61d9dc14dbbb60cbb8c1fa Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 26 May 2020 22:47:27 -0400
Subject: [PATCH] Improve word wrapping for CJK characters
* src/xdisp.c (it_char_has_category, char_can_wrap_before,
char_can_wrap_after): New functions.
(move_it_in_display_line_to, display_line): Replace
'IT_DISPLAYING_WHITESPACE' with either 'char_can_wrap_before' or
'char_can_wrap_after'.
(cjk-word-wrap): New variable.
---
src/xdisp.c | 180 +++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 142 insertions(+), 38 deletions(-)
diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..35ff381829 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
#include "termchar.h"
#include "dispextern.h"
#include "character.h"
+#include "category.h"
#include "buffer.h"
#include "charset.h"
#include "indent.h"
@@ -508,6 +509,77 @@ #define IT_DISPLAYING_WHITESPACE(it) \
&& (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' ' \
|| *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
+/* These are the category sets we use. They are defined by
+ kinsoku.el. */
+#define NOT_AT_EOL '<'
+#define NOT_AT_BOL '>'
+#define LINE_BREAKABLE '|'
+
+static bool it_char_has_category(struct it *it, int cat)
+{
+ int ch = 0;
+ if (it->what == IT_CHARACTER)
+ ch = it->c;
+ else if (STRINGP (it->string))
+ ch = SREF (it->string, IT_STRING_BYTEPOS (*it));
+ else if (it->s)
+ ch = it->s[IT_BYTEPOS (*it)];
+ else if (IT_BYTEPOS (*it) < ZV_BYTE)
+ ch = *BYTE_POS_ADDR (IT_BYTEPOS (*it));
+
+ if (ch == 0)
+ return false;
+ else
+ return CHAR_HAS_CATEGORY (ch, cat);
+}
+
+/* Return true if the current character allows wrapping before it. */
+static bool char_can_wrap_before (struct it *it)
+{
+ if (!Vcjk_word_wrap)
+ return !IT_DISPLAYING_WHITESPACE (it);
+
+ /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+ Because in RTL paragraph, each glyph is prepended to the last
+ one, effectively drawing right to left. */
+ int not_at_bol;
+ if (it->glyph_row && it->glyph_row->reversed_p)
+ not_at_bol = NOT_AT_EOL;
+ else
+ not_at_bol = NOT_AT_BOL;
+ /* You cannot wrap before a space or tab because that way you'll
+ have space and tab at the beginning of next line. */
+ return (!IT_DISPLAYING_WHITESPACE (it)
+ /* Can be at BOL. */
+ && !it_char_has_category (it, not_at_bol));
+}
+
+/* Return true if the current character allows wrapping after it. */
+static bool char_can_wrap_after (struct it *it)
+{
+ if (!Vcjk_word_wrap)
+ return IT_DISPLAYING_WHITESPACE (it);
+
+ /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+ Because in RTL paragraph, each glyph is prepended to the last
+ one, effectively drawing right to left. */
+ int not_at_eol;
+ if (it->glyph_row && it->glyph_row->reversed_p)
+ not_at_eol = NOT_AT_BOL;
+ else
+ not_at_eol = NOT_AT_EOL;
+
+ return (IT_DISPLAYING_WHITESPACE (it)
+ /* Can break after && can be at EOL. */
+ || (it_char_has_category (it, LINE_BREAKABLE)
+ && !it_char_has_category (it, not_at_eol)));
+}
+
+#undef IT_DISPLAYING_WHITESPACE
+#undef NOT_AT_EOL
+#undef NOT_AT_BOL
+#undef LINE_BREAKABLE
+
/* If all the conditions needed to print the fill column indicator are
met, return the (nonnegative) column number, else return a negative
value. */
@@ -9185,13 +9257,14 @@ #define IT_RESET_X_ASCENT_DESCENT(IT) \
{
if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
{
- if (IT_DISPLAYING_WHITESPACE (it))
- may_wrap = true;
- else if (may_wrap)
+ /* Can we wrap here? */
+ if (may_wrap && char_can_wrap_before (it))
{
/* We have reached a glyph that follows one or more
- whitespace characters. If the position is
- already found, we are done. */
+ whitespace characters or a character that allows
+ wrapping after it. If this character allows
+ wrapping before it, save this position as a
+ wrapping point. */
if (atpos_it.sp >= 0)
{
RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9206,8 +9279,17 @@ #define IT_RESET_X_ASCENT_DESCENT(IT) \
}
/* Otherwise, we can wrap here. */
SAVE_IT (wrap_it, *it, wrap_data);
- may_wrap = false;
}
+ /* This has to run after the previous block because the
+ previous block consumes `may_wrap' and this block
+ sets it, but the value set by this block is intended
+ for the _next_ character/iteration. */
+ if (char_can_wrap_after (it))
+ /* may_wrap basically means "previous char allows
+ wrapping after it". */
+ may_wrap = true;
+ else
+ may_wrap = false;
}
}
@@ -9335,10 +9417,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT) \
{
bool can_wrap = true;
- /* If we are at a whitespace character
- that barely fits on this screen line,
- but the next character is also
- whitespace, we cannot wrap here. */
+ /* If the previous character says we can
+ wrap after it, but the current
+ character says we can't wrap before
+ it, then we can't wrap here. */
if (it->line_wrap == WORD_WRAP
&& wrap_it.sp >= 0
&& may_wrap
@@ -9350,7 +9432,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT) \
SAVE_IT (tem_it, *it, tem_data);
set_iterator_to_next (it, true);
if (get_next_display_element (it)
- && IT_DISPLAYING_WHITESPACE (it))
+ && !char_can_wrap_before (it))
can_wrap = false;
RESTORE_IT (it, &tem_it, tem_data);
}
@@ -9429,19 +9511,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT) \
else
IT_RESET_X_ASCENT_DESCENT (it);
- /* If the screen line ends with whitespace, and we
- are under word-wrap, don't use wrap_it: it is no
- longer relevant, but we won't have an opportunity
- to update it, since we are done with this screen
- line. */
+ /* If the screen line ends with whitespace (or
+ wrap-able character), and we are under word-wrap,
+ don't use wrap_it: it is no longer relevant, but
+ we won't have an opportunity to update it, since
+ we are done with this screen line. */
if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
/* If the character after the one which set the
- may_wrap flag is also whitespace, we can't
- wrap here, since the screen line cannot be
- wrapped in the middle of whitespace.
- Therefore, wrap_it _is_ relevant in that
- case. */
- && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+ may_wrap flag says we can't wrap before it,
+ we can't wrap here. Therefore, wrap_it
+ (previously found wrap-point) _is_ relevant
+ in that case. */
+ && !(moved_forward && char_can_wrap_before (it)))
{
/* If we've found TO_X, go back there, as we now
know the last word fits on this screen line. */
@@ -23292,9 +23373,8 @@ #define RECORD_MAX_MIN_POS(IT) \
if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
{
- if (IT_DISPLAYING_WHITESPACE (it))
- may_wrap = true;
- else if (may_wrap)
+ /* Can we wrap here? */
+ if (may_wrap && char_can_wrap_before (it))
{
SAVE_IT (wrap_it, *it, wrap_data);
wrap_x = x;
@@ -23308,8 +23388,12 @@ #define RECORD_MAX_MIN_POS(IT) \
wrap_row_min_bpos = min_bpos;
wrap_row_max_pos = max_pos;
wrap_row_max_bpos = max_bpos;
- may_wrap = false;
}
+ /* This has to run after the previous block. */
+ if (char_can_wrap_after (it))
+ may_wrap = true;
+ else
+ may_wrap = false;
}
}
@@ -23433,14 +23517,18 @@ #define RECORD_MAX_MIN_POS(IT) \
/* If line-wrap is on, check if a previous
wrap point was found. */
if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
- && wrap_row_used > 0
+ && wrap_row_used > 0 /* Found. */
/* Even if there is a previous wrap
point, continue the line here as
usual, if (i) the previous character
- was a space or tab AND (ii) the
- current character is not. */
- && (!may_wrap
- || IT_DISPLAYING_WHITESPACE (it)))
+ allows wrapping after it, AND (ii)
+ the current character allows wrapping
+ before it. Because this is a valid
+ break point, we can just continue to
+ the next line at here, there is no
+ need to wrap early at the previous
+ wrap point. */
+ && (!may_wrap || !char_can_wrap_before (it)))
goto back_to_wrap;
/* Record the maximum and minimum buffer
@@ -23468,13 +23556,16 @@ #define RECORD_MAX_MIN_POS(IT) \
/* If line-wrap is on, check if a
previous wrap point was found. */
else if (wrap_row_used > 0
- /* Even if there is a previous wrap
- point, continue the line here as
- usual, if (i) the previous character
- was a space or tab AND (ii) the
- current character is not. */
- && (!may_wrap
- || IT_DISPLAYING_WHITESPACE (it)))
+ /* Even if there is a previous
+ wrap point, continue the
+ line here as usual, if (i)
+ the previous character was a
+ space or tab AND (ii) the
+ current character is not,
+ AND (iii) the current
+ character allows wrapping
+ before it. */
+ && (!may_wrap || !char_can_wrap_before (it)))
goto back_to_wrap;
}
@@ -34594,6 +34685,19 @@ syms_of_xdisp (void)
If `word-wrap' is enabled, you might want to reduce this. */);
Vtruncate_partial_width_windows = make_fixnum (50);
+ DEFVAR_BOOL("cjk-word-wrap", Vcjk_word_wrap,
+ doc: /* Non-nil means also wrap after all CJK characters.
+Normally when word-wrapping is on, Emacs only breaks line after
+whitespace characters. When this option is turned on, Emacs also
+breaks line after CJK characters (more accurately, characters that
+have "|" category defined in characters.el).
+
+If kinsoku.el is loaded, Emacs also respects kinsoku rules when
+breaking lines. That means some characters don't appear at the
+beginning of a line (e.g., FULLWIDTH COMMA), and some don't appear at
+the end of a line (e.g., LEFT DOUBLE ANGLE BRACKET). */);
+ Vcjk_word_wrap = false;
+
DEFVAR_LISP ("line-number-display-limit", Vline_number_display_limit,
doc: /* Maximum buffer size for which line number should be displayed.
If the buffer is bigger than this, the line number does not appear
--
2.27.0
next prev parent reply other threads:[~2020-07-18 17:14 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-25 18:13 Line wrap reconsidered Yuan Fu
2020-05-25 19:23 ` Eli Zaretskii
2020-05-25 19:31 ` Yuan Fu
2020-05-26 1:55 ` Ihor Radchenko
2020-05-26 12:55 ` Joost Kremers
2020-05-26 13:35 ` Yuan Fu
2020-05-26 14:47 ` Eli Zaretskii
2020-05-26 15:01 ` Ihor Radchenko
2020-05-26 15:29 ` Eli Zaretskii
2020-05-26 15:46 ` Ihor Radchenko
2020-05-26 16:29 ` Eli Zaretskii
2020-05-26 15:59 ` Stefan Monnier
2020-05-26 16:31 ` Eli Zaretskii
2020-05-26 16:43 ` Yuan Fu
2020-05-26 16:43 ` Ihor Radchenko
2020-05-26 18:57 ` Eli Zaretskii
2020-05-26 19:10 ` Yuan Fu
2020-05-26 19:59 ` Eli Zaretskii
2020-05-26 19:12 ` Ihor Radchenko
2020-05-26 20:04 ` Eli Zaretskii
2020-05-26 21:01 ` Stefan Monnier
2020-05-25 19:31 ` Stefan Monnier
2020-05-25 19:51 ` Yuan Fu
2020-05-25 20:43 ` Lars Ingebrigtsen
2020-05-25 23:26 ` Yuan Fu
2020-05-25 23:32 ` Yuan Fu
2020-05-26 2:15 ` Yuan Fu
2020-05-26 3:30 ` Yuan Fu
2020-05-26 4:46 ` Yuan Fu
2020-05-26 15:14 ` Eli Zaretskii
2020-05-26 15:00 ` Eli Zaretskii
2020-05-26 14:54 ` Eli Zaretskii
2020-05-26 17:34 ` Yuan Fu
2020-05-26 19:50 ` Eli Zaretskii
2020-05-26 20:31 ` Yuan Fu
2020-05-26 22:29 ` Yuan Fu
2020-05-27 17:29 ` Eli Zaretskii
2020-05-28 17:31 ` Yuan Fu
2020-05-28 18:05 ` Eli Zaretskii
2020-05-28 19:34 ` Yuan Fu
2020-05-28 20:42 ` Yuan Fu
2020-05-29 7:17 ` Eli Zaretskii
2020-05-29 6:56 ` Eli Zaretskii
2020-05-29 21:20 ` Yuan Fu
2020-05-30 6:14 ` Eli Zaretskii
2020-05-31 17:39 ` Yuan Fu
2020-05-31 17:55 ` Eli Zaretskii
2020-05-31 18:23 ` Yuan Fu
2020-05-31 18:47 ` Eli Zaretskii
2020-06-18 21:46 ` Yuan Fu
2020-06-19 6:17 ` Eli Zaretskii
2020-06-19 12:04 ` Yuan Fu
2020-06-19 12:38 ` Eli Zaretskii
2020-06-19 17:22 ` Yuan Fu
2020-06-19 17:47 ` Eli Zaretskii
2020-06-19 18:03 ` Yuan Fu
2020-06-19 18:34 ` Eli Zaretskii
2020-07-12 17:25 ` Yuan Fu
2020-07-12 18:27 ` Eli Zaretskii
2020-07-12 19:28 ` Yuan Fu
2020-07-13 19:46 ` Yuan Fu
2020-07-18 8:15 ` Eli Zaretskii
2020-07-18 17:14 ` Yuan Fu [this message]
2020-07-18 19:49 ` Yuan Fu
2020-07-18 20:25 ` Stefan Monnier
2020-07-19 14:52 ` Eli Zaretskii
2020-07-19 16:16 ` Yuan Fu
2020-07-19 16:17 ` Yuan Fu
2020-08-13 19:35 ` Yuan Fu
2020-08-14 5:55 ` Eli Zaretskii
2020-08-14 15:08 ` Yuan Fu
2020-08-15 9:10 ` Eli Zaretskii
2020-08-15 13:10 ` Fu Yuan
2020-08-15 14:56 ` Eli Zaretskii
2020-08-15 17:34 ` Yuan Fu
2020-08-15 17:46 ` Eli Zaretskii
2020-08-15 18:00 ` Yuan Fu
2020-08-15 18:47 ` Eli Zaretskii
2020-08-16 3:22 ` Yuan Fu
2020-08-16 14:15 ` Eli Zaretskii
2020-08-16 17:31 ` Yuan Fu
2020-08-22 7:42 ` Eli Zaretskii
2020-08-22 20:58 ` Yuan Fu
2020-08-23 7:12 ` Eli Zaretskii
2020-08-24 14:00 ` Yuan Fu
2020-05-27 15:20 ` Eli Zaretskii
2020-05-26 8:02 ` martin rudalics
2020-05-26 12:38 ` Yuan Fu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3A15EC3E-78EC-40BA-8841-47B3F9749092@gmail.com \
--to=casouri@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=larsi@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).