unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Yuan Fu <casouri@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: larsi@gnus.org, emacs-devel@gnu.org
Subject: Re: Line wrap reconsidered
Date: Sun, 16 Aug 2020 13:31:56 -0400	[thread overview]
Message-ID: <2F824AA1-7C77-4554-B24F-1FA483AAF0C6@gmail.com> (raw)
In-Reply-To: <83o8naisn9.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 936 bytes --]



> On Aug 16, 2020, at 10:15 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sat, 15 Aug 2020 23:22:42 -0400
>> Cc: larsi@gnus.org,
>> emacs-devel@gnu.org
>> 
>>> What I had in mind is to have a special value of word-wrap-by-category
>>> that would load kinsoku by calling a setup function.  
>> 
>> That would be confusing tho. Suppose word-wrap-by-category can be t, nil and ‘kinsoku. If a user sets it first to 'kinsoku then to t, he would expect kinsoku stuff to not take effect; but in reality since kinsoku.el is loaded, it will be in effect until the Emacs session ends.
>> 
>> Since people who enable word-wrap-by-category (people who edit CJK chars) would probably also want kinsoku, could we just load kinsoku in the setter of word-wrap-by-category if it’s set to t? I think that’s ok.
> 
> OK, let's go with this method.

Here is the new patch :-)

Yuan


[-- Attachment #2: word-wrap.patch --]
[-- Type: application/octet-stream, Size: 14799 bytes --]

From eea495e4e2558650c448d514b6ecaa6186bd0994 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 26 May 2020 22:47:27 -0400
Subject: [PATCH] Improve word wrapping for CJK characters

Note about the change around line 9257 and 23372:

Before, the test for whitespace checks for can_wrap_before and
can_wrap_after simutaniously.  Now we separate these two checks, and
the logic needs to change a little bit.  However, when we don't enable
the new wrapping feature, 'can_wrap_after' is equivalent to
'IT_DISPLAYING_WHITESPACE' and 'can_wrap_before' is equivalent to
'!IT_DISPLAYING_WHITESPACE'.  And the new logic is equivalent with the
old one in that case.

Old logic:

    if (whitespace) /* Which means can wrap after && can't wrap
                       before.  */
      may_wrap = true;

    else if (may_wrap) /* aka (!whitespace && may_wrap)
      (set wrap point)  * aka (can't wrap after && can wrap before
      may_wrap = false  *      && may_wrap)
                        */

New logic:

    if (can_wrap_after)
      next_may_wrap = true
    else
      next_may_wrap = false;

    if (may_wrap && can_wrap_before)
      (set wrap point)

    /* Update may_wrap.  */
    may_wrap = next_may_wrap;

* src/xdisp.c (it_char_has_category, char_can_wrap_before,
char_can_wrap_after): New functions.
(move_it_in_display_line_to, display_line): Replace
'IT_DISPLAYING_WHITESPACE' with either 'char_can_wrap_before' or
'char_can_wrap_after'.
(word-wrap-by-category): New variable.
* doc/emacs/display.texi (Visual Line Mode): Add a paragraph about the
new feature.
* etc/NEWS: Add a news entry.
* lisp/cus-start.el (minibuffer-prompt-properties--setter): Add
'word-wrap-by-category' as a custom variable.
---
 doc/emacs/display.texi |   8 ++
 etc/NEWS               |   6 ++
 lisp/cus-start.el      |   5 ++
 src/xdisp.c            | 185 ++++++++++++++++++++++++++++++++---------
 4 files changed, 166 insertions(+), 38 deletions(-)

diff --git a/doc/emacs/display.texi b/doc/emacs/display.texi
index 536f4cb5da..f0bc9716c1 100644
--- a/doc/emacs/display.texi
+++ b/doc/emacs/display.texi
@@ -1801,6 +1801,14 @@ Visual Line Mode
 would be visually distracting.  You can change this by customizing the
 variable @code{visual-line-fringe-indicators}.
 
+@vindex{word-wrap-by-category}
+  By default, Emacs only breaks lines after whitespace
+characters. That strategy produces bad results when CJK and Latin text
+are mixed together (because CJK characters don't use whitespace to
+separate words).  You can customize @code{word-wrap-by-category} to
+allow Emacs to break lines after more characters. That way
+word-wrapping for CJK-Latin text works right.
+
 @node Display Custom
 @section Customization of Display
 
diff --git a/etc/NEWS b/etc/NEWS
index e97755a454..9d95186866 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -70,6 +70,12 @@ specify 'cursor-type' to be '(box . SIZE)', the cursor becomes a hollow
 box if the point is on an image larger than 'SIZE' pixels in any
 dimension.
 
+** New custom option 'word-wrap-by-category'.
+
+When word-wrap is enabled, this option allows Emacs to break lines
+after more characters (instead of just whitespace characters), that
+means word-wrapping for CJK text mixed with Latin text is improved.
+
 \f
 * Editing Changes in Emacs 28.1
 
diff --git a/lisp/cus-start.el b/lisp/cus-start.el
index 6632687da4..7ecb7b51be 100644
--- a/lisp/cus-start.el
+++ b/lisp/cus-start.el
@@ -98,6 +98,11 @@ minibuffer-prompt-properties--setter
 	     (ctl-arrow display boolean)
 	     (truncate-lines display boolean)
 	     (word-wrap display boolean)
+             (word-wrap-by-category
+              display boolean "28.1"
+              :set (lambda (symbol value)
+                     (set-default symbol value)
+                     (when value (load "kinsoku.el"))))
 	     (selective-display-ellipses display boolean)
 	     (indicate-empty-lines fringe boolean)
 	     (indicate-buffer-boundaries
diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..b1a6badd3a 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation,
 #include "termchar.h"
 #include "dispextern.h"
 #include "character.h"
+#include "category.h"
 #include "buffer.h"
 #include "charset.h"
 #include "indent.h"
@@ -508,6 +509,77 @@ #define IT_DISPLAYING_WHITESPACE(it)					\
 	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
 	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))
 
+/* These are the category sets we use.  They are defined by
+   kinsoku.el and chracters.el.  */
+#define NOT_AT_EOL '<'
+#define NOT_AT_BOL '>'
+#define LINE_BREAKABLE '|'
+
+static bool it_char_has_category(struct it *it, int cat)
+{
+  int ch = 0;
+  if (it->what == IT_CHARACTER)
+    ch = it->c;
+  else if (STRINGP (it->string))
+    ch = SREF (it->string, IT_STRING_BYTEPOS (*it));
+  else if (it->s)
+    ch = it->s[IT_BYTEPOS (*it)];
+  else if (IT_BYTEPOS (*it) < ZV_BYTE)
+    ch = *BYTE_POS_ADDR (IT_BYTEPOS (*it));
+
+  if (ch == 0)
+    return false;
+  else
+    return CHAR_HAS_CATEGORY (ch, cat);
+}
+
+/* Return true if the current character allows wrapping before it.   */
+static bool char_can_wrap_before (struct it *it)
+{
+  if (!Vword_wrap_by_category)
+    return !IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_bol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_bol = NOT_AT_EOL;
+  else
+    not_at_bol = NOT_AT_BOL;
+  /* You cannot wrap before a space or tab because that way you'll
+     have space and tab at the beginning of next line.  */
+  return (!IT_DISPLAYING_WHITESPACE (it)
+	  /* Can be at BOL.  */
+	  && !it_char_has_category (it, not_at_bol));
+}
+
+/* Return true if the current character allows wrapping after it.   */
+static bool char_can_wrap_after (struct it *it)
+{
+  if (!Vword_wrap_by_category)
+    return IT_DISPLAYING_WHITESPACE (it);
+
+  /* For CJK (LTR) text in RTL paragraph, EOL and BOL are flipped.
+     Because in RTL paragraph, each glyph is prepended to the last
+     one, effectively drawing right to left.  */
+  int not_at_eol;
+  if (it->glyph_row && it->glyph_row->reversed_p)
+    not_at_eol = NOT_AT_BOL;
+  else
+    not_at_eol = NOT_AT_EOL;
+
+  return (IT_DISPLAYING_WHITESPACE (it)
+	  /* Can break after && can be at EOL.  */
+	  || (it_char_has_category (it, LINE_BREAKABLE)
+	      && !it_char_has_category (it, not_at_eol)));
+}
+
+#undef IT_DISPLAYING_WHITESPACE
+#undef NOT_AT_EOL
+#undef NOT_AT_BOL
+#undef LINE_BREAKABLE
+
 /* If all the conditions needed to print the fill column indicator are
    met, return the (nonnegative) column number, else return a negative
    value.  */
@@ -9185,13 +9257,20 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 	{
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              bool next_may_wrap = may_wrap;
+              /* Can we wrap after this character?  */
+              if (char_can_wrap_after (it))
+		next_may_wrap = true;
+              else
+                next_may_wrap = false;
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  /* We have reached a glyph that follows one or more
-		     whitespace characters.  If the position is
-		     already found, we are done.  */
+		     whitespace characters or a character that allows
+		     wrapping after it.  If this character allows
+		     wrapping before it, save this position as a
+		     wrapping point.  */
 		  if (atpos_it.sp >= 0)
 		    {
 		      RESTORE_IT (it, &atpos_it, atpos_data);
@@ -9206,8 +9285,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		    }
 		  /* Otherwise, we can wrap here.  */
 		  SAVE_IT (wrap_it, *it, wrap_data);
-		  may_wrap = false;
+                  next_may_wrap = false;
 		}
+              /* Update may_wrap for the next iteration.  */
+              may_wrap = next_may_wrap;
 	    }
 	}
 
@@ -9335,10 +9416,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 			    {
 			      bool can_wrap = true;
 
-			      /* If we are at a whitespace character
-				 that barely fits on this screen line,
-				 but the next character is also
-				 whitespace, we cannot wrap here.  */
+			      /* If the previous character says we can
+				 wrap after it, but the current
+				 character says we can't wrap before
+				 it, then we can't wrap here.  */
 			      if (it->line_wrap == WORD_WRAP
 				  && wrap_it.sp >= 0
 				  && may_wrap
@@ -9350,7 +9431,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 				  SAVE_IT (tem_it, *it, tem_data);
 				  set_iterator_to_next (it, true);
 				  if (get_next_display_element (it)
-				      && IT_DISPLAYING_WHITESPACE (it))
+				      && !char_can_wrap_before (it))
 				    can_wrap = false;
 				  RESTORE_IT (it, &tem_it, tem_data);
 				}
@@ -9429,19 +9510,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT)			\
 		  else
 		    IT_RESET_X_ASCENT_DESCENT (it);
 
-		  /* If the screen line ends with whitespace, and we
-		     are under word-wrap, don't use wrap_it: it is no
-		     longer relevant, but we won't have an opportunity
-		     to update it, since we are done with this screen
-		     line.  */
+		  /* If the screen line ends with whitespace (or
+		     wrap-able character), and we are under word-wrap,
+		     don't use wrap_it: it is no longer relevant, but
+		     we won't have an opportunity to update it, since
+		     we are done with this screen line.  */
 		  if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
 		      /* If the character after the one which set the
-			 may_wrap flag is also whitespace, we can't
-			 wrap here, since the screen line cannot be
-			 wrapped in the middle of whitespace.
-			 Therefore, wrap_it _is_ relevant in that
-			 case.  */
-		      && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+			 may_wrap flag says we can't wrap before it,
+			 we can't wrap here.  Therefore, wrap_it
+			 (previously found wrap-point) _is_ relevant
+			 in that case.  */
+		      && !(moved_forward && char_can_wrap_before (it)))
 		    {
 		      /* If we've found TO_X, go back there, as we now
 			 know the last word fits on this screen line.  */
@@ -23292,9 +23372,14 @@ #define RECORD_MAX_MIN_POS(IT)					\
 
 	  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
 	    {
-	      if (IT_DISPLAYING_WHITESPACE (it))
-		may_wrap = true;
-	      else if (may_wrap)
+              bool next_may_wrap = may_wrap;
+              /* Can we wrap after this character?  */
+              if (char_can_wrap_after (it))
+		next_may_wrap = true;
+              else
+                next_may_wrap = false;
+	      /* Can we wrap here? */
+	      if (may_wrap && char_can_wrap_before (it))
 		{
 		  SAVE_IT (wrap_it, *it, wrap_data);
 		  wrap_x = x;
@@ -23308,8 +23393,9 @@ #define RECORD_MAX_MIN_POS(IT)					\
 		  wrap_row_min_bpos = min_bpos;
 		  wrap_row_max_pos = max_pos;
 		  wrap_row_max_bpos = max_bpos;
-		  may_wrap = false;
 		}
+	      /* Update may_wrap for the next iteration.  */
+              may_wrap = next_may_wrap;
 	    }
 	}
 
@@ -23433,14 +23519,18 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			  /* If line-wrap is on, check if a previous
 			     wrap point was found.  */
 			  if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it)
-			      && wrap_row_used > 0
+			      && wrap_row_used > 0 /* Found.  */
 			      /* Even if there is a previous wrap
 				 point, continue the line here as
 				 usual, if (i) the previous character
-				 was a space or tab AND (ii) the
-				 current character is not.  */
-			      && (!may_wrap
-				  || IT_DISPLAYING_WHITESPACE (it)))
+				 allows wrapping after it, AND (ii)
+				 the current character allows wrapping
+				 before it.  Because this is a valid
+				 break point, we can just continue to
+				 the next line at here, there is no
+				 need to wrap early at the previous
+				 wrap point.  */
+			      && (!may_wrap || !char_can_wrap_before (it)))
 			    goto back_to_wrap;
 
 			  /* Record the maximum and minimum buffer
@@ -23468,13 +23558,16 @@ #define RECORD_MAX_MIN_POS(IT)					\
 			      /* If line-wrap is on, check if a
 				 previous wrap point was found.  */
 			      else if (wrap_row_used > 0
-				       /* Even if there is a previous wrap
-					  point, continue the line here as
-					  usual, if (i) the previous character
-					  was a space or tab AND (ii) the
-					  current character is not.  */
-				       && (!may_wrap
-					   || IT_DISPLAYING_WHITESPACE (it)))
+				       /* Even if there is a previous
+					  wrap point, continue the
+					  line here as usual, if (i)
+					  the previous character was a
+					  space or tab AND (ii) the
+					  current character is not,
+					  AND (iii) the current
+					  character allows wrapping
+					  before it.  */
+				       && (!may_wrap || !char_can_wrap_before (it)))
 				goto back_to_wrap;
 
 			    }
@@ -34594,6 +34687,22 @@ syms_of_xdisp (void)
 If `word-wrap' is enabled, you might want to reduce this.  */);
   Vtruncate_partial_width_windows = make_fixnum (50);
 
+  DEFVAR_BOOL("word-wrap-by-category", Vword_wrap_by_category,
+    doc: /*  Non-nil means also wrap after characters of a certain category.
+Normally when `word-wrap' is on, Emacs only breaks lines after
+whitespace characters.  When this option is turned on, Emacs also
+breaks lines after characters that have the "|" category (defined in
+characters.el).  This is useful for allowing breaking after CJK
+characters and improves the word-wrapping for CJK text mixed with
+Latin text.
+
+If this variable is set using Customize, Emacs automatically loads
+kinsoku.el.  When kinsoku.el is loaded, Emacs respects kinsoku rules
+when breaking lines.  That means some characters don't appear at the
+beginning of a line (e.g., FULLWIDTH COMMA), and some don't appear at
+the end of a line (e.g., LEFT DOUBLE ANGLE BRACKET).  */);
+  Vword_wrap_by_category = false;
+
   DEFVAR_LISP ("line-number-display-limit", Vline_number_display_limit,
     doc: /* Maximum buffer size for which line number should be displayed.
 If the buffer is bigger than this, the line number does not appear
-- 
2.27.0


  reply	other threads:[~2020-08-16 17:31 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-25 18:13 Line wrap reconsidered Yuan Fu
2020-05-25 19:23 ` Eli Zaretskii
2020-05-25 19:31   ` Yuan Fu
2020-05-26  1:55   ` Ihor Radchenko
2020-05-26 12:55     ` Joost Kremers
2020-05-26 13:35       ` Yuan Fu
2020-05-26 14:47     ` Eli Zaretskii
2020-05-26 15:01       ` Ihor Radchenko
2020-05-26 15:29         ` Eli Zaretskii
2020-05-26 15:46           ` Ihor Radchenko
2020-05-26 16:29             ` Eli Zaretskii
2020-05-26 15:59       ` Stefan Monnier
2020-05-26 16:31         ` Eli Zaretskii
2020-05-26 16:43           ` Yuan Fu
2020-05-26 16:43             ` Ihor Radchenko
2020-05-26 18:57             ` Eli Zaretskii
2020-05-26 19:10               ` Yuan Fu
2020-05-26 19:59                 ` Eli Zaretskii
2020-05-26 19:12               ` Ihor Radchenko
2020-05-26 20:04                 ` Eli Zaretskii
2020-05-26 21:01                   ` Stefan Monnier
2020-05-25 19:31 ` Stefan Monnier
2020-05-25 19:51   ` Yuan Fu
2020-05-25 20:43 ` Lars Ingebrigtsen
2020-05-25 23:26   ` Yuan Fu
2020-05-25 23:32     ` Yuan Fu
2020-05-26  2:15       ` Yuan Fu
2020-05-26  3:30         ` Yuan Fu
2020-05-26  4:46           ` Yuan Fu
2020-05-26 15:14             ` Eli Zaretskii
2020-05-26 15:00           ` Eli Zaretskii
2020-05-26 14:54       ` Eli Zaretskii
2020-05-26 17:34         ` Yuan Fu
2020-05-26 19:50           ` Eli Zaretskii
2020-05-26 20:31             ` Yuan Fu
2020-05-26 22:29               ` Yuan Fu
2020-05-27 17:29                 ` Eli Zaretskii
2020-05-28 17:31                   ` Yuan Fu
2020-05-28 18:05                     ` Eli Zaretskii
2020-05-28 19:34                       ` Yuan Fu
2020-05-28 20:42                         ` Yuan Fu
2020-05-29  7:17                           ` Eli Zaretskii
2020-05-29  6:56                         ` Eli Zaretskii
2020-05-29 21:20                           ` Yuan Fu
2020-05-30  6:14                             ` Eli Zaretskii
2020-05-31 17:39                               ` Yuan Fu
2020-05-31 17:55                                 ` Eli Zaretskii
2020-05-31 18:23                                   ` Yuan Fu
2020-05-31 18:47                                     ` Eli Zaretskii
2020-06-18 21:46                                       ` Yuan Fu
2020-06-19  6:17                                         ` Eli Zaretskii
2020-06-19 12:04                                           ` Yuan Fu
2020-06-19 12:38                                             ` Eli Zaretskii
2020-06-19 17:22                                               ` Yuan Fu
2020-06-19 17:47                                                 ` Eli Zaretskii
2020-06-19 18:03                                                   ` Yuan Fu
2020-06-19 18:34                                                     ` Eli Zaretskii
2020-07-12 17:25                                                       ` Yuan Fu
2020-07-12 18:27                                                         ` Eli Zaretskii
2020-07-12 19:28                                                           ` Yuan Fu
2020-07-13 19:46                                                             ` Yuan Fu
2020-07-18  8:15                                                               ` Eli Zaretskii
2020-07-18 17:14                                                                 ` Yuan Fu
2020-07-18 19:49                                                                   ` Yuan Fu
2020-07-18 20:25                                                                   ` Stefan Monnier
2020-07-19 14:52                                                                   ` Eli Zaretskii
2020-07-19 16:16                                                                     ` Yuan Fu
2020-07-19 16:17                                                                       ` Yuan Fu
2020-08-13 19:35                                                                         ` Yuan Fu
2020-08-14  5:55                                                                           ` Eli Zaretskii
2020-08-14 15:08                                                                             ` Yuan Fu
2020-08-15  9:10                                                                               ` Eli Zaretskii
2020-08-15 13:10                                                                                 ` Fu Yuan
2020-08-15 14:56                                                                                   ` Eli Zaretskii
2020-08-15 17:34                                                                                     ` Yuan Fu
2020-08-15 17:46                                                                                       ` Eli Zaretskii
2020-08-15 18:00                                                                                         ` Yuan Fu
2020-08-15 18:47                                                                                           ` Eli Zaretskii
2020-08-16  3:22                                                                                             ` Yuan Fu
2020-08-16 14:15                                                                                               ` Eli Zaretskii
2020-08-16 17:31                                                                                                 ` Yuan Fu [this message]
2020-08-22  7:42                                                                                                   ` Eli Zaretskii
2020-08-22 20:58                                                                                                     ` Yuan Fu
2020-08-23  7:12                                                                                                       ` Eli Zaretskii
2020-08-24 14:00                                                                                                         ` Yuan Fu
2020-05-27 15:20               ` Eli Zaretskii
2020-05-26  8:02 ` martin rudalics
2020-05-26 12:38   ` Yuan Fu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2F824AA1-7C77-4554-B24F-1FA483AAF0C6@gmail.com \
    --to=casouri@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=larsi@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).