all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Adam Tack <adam.tack.513@gmail.com>
To: 13399@debbugs.gnu.org
Subject: bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B
Date: Fri, 8 Dec 2017 01:02:08 +0000	[thread overview]
Message-ID: <CAA+VxxHdj3795qbgTJV-EE_G+nC9-yLGvjs5KmQJMN4RE-RMAA@mail.gmail.com> (raw)
In-Reply-To: <50EE7BE5.2060806@gmx.at>

[-- Attachment #1: Type: text/plain, Size: 2760 bytes --]

I have a patch for the original issue of word-wrap not wrapping at a
zero-width space.  The implementation uses a character table, and is
closely based on that written by Martin Rudalics
(https://debbugs.gnu.org/cgi/bugreport.cgi?bug=13399#113), with Eli
Zaretski's suggestions regarding unicode.

The patch applies cleanly to the latest master, compiles on GNU+Linux
(Ubuntu Xenial) and appears to work — both of the following tests
result in the expected wrapping on the zero-width space character (the
first of these is taken verbatim from this bug thread, the second,
adapted from the first, checks that there is no regression of Bug#11341):

(with-current-buffer (get-buffer-create "*foo*")
  (dotimes (i 1000)
    (insert "1234")) ; U-200B
  (setq word-wrap t)
  (display-buffer "*foo*"))

(with-current-buffer (get-buffer-create "*bar*")
  (dotimes (i 1000)
    (insert "1234")) ; U-200B
  (setq word-wrap t)
  (setq whitespace-display-mappings
    '((space-mark 32
              [183]
              [46])
      (space-mark 160
              [164]
              [95])
      (space-mark 8203
              [164]
              [95])
      (newline-mark 10
            [36 10])
      (tab-mark 9
            [187 9]
            [92 9])))
  (whitespace-mode)
  (display-buffer "*bar*"))

Setting other word-wrap characters using set-char-table-range with
lisp also works as expected in the simple situations that I tested.

However, this is my first foray into modifying a serious C codebase,
so I am not sure if I have done the right thing.  In particular, I
have serious doubts about the second and third cases from
IT_DISPLAYING_WHITESPACE, especially since I don't really know when
they would be applicable.

   || ((STRINGP (it->string)                        \
    && !NILP (CHAR_TABLE_REF                    \
          (Vword_wrap_chars, STRING_CHAR            \
           (SDATA (it->string) + IT_STRING_BYTEPOS (*it)))))    \
       || (it->s && !NILP (CHAR_TABLE_REF                \
               (Vword_wrap_chars,                \
                STRING_CHAR(it->s + IT_BYTEPOS (*it)))))    \

Additionally, I'm not certain whether syms_of_character in character.c
is the right location for the definition of the char-table and whether
the range of characters U+2000 to U+200B should be in the chartable,
or if it should just be space and tab, by default.


I am aware that if this were to be accepted, I would also need to make
a change to etc/NEWS, probably the docstring of `word-wrap' and
somewhere in the Texinfo manual.

I have not yet filled out a copyright assignment form, though I will
do so if this patch (modulo changes) is considered acceptable.

Thanks!

[-- Attachment #2: word_wrap_char_table.diff --]
[-- Type: text/plain, Size: 2525 bytes --]

diff --git a/src/character.c b/src/character.c
index c8ffa2b..6e7f55a 100644
--- a/src/character.c
+++ b/src/character.c
@@ -1145,4 +1145,15 @@ All Unicode characters have one of the following values (symbol):
 See The Unicode Standard for the meaning of those values.  */);
   /* The correct char-table is setup in characters.el.  */
   Vunicode_category_table = Qnil;
+
+  DEFVAR_LISP ("word-wrap-chars", Vword_wrap_chars,
+	       doc: /* A char-table for characters at which word-wrap occurs.
+Such characters have value t in this table.
+By default these are the whitespace characters. */);
+  Vword_wrap_chars = Fmake_char_table (Qnil, Qnil);
+  Fset_char_table_range (Vword_wrap_chars, make_number (9), Qt);
+  Fset_char_table_range (Vword_wrap_chars, make_number (32), Qt);
+  Fset_char_table_range (Vword_wrap_chars,
+			 Fcons (make_number (8192),
+				make_number (8203)), Qt);
 }
diff --git a/src/xdisp.c b/src/xdisp.c
index 7e47c06..7152220 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -348,20 +348,23 @@ static Lisp_Object list_of_error;
 #endif /* HAVE_WINDOW_SYSTEM */
 
 /* Test if the display element loaded in IT, or the underlying buffer
-   or string character, is a space or a TAB character.  This is used
-   to determine where word wrapping can occur.  */
+   or string character, belongs to the word-wrap-chars char-table.
+   This is used to determine where word wrapping can occur.  */
 
 #define IT_DISPLAYING_WHITESPACE(it)					\
-  ((it->what == IT_CHARACTER && (it->c == ' ' || it->c == '\t'))	\
+  ((it->what == IT_CHARACTER						\
+    && !NILP (CHAR_TABLE_REF (Vword_wrap_chars, it->c)))		\
    || ((STRINGP (it->string)						\
-	&& (SREF (it->string, IT_STRING_BYTEPOS (*it)) == ' '		\
-	    || SREF (it->string, IT_STRING_BYTEPOS (*it)) == '\t'))	\
-       || (it->s							\
-	   && (it->s[IT_BYTEPOS (*it)] == ' '				\
-	       || it->s[IT_BYTEPOS (*it)] == '\t'))			\
+	&& !NILP (CHAR_TABLE_REF					\
+		  (Vword_wrap_chars, STRING_CHAR			\
+		   (SDATA (it->string) + IT_STRING_BYTEPOS (*it)))))	\
+       || (it->s && !NILP (CHAR_TABLE_REF				\
+			   (Vword_wrap_chars,				\
+			    STRING_CHAR(it->s + IT_BYTEPOS (*it)))))	\
        || (IT_BYTEPOS (*it) < ZV_BYTE					\
-	   && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '			\
-	       || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))))		\
+	   && !NILP (CHAR_TABLE_REF					\
+		     (Vword_wrap_chars,					\
+		      (FETCH_CHAR(IT_BYTEPOS (*it))))))))		\
 
 /* True means print newline to stdout before next mini-buffer message.  */
 

  parent reply	other threads:[~2017-12-08  1:02 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-10  8:29 bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B martin rudalics
2013-01-10 19:15 ` Eli Zaretskii
2013-01-11  8:16   ` martin rudalics
2013-01-11  8:58     ` Eli Zaretskii
2013-01-11 10:29       ` martin rudalics
2013-01-11 10:57         ` Eli Zaretskii
2013-01-11 14:30           ` martin rudalics
2013-01-11 14:49             ` Eli Zaretskii
2013-01-11 15:17               ` martin rudalics
2013-01-11 15:22                 ` Christopher Schmidt
2013-01-11 18:04                   ` martin rudalics
2013-01-11 15:53                 ` Eli Zaretskii
2013-01-11 18:04                   ` martin rudalics
2013-01-11 16:08             ` Stefan Monnier
2013-01-11 18:06               ` martin rudalics
2013-01-11 18:50                 ` Stefan Monnier
2013-01-11 19:29                   ` Eli Zaretskii
2013-01-11 22:47                     ` Stefan Monnier
2013-01-12  8:28                       ` Eli Zaretskii
2013-01-12 13:20                         ` Stefan Monnier
2013-01-12 14:12                           ` Eli Zaretskii
2013-01-12 16:06                             ` Stefan Monnier
2013-02-02 16:48                         ` martin rudalics
2013-02-02 17:52                           ` Eli Zaretskii
2013-02-02 18:20                             ` martin rudalics
2013-02-02 18:36                               ` Eli Zaretskii
2013-02-03  9:44                                 ` martin rudalics
2013-02-03 16:01                                   ` Stefan Monnier
2013-02-03 19:32                                   ` Eli Zaretskii
2013-02-04 17:04                                     ` martin rudalics
2013-02-04 17:57                                       ` Eli Zaretskii
2013-01-11 19:08                 ` Eli Zaretskii
2013-01-12 14:29                   ` martin rudalics
2013-01-12 14:56                     ` Eli Zaretskii
2013-01-12 16:37                       ` martin rudalics
2013-01-12 16:51                         ` Eli Zaretskii
2013-01-12 18:01                           ` martin rudalics
2013-01-12 18:38                             ` Eli Zaretskii
2013-01-14 18:04                               ` martin rudalics
2013-02-03 18:57   ` martin rudalics
2013-02-03 19:45     ` Eli Zaretskii
2017-12-08  1:02 ` Adam Tack [this message]
2017-12-08 10:12   ` martin rudalics
2017-12-08 15:38   ` Eli Zaretskii
2017-12-08 20:08     ` Eli Zaretskii
2017-12-09  3:50       ` Adam Tack
2017-12-12 17:13         ` Eli Zaretskii
2017-12-13  4:00           ` Adam Tack
2017-12-13 16:09             ` Eli Zaretskii
2017-12-17  2:22               ` Adam Tack
2020-09-18 14:55                 ` Lars Ingebrigtsen
2020-09-18 15:39                   ` Eli Zaretskii
2020-09-19 13:15                     ` Lars Ingebrigtsen
2020-09-19 14:36                       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAA+VxxHdj3795qbgTJV-EE_G+nC9-yLGvjs5KmQJMN4RE-RMAA@mail.gmail.com \
    --to=adam.tack.513@gmail.com \
    --cc=13399@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.