From: Adam Tack <adam.tack.513@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 13399@debbugs.gnu.org
Subject: bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B
Date: Sat, 9 Dec 2017 03:50:05 +0000 [thread overview]
Message-ID: <CAA+VxxG4Zpj6W+QzLbSnSx04spdeMV2fiaqmPZWktnqX7cScKA@mail.gmail.com> (raw)
In-Reply-To: <83r2s5ugnd.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 4013 bytes --]
Thanks for the very fast replies and the suggestions!
> I think this is okay, but maybe the macro could be converted into an
> inline function, and then fetching the character from the various
> objects separated from looking up the char-table for that character?
I've made the conversion — it's now slightly less messy. Regarding
the separation, I think that the most that can be done is to have the
look-up in a separate function. Regrettably, trying to first obtain
the character, for example via a set of if-else clauses, and then
looking it up, which would be cleaner, can't really work since the
cases (in particular the first and fourth) are not disjunct.
> Well, since it's a char-table, users will probably want to control
> which characters cause word-wrap. One idea would be to have a minor
> mode or some such, providing users an ability to include or exclude
> different groups of related whitespace characters as a whole? This
> could be in follow-up patches, though.
Customisability was the idea. :) I'm not sure how best to expose it in
a reasonably user-friendly way, though. For the time being, allowing
control directly via the char-table might suffice.
> We could also look at LineBreak.txt in the Unicode database for
> inspiration and ideas.
The three main customisation options that I'm considering are:
i) Unicode whitespace (U+2000 - U+200B),
ii) vim's breakat characters (default " ^I!@*-+;:,./?"), since
presumably they had given it some thought,
iii) The characters in LineBreak.txt (parsing the file shouldn't be
hard, if there aren't copyright issues).
> But I do think that the default should be only TAB and SPC, as Emacs
> always did, and the rest should be optional, and probably in Lisp, not
> C.
> And also a couple of tests (the ones you used would be a good start).
These would presumably have to be in tests/manual since the position of
the word-wrap depends on too many variables (width of window, font
type, font size)?
> I will send the forms off-list, thanks.
Thanks!
> One other thought: since TAB and SPC are single-byte characters,
> whereas the other "whitespace" characters are not, supporting the
> non-ASCII whitespace will be associated with some performance hit in
> the display engine, because it requires a char-table look up and
> fetching multibyte characters. So perhaps we should allow the
> word-wrap-chars char-table to be nil (and make that the default), and
> in that case support only TAB and SPC as word-wrap characters. This
> would let the default configuration work as fast is it does now,
> imposing the performance penalty only on those who want to support
> more whitespace characters.
> WDYT?
That seems sensible. The old behaviour will now be the default and
look-up using the char-table only enabled with the global minor mode
`word-wrap-char-table-mode' (suggestions for a catchier name very
welcome). For the time being, its definition is in a new file
`lisp/word-wrap.el'. Also temporarily, for ease of testing, it allows
wrapping on the unicode whitespace characters.
The current iteration is attached. Until they've found a proper home,
the slightly updated tests are below.
(require 'word-wrap)
(with-current-buffer (get-buffer-create "*bar*")
(dotimes (i 1000)
(insert "1234")) ; U-200B
(setq word-wrap t)
(setq whitespace-display-mappings
'((space-mark 32
[183]
[46])
(space-mark 160
[164]
[95])
(space-mark 8203
[164]
[95])
(newline-mark 10
[36 10])
(tab-mark 9
[187 9]
[92 9])))
(whitespace-mode)
(word-wrap-char-table-mode)
(display-buffer "*bar*"))
(with-current-buffer (get-buffer-create "*foo*")
(dotimes (i 1000)
(insert "1234")) ; U-200B
(setq word-wrap t)
(word-wrap-char-table-mode)
(display-buffer "*foo*"))
[-- Attachment #2: word_wrap_char_table.diff --]
[-- Type: text/plain, Size: 5836 bytes --]
diff --git a/lisp/word-wrap.el b/lisp/word-wrap.el
new file mode 100644
index 0000000..6d59a83
--- /dev/null
+++ b/lisp/word-wrap.el
@@ -0,0 +1,21 @@
+(define-minor-mode word-wrap-char-table-mode
+ "Toggle wrapping using a look-up to word-wrap-chars, globally.
+
+Currently, this allows word wrapping on the characters U+2000 to
+U+200B in addition to the default of space and tap, when
+`word-wrap' is set to t.
+
+(Provisional and unstable.)
+"
+ :global t
+ :lighter "uws "
+ (if word-wrap-char-table-mode
+ (progn (setq word-wrap-chars (make-char-table nil nil))
+ (set-char-table-range word-wrap-chars 9 t)
+ (set-char-table-range word-wrap-chars 32 t)
+ (set-char-table-range word-wrap-chars
+ '(8192 . 8203) t))
+ (setq word-wrap-chars nil)))
+
+(provide 'word-wrap)
+
diff --git a/src/character.c b/src/character.c
index c8ffa2b..af89a8b 100644
--- a/src/character.c
+++ b/src/character.c
@@ -1145,4 +1145,10 @@ All Unicode characters have one of the following values (symbol):
See The Unicode Standard for the meaning of those values. */);
/* The correct char-table is setup in characters.el. */
Vunicode_category_table = Qnil;
+
+ DEFVAR_LISP ("word-wrap-chars", Vword_wrap_chars,
+ doc: /* A char-table for characters at which word-wrap occurs.
+Such characters have value t in this table.
+This is set up in ... */);
+ Vword_wrap_chars = Qnil;
}
diff --git a/src/xdisp.c b/src/xdisp.c
index 7e47c06..4e8b045 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -348,20 +348,41 @@ static Lisp_Object list_of_error;
#endif /* HAVE_WINDOW_SYSTEM */
/* Test if the display element loaded in IT, or the underlying buffer
- or string character, is a space or a TAB character. This is used
- to determine where word wrapping can occur. */
-
-#define IT_DISPLAYING_WHITESPACE(it) \
- ((it->what == IT_CHARACTER && (it->c == ' ' || it->c == '\t')) \
- || ((STRINGP (it->string) \
- && (SREF (it->string, IT_STRING_BYTEPOS (*it)) == ' ' \
- || SREF (it->string, IT_STRING_BYTEPOS (*it)) == '\t')) \
- || (it->s \
- && (it->s[IT_BYTEPOS (*it)] == ' ' \
- || it->s[IT_BYTEPOS (*it)] == '\t')) \
- || (IT_BYTEPOS (*it) < ZV_BYTE \
- && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' ' \
- || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t')))) \
+ or string character, is a space or tab (by default, to avoid the
+ unnecessary performance hit of char-table lookup). If
+ word-wrap-chars is a char-table, then instead check if the relevant
+ element or character belongs to the char-table. This is used to
+ determine where word wrapping can occur. */
+
+static inline bool
+char_is_whitespace_p (int c) {
+ return !NILP (CHAR_TABLE_REF (Vword_wrap_chars, c));
+}
+
+static inline bool
+it_displaying_whitespace (struct it *it) {
+ if (!CHAR_TABLE_P (Vword_wrap_chars)) {
+ return ((it->what == IT_CHARACTER && (it->c == ' ' || it->c == '\t'))
+ || ((STRINGP (it->string)
+ && (SREF (it->string, IT_STRING_BYTEPOS (*it)) == ' '
+ || SREF (it->string, IT_STRING_BYTEPOS (*it)) == '\t'))
+ || (it->s
+ && (it->s[IT_BYTEPOS (*it)] == ' '
+ || it->s[IT_BYTEPOS (*it)] == '\t'))
+ || (IT_BYTEPOS (*it) < ZV_BYTE
+ && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '
+ || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t'))));
+ } else {
+ return ((it->what == IT_CHARACTER && char_is_whitespace_p (it->c))
+ || (STRINGP (it->string) && char_is_whitespace_p
+ (STRING_CHAR
+ (SDATA (it->string) + IT_STRING_BYTEPOS (*it))))
+ || (it->s && char_is_whitespace_p
+ (STRING_CHAR(it->s + IT_BYTEPOS (*it))))
+ || (IT_BYTEPOS (*it) < ZV_BYTE && char_is_whitespace_p
+ (FETCH_CHAR (IT_BYTEPOS (*it)))));
+ }
+}
/* True means print newline to stdout before next mini-buffer message. */
@@ -8785,7 +8806,7 @@ move_it_in_display_line_to (struct it *it,
{
if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
{
- if (IT_DISPLAYING_WHITESPACE (it))
+ if (it_displaying_whitespace (it))
may_wrap = true;
else if (may_wrap)
{
@@ -8950,7 +8971,7 @@ move_it_in_display_line_to (struct it *it,
SAVE_IT (tem_it, *it, tem_data);
set_iterator_to_next (it, true);
if (get_next_display_element (it)
- && IT_DISPLAYING_WHITESPACE (it))
+ && it_displaying_whitespace (it))
can_wrap = false;
RESTORE_IT (it, &tem_it, tem_data);
}
@@ -9041,7 +9062,7 @@ move_it_in_display_line_to (struct it *it,
wrapped in the middle of whitespace.
Therefore, wrap_it _is_ relevant in that
case. */
- && !(moved_forward && IT_DISPLAYING_WHITESPACE (it)))
+ && !(moved_forward && it_displaying_whitespace (it)))
{
/* If we've found TO_X, go back there, as we now
know the last word fits on this screen line. */
@@ -21427,7 +21448,7 @@ display_line (struct it *it, int cursor_vpos)
if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
{
- if (IT_DISPLAYING_WHITESPACE (it))
+ if (it_displaying_whitespace (it))
may_wrap = true;
else if (may_wrap)
{
@@ -21571,7 +21592,7 @@ display_line (struct it *it, int cursor_vpos)
was a space or tab AND (ii) the
current character is not. */
&& (!may_wrap
- || IT_DISPLAYING_WHITESPACE (it)))
+ || it_displaying_whitespace (it)))
goto back_to_wrap;
/* Record the maximum and minimum buffer
@@ -21605,7 +21626,7 @@ display_line (struct it *it, int cursor_vpos)
was a space or tab AND (ii) the
current character is not. */
&& (!may_wrap
- || IT_DISPLAYING_WHITESPACE (it)))
+ || it_displaying_whitespace (it)))
goto back_to_wrap;
}
next prev parent reply other threads:[~2017-12-09 3:50 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-10 8:29 bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B martin rudalics
2013-01-10 19:15 ` Eli Zaretskii
2013-01-11 8:16 ` martin rudalics
2013-01-11 8:58 ` Eli Zaretskii
2013-01-11 10:29 ` martin rudalics
2013-01-11 10:57 ` Eli Zaretskii
2013-01-11 14:30 ` martin rudalics
2013-01-11 14:49 ` Eli Zaretskii
2013-01-11 15:17 ` martin rudalics
2013-01-11 15:22 ` Christopher Schmidt
2013-01-11 18:04 ` martin rudalics
2013-01-11 15:53 ` Eli Zaretskii
2013-01-11 18:04 ` martin rudalics
2013-01-11 16:08 ` Stefan Monnier
2013-01-11 18:06 ` martin rudalics
2013-01-11 18:50 ` Stefan Monnier
2013-01-11 19:29 ` Eli Zaretskii
2013-01-11 22:47 ` Stefan Monnier
2013-01-12 8:28 ` Eli Zaretskii
2013-01-12 13:20 ` Stefan Monnier
2013-01-12 14:12 ` Eli Zaretskii
2013-01-12 16:06 ` Stefan Monnier
2013-02-02 16:48 ` martin rudalics
2013-02-02 17:52 ` Eli Zaretskii
2013-02-02 18:20 ` martin rudalics
2013-02-02 18:36 ` Eli Zaretskii
2013-02-03 9:44 ` martin rudalics
2013-02-03 16:01 ` Stefan Monnier
2013-02-03 19:32 ` Eli Zaretskii
2013-02-04 17:04 ` martin rudalics
2013-02-04 17:57 ` Eli Zaretskii
2013-01-11 19:08 ` Eli Zaretskii
2013-01-12 14:29 ` martin rudalics
2013-01-12 14:56 ` Eli Zaretskii
2013-01-12 16:37 ` martin rudalics
2013-01-12 16:51 ` Eli Zaretskii
2013-01-12 18:01 ` martin rudalics
2013-01-12 18:38 ` Eli Zaretskii
2013-01-14 18:04 ` martin rudalics
2013-02-03 18:57 ` martin rudalics
2013-02-03 19:45 ` Eli Zaretskii
2017-12-08 1:02 ` Adam Tack
2017-12-08 10:12 ` martin rudalics
2017-12-08 15:38 ` Eli Zaretskii
2017-12-08 20:08 ` Eli Zaretskii
2017-12-09 3:50 ` Adam Tack [this message]
2017-12-12 17:13 ` Eli Zaretskii
2017-12-13 4:00 ` Adam Tack
2017-12-13 16:09 ` Eli Zaretskii
2017-12-17 2:22 ` Adam Tack
2020-09-18 14:55 ` Lars Ingebrigtsen
2020-09-18 15:39 ` Eli Zaretskii
2020-09-19 13:15 ` Lars Ingebrigtsen
2020-09-19 14:36 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAA+VxxG4Zpj6W+QzLbSnSx04spdeMV2fiaqmPZWktnqX7cScKA@mail.gmail.com \
--to=adam.tack.513@gmail.com \
--cc=13399@debbugs.gnu.org \
--cc=eliz@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.