From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lars Ingebrigtsen Newsgroups: gmane.emacs.bugs Subject: bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B Date: Fri, 18 Sep 2020 16:55:40 +0200 Message-ID: <877dsrf82b.fsf@gnus.org> References: <50EE7BE5.2060806@gmx.at> <83609hw7pm.fsf@gnu.org> <83r2s5ugnd.fsf@gnu.org> <838te7swci.fsf@gnu.org> <83lgi6vccy.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34711"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: 13399@debbugs.gnu.org To: Adam Tack Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Sep 18 16:58:18 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kJHq5-0008tG-N1 for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 18 Sep 2020 16:58:17 +0200 Original-Received: from localhost ([::1]:40064 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kJHq4-0006zX-O9 for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 18 Sep 2020 10:58:16 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43086) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kJHos-0005M6-9o for bug-gnu-emacs@gnu.org; Fri, 18 Sep 2020 10:57:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:60780) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kJHor-0006Ym-UZ for bug-gnu-emacs@gnu.org; Fri, 18 Sep 2020 10:57:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kJHor-0008J5-Tm for bug-gnu-emacs@gnu.org; Fri, 18 Sep 2020 10:57:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Lars Ingebrigtsen Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 18 Sep 2020 14:57:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13399 X-GNU-PR-Package: emacs Original-Received: via spool by 13399-submit@debbugs.gnu.org id=B13399.160044096731815 (code B ref 13399); Fri, 18 Sep 2020 14:57:01 +0000 Original-Received: (at 13399) by debbugs.gnu.org; 18 Sep 2020 14:56:07 +0000 Original-Received: from localhost ([127.0.0.1]:44062 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kJHny-0008Gz-Cn for submit@debbugs.gnu.org; Fri, 18 Sep 2020 10:56:07 -0400 Original-Received: from quimby.gnus.org ([95.216.78.240]:55466) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kJHnm-0008Fx-A2 for 13399@debbugs.gnu.org; Fri, 18 Sep 2020 10:55:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID :In-Reply-To:Date:References:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=8A+m737R5WSzpNjSPtKjCyjnLfwIkibcABZyOmNUqFk=; b=WbLNzpmfrrJfA4r8t6cCO+KCCO 3yvY+lFYK8Iv64XB+RvXD704FjsyQ/i9aB4EWMNKXRQTf39rf11jowUrnt7a31KUdp5q2xzf/q8CG 6pphhyQsr61+PDmOboYzsQAg40islLWdd6hD1nDfFqA7THmiDMvOY0pkXPuEhBirb9h8=; Original-Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=xo) by quimby with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kJHna-00071A-BM; Fri, 18 Sep 2020 16:55:47 +0200 X-Now-Playing: Saito Koji's _433-1_: "433_080" In-Reply-To: (Adam Tack's message of "Sun, 17 Dec 2017 02:22:12 +0000") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:188322 Archived-At: Adam Tack writes: > I've split out the non-nil char-table case out into a function, as I > think that using a named function slightly improves readability, and > having a macro over 20 lines long, somehow feels "wrong". If the > compiler does actually follow the inline directive, there should be no > additional performance hit. This was the last post in the thread, and the patch no longer applied, so I've respun it for Emacs 28. However, I can't find any copyright assignment on file -- Adam, did you go through with the assignment process? diff --git a/doc/emacs/display.texi b/doc/emacs/display.texi index e7b8745a04..9fcca8c6e6 100644 --- a/doc/emacs/display.texi +++ b/doc/emacs/display.texi @@ -1831,6 +1831,14 @@ Visual Line Mode report. You can add categories to a character using the command @code{modify-category-entry}. =20 +@vindex word-wrap-chars +@findex word-wrap-chars-mode + Word boundaries and hence points at which word wrap can occur are, +by default, considered to occur on the space and tab characters. If +you prefer word-wrap to be permissible at other characters, you can +change the value of the char-table @code{word-wrap-chars}, or use +@code{word-wrap-chars-mode}, which does this for you. + @node Display Custom @section Customization of Display =20 diff --git a/etc/NEWS b/etc/NEWS index 54bad068f8..f3216ed445 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -73,6 +73,19 @@ its implementation has been removed from the Linux kerne= l. OpenBSD 5.3 and older releases are no longer supported, as they lack proper pty support that Emacs needs. =20 ++++ +** The characters at which word-wrapping occurs can now be controlled +using the new `word-wrap-chars' char-table. If `word-wrap-chars' is +nil (the default), then word-wrapping will occur only on the space or +tab characters, as has been the case until now. + +The most convenient way to change the characters at which wrap occurs +is customizing the new variable `word-wrap-type' and using the new +`word-wrap-chars-mode' minor mode, which sets `word-wrap-chars' based +on `word-wrap-type', for you. The options for `word-wrap-type' are +ascii-whitespace, unicode-whitespace and a customizable list of +character codes and character code ranges. + * Startup Changes in Emacs 28.1 =20 diff --git a/lisp/simple.el b/lisp/simple.el index 7dc695848b..b881cbc23e 100644 --- a/lisp/simple.el +++ b/lisp/simple.el @@ -7269,6 +7269,117 @@ turn-on-visual-line-mode (define-globalized-minor-mode global-visual-line-mode visual-line-mode turn-on-visual-line-mode) =20 + +(defvar word-wrap-type) + +(defvar word-wrap-chars--saved nil) + +(define-minor-mode word-wrap-chars-mode + "Toggle wrapping using a look-up to `word-wrap-chars'. +The exact choice of characters on which wrapping occurs, depends +on the value of `word-wrap-type'. By default, `word-wrap-type' +is set to unicode-white-space, which allows word wrapping on all +breakable unicode whitespace, not only space and tap. + +For details of other customization options, see +`word-wrap-type'. + +This minor mode has no effect unless `visual-line-mode' is +enabled or `word-wrap' is set to t. + +To toggle wrapping using a look-up, globally, use +`global-word-wrap-chars-mode'." + :group 'visual-line + :lighter " wwc" + (if word-wrap-chars-mode + (progn + (if (local-variable-p 'word-wrap-chars) + (setq-local word-wrap-chars--saved + word-wrap-chars)) + (set-word-wrap-chars)) + (setq-local word-wrap-chars word-wrap-chars--saved))) + +(defun turn-on-word-wrap-chars-mode () + (visual-line-mode 1)) + +(define-globalized-minor-mode global-word-wrap-chars-mode + word-wrap-chars-mode turn-on-word-wrap-chars-mode) + +(defun update-word-wrap-chars () + "Update `word-wrap-chars' upon Customize of `word-wrap-type'. + +Only buffers which use the `word-wrap-chars-mode' are affected." + (mapcar #'(lambda (buf) + (with-current-buffer buf + (if word-wrap-chars-mode + (set-word-wrap-chars)))) + (buffer-list))) + +(defun set-word-wrap-chars () + "Set `word-wrap-chars' locally, based on `word-wrap-type'." + (cond + ((eq word-wrap-type 'ascii-whitespace) + (setq-local word-wrap-chars nil)) + ((eq word-wrap-type 'unicode-whitespace) + (set-word-wrap-chars-from-list + '(9 32 5760 (8192 . 8198) (8200 . 8203) 8287 12288))) + ((listp word-wrap-type) + (set-word-wrap-chars-from-list word-wrap-type)))) + +(defun set-word-wrap-chars-from-list (list) + "Set `word-wrap-chars' locally from a list. +Each element of the list can be a character code (code point) or +a cons of character codes, representing the two (inclusive) +endpoints of the range of characters." + (setq-local + word-wrap-chars + (let ((char-table (make-char-table nil nil))) + (dolist (range list char-table) + (set-char-table-range char-table range t))))) + +(defcustom word-wrap-type + 'unicode-whitespace + "Characters on which word-wrap occurs. +This variable controls the value of `word-wrap-chars' that is set +by `word-wrap-chars-mode`. `word-wrap-chars' determines on what +characters word-wrapping can occur, when `word-wrap' is t or +`visual-line-mode' is enabled. + +Possible values are ascii-whitespace, unicode-whitespace or a +custom list of characters and character ranges. + +If the value is `ascii-whitespace', word-wrap is only on space +and tab. If the value is `unicode-whitespace', word-wrap is on +all the Unicode whitespace characters that permit wrapping, +including but not limited to space and tab. + +If a custom list of characters and ranges is used, word wrap is +on these characters and character ranges. The ranges are +inclusive of both endpoints. + +When you change this without using customize, you need to call +`update-word-wrap-chars' to update the word wrap in current +buffers. For instance: + +(setq word-wrap-type \\=3D'(9 32 ?_)) +(update-word-wrap-chars) + +will set the wrappable characters to space, tab and underscore, +in all buffers in `word-wrap-chars-mode' and using the default +value of `word-wrap-type'. +" + :type '(choice (const :tag "Space and tab" ascii-whitespace) + (const :tag "All unicode spaces" unicode-whitespace) + (repeat :tag "Custom characters or ranges" + :value (9 32) + (choice (character) + (cons :tag "Range" character character)))) + :set (lambda (symbol value) + (set-default symbol value) + (update-word-wrap-chars)) + :group 'visual-line + :version 27.1) + (defun transpose-chars (arg) "Interchange characters around point, moving forward one character. diff --git a/src/buffer.c b/src/buffer.c index 241f2d43a9..5c26323d69 100644 --- a/src/buffer.c +++ b/src/buffer.c @@ -5786,7 +5786,12 @@ syms_of_buffer (void) Visual Line mode. Visual Line mode, when enabled, sets `word-wrap' to t, and additionally redefines simple editing commands to act on visual lines rather than logical lines. See the documentation of -`visual-line-mode'. */); +`visual-line-mode'. + +If `word-wrap-chars' is non-nil and a char-table, continuation lines +are wrapped on the characters in `word-wrap-chars' whose value is t, +rather than the space and tab characters. `word-wrap-chars-mode +provides a convenient interface for using this. */); =20 DEFVAR_PER_BUFFER ("default-directory", &BVAR (current_buffer, directory= ), Qstringp, diff --git a/src/character.c b/src/character.c index 5860f6a0c8..032f4fc12b 100644 --- a/src/character.c +++ b/src/character.c @@ -1084,4 +1084,14 @@ syms_of_character (void) See The Unicode Standard for the meaning of those values. */); /* The correct char-table is setup in characters.el. */ Vunicode_category_table =3D Qnil; + + DEFVAR_LISP ("word-wrap-chars", Vword_wrap_chars, + doc: /* A char-table for characters at which word-wrap occurs. +Such characters have value t in this table. If the char-table is nil, +word-wrap occurs only on space and tab. + +For a more user-friendly way of changing the characters at which +word-wrap can occur, consider using `word-wrap-chars-mode' and +customizing `word-wrap-type'. */); + Vword_wrap_chars =3D Qnil; } diff --git a/src/xdisp.c b/src/xdisp.c index 615f0ca7cf..744b9a52c7 100644 --- a/src/xdisp.c +++ b/src/xdisp.c @@ -494,20 +494,42 @@ #define IT_OVERFLOW_NEWLINE_INTO_FRINGE(it) false #endif /* HAVE_WINDOW_SYSTEM */ =20 /* Test if the display element loaded in IT, or the underlying buffer - or string character, is a space or a TAB character. This is used - to determine where word wrapping can occur. */ + or string character, is a space or tab (by default, to avoid the + unnecessary performance hit of char-table lookup). If + word-wrap-chars is a char-table, then instead check if the relevant + element or character belongs to the char-table. This is used to + determine where word wrapping can occur. */ =20 #define IT_DISPLAYING_WHITESPACE(it) \ - ((it->what =3D=3D IT_CHARACTER && (it->c =3D=3D ' ' || it->c =3D=3D '\t'= )) \ - || ((STRINGP (it->string) \ - && (SREF (it->string, IT_STRING_BYTEPOS (*it)) =3D=3D ' ' \ - || SREF (it->string, IT_STRING_BYTEPOS (*it)) =3D=3D '\t')) \ - || (it->s \ - && (it->s[IT_BYTEPOS (*it)] =3D=3D ' ' \ - || it->s[IT_BYTEPOS (*it)] =3D=3D '\t')) \ - || (IT_BYTEPOS (*it) < ZV_BYTE \ - && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) =3D=3D ' ' \ - || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) =3D=3D '\t')))) + (!CHAR_TABLE_P (Vword_wrap_chars) \ + ? ((it->what =3D=3D IT_CHARACTER && (it->c =3D=3D ' ' || it->c =3D=3D '= \t')) \ + || ((STRINGP (it->string) \ + && (SREF (it->string, IT_STRING_BYTEPOS (*it)) =3D=3D ' ' \ + || SREF (it->string, IT_STRING_BYTEPOS (*it)) =3D=3D '\t')) \ + || (it->s \ + && (it->s[IT_BYTEPOS (*it)] =3D=3D ' ' \ + || it->s[IT_BYTEPOS (*it)] =3D=3D '\t')) \ + || (IT_BYTEPOS (*it) < ZV_BYTE \ + && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) =3D=3D ' ' \ + || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) =3D=3D '\t')))) \ + : it_displaying_word_wrap_char(it)) \ + +static inline bool +char_is_word_wrap_char_p (int c) { + return !NILP (CHAR_TABLE_REF (Vword_wrap_chars, c)); +} + +static inline bool +it_displaying_word_wrap_char (struct it *it) { + return ((it->what =3D=3D IT_CHARACTER && char_is_word_wrap_char_p (it->c= )) + || (STRINGP (it->string) && char_is_word_wrap_char_p + (STRING_CHAR + (SDATA (it->string) + IT_STRING_BYTEPOS (*it)))) + || (it->s && char_is_word_wrap_char_p + (STRING_CHAR(it->s + IT_BYTEPOS (*it)))) + || (IT_BYTEPOS (*it) < ZV_BYTE && char_is_word_wrap_char_p + (FETCH_CHAR (IT_BYTEPOS (*it))))); +} =20 /* These are the category sets we use. They are defined by kinsoku.el and chracters.el. */ diff --git a/test/manual/word-wrap-test.el b/test/manual/word-wrap-test.el new file mode 100644 index 0000000000..593c2decc7 --- /dev/null +++ b/test/manual/word-wrap-test.el @@ -0,0 +1,127 @@ +;;; word-wrap-test.el -- tests for word-wrap -*- lexical-binding: t -*- + +;; Copyright (C) 2017 Free Software Foundation, Inc. + +;; This file is part of GNU Emacs. + +;; This program is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; This program is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with this program. If not, see . + +;;; Commentary: + +;; Run the tests M-x word-wrap-test-[1-4] which correspond to the four +;; combinations: +;; +;; i) whitespace-mode being enabled and disabled, +;; +;; ii) word-wrap-chars being nil and equal to a char-table that +;; specifies U-200B as the only word-wrap character. +;; +;; The tests with whitespace-mode are needed to help avoid a +;; regression on Bug#11341. + +;;; Code: + +(setq whitespace-display-mappings-for-zero-width-space + '((space-mark 32 + [183] + [46]) + (space-mark 160 + [164] + [95]) + (space-mark 8203 + [164] + [95]) + (newline-mark 10 + [36 10]) + (tab-mark 9 + [187 9] + [92 9]))) + +(defun word-wrap-test-1 () + "Check word-wrap for nil `word-wrap-chars'." + (interactive) + (let ((buf (get-buffer-create "*Word-wrap Test 1*"))) + (with-current-buffer buf + (erase-buffer) + (insert "Word wrap should occur for space.\n\n") + (dotimes (i 100) + (insert "1234567 ")) ; Space + (insert "\n\nWord wrap should NOT occur for U-200B.\n\n") + (dotimes (i 100) + (insert "1234567=E2=80=8B")) ; U-200B + (setq word-wrap t) + (setq-local word-wrap-chars nil) + (whitespace-mode -1) + (display-buffer buf)))) + +(defun word-wrap-test-2 () + "Check word-wrap for nil `word-wrap-chars' with whitespace-mode." + (interactive) + (let ((buf (get-buffer-create "*Word-wrap Test 2*"))) + (with-current-buffer buf + (erase-buffer) + (insert "Word wrap should occur for space (displayed as `=C2=B7').\n= \n") + (dotimes (i 100) + (insert "1234567 ")) ; Space + (insert "\n\nWord wrap should NOT occur for U-200B (displayed as `= =C2=A4').\n\n") + (dotimes (i 100) + (insert "1234567=E2=80=8B")) ; U-200B + (setq word-wrap t) + (setq-local word-wrap-chars nil) + (setq-local whitespace-display-mappings + whitespace-display-mappings-for-zero-width-space) + (whitespace-mode) + (display-buffer buf)))) + +(defun word-wrap-test-3 () + "Check word-wrap if `word-wrap-chars' is a char-table." + (interactive) + (let ((buf (get-buffer-create "*Word-wrap Test 3*"))) + (with-current-buffer buf + (erase-buffer) + (insert "Word wrap should NOT occur for space.\n\n") + (dotimes (i 100) + (insert "1234567 ")) ; Space + (insert "\n\nWord wrap should occur for U-200B.\n\n") + (dotimes (i 100) + (insert "1234567=E2=80=8B")) ; U-200B + (setq word-wrap t) + (setq-local word-wrap-chars + (let ((ct (make-char-table nil nil))) + (set-char-table-range ct 8203 t) + ct)) + (whitespace-mode -1) + (display-buffer buf)))) + +(defun word-wrap-test-4 () + "Check word-wrap if `word-wrap-chars' is a char-table, for whitespace-mo= de." + (interactive) + (let ((buf (get-buffer-create "*Word-wrap Test 4*"))) + (with-current-buffer buf + (erase-buffer) + (insert "Word wrap should NOT occur for space (displayed as `=C2=B7'= ).\n\n") + (dotimes (i 100) + (insert "1234567 ")) ; Space + (insert "\n\nWord wrap should occur for U-200B (displayed as `=C2=A4= ').\n\n") + (dotimes (i 100) + (insert "1234567=E2=80=8B")) ; U-200B + (setq word-wrap t) + (setq-local word-wrap-chars + (let ((ct (make-char-table nil nil))) + (set-char-table-range ct 8203 t) + ct)) + (setq-local whitespace-display-mappings + whitespace-display-mappings-for-zero-width-space) + (whitespace-mode) + (display-buffer buf)))) --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no