From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B Date: Tue, 12 Dec 2017 19:13:33 +0200 Message-ID: <838te7swci.fsf@gnu.org> References: <50EE7BE5.2060806@gmx.at> <83609hw7pm.fsf@gnu.org> <83r2s5ugnd.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1513098857 21271 195.159.176.226 (12 Dec 2017 17:14:17 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 12 Dec 2017 17:14:17 +0000 (UTC) Cc: 13399@debbugs.gnu.org To: Adam Tack Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Dec 12 18:14:08 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eOo88-0005AR-J0 for geb-bug-gnu-emacs@m.gmane.org; Tue, 12 Dec 2017 18:14:08 +0100 Original-Received: from localhost ([::1]:59620 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eOo8F-0003nt-PZ for geb-bug-gnu-emacs@m.gmane.org; Tue, 12 Dec 2017 12:14:15 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49430) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eOo89-0003nl-VZ for bug-gnu-emacs@gnu.org; Tue, 12 Dec 2017 12:14:11 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eOo83-0004Uh-Tu for bug-gnu-emacs@gnu.org; Tue, 12 Dec 2017 12:14:09 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:50118) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eOo83-0004UV-Pc for bug-gnu-emacs@gnu.org; Tue, 12 Dec 2017 12:14:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eOo82-000560-BE for bug-gnu-emacs@gnu.org; Tue, 12 Dec 2017 12:14:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 12 Dec 2017 17:14:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13399 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13399-submit@debbugs.gnu.org id=B13399.151309883319561 (code B ref 13399); Tue, 12 Dec 2017 17:14:02 +0000 Original-Received: (at 13399) by debbugs.gnu.org; 12 Dec 2017 17:13:53 +0000 Original-Received: from localhost ([127.0.0.1]:58799 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eOo7s-00055R-KU for submit@debbugs.gnu.org; Tue, 12 Dec 2017 12:13:52 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:58313) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eOo7r-00055E-Pz for 13399@debbugs.gnu.org; Tue, 12 Dec 2017 12:13:52 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eOo7h-0004Dc-MQ for 13399@debbugs.gnu.org; Tue, 12 Dec 2017 12:13:46 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:51319) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eOo7h-0004DM-IM; Tue, 12 Dec 2017 12:13:41 -0500 Original-Received: from [176.228.60.248] (port=1035 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1eOo7g-0002N7-VQ; Tue, 12 Dec 2017 12:13:41 -0500 In-reply-to: (message from Adam Tack on Sat, 9 Dec 2017 03:50:05 +0000) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:140983 Archived-At: > From: Adam Tack > Date: Sat, 9 Dec 2017 03:50:05 +0000 > Cc: 13399@debbugs.gnu.org > > > I think this is okay, but maybe the macro could be converted into an > > inline function, and then fetching the character from the various > > objects separated from looking up the char-table for that character? > > I've made the conversion — it's now slightly less messy. Regarding > the separation, I think that the most that can be done is to have the > look-up in a separate function. Regrettably, trying to first obtain > the character, for example via a set of if-else clauses, and then > looking it up, which would be cleaner, can't really work since the > cases (in particular the first and fourth) are not disjunct. Hmm... not sure why you arrived at this conclusion. E.g., what's wrong with the implementation at the bottom of this message? > > We could also look at LineBreak.txt in the Unicode database for > > inspiration and ideas. > > The three main customisation options that I'm considering are: > > i) Unicode whitespace (U+2000 - U+200B), Yes. > ii) vim's breakat characters (default " ^I!@*-+;:,./?"), since > presumably they had given it some thought, Maybe. I'm not sure in what modes this would be TRT. > iii) The characters in LineBreak.txt (parsing the file shouldn't be > hard, if there aren't copyright issues). We already import several UCD files, see admin/unidata, where you will also find copyright.html from the Unicode Consortium. > > And also a couple of tests (the ones you used would be a good start). > > These would presumably have to be in tests/manual since the position of > the word-wrap depends on too many variables (width of window, font > type, font size)? test/manual is okay. > diff --git a/lisp/word-wrap.el b/lisp/word-wrap.el > new file mode 100644 > index 0000000..6d59a83 > --- /dev/null > +++ b/lisp/word-wrap.el > @@ -0,0 +1,21 @@ > +(define-minor-mode word-wrap-char-table-mode > + "Toggle wrapping using a look-up to word-wrap-chars, globally. > + > +Currently, this allows word wrapping on the characters U+2000 to > +U+200B in addition to the default of space and tap, when > +`word-wrap' is set to t. > + > +(Provisional and unstable.) > +" > + :global t > + :lighter "uws " > + (if word-wrap-char-table-mode > + (progn (setq word-wrap-chars (make-char-table nil nil)) > + (set-char-table-range word-wrap-chars 9 t) > + (set-char-table-range word-wrap-chars 32 t) > + (set-char-table-range word-wrap-chars > + '(8192 . 8203) t)) > + (setq word-wrap-chars nil))) This should probably go into simple.el. Thanks. Here's the implementation of IT_DISPLAYING_WHITESPACE I had in mind: static inline bool IT_DISPLAYING_WHITESPACE (struct it *it) { bool char_table_p = CHAR_TABLE_P (Vword_wrap_chars); int c; if (it->what == IT_CHARACTER) c = it->c; else if (!char_table_p) { if (STRINGP (it->string)) c = SREF (it->string, IT_STRING_BYTEPOS (*it)); else if (it->s) c = it->s[IT_BYTEPOS (*it)]; else if (IT_BYTEPOS (*it) < ZV_BYTE) c = *BYTE_POS_ADDR (IT_BYTEPOS (*it)); } else { if (STRINGP (it->string)) c = STRING_CHAR (SDATA (it->string) + IT_STRING_BYTEPOS (*it)); else if (it->s) c = STRING_CHAR (it->s + IT_BYTEPOS (*it)); else if (IT_BYTEPOS (*it) < ZV_BYTE) c = FETCH_CHAR_AS_MULTIBYTE (IT_BYTEPOS (*it)); } return char_table_p ? !NILP (CHAR_TABLE_REF (Vword_wrap_chars, c)) : (c == ' ' || c == '\t'); }