From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B Date: Fri, 11 Jan 2013 10:58:08 +0200 Message-ID: <83ip74ume7.fsf@gnu.org> References: <50EE7BE5.2060806@gmx.at> <83hamohmtj.fsf@gnu.org> <50EFCA6D.7090702@gmx.at> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1357894741 8271 80.91.229.3 (11 Jan 2013 08:59:01 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 11 Jan 2013 08:59:01 +0000 (UTC) Cc: 13399@debbugs.gnu.org To: martin rudalics Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Jan 11 09:59:19 2013 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TtaSM-0003Ju-Jd for geb-bug-gnu-emacs@m.gmane.org; Fri, 11 Jan 2013 09:59:18 +0100 Original-Received: from localhost ([::1]:32815 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TtaS6-0000am-G8 for geb-bug-gnu-emacs@m.gmane.org; Fri, 11 Jan 2013 03:59:02 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:37308) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TtaS2-0000UD-TS for bug-gnu-emacs@gnu.org; Fri, 11 Jan 2013 03:59:00 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TtaS1-0008Dn-4l for bug-gnu-emacs@gnu.org; Fri, 11 Jan 2013 03:58:58 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:49632) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TtaS1-0008Df-1M for bug-gnu-emacs@gnu.org; Fri, 11 Jan 2013 03:58:57 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1TtaS6-00054T-4d for bug-gnu-emacs@gnu.org; Fri, 11 Jan 2013 03:59:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 11 Jan 2013 08:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13399 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13399-submit@debbugs.gnu.org id=B13399.135789469519436 (code B ref 13399); Fri, 11 Jan 2013 08:59:02 +0000 Original-Received: (at 13399) by debbugs.gnu.org; 11 Jan 2013 08:58:15 +0000 Original-Received: from localhost ([127.0.0.1]:55096 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TtaRK-00053Q-IT for submit@debbugs.gnu.org; Fri, 11 Jan 2013 03:58:14 -0500 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:52128) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TtaRH-00053A-KJ for 13399@debbugs.gnu.org; Fri, 11 Jan 2013 03:58:13 -0500 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MGG00600E3HAV00@a-mtaout20.012.net.il> for 13399@debbugs.gnu.org; Fri, 11 Jan 2013 10:57:50 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MGG006E6E8DBJ00@a-mtaout20.012.net.il>; Fri, 11 Jan 2013 10:57:50 +0200 (IST) In-reply-to: <50EFCA6D.7090702@gmx.at> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:69597 Archived-At: > Date: Fri, 11 Jan 2013 09:16:45 +0100 > From: martin rudalics > CC: 13399@debbugs.gnu.org > > >> As can be seen in the window showing *foo*, > >> lines are not regularly wrapped at that character. > > > > You mean, not wrapped at all. Witness the continuation bitmaps in the > > fringes, which shouldn't appear when a line is wrapped. > > I thought these bitmaps appear when a line is wrapped. Not by default. Not unless you customize visual-line-fringe-indicators. > > If anything, this is a missing feature, since word-wrap is explicitly > > coded to break lines only on SPC and TAB characters. > > The doc-string of `word-wrap' says > > When word-wrapping is on, continuation lines are wrapped at the space > or tab character nearest to the right window edge > > Since U-200B is a space character the line should wrap at it. No, it means literally "the space character", U+0020. > Also > > this character is intended for invisible word separation and for line > break control; it has no width, but its presence between two > characters does not prevent increased letter spacing in justification > > and Emacs apparently does handle it specially since it reserves a few > pixels when drawing it. See glyphless-char-display and glyphless-char-display-control for why. > But documentation on `word-wrap' is scarce ... Actually, it doesn't exist, apart of the doc string. > > See the > > IT_DISPLAYING_WHITESPACE macro in xdisp.c. > > I tried to understand the code but failed. #define IT_DISPLAYING_WHITESPACE(it) \ /* If the character to be displayed is SPC or TAB */ ((it->what == IT_CHARACTER && (it->c == ' ' || it->c == '\t')) \ /* Or we are iterating over a display or overlay string, ... */ || ((STRINGP (it->string) \ /* ... and the character at current string position is SPC or TAB */ && (SREF (it->string, IT_STRING_BYTEPOS (*it)) == ' ' \ || SREF (it->string, IT_STRING_BYTEPOS (*it)) == '\t')) \ /* Or we are iterating over a C string, ... */ || (it->s \ /* ... and the character at current string position is SPC or TAB */ && (it->s[IT_BYTEPOS (*it)] == ' ' \ || it->s[IT_BYTEPOS (*it)] == '\t')) \ /* Or the iterator is before end of buffer's reachable portion, ... */ || (IT_BYTEPOS (*it) < ZV_BYTE \ /* ... and the character at current buffer position is SPC or TAB */ && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' ' \ || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t')))) \ In any case, you can clearly see that it only tests for literal SPC and TAB characters. > > If we want to add more characters to the set, we should probably > > arrange a special char-table for this, and have it exposed to Lisp, so > > it could be customized. Patches are welcome. > > IIUC all breakable spaces are between U-2000 and U-200B so maybe a > character table is not needed. Who said we want only break at breakable space characters? Who said Unicode will never add more such characters in another block? And what about low-ASCII characters, which are already in a different block? In any case, even if you are right, a char-table is a way to store character properties efficiently. In particular, it will waste very little storage to mark a contiguous range of characters with the same property. The advantage of using a char-table is that it will dynamically expand as needed if more characters are added to the set. > Anway, exposing displayed text to Lisp would be great. We'd just need > two functions - one that gets the pixel width of an arbitrary buffer > string wrt a specific window, and one that gets the pixel height of an > arbitrary buffer string (newlines ignored) wrt a specific window. This > way we could get rid of lots of problems currently hidden in the display > engine ... You lost me here. By "exposing to Lisp" I meant expose the char-table of word-wrap characters to Lisp. What did _you_ want exposed to Lisp?