From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Adam Tack Newsgroups: gmane.emacs.bugs Subject: bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B Date: Wed, 13 Dec 2017 04:00:56 +0000 Message-ID: References: <50EE7BE5.2060806@gmx.at> <83609hw7pm.fsf@gnu.org> <83r2s5ugnd.fsf@gnu.org> <838te7swci.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1513137731 20633 195.159.176.226 (13 Dec 2017 04:02:11 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 13 Dec 2017 04:02:11 +0000 (UTC) Cc: 13399@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Dec 13 05:02:07 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eOyFD-00057m-3Q for geb-bug-gnu-emacs@m.gmane.org; Wed, 13 Dec 2017 05:02:07 +0100 Original-Received: from localhost ([::1]:33454 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eOyFK-00052h-8V for geb-bug-gnu-emacs@m.gmane.org; Tue, 12 Dec 2017 23:02:14 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43067) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eOyFC-00051X-MG for bug-gnu-emacs@gnu.org; Tue, 12 Dec 2017 23:02:07 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eOyF8-0007CZ-Gb for bug-gnu-emacs@gnu.org; Tue, 12 Dec 2017 23:02:06 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:50584) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eOyF8-0007CF-CG for bug-gnu-emacs@gnu.org; Tue, 12 Dec 2017 23:02:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eOyF7-00076J-Qy for bug-gnu-emacs@gnu.org; Tue, 12 Dec 2017 23:02:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Adam Tack Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 13 Dec 2017 04:02:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13399 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13399-submit@debbugs.gnu.org id=B13399.151313766427213 (code B ref 13399); Wed, 13 Dec 2017 04:02:01 +0000 Original-Received: (at 13399) by debbugs.gnu.org; 13 Dec 2017 04:01:04 +0000 Original-Received: from localhost ([127.0.0.1]:59265 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eOyEB-00074q-VI for submit@debbugs.gnu.org; Tue, 12 Dec 2017 23:01:04 -0500 Original-Received: from mail-wm0-f41.google.com ([74.125.82.41]:34580) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eOyEA-00074G-Cr for 13399@debbugs.gnu.org; Tue, 12 Dec 2017 23:01:02 -0500 Original-Received: by mail-wm0-f41.google.com with SMTP id y82so19944552wmg.1 for <13399@debbugs.gnu.org>; Tue, 12 Dec 2017 20:01:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=6EkmMhj7pU+5mtwMz4xWgILIX2HrB1pC2MHnKuC64MY=; b=IKXf6at4S1fFrXxqqhOa1UJAxvY1Qt506i7s9HWunEb5oUeUnHP29/1+c/G1Zvn6qm pxJxOb3m+Hnf54T3y5FPs1Sgz6jh7ClLMP9apNG+o4uzgVt0JdY0C/+uh9N8camvJf2q bMqVjpPRrE0VH6uk2BOVb/rT3HqxqDtgmwmhCExT1GtOOrEjJkc8hAieQcdjwYRUS8zu gK1Qvo0GtjgOReJsM3vHSZ73xPTqKstwyhAatgL/nHu2mcqvfWA+QuhV9cowvg7sbfxn zzTXdZS3nHWwCf4Qw5ezppIxi/b7zK0hk7DW63k54n9UuPnHQzZj5+3OMpw7QknDhBlN 11Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=6EkmMhj7pU+5mtwMz4xWgILIX2HrB1pC2MHnKuC64MY=; b=QdSDXGHU5uzGIPZ5yY73mjlmh347fZnUrqTP5/3+xoxCypjr8jEg6emQCR6okqVFch OSLMB8Sbm/PrAH+Uls1MQv1Opk5IuW0x/mRkwB8nYLjf440X0CPaZ79mhxiMSRBBTF/H QFBCBKDx35xMoRl7Yzlju6DBFdkQ1qh1JQ8PRw9Zob1MOb0ejIrmOdBzlIl0813+3U5i +pgzC4eMXNzBpFxC/1AXdXuj0pyvmf65x94nEcShJsBdd1fjv2HJ8VZXx3qD8DhGXd0H XX80T2PRQMC7J8KGiIZHW5G0oMIWuFuiFABt1fD7gnbEwYIKy37YUGntv2OPxzQ/DPZQ 4aMg== X-Gm-Message-State: AKGB3mJDbgJFEllJh99FDKnXmEGVmX8xFgm5h2jvHGwsLPFtozoKM5nz AWlwsHsfvECIzWaiN2SF0tuajRx/KloPBLdT7K8= X-Google-Smtp-Source: ACJfBot4XmGWhvEVFTN2n51Ibl+Xp7c7YsaZklVjWq98qSilngC1xSsckr5CVTYtfrUwmDVJrZxri+RmlTlk3N/jmo4= X-Received: by 10.28.108.11 with SMTP id h11mr700730wmc.28.1513137656466; Tue, 12 Dec 2017 20:00:56 -0800 (PST) Original-Received: by 10.28.52.145 with HTTP; Tue, 12 Dec 2017 20:00:56 -0800 (PST) In-Reply-To: <838te7swci.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:141020 Archived-At: Sorry for not working further on this, but I didn't have time. I will get back to finishing this, soon. > Hmm... not sure why you arrived at this conclusion. E.g., what's > wrong with the implementation at the bottom of this message? This was very similar to my first try. Unfortunately, it doesn't work correctly in whitespace-mode, even with just normal spaces, regressing on Bug#11341. (with-current-buffer (get-buffer-create "*bar*") (dotimes (i 1000) (insert "1234 ")) ; Space (setq word-wrap t) (whitespace-mode) (display-buffer "*bar*")) The spaces are displayed as `=C2=B7', so it->c returns 183, none of the further tests are checked and IT_DISPLAYING_WHITESPACE returns False. (In the currently used implementation, if it->c is not one of ' ' or '\t' then the later tests are all checked.) I thought about changing the order of the tests to something like the following (ignoring the special case of ' ' and '\t', here, for brevity): static inline bool IT_DISPLAYING_WHITESPACE (struct it *it) { int c; if (IT_BYTEPOS (*it) < ZV_BYTE) c =3D FETCH_CHAR (IT_BYTEPOS (*it)); else if (it->what =3D=3D IT_CHARACTER) c =3D it->c; else if (STRINGP (it->string)) c =3D STRING_CHAR (SDATA (it->string) + IT_STRING_BYTEPOS (*it)); else if (it->s) c =3D STRING_CHAR (it->s + IT_BYTEPOS (*it)); else return false; return !NILP (CHAR_TABLE_REF (Vword_wrap_chars, c)); } which in the case of whitespace-mode does TRT, but I worried that there might be situations where wrapping on the display character is correct. The crux (as I had previously, but very unclearly, written) is that under "normal" circumstances, both `(it->what =3D=3D IT_CHARACTER)' and `(IT_BYTEPOS (*it) < ZV_BYTE)' are true. Additionally, I wasn't sure whether there should be a fall-through, since on the one hand, it prevents emacs crashing if (weirdly) all the previous tests return false, but on the other, it might preclude some magic compiler optimisation. Chaining ORs side-stepped both issues, so I settled on keeping it, though it might have been the wrong decision. > > ii) vim's breakat characters (default " ^I!@*-+;:,./?"), since > > presumably they had given it some thought, > Maybe. I'm not sure in what modes this would be TRT. It should almost certainly not be the default in any mode, but it might, perhaps, be a useful, pre-defined option for some users. (For instance, when wrapping long URLs or paths in comments: |;; | |https://very.long.url/that-will-not-fit-on-a-single-lin| |e-anyway-but-could-at-least-start-on-the-same-line-as-t| |he-comment-sign-and-break-at-slightly-more-logical-plac| |es | looks (IMO at least!) less aesthetically pleasing than: |;; https://very.long.url/that-will-not-fit-on-a-single-| |line-anyway-but-could-at-least-start-on-the-same-line- | |as-the-comment-sign-and-break-at-slightly-more-logical-| |places | where `|' is the margin. The same sometimes holds for excessively long variable names. I definitely wouldn't impose this preference on others, but I assume that some might share it.) Using vim's choice helps avoid bike-shedding. > We already import several UCD files, see admin/unidata, where you will > also find copyright.html from the Unicode Consortium. Great! That's convenient. > test/manual is okay. Thanks! > This should probably go into simple.el. I'll move it there. Thanks for the help!