all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Adam Tack <adam.tack.513@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 13399@debbugs.gnu.org
Subject: bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B
Date: Wed, 13 Dec 2017 04:00:56 +0000	[thread overview]
Message-ID: <CAA+VxxGE8ROEFUSoCXZwNAq2CKM5EHzs33FonjGbHSq=DgD9hw@mail.gmail.com> (raw)
In-Reply-To: <838te7swci.fsf@gnu.org>

Sorry for not working further on this, but I didn't have time.  I will
get back to finishing this, soon.

> Hmm... not sure why you arrived at this conclusion.  E.g., what's
> wrong with the implementation at the bottom of this message?

This was very similar to my first try.  Unfortunately, it doesn't work
correctly in whitespace-mode, even with just normal spaces, regressing
on Bug#11341.

(with-current-buffer (get-buffer-create "*bar*")
  (dotimes (i 1000)
    (insert "1234 ")) ; Space
  (setq word-wrap t)
  (whitespace-mode)
  (display-buffer "*bar*"))

The spaces are displayed as `·', so it->c returns 183, none of the
further tests are checked and IT_DISPLAYING_WHITESPACE returns False.
(In the currently used implementation, if it->c is not one of ' ' or '\t'
then the later tests are all checked.)

I thought about changing the order of the tests to something like the
following (ignoring the special case of ' ' and '\t', here, for
brevity):

static inline bool
IT_DISPLAYING_WHITESPACE (struct it *it) {
  int c;
  if (IT_BYTEPOS (*it) < ZV_BYTE)
    c = FETCH_CHAR (IT_BYTEPOS (*it));
  else if (it->what == IT_CHARACTER)
    c = it->c;
  else if (STRINGP (it->string))
    c = STRING_CHAR (SDATA (it->string) + IT_STRING_BYTEPOS (*it));
  else if (it->s)
    c = STRING_CHAR (it->s + IT_BYTEPOS (*it));
  else
    return false;

  return !NILP (CHAR_TABLE_REF (Vword_wrap_chars, c));
}

which in the case of whitespace-mode does TRT, but I worried that
there might be situations where wrapping on the display character
is correct.  The crux (as I had previously, but very unclearly,
written) is that under "normal" circumstances, both
`(it->what == IT_CHARACTER)' and `(IT_BYTEPOS (*it) < ZV_BYTE)'
are true.

Additionally, I wasn't sure whether there should be a fall-through,
since on the one hand, it prevents emacs crashing if (weirdly) all the
previous tests return false, but on the other, it might preclude some magic
compiler optimisation.

Chaining ORs side-stepped both issues, so I settled on keeping it, though
it might have been the wrong decision.

> > ii) vim's breakat characters (default " ^I!@*-+;:,./?"), since
> > presumably they had given it some thought,

> Maybe.  I'm not sure in what modes this would be TRT.

It should almost certainly not be the default in any mode, but it
might, perhaps, be a useful, pre-defined option for some users.  (For
instance, when wrapping long URLs or paths in comments:

|;;                                                     |
|https://very.long.url/that-will-not-fit-on-a-single-lin|
|e-anyway-but-could-at-least-start-on-the-same-line-as-t|
|he-comment-sign-and-break-at-slightly-more-logical-plac|
|es                                                     |

looks (IMO at least!) less aesthetically pleasing than:

|;; https://very.long.url/that-will-not-fit-on-a-single-|
|line-anyway-but-could-at-least-start-on-the-same-line- |
|as-the-comment-sign-and-break-at-slightly-more-logical-|
|places                                                 |

where `|' is the margin.

The same sometimes holds for excessively long variable names.  I
definitely wouldn't impose this preference on others, but I assume
that some might share it.)  Using vim's choice helps avoid
bike-shedding.

> We already import several UCD files, see admin/unidata, where you will
> also find copyright.html from the Unicode Consortium.

Great! That's convenient.

> test/manual is okay.

Thanks!

> This should probably go into simple.el.

I'll move it there.


Thanks for the help!





  reply	other threads:[~2017-12-13  4:00 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-10  8:29 bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B martin rudalics
2013-01-10 19:15 ` Eli Zaretskii
2013-01-11  8:16   ` martin rudalics
2013-01-11  8:58     ` Eli Zaretskii
2013-01-11 10:29       ` martin rudalics
2013-01-11 10:57         ` Eli Zaretskii
2013-01-11 14:30           ` martin rudalics
2013-01-11 14:49             ` Eli Zaretskii
2013-01-11 15:17               ` martin rudalics
2013-01-11 15:22                 ` Christopher Schmidt
2013-01-11 18:04                   ` martin rudalics
2013-01-11 15:53                 ` Eli Zaretskii
2013-01-11 18:04                   ` martin rudalics
2013-01-11 16:08             ` Stefan Monnier
2013-01-11 18:06               ` martin rudalics
2013-01-11 18:50                 ` Stefan Monnier
2013-01-11 19:29                   ` Eli Zaretskii
2013-01-11 22:47                     ` Stefan Monnier
2013-01-12  8:28                       ` Eli Zaretskii
2013-01-12 13:20                         ` Stefan Monnier
2013-01-12 14:12                           ` Eli Zaretskii
2013-01-12 16:06                             ` Stefan Monnier
2013-02-02 16:48                         ` martin rudalics
2013-02-02 17:52                           ` Eli Zaretskii
2013-02-02 18:20                             ` martin rudalics
2013-02-02 18:36                               ` Eli Zaretskii
2013-02-03  9:44                                 ` martin rudalics
2013-02-03 16:01                                   ` Stefan Monnier
2013-02-03 19:32                                   ` Eli Zaretskii
2013-02-04 17:04                                     ` martin rudalics
2013-02-04 17:57                                       ` Eli Zaretskii
2013-01-11 19:08                 ` Eli Zaretskii
2013-01-12 14:29                   ` martin rudalics
2013-01-12 14:56                     ` Eli Zaretskii
2013-01-12 16:37                       ` martin rudalics
2013-01-12 16:51                         ` Eli Zaretskii
2013-01-12 18:01                           ` martin rudalics
2013-01-12 18:38                             ` Eli Zaretskii
2013-01-14 18:04                               ` martin rudalics
2013-02-03 18:57   ` martin rudalics
2013-02-03 19:45     ` Eli Zaretskii
2017-12-08  1:02 ` Adam Tack
2017-12-08 10:12   ` martin rudalics
2017-12-08 15:38   ` Eli Zaretskii
2017-12-08 20:08     ` Eli Zaretskii
2017-12-09  3:50       ` Adam Tack
2017-12-12 17:13         ` Eli Zaretskii
2017-12-13  4:00           ` Adam Tack [this message]
2017-12-13 16:09             ` Eli Zaretskii
2017-12-17  2:22               ` Adam Tack
2020-09-18 14:55                 ` Lars Ingebrigtsen
2020-09-18 15:39                   ` Eli Zaretskii
2020-09-19 13:15                     ` Lars Ingebrigtsen
2020-09-19 14:36                       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA+VxxGE8ROEFUSoCXZwNAq2CKM5EHzs33FonjGbHSq=DgD9hw@mail.gmail.com' \
    --to=adam.tack.513@gmail.com \
    --cc=13399@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.