From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Line wrap reconsidered Date: Tue, 26 May 2020 18:29:04 -0400 Message-ID: <9766BA3D-B8F9-456B-9F59-60D21B86E390@gmail.com> References: <92FF4412-04FB-4521-B6CE-52B08526E4E5@gmail.com> <878shfsq35.fsf@gnus.org> <83imgivjak.fsf@gnu.org> <83lfletr03.fsf@gnu.org> <4895C6EE-5E1F-44BF-93C1-CC5F7C096F73@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_1A4E3F0F-9772-4C7F-9237-27AFA8191D11" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="40090"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Lars Ingebrigtsen , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed May 27 00:29:51 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jdi51-000AO1-Ac for ged-emacs-devel@m.gmane-mx.org; Wed, 27 May 2020 00:29:51 +0200 Original-Received: from localhost ([::1]:44898 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jdi50-0000iT-CO for ged-emacs-devel@m.gmane-mx.org; Tue, 26 May 2020 18:29:50 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46372) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jdi4L-0000Av-Kc for emacs-devel@gnu.org; Tue, 26 May 2020 18:29:09 -0400 Original-Received: from mail-qv1-xf34.google.com ([2607:f8b0:4864:20::f34]:33489) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jdi4J-0001KY-UA; Tue, 26 May 2020 18:29:09 -0400 Original-Received: by mail-qv1-xf34.google.com with SMTP id er16so10323935qvb.0; Tue, 26 May 2020 15:29:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=EqM8eAMAEes3epkc0CfDEkMZjnyGqWd+mgcJ4BbuaIE=; b=m27umkzxxdwJpXqfKO38diEJp3xzm9VSidkxheV/ogTacFjOe6Q/9yQ1QbcgwJOhGs 38dEe4iARVBH3FwwB216KBZalM+nncTMOmEnolY5aQlUjNubTmWsP6LjEwKgCHJp6lGi V1EwdzdKstvC/QZyeTmfRl9fVOPBMg3+DfZeok4CDfI67fHxDK6k0JY0yBalXBRIqBKO b8gQmgNYv0VzEmGwZAH2UAHw5rUfVm37t+v97WyqfK5iZri1ChkbDjd9Hm01Ya8E+xxn 1+uwxxlSB5As0pidEeu77rwvU8yBrA7B4VoY1KYpV41bKYE2bZdzKs5dkLl4+eGfrmtp mhhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=EqM8eAMAEes3epkc0CfDEkMZjnyGqWd+mgcJ4BbuaIE=; b=rFJmDG+Yr6jINqA5FocsBISAJ25y3lhktNMq2OHhNpqsuG392Tu/xWTkyEhOAnXkCn 7v5r6Rs1oRfxp7RW8wqnnq2sNkEvK4Y8pKwv3reSIJKuhDofd2ZJpjjRIWqyrm8SrJX4 7Z7/zG0GCF+tzzW6K4M5VjGLTW2cpb69+ZH9SfggwpIsBLAfJLxChl+DWyNnPoRNNjj4 m+bTzPqOFeNQ8i0unB3riH7rJRrBIrz4aZ1kRVnMeVGSJ5LeI+1hO6zluKDBkbbvZZir /HcwppO8uLAu9OVoFUABq9QoCUZiVd9yLTarUDk9e6hMb9g0nKInIvsDlfdvRf5/MIXS mwXg== X-Gm-Message-State: AOAM530VXjzafdzHGnQHvsRv+Y7agljwTO16Y4ggixHQwFgjxbKRt8eE Lb2GG5nG52LTH0d1ew+MS6un21fqURCl+A== X-Google-Smtp-Source: ABdhPJwsu+gp0ShAZJxappm3flluxm9qhPa34uf1rT34EFPvqdh5uEjKQvmCbv+WCxCSdAfDF6oHFg== X-Received: by 2002:a0c:fb0e:: with SMTP id c14mr8244571qvp.63.1590532146369; Tue, 26 May 2020 15:29:06 -0700 (PDT) Original-Received: from [192.168.1.10] (c-174-60-229-153.hsd1.pa.comcast.net. [174.60.229.153]) by smtp.gmail.com with ESMTPSA id x66sm886498qkb.33.2020.05.26.15.29.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 May 2020 15:29:05 -0700 (PDT) In-Reply-To: <4895C6EE-5E1F-44BF-93C1-CC5F7C096F73@gmail.com> X-Mailer: Apple Mail (2.3608.80.23.2.2) Received-SPF: pass client-ip=2607:f8b0:4864:20::f34; envelope-from=casouri@gmail.com; helo=mail-qv1-xf34.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:251479 Archived-At: --Apple-Mail=_1A4E3F0F-9772-4C7F-9237-27AFA8191D11 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On May 26, 2020, at 4:31 PM, Yuan Fu wrote: >=20 >=20 >=20 >> On May 26, 2020, at 3:50 PM, Eli Zaretskii wrote: >>=20 >>> From: Yuan Fu >>> Date: Tue, 26 May 2020 13:34:01 -0400 >>> Cc: larsi@gnus.org, >>> emacs-devel@gnu.org >>>=20 >>> I fixed the problems and it now works. If you apply the patch below = and load kinsaku.el, open the test.txt and M-x toggle-word-wrap. You = should see the text properly wrapped: wrapping between CJK characters = and whitespaces but not between ASCII characters. Also according to = kinsoku rules, CJK comma will not be placed at the beginning of a line; = CJK =E2=80=9C=E3=80=8A=E2=80=9D will not be place at the end of a line, = etc. >>>=20 >>> It determines whether we can wrap before/after a character by = looking at =E2=80=9C<=E2=80=9C, =E2=80=9C>=E2=80=9D and =E2=80=9C|=E2=80=9D= categories, roughly corresponding to =E2=80=9Cdon=E2=80=99t wrap = before=E2=80=9D, =E2=80=9Cdon=E2=80=99t wrap after=E2=80=9D and =E2=80=9Cw= rap before and after=E2=80=9D.=20 >>=20 >> Thanks. >>=20 >> This still doesn't support strings, only buffer text. >>=20 >> Also, why are you putting a text property, instead of just examining >> the category as part of IT_CAN_WRAP? What do you need the property >> for? >>=20 >=20 > I don=E2=80=99t really know which way is better/more efficient and = just took one to implement. Plus text property might allow some user = customizations. I can change it to only use category table.=20 Here is the version that doesn=E2=80=99t use text properties. I assume = by string you mean display properties? I checked with display property = and it wraps fine in this version. Yuan --Apple-Mail=_1A4E3F0F-9772-4C7F-9237-27AFA8191D11 Content-Type: multipart/mixed; boundary="Apple-Mail=_5EB46C26-93CE-4719-8910-698ACB1AEF26" --Apple-Mail=_5EB46C26-93CE-4719-8910-698ACB1AEF26 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On May 26, 2020, at 4:31 PM, Yuan Fu <casouri@gmail.com> = wrote:



On = May 26, 2020, at 3:50 PM, Eli Zaretskii <eliz@gnu.org> wrote:

From: = Yuan Fu <casouri@gmail.com>
Date: Tue, 26 May = 2020 13:34:01 -0400
Cc: larsi@gnus.org,
emacs-devel@gnu.org

I fixed the problems and it now works. If you = apply the patch below and load kinsaku.el, open the test.txt and M-x = toggle-word-wrap. You should see the text properly wrapped: wrapping = between CJK characters and whitespaces but not between ASCII characters. = Also according to kinsoku rules, CJK comma will not be placed at the = beginning of a line; CJK =E2=80=9C=E3=80=8A=E2=80=9D will not be place = at the end of a line, etc.

It determines = whether we can wrap before/after a character by looking at =E2=80=9C<=E2= =80=9C, =E2=80=9C>=E2=80=9D and =E2=80=9C|=E2=80=9D categories, = roughly corresponding to =E2=80=9Cdon=E2=80=99t wrap before=E2=80=9D, = =E2=80=9Cdon=E2=80=99t wrap after=E2=80=9D and =E2=80=9Cwrap before and = after=E2=80=9D. 

Thanks.

This still doesn't support strings, only buffer text.

Also, why are you putting a text property, = instead of just examining
the category as part of = IT_CAN_WRAP?  What do you need the property
for?


I don=E2=80=99t really know which way is better/more = efficient and just took one to implement. Plus text property might allow = some user customizations. I can change it to only use category = table. 

Here is the version that doesn=E2=80=99t = use text properties. I assume by string you mean display properties? I = checked with display property and it wraps fine in this = version.

Yuan


= --Apple-Mail=_5EB46C26-93CE-4719-8910-698ACB1AEF26 Content-Disposition: attachment; filename=new-wrap.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="new-wrap.patch" Content-Transfer-Encoding: 7bit diff --git a/src/xdisp.c b/src/xdisp.c index cf15f579b5..105b6b175a 100644 --- a/src/xdisp.c +++ b/src/xdisp.c @@ -447,6 +447,7 @@ Copyright (C) 1985-1988, 1993-1995, 1997-2020 Free Software Foundation, #include "termchar.h" #include "dispextern.h" #include "character.h" +#include "category.h" #include "buffer.h" #include "charset.h" #include "indent.h" @@ -508,6 +509,37 @@ #define IT_DISPLAYING_WHITESPACE(it) \ && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' ' \ || *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t')))) +/* These are the category sets we use. */ +#define NOT_AT_EOL 60 /* < */ +#define NOT_AT_BOL 62 /* > */ +#define LINE_BREAKABLE 124 /* | */ + +/* Return true if the current character allows wrapping before it. */ +static bool char_can_wrap_before (struct it *it) +{ + /* We used to only check for whitespace for wrapping, hence this + macro. You cannot wrap before a whitespace. */ + return ((it->what == IT_CHARACTER + && !CHAR_HAS_CATEGORY(it->c, NOT_AT_BOL)) + /* There used to be */ + && !IT_DISPLAYING_WHITESPACE (it)); +} + +/* Return true if the current character allows wrapping after it. */ +static bool char_can_wrap_after (struct it *it) +{ + return ((it->what == IT_CHARACTER + && CHAR_HAS_CATEGORY (it->c, LINE_BREAKABLE) + && !CHAR_HAS_CATEGORY(it->c, NOT_AT_EOL)) + /* We used to only check for whitespace for wrapping, hence + this macro. Obviously you can wrap after a space. */ + || IT_DISPLAYING_WHITESPACE (it)); +} + +#undef NOT_AT_BOL +#undef NOT_AT_BOL +#undef LINE_BREAKABLE + /* If all the conditions needed to print the fill column indicator are met, return the (nonnegative) column number, else return a negative value. */ @@ -9185,13 +9217,13 @@ #define IT_RESET_X_ASCENT_DESCENT(IT) \ { if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA) { - if (IT_DISPLAYING_WHITESPACE (it)) - may_wrap = true; - else if (may_wrap) + /* Can we wrap here? */ + if (may_wrap && char_can_wrap_before(it)) { /* We have reached a glyph that follows one or more - whitespace characters. If the position is - already found, we are done. */ + whitespace characters (or a character that allows + wrapping after it). If the position is already + found, we are done. */ if (atpos_it.sp >= 0) { RESTORE_IT (it, &atpos_it, atpos_data); @@ -9206,8 +9238,14 @@ #define IT_RESET_X_ASCENT_DESCENT(IT) \ } /* Otherwise, we can wrap here. */ SAVE_IT (wrap_it, *it, wrap_data); - may_wrap = false; } + /* This has to run after the previous block. */ + if (char_can_wrap_after (it)) + /* may_wrap basically means "previous char allows + wrapping after it". */ + may_wrap = true; + else + may_wrap = false; } } @@ -9335,10 +9373,10 @@ #define IT_RESET_X_ASCENT_DESCENT(IT) \ { bool can_wrap = true; - /* If we are at a whitespace character - that barely fits on this screen line, - but the next character is also - whitespace, we cannot wrap here. */ + /* If the previous character says we can + wrap after it, but the current + character says we can't wrap before + it, then we can't wrap here. */ if (it->line_wrap == WORD_WRAP && wrap_it.sp >= 0 && may_wrap @@ -9350,7 +9388,7 @@ #define IT_RESET_X_ASCENT_DESCENT(IT) \ SAVE_IT (tem_it, *it, tem_data); set_iterator_to_next (it, true); if (get_next_display_element (it) - && IT_DISPLAYING_WHITESPACE (it)) + && !char_can_wrap_before(it)) can_wrap = false; RESTORE_IT (it, &tem_it, tem_data); } @@ -9429,19 +9467,18 @@ #define IT_RESET_X_ASCENT_DESCENT(IT) \ else IT_RESET_X_ASCENT_DESCENT (it); - /* If the screen line ends with whitespace, and we - are under word-wrap, don't use wrap_it: it is no - longer relevant, but we won't have an opportunity - to update it, since we are done with this screen - line. */ + /* If the screen line ends with whitespace (or + wrap-able character), and we are under word-wrap, + don't use wrap_it: it is no longer relevant, but + we won't have an opportunity to update it, since + we are done with this screen line. */ if (may_wrap && IT_OVERFLOW_NEWLINE_INTO_FRINGE (it) /* If the character after the one which set the - may_wrap flag is also whitespace, we can't - wrap here, since the screen line cannot be - wrapped in the middle of whitespace. - Therefore, wrap_it _is_ relevant in that - case. */ - && !(moved_forward && IT_DISPLAYING_WHITESPACE (it))) + may_wrap flag says we can't wrap before it, + we can't wrap here. Therefore, wrap_it + (previously found wrap-point) _is_ relevant + in that case. */ + && !(moved_forward && char_can_wrap_before(it))) { /* If we've found TO_X, go back there, as we now know the last word fits on this screen line. */ @@ -23292,9 +23329,8 @@ #define RECORD_MAX_MIN_POS(IT) \ if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA) { - if (IT_DISPLAYING_WHITESPACE (it)) - may_wrap = true; - else if (may_wrap) + /* Can we wrap here? */ + if (may_wrap && char_can_wrap_before(it)) { SAVE_IT (wrap_it, *it, wrap_data); wrap_x = x; @@ -23308,9 +23344,13 @@ #define RECORD_MAX_MIN_POS(IT) \ wrap_row_min_bpos = min_bpos; wrap_row_max_pos = max_pos; wrap_row_max_bpos = max_bpos; - may_wrap = false; } - } + /* This has to run after the previous block. */ + if (char_can_wrap_after (it)) + may_wrap = true; + else + may_wrap = false; + } } PRODUCE_GLYPHS (it); @@ -23433,14 +23473,18 @@ #define RECORD_MAX_MIN_POS(IT) \ /* If line-wrap is on, check if a previous wrap point was found. */ if (!IT_OVERFLOW_NEWLINE_INTO_FRINGE (it) - && wrap_row_used > 0 + && wrap_row_used > 0 /* Found. */ /* Even if there is a previous wrap point, continue the line here as usual, if (i) the previous character - was a space or tab AND (ii) the - current character is not. */ - && (!may_wrap - || IT_DISPLAYING_WHITESPACE (it))) + allows wrapping after it, AND (ii) + the current character allows wrapping + before it. Because this is a valid + break point, we can just continue to + the next line at here, there is no + need to wrap early at the previous + wrap point. */ + && (!may_wrap || !char_can_wrap_before(it))) goto back_to_wrap; /* Record the maximum and minimum buffer @@ -23468,13 +23512,16 @@ #define RECORD_MAX_MIN_POS(IT) \ /* If line-wrap is on, check if a previous wrap point was found. */ else if (wrap_row_used > 0 - /* Even if there is a previous wrap - point, continue the line here as - usual, if (i) the previous character - was a space or tab AND (ii) the - current character is not. */ - && (!may_wrap - || IT_DISPLAYING_WHITESPACE (it))) + /* Even if there is a previous + wrap point, continue the + line here as usual, if (i) + the previous character was a + space or tab AND (ii) the + current character is not, + AND (iii) the current + character allows wrapping + before it. */ + && (!may_wrap || !char_can_wrap_before(it))) goto back_to_wrap; } @@ -34349,6 +34396,10 @@ syms_of_xdisp (void) DEFSYM (QCfile, ":file"); DEFSYM (Qfontified, "fontified"); DEFSYM (Qfontification_functions, "fontification-functions"); + DEFSYM (Qword_wrap, "word-wrap"); + DEFSYM (Qonly_before, "only-before"); + DEFSYM (Qonly_after, "only-after"); + DEFSYM (Qno_wrap, "no-wrap"); /* Name of the symbol which disables Lisp evaluation in 'display' properties. This is used by enriched.el. */ --Apple-Mail=_5EB46C26-93CE-4719-8910-698ACB1AEF26 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=us-ascii

--Apple-Mail=_5EB46C26-93CE-4719-8910-698ACB1AEF26-- --Apple-Mail=_1A4E3F0F-9772-4C7F-9237-27AFA8191D11--