From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Line wrap reconsidered Date: Thu, 28 May 2020 13:31:37 -0400 Message-ID: References: <92FF4412-04FB-4521-B6CE-52B08526E4E5@gmail.com> <878shfsq35.fsf@gnus.org> <83imgivjak.fsf@gnu.org> <83lfletr03.fsf@gnu.org> <4895C6EE-5E1F-44BF-93C1-CC5F7C096F73@gmail.com> <9766BA3D-B8F9-456B-9F59-60D21B86E390@gmail.com> <83sgfls2ul.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="35411"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Lars Ingebrigtsen , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu May 28 19:32:44 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jeMOZ-00094f-NB for ged-emacs-devel@m.gmane-mx.org; Thu, 28 May 2020 19:32:43 +0200 Original-Received: from localhost ([::1]:56772 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jeMOY-0002ny-Nb for ged-emacs-devel@m.gmane-mx.org; Thu, 28 May 2020 13:32:42 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:52066) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jeMNe-0001gz-Ep for emacs-devel@gnu.org; Thu, 28 May 2020 13:31:46 -0400 Original-Received: from mail-qt1-x832.google.com ([2607:f8b0:4864:20::832]:43096) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jeMNc-0006We-MW; Thu, 28 May 2020 13:31:46 -0400 Original-Received: by mail-qt1-x832.google.com with SMTP id j32so695067qte.10; Thu, 28 May 2020 10:31:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=lrwSRWG5UOZFNxDLidEvjfnzgHgKYnXRyt+VuGbBeV4=; b=oc3xj88VFMAzdhOjJBUz5t51x6qiHkEF3awn6NxyCzavohnND/QzCu9b0qKZwkDFwE 8oZCkdzSkKzHsmkkR2g8GfxO2xXttHLlTk/LdcxsxR/MHWd7VVtYnAg2XAQnMOuuAxob xH/vDyfObOfxhss5wv9o1tYgUvjri1wWZ52L7N4O5+iPYiqaxA11voP0MxpcYK1If2HE GFPGg4mdQXWjKScBn1dpXT92D+Pp9VVIz2mrCggA/fdXgoF1vIHeKnX/9OZLTX2umjdm DwD8ZCOYoirJtmm/hzPR9d3fRLjzjct2FRW5ExiBIqB9oGy28otz0pGMFBz/iMfHBfGO vDrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=lrwSRWG5UOZFNxDLidEvjfnzgHgKYnXRyt+VuGbBeV4=; b=YEuiFhwcJ9yFhVmHm+WyGkSWQztHKjg01TyGoxF0lnM6oydHoP6rlUmacxbmvkiehs MfdQxvjEri2kFZQIcRk8QpWnkRZIxa78iG6aJpToV6AO8xEvwG6ZkyLxCQRMKZPc9N+l bmwaZy/jBKao/utGYdoiJ/CfohzX8tTpz8ACwnbAqEUacCdwopW6QC6/m0+a9tP5wU9z dLy8Ns/aAr2rDIDcXTrXza/FmcnBJQ0NOc0qNhtCv6Pjf8SPiLxuswDkHQ3rhG4vm1zO fIpYZSQT+v55k74wgAMCV2CMqtzE7853HzfGGiuAC5+9mBiVfd6px/ICQsG501MSr64w hpRw== X-Gm-Message-State: AOAM533bfCmuRaawutcaklasGj9ftQummFgp6OY3iXcMYw3JKHNqCugJ i8yzD9/EaePUpBu9/pKSLzKiOfZ8iMePJg== X-Google-Smtp-Source: ABdhPJwlVffBHNgDTd4VmFEBLANsrrt63qspiep6xwYjSTRg2cXdxjPV7f6N5U7nUPPP8pL4aHxtww== X-Received: by 2002:ac8:17f8:: with SMTP id r53mr4171778qtk.351.1590687098674; Thu, 28 May 2020 10:31:38 -0700 (PDT) Original-Received: from [192.168.1.10] (c-174-60-229-153.hsd1.pa.comcast.net. [174.60.229.153]) by smtp.gmail.com with ESMTPSA id u129sm5538014qkb.51.2020.05.28.10.31.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 May 2020 10:31:38 -0700 (PDT) In-Reply-To: <83sgfls2ul.fsf@gnu.org> X-Mailer: Apple Mail (2.3608.80.23.2.2) Received-SPF: pass client-ip=2607:f8b0:4864:20::832; envelope-from=casouri@gmail.com; helo=mail-qt1-x832.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:251563 Archived-At: > On May 27, 2020, at 1:29 PM, Eli Zaretskii wrote: >=20 >> From: Yuan Fu >> Date: Tue, 26 May 2020 18:29:04 -0400 >> Cc: Lars Ingebrigtsen , >> emacs-devel@gnu.org >>=20 >> Here is the version that doesn=E2=80=99t use text properties. >=20 > Thanks, few comments below. Thank you for reviewing. >=20 >> I assume by string you mean display properties? I checked with = display property and it wraps fine in this version. >=20 > Display properties whose values are strings, and also before-string > and after-string overlay properties. >=20 >> +static bool char_can_wrap_before (struct it *it) >> +{ >> + /* We used to only check for whitespace for wrapping, hence this >> + macro. You cannot wrap before a whitespace. */ >> + return ((it->what =3D=3D IT_CHARACTER >> + && !CHAR_HAS_CATEGORY(it->c, NOT_AT_BOL)) >> + /* There used to be */ >> + && !IT_DISPLAYING_WHITESPACE (it)); >> +} >=20 > The order here is wrong: the IT_DISPLAYING_WHITESPACE should be tested > first, as that is the more frequent situation, so it should be > processed faster. >=20 Fixed. Before answering your questions, this is my understanding of the word = wrapping in redisplay: On every iteration we check if current character allow wrapping after = it, if so, we set may_wrap to true. That basically means the _previous = char_ allows wrapping after it. While we are at the same iteration, we = may want to wrapping point (wrap_it) if 1) the previous char allows = wrapping after it (from may_wrap=E2=80=99s value set by _previous = iteration_) and 2) the current character allows wrapping before it. When = we found ourselves at the end of a line, we have two choices: continue = to the next line (which I assume is what =E2=80=9Ccontinue=E2=80=9D = means in the comment), or instead go to a previously saved wrap point = and break the line there. If there is no previous wrap point, we = continue, if there is a previous wrap point but we actually can wrap at = this point (previous char can wrap after & this char can wrap before), = we just continue, since this position is a valid wrap position. = Otherwise we go back to previous wrap point and wrap there (goto = back_to_wrap;). Although the original code only has one checker (IS_WHITESPACE), it = serves a dual purpose: it is used to determine if we can wrap = after=E2=80=94whitespace and tab allow wrapping after; it is also used = to determine if we can wrap before=E2=80=94they don=E2=80=99t allow = wrapping before (otherwise you see whitespace and tabs on the beginning = of the next line). Needless to say, I=E2=80=99m a newbie in Emacs C internals and = redisplay, so my understanding from reading the original code and = comments could be wrong. But the code seems to work right so I think the = truth isn=E2=80=99t far away. >> +/* Return true if the current character allows wrapping after it. = */ >> +static bool char_can_wrap_after (struct it *it) >> +{ >> + return ((it->what =3D=3D IT_CHARACTER >> + && CHAR_HAS_CATEGORY (it->c, LINE_BREAKABLE) >> + && !CHAR_HAS_CATEGORY(it->c, NOT_AT_EOL)) >> + /* We used to only check for whitespace for wrapping, = hence >> + this macro. Obviously you can wrap after a space. */ >> + || IT_DISPLAYING_WHITESPACE (it)); >> +} >=20 > Do we really need two separate functions? And note that each one > calls IT_DISPLAYING_WHITESPACE, so in some situations you will be > testing that twice for the same character -- because you have 2 > separate functions. IT_DISPLAYING_WHITESPACE would run twice, too: one time to check for = warp after and one time for wrap before. >=20 >> - if (IT_DISPLAYING_WHITESPACE (it)) >> - may_wrap =3D true; >> - else if (may_wrap) >> + /* Can we wrap here? */ >> + if (may_wrap && char_can_wrap_before(it)) >> { >> /* We have reached a glyph that follows one or more >> - whitespace characters. If the position is >> - already found, we are done. */ >> + whitespace characters (or a character that allows >> + wrapping after it). If the position is already >> + found, we are done. */ >=20 > The code calls char_can_wrap_before, but the comment says we can wrap > after it. Which one is right? The comment says this char _follows_ a char that allows wrapping after, = we still need to check if _this_ char allows wrapping before. >=20 >> + /* This has to run after the previous block. */ >=20 > This kind of comment begs the question: "why?" Please rewrite the > comment to answer that question up front. Fixed. Hopefully it=E2=80=99s clear now. >=20 >> - /* If we are at a whitespace character >> - that barely fits on this screen line, >> - but the next character is also >> - whitespace, we cannot wrap here. */ >> + /* If the previous character says we can >> + wrap after it, but the current >> + character says we can't wrap before >> + it, then we can't wrap here. */ >=20 > It sounds like your Emacs is set up to use only spaces for indentation > in C source files, whereas our convention is to use tabs and spaces. >=20 I=E2=80=99m not sure what you mean, do you mean the indent style for the = new code, or are you talking about the word wrapping? >> + DEFSYM (Qword_wrap, "word-wrap"); >> + DEFSYM (Qonly_before, "only-before"); >> + DEFSYM (Qonly_after, "only-after"); >> + DEFSYM (Qno_wrap, "no-wrap"); >=20 > These symbols are not used in the code. Removed. >=20 > And finally, one more general comment/question: isn't your code assume > implicitly that buffer text is always processed in the logical order, > i.e. in the increasing order of buffer positions? I mean, the fact > that you have the "before" and the "after" function seems to imply > that you do assume that, and the logic of processing the categories is > relying on that, expecting that when you see a wrap_after character, > you can wrap on the next character. Is this so? because if it is, > then this will break when processing RTL text, since we may process it > in the reverse order of buffer positions. Please look into these > situations and see that the code does TRT in them. I tested with some Alaric text made by google translate, and the = wrapping seems not take effect. IIUC bidi.c reorders the line to RTL and = the redisplay iterator will still go through them LTR, is that true? I = wonder how does the original wrapping works in that case. Does the old = code handle bidi text? Yuan