From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Pip Cet Newsgroups: gmane.emacs.bugs Subject: bug#41506: 28.0.50; RTL problem Date: Sat, 06 Jun 2020 13:05:43 +0000 Message-ID: <871rmsz6mw.fsf@gmail.com> References: <838shhxuff.fsf@gnu.org> <83tuztctpk.fsf@gnu.org> <83ftbdcmm3.fsf@gnu.org> <87k10kzkv3.fsf@gmail.com> <833678a8xx.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="21259"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: 41506@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jun 06 15:06:10 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jhYWY-0005Ux-78 for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 06 Jun 2020 15:06:10 +0200 Original-Received: from localhost ([::1]:54844 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jhYWX-0004e0-A3 for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 06 Jun 2020 09:06:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38328) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jhYWQ-0004df-Ny for bug-gnu-emacs@gnu.org; Sat, 06 Jun 2020 09:06:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:39497) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jhYWQ-0006Df-F7 for bug-gnu-emacs@gnu.org; Sat, 06 Jun 2020 09:06:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jhYWQ-0003NV-A8 for bug-gnu-emacs@gnu.org; Sat, 06 Jun 2020 09:06:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Pip Cet Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 06 Jun 2020 13:06:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 41506 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: confirmed Original-Received: via spool by 41506-submit@debbugs.gnu.org id=B41506.159144875612973 (code B ref 41506); Sat, 06 Jun 2020 13:06:02 +0000 Original-Received: (at 41506) by debbugs.gnu.org; 6 Jun 2020 13:05:56 +0000 Original-Received: from localhost ([127.0.0.1]:51043 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jhYWJ-0003NB-UT for submit@debbugs.gnu.org; Sat, 06 Jun 2020 09:05:56 -0400 Original-Received: from mail-wm1-f46.google.com ([209.85.128.46]:56288) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jhYWI-0003My-V7 for 41506@debbugs.gnu.org; Sat, 06 Jun 2020 09:05:55 -0400 Original-Received: by mail-wm1-f46.google.com with SMTP id c71so10833849wmd.5 for <41506@debbugs.gnu.org>; Sat, 06 Jun 2020 06:05:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=G40V1rQme00epRO7bFANEwX6GmjrUcEbKD2zA1UcXEA=; b=WjaoKSXavhR0TWc7NCXDxnFdVv0uV1oyhVwmysW8IvMvco1HAJbyrnqd/1SWtLLRUU vJhXPaY9tRjF/oQdk1HDR+J6pbAmCk95KWfsvhbEUZNKFr2Jvzwzy0sL/Rsj3xu0vE7r LpP1zLjfd7HO1r9ufzoQh/PHtfqHlarvgXkzUjMRmALzSSJbVtTM9aIl20WMafwXUuo9 dIow150MJ8CDCoyQy2BZJc0EHqtj1gf9orTMtrAQFGVGGUYjo0kPSdruGMP7UNvL3Dy0 QEFxvmbEftMgXr8/hlOSaL8Z9L4MeGw1ZMotz2woss1zpJd6qSUhe5DvkErEg764nlnH gGAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=G40V1rQme00epRO7bFANEwX6GmjrUcEbKD2zA1UcXEA=; b=MILl1c4ltwlll2HSuEiKSkMZuXhm6mu4M4qzgWUldADACee/JMgaEc+BH9ddjuK+z3 5k+yZUgTn01eLLuvI2S6xKc/DoaiqgPU5BjGxc73PQJnyYt9JVVdmW1iChEkUvIPsO5h jQ0mA/rCvXP54YdPk50HLYgBXYVHFOe9WlN6qB8YtkA2Tpeer1oMlWBA8/160BZouikh tWhU18sbnsGH7L4c7G+2b8FoeSpCtEYdnpoP3zkKcwrxNdr35h17b0WfCTWNefx6U9+Q nfMQ3gBV7z9KlDe6j3V3pXPBFo8OYYepKjFSyGN8GHnwkuBY3yXHt8bWN7pq6HABGlrA 2MgA== X-Gm-Message-State: AOAM531Ka2vA4A6qAdv2DaaQkqbuWeynTUW19nK5A9KrdBBdUc+SQl6Q /HQyJWy9cGFbY4OfurynlbwPgg0Wj9w= X-Google-Smtp-Source: ABdhPJzf2DGV3L2hkdWEbZwQBukaeXpQeu3fFr8WNKrOcclSALjLP96cAonIWOxsDOthSHPuGjSlKg== X-Received: by 2002:a1c:4b15:: with SMTP id y21mr7387333wma.32.1591448748731; Sat, 06 Jun 2020 06:05:48 -0700 (PDT) Original-Received: from chametz ([185.220.101.13]) by smtp.gmail.com with ESMTPSA id v6sm8097339wrf.61.2020.06.06.06.05.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 06 Jun 2020 06:05:48 -0700 (PDT) In-Reply-To: <833678a8xx.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 06 Jun 2020 11:35:06 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:181613 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Eli Zaretskii writes: >> From: Pip Cet >> Cc: 41506@debbugs.gnu.org >> Date: Sat, 06 Jun 2020 07:58:24 +0000 >>=20 >> when we're called with bidi_it->first_elt =3D true, it's possible we >> shouldn't touch bidi_it->new_paragraph at all... > > Can you elaborate on why you think that? Sorry, I shouldn't have said "touch" there. I meant "set", though I no longer think so. > first_elt can be set when we are at the beginning of a paragraph or > when we are in the middle of it, so its meaning is different from that > of new_paragraph. Indeed. >> + paragraph might start. But don't do that for the first >> + element since this function will be called twice in that >> + case. */ > > Which code causes the two calls, and why is that significant in this > case? Maybe this code would be clearer: if (!bidi_it->first_elt) { bytepos++; pos++; } We always look at the paragraph containing the next character to be loaded by bidi_level_of_next_char. If first_elt is set, that is the current character; otherwise, it's the one after that. In the "\n\n=D7=A9" case, this happens: 1. bidi_paragraph_init is called with first_elt =3D 1 at buffer position 1 2. new_paragraph is cleared to false 3. bidi_at_paragraph_end is called for buffer position 2. That looks like a line ending a paragraph, though it's actually a line starting the next paragraph. Still, it returns true. 4. new_paragraph is set again 5. bidi_paragraph_init is called with first_elt =3D 0 at buffer position 1 So everything happens to work in this case, even though several of the assumptions in the bidi code are violated. The code is written to assume paragraphs contain at least two characters: that assumption means it's valid for bidi_paragraph_init to clear new_paragraph. In this case, it's not, but the next line we're looking at, while not actually ending a paragraph, looks like it is... What I'm not sure about is "\n \n=D7=A9". It could be either a single two-line paragraph followed by =D7=A9, or a single-character paragraph followed by another paragraph whose first line happens to contain only a space character; in the first case, paragraph orientation would default to L2R, in the second case, it would be R2L. Do you happen to know what Unicode says for this case? --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-Handle-buffers-containing-two-newlines-followed-by-a.patch >From c5232df875d62ead326d5e90f122ab9ac9798e59 Mon Sep 17 00:00:00 2001 From: Pip Cet Date: Sat, 6 Jun 2020 13:02:55 +0000 Subject: [PATCH] Handle buffers containing two newlines followed by an RTL char * src/bidi.c (bidi_paragraph_init): Correct handling of initial newlines. (Bug#41506) --- src/bidi.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/bidi.c b/src/bidi.c index 1017bd2d52..8aa325fe6d 100644 --- a/src/bidi.c +++ b/src/bidi.c @@ -1714,8 +1714,12 @@ bidi_paragraph_init (bidi_dir_t dir, struct bidi_it *bidi_it, bool no_default_p) s = (STRINGP (bidi_it->string.lstring) ? SDATA (bidi_it->string.lstring) : bidi_it->string.s); - if (bytepos > begbyte - && bidi_char_at_pos (bytepos, s, bidi_it->string.unibyte) == '\n') + /* We always look at the paragraph containing the next character + to be loaded by bidi_level_of_next_char. + + This code happens to work for a buffer containing two + newlines followed by an RTL character (Bug#41506). */ + if (!bidi_it->first_elt) { bytepos++; pos++; -- 2.27.0.rc0 --=-=-=--