From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Mauro Aranda Newsgroups: gmane.emacs.bugs Subject: bug#35202: 27.0.50; Info-quoted false positives and false negatives Date: Wed, 10 Apr 2019 21:19:54 -0300 Message-ID: References: <83sgur7lbv.fsf@gnu.org> <837ec36nra.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="000000000000fc5eb205863625e9" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="179114"; mail-complaints-to="usenet@blaine.gmane.org" Cc: 35202@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Apr 11 02:21:14 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hENSq-000kUz-KX for geb-bug-gnu-emacs@m.gmane.org; Thu, 11 Apr 2019 02:21:12 +0200 Original-Received: from localhost ([127.0.0.1]:39518 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hENSp-0006fE-H1 for geb-bug-gnu-emacs@m.gmane.org; Wed, 10 Apr 2019 20:21:11 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:54329) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hENSi-0006f5-8k for bug-gnu-emacs@gnu.org; Wed, 10 Apr 2019 20:21:06 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hENSg-0002ve-Er for bug-gnu-emacs@gnu.org; Wed, 10 Apr 2019 20:21:04 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:40143) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hENSg-0002vR-7w for bug-gnu-emacs@gnu.org; Wed, 10 Apr 2019 20:21:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hENSf-0001S8-VE for bug-gnu-emacs@gnu.org; Wed, 10 Apr 2019 20:21:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mauro Aranda Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 11 Apr 2019 00:21:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 35202 X-GNU-PR-Package: emacs Original-Received: via spool by 35202-submit@debbugs.gnu.org id=B35202.15549420175504 (code B ref 35202); Thu, 11 Apr 2019 00:21:01 +0000 Original-Received: (at 35202) by debbugs.gnu.org; 11 Apr 2019 00:20:17 +0000 Original-Received: from localhost ([127.0.0.1]:53687 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hENRv-0001Qg-Ga for submit@debbugs.gnu.org; Wed, 10 Apr 2019 20:20:17 -0400 Original-Received: from mail-lj1-f175.google.com ([209.85.208.175]:33022) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hENRt-0001QM-7c for 35202@debbugs.gnu.org; Wed, 10 Apr 2019 20:20:14 -0400 Original-Received: by mail-lj1-f175.google.com with SMTP id f23so3793520ljc.0 for <35202@debbugs.gnu.org>; Wed, 10 Apr 2019 17:20:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4+SWffig+vp/3lEcrxIiS6tKLmTzht+76sqN3U8LhnY=; b=J41QhQM3SbM3uSC/X2dzkDDrEa28a2jWaqIfoTJ6MrV8OjiDXMj9HsluS3nkqwkpUH HFUGDul1HYtvg3pEvyT8ePZeIePSTynGDOoo7r99ocKx0paaPjw97srq/kwBKDQDCTaR TemxOe+j56bWzImklrzfGNnHw0XPIIsyrkXoLJmWIYE9+D83N7LhaPF2CaxtXGDiv8Dd Q04DA5kleodIh/qzSGYGLPKkPXtKhZK2jy7ZG0n1D2eLWRzUpv129sx/yGzsRcZhSRtB LZHfcJH73v88xJ4r9Ik18Yb+1YwfFDkJ/fKmLH+X6IHRfj/87wNxZyDqvOD5dNo/c3+U bwLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4+SWffig+vp/3lEcrxIiS6tKLmTzht+76sqN3U8LhnY=; b=V95gHZT3SStoJweJs/SrTumrTDpySJFncmSmGTD2evjcRJEcwGLZcqwXQZghejcmLO j+MW+vRQyWsuJeCdd7ABx/5pXTEAUWJvdP9lwjddJhlHrlQe7rSiPITVEpScDrxVtGWP 9Idf5ZiJolfAJznNSNhICWvBC07ZGsW4PpLMCPpzeXs2+ApTmLL+PWlHONM0zeQd7hh2 Boseaa3GCzvjWicB4S9Jgta4W7UC5IMt5WA/5s+hkRzogza3cgQaw9S/g96TsYjAkzRn RZKdDiakrI3gAQne/STjs/z+NOzd0W4g1VuC0g5RcNil3U4rr66PEdol9hDW0FyXrZwH Fohw== X-Gm-Message-State: APjAAAUmHK2GcMHPqpxRciMoSp/Fg/Olmna+tn7PweBby4iRuDOiY2vN v0L/uHvNOVoO3A8q8KzrCRA8Z6gkjLD5VcPnTZU= X-Google-Smtp-Source: APXvYqxxRCjMkFTdSjBs1IOUbdUAuMgTmJehnbBbJ8nenowZx+5zf8dRaYwcE8CteLOsoI/103cfFD774VknhTi53hY= X-Received: by 2002:a2e:7005:: with SMTP id l5mr25036264ljc.13.1554942007196; Wed, 10 Apr 2019 17:20:07 -0700 (PDT) In-Reply-To: <837ec36nra.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:157474 Archived-At: --000000000000fc5eb205863625e9 Content-Type: multipart/alternative; boundary="000000000000fc5eb005863625e7" --000000000000fc5eb005863625e7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello. I'll explain shortly how I implemented the test: I created a list of info files that get built when building emacs, and for each one, I called (info info-filename) to visit it [1]. Then, searche= d all the file for the old regexp, storing the values in a list, each element being a list: '((match-beginning 1) (1- (match-end 1))). Then, something similar for the new regexp. For the change of regexp to have effect, I added a hook to Info-mode-hook, that basically does this: (setcar (car Info-mode-font-lock-keywords) current-re) (setq-local font-lock-defaults '(Info-mode-font-lock-keywords t t))) Finally, compared both lists (namely old-matches and new-matches) with cl-set-exclusive-or, sorted it (for better comparison) and wrote a file .mismatches-filename, for each info file. Now for the results: The files that presented mismatches are the following: emacs (as expected, hence the bug report), calc, idlwave, mh-e, org, sc. To navigate to the points, for examination, I recommend widening the info buffer and then goto-char. * Emacs: 1) Old match: (92506 92541) New match: (92536 92541) 2) Old match: (92823 92860) New match: (92856 92860) This is correct, and achieved by the second option of the regexp I proposed. 3) Old match: (183951 183977) New match: (183952 183977) This is a little odd, since it is a quote inside a quote. The new regexp matches the inner quote, while the old quote quotes the =E2=80=98 starting the inner quote too. There's no big difference, IMO. 4) Old match: (313527 313526) New match: (313527 313527) 5) Old match: (313905 313904) New match: (313905 313905) 4) and 5) are the same. This was part of the original bug report, so as expected, the new regexp handles this case just right. 6) Old match: (652524 652542) New match: (652536 652542) Similar to 1) and 2). 7) Old match: (767119 767124) New match: (767123 767124) This one is tricky. It is a quote that contains =E2=80=98 and =E2=80=99, b= ut it is not a nested quote. Tweaking the regexp to match nested quotes would do the right thing, but by sheer luck. 8) Old match: (768216 768225) New match: (768219 768225) See 1) and 2). * Calc: 9) Old match: (493087 493098) New match: (493088 493098) This is odd, and might be the calc.texi file that is wrong (I'm not sure, but the "`" in calc.texi looks suspicious). Still, the new behavior doesn't break display with this one, IMO. 10) Something extra I noted in the Appendix E Calc Summary. Both regexp fails at (1386635 1404639). I found this a hard one, and I can't think of a way to solve it. * Idlwave: 11) Old match: (93451 93514) New match: (93496 93514) Both regexp are wrong in this table. This is similar to 10). Not an easy one to solve, but the new regexp at least behaves a little better in the line with (=E2=80=98idlwave-find-module=E2=80=99), IMO. * MH-E: 12) This one is a group of similar mismatches: Old ones: (168432 168456) (168585 168611) (168755 168774) New ones: (168456 168456) (168611 168611) (168774 168774) The old regexp quotes inconsistently, while the proposed one quotes only the =E2=80=98+=E2=80=99, =E2=80=98-=E2=80=99 and the =E2=80=98r=E2=80=99. = I think it could be solved by tweaking the proposed regexp, to match the outer quote of a nested quote. * Org: 13) Go to the table at 685320. The problem with the mismatches is similar to 11), and both regexp get it wrong. * SC: 14) Old matches: (9549 9550) (9768 9769) New matches: (9550 9550) (9769 9769) I'm not sure if the double quoting of > (as in =E2=80=98=E2=80=98>=E2=80=99= =E2=80=99) is intended. I don't think so, but I can't be sure. Still, the new regexp behaves better, by quoting only the >, while the current one is inconsistent and looks odd. To sum it up: * Not sure if the tables in Idlwave and Org could be changed. If yes, then the problems with these files will go away. * If 9) and 14) can be solved by modifying the .texi file, then either regexp will do. * The regexp could be tweaked to match outer quotes, when quotes are nested. This is necessary to do the right thing in mh-e file, for example. * Overall, I think it is an improvement. It doesn't break display, it is more accurate, and wherever it fails, the current regexp fails too. But of course, I'm biased, since I'm the one proposing it. [1] Files I checked: ada-mode, auth, autotype, bovine, calc, ccmode, cl, dbus, dired-x, ebrowse, ede, ediff, edt, efaq, efaq-w32, eieio, elisp, eintr, emacs, emacs-gnutls, emacs-mime, epa, erc, ert, eshell, eudc, eww, flymake, forms, gnus, htmlfontify, idlwave, ido, info, mairix-el, message, mh-e, nesticker, nxml-mode, octave-mode, org, pcl-cvs, pgg, ricrc, reftex, remember, sasl, sc, semantic, ses, sieve, smtpmail, speedbar, srecode, todo-mode, tramp, url, vhdl-mode, vip, viper, widget, wisent, woman. For extra points, I checked some external files I happen to have installed: libc, bison, wget. --000000000000fc5eb005863625e7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello.

I'll explain shortly ho= w I implemented the test:
I created a list of info files that get built = when building emacs, and
for each one, I called (info info-filename) to = visit it [1].=C2=A0 Then, searched
all the file for the old regexp, stor= ing the values in a list, each
element being a list: '((match-beginn= ing 1) (1- (match-end 1))).=C2=A0 Then,
something similar for the new re= gexp.=C2=A0 For the change of regexp to have
effect, I added a hook to I= nfo-mode-hook, that basically does this:

(setcar (car Info-mode-font= -lock-keywords) current-re)
(setq-local font-lock-defaults '(Info-mo= de-font-lock-keywords t t)))

Finally, compared both lists (namely ol= d-matches and new-matches) with
cl-set-exclusive-or, sorted it (for bett= er comparison) and wrote a file
.mismatches-filename, for each info file= .


Now for the results:
The files that presented mismatches ar= e the following:
emacs (as expected, hence the bug report), calc, idlwav= e, mh-e, org,
sc.

To navigate to the points, for examination, I r= ecommend widening the
info buffer and then goto-char.

* Emacs:1)
Old match: (92506 92541)
New match: (92536 92541)

2)
Ol= d match: (92823 92860)
New match: (92856 92860)

This is correct, = and achieved by the second option of the regexp I
proposed.

3)Old match: (183951 183977)
New match: (183952 183977)

This is a = little odd, since it is a quote inside a quote.=C2=A0 The new
regexp mat= ches the inner quote, while the old quote quotes the =E2=80=98
starting = the inner quote too.=C2=A0 There's no big difference, IMO.

4)Old match: (313527 313526)
New match: (313527 313527)

5)
Old = match: (313905 313904)
New match: (313905 313905)

4) and 5) are t= he same.=C2=A0 This was part of the original bug report, so as
expected,= the new regexp handles this case just right.

6)
Old match: (6525= 24 652542)
New match: (652536 652542)

Similar to 1) and 2).
7)
Old match: (767119 767124)
New match: (767123 767124)

Thi= s one is tricky.=C2=A0 It is a quote that contains =E2=80=98 and =E2=80=99,= but it is not
a nested quote.=C2=A0 Tweaking the regexp to match nested= quotes would do the
right thing, but by sheer luck.

8)
Old ma= tch: (768216 768225)
New match: (768219 768225)

See 1) and 2).
* Calc:
9)
Old match: (493087 493098)
New match: (493088 4930= 98)

This is odd, and might be the calc.texi file that is wrong (I= 9;m not
sure, but the "`" in calc.texi looks suspicious).=C2= =A0 Still, the new
behavior doesn't break display with this one, IMO= .

10) Something extra I noted=C2=A0 in the Appendix E Calc Summary.<= br>Both regexp fails at (1386635 1404639).=C2=A0 I found this a hard one, a= nd I
can't think of a way to solve it.

* Idlwave:
11)
O= ld match: (93451 93514)
New match: (93496 93514)

Both regexp are = wrong in this table.=C2=A0 This is similar to 10).=C2=A0 Not an
easy one= to solve, but the new regexp at least behaves a little better
in the li= ne with (=E2=80=98idlwave-find-module=E2=80=99), IMO.

* MH-E:
12)= This one is a group of similar mismatches:
Old ones:
(168432 168456)=
(168585 168611)
(168755 168774)

New ones:
(168456 168456)<= br>(168611 168611)
(168774 168774)

The old regexp quotes inconsis= tently, while the proposed one quotes only
the =E2=80=98+=E2=80=99, =E2= =80=98-=E2=80=99 and the =E2=80=98r=E2=80=99.=C2=A0 I think it could be sol= ved by tweaking the
proposed regexp, to match the outer quote of a neste= d quote.

* Org:
13) Go to the table at 685320.=C2=A0 The problem = with the mismatches is
similar to 11), and both regexp get it wrong.
=
* SC:
14)
Old matches:
(9549 9550)
(9768 9769)

New m= atches:
(9550 9550)
(9769 9769)

I'm not sure if the double= quoting of > (as in =E2=80=98=E2=80=98>=E2=80=99=E2=80=99) is intend= ed.=C2=A0 I
don't think so, but I can't be sure.=C2=A0 Still, th= e new regexp behaves
better, by quoting only the >, while the current= one is inconsistent and
looks odd.


To sum it up:
* Not su= re if the tables in Idlwave and Org could be changed.=C2=A0 If yes,
then= the problems with these files will go away.

*= If 9) and 14) can be solved by modifying the .texi file, then either
regexp will do.

* The regexp cou= ld be tweaked to match outer quotes, when quotes are
nested.=C2=A0 This = is necessary to do the right thing in mh-e file, for
example.=C2=A0
=

* Overall, I think it is = an improvement.=C2=A0 It doesn't break display, it is more
ac= curate, and wherever it fails, the current regexp fails too. But of course,=
I'm biased, since I'm the one proposing it.

[1] Files I checked:
ada-mode, auth, autotype, bov= ine, calc, ccmode, cl, dbus, dired-x,
ebrowse, ede, ediff, edt, efaq, ef= aq-w32, eieio, elisp, eintr, emacs,
emacs-gnutls, emacs-mime, epa, erc, = ert, eshell, eudc, eww, flymake,
forms, gnus, htmlfontify, idlwave, ido,= info, mairix-el, message, mh-e,
nesticker, nxml-mode, octave-mode, org,= pcl-cvs, pgg, ricrc, reftex,
remember, sasl, sc, semantic, ses, sieve, = smtpmail, speedbar, srecode,
todo-mode, tramp, url, vhdl-mode, vip, vipe= r, widget, wisent, woman.

For extra points, I checked some external = files I happen to have
installed:
libc, bison, wget.

--000000000000fc5eb005863625e7-- --000000000000fc5eb205863625e9 Content-Type: text/x-patch; charset="UTF-8"; name="0001-Avoid-false-positives-and-false-negatives-of-Info-qu.patch" Content-Disposition: attachment; filename="0001-Avoid-false-positives-and-false-negatives-of-Info-qu.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_jubw1cf20 RnJvbSBmZTI1NWMxMTU2ODAzMzlmY2ZkNTkwMDBjZmM4NjdhYmNmNjk5NWJkIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBNYXVybyBBcmFuZGEgPG1hdXJvb2FyYW5kYUBnbWFpbC5jb20+ CkRhdGU6IE1vbiwgOCBBcHIgMjAxOSAyMDoyNDozMiAtMDMwMApTdWJqZWN0OiBbUEFUQ0hdIEF2 b2lkIGZhbHNlIHBvc2l0aXZlcyBhbmQgZmFsc2UgbmVnYXRpdmVzIG9mIEluZm8tcXVvdGVkIGZh Y2UKCiogbGlzcC9pbmZvLmVsIChJbmZvLW1vZGUtZm9udC1sb2NrLWtleXdvcmRzKTogTW9kaWZ5 IHRoZSByZWdleHAsIGZvcgptYXRjaGluZyBzaW5nbGUgcXVvdGVzIG9mIG9wZW5pbmcgc2luZ2xl IHF1b3RlIGFuZCBjbG9zaW5nIHNpbmdsZQpxdW90ZSwgYW5kIGF2b2lkIG1hdGNoaW5nIHRleHQg Zm9sbG93ZWQgYnkgYSBjdXJseSBxdW90ZSB3aGVuIGl0IGlzCm5vdCBxdW90aW5nLiAoQnVnIzM1 MjAyKQotLS0KIGxpc3AvaW5mby5lbCB8IDMgKystCiAxIGZpbGUgY2hhbmdlZCwgMiBpbnNlcnRp b25zKCspLCAxIGRlbGV0aW9uKC0pCgpkaWZmIC0tZ2l0IGEvbGlzcC9pbmZvLmVsIGIvbGlzcC9p bmZvLmVsCmluZGV4IGYzYjQxM2EuLjlhOGM4MmUgMTAwNjQ0Ci0tLSBhL2xpc3AvaW5mby5lbAor KysgYi9saXNwL2luZm8uZWwKQEAgLTQyNjgsOCArNDI2OCw5IEBAIEluZm8tcXVvdGVkCiA7OyBX ZSBkZWxpYmVyYXRlbHkgZm9udGlmeSBvbmx5IOKAmC4u4oCZIHF1b3RpbmcsIGFuZCBub3QgYC4u JywgYmVjYXVzZQogOzsgdGhlIGZvcm1lciBjYW4gYmUgZG9uZSBtdWNoIG1vcmUgcmVsaWFibHks IGkuZS4gd2l0aG91dCByaXNraW5nCiA7OyBmYWxzZSBwb3NpdGl2ZXMuCis7OyBGSVhNRTogSXQg ZG9lc24ndCBoYW5kbGUgbmVzdGVkIHF1b3Rlcy4KIChkZWZ2YXIgSW5mby1tb2RlLWZvbnQtbG9j ay1rZXl3b3JkcwotICAnKCgi4oCYXFwoW17igJldKlxcKeKAmSIgKDEgJ0luZm8tcXVvdGVkKSkp KQorICAnKCgi4oCYXFwoW+KAmOKAmV1cXHxbXuKAmOKAmV0qXFwp4oCZIiAoMSAnSW5mby1xdW90 ZWQpKSkpCiAKIDs7IEF1dG9sb2FkIGNvb2tpZSBuZWVkZWQgYnkgZGVza3RvcC5lbAogOzs7IyMj YXV0b2xvYWQKLS0gCjIuNy40Cgo= --000000000000fc5eb205863625e9--