From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: =?UTF-8?Q?Gr=C3=A9gory_?= =?UTF-8?Q?Mouni=C3=A9?= Newsgroups: gmane.emacs.bugs Subject: bug#27978: Detection of section name in man.el Date: Sun, 6 Aug 2017 01:44:19 +0200 Message-ID: <490651f5-e3f7-fd6d-e008-5c52d78fa675@imag.fr> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------D3D8D93421EA207A219E3ED7" X-Trace: blaine.gmane.org 1501977493 30902 195.159.176.226 (5 Aug 2017 23:58:13 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 5 Aug 2017 23:58:13 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 To: 27978@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Aug 06 01:58:09 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1de8xN-0007qd-9a for geb-bug-gnu-emacs@m.gmane.org; Sun, 06 Aug 2017 01:58:09 +0200 Original-Received: from localhost ([::1]:59185 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1de8xT-0001th-D4 for geb-bug-gnu-emacs@m.gmane.org; Sat, 05 Aug 2017 19:58:15 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38332) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1de8xL-0001rq-5z for bug-gnu-emacs@gnu.org; Sat, 05 Aug 2017 19:58:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1de8xI-00025s-4N for bug-gnu-emacs@gnu.org; Sat, 05 Aug 2017 19:58:07 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:41194) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1de8xI-00025O-0B for bug-gnu-emacs@gnu.org; Sat, 05 Aug 2017 19:58:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1de8xG-0000Hx-8F for bug-gnu-emacs@gnu.org; Sat, 05 Aug 2017 19:58:03 -0400 X-Loop: help-debbugs@gnu.org Resent-From: =?UTF-8?Q?Gr=C3=A9gory_?= =?UTF-8?Q?Mouni=C3=A9?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 05 Aug 2017 23:58:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 27978 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.15019774371049 (code B ref -1); Sat, 05 Aug 2017 23:58:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 5 Aug 2017 23:57:17 +0000 Original-Received: from localhost ([127.0.0.1]:43871 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1de8wW-0000Gq-Cy for submit@debbugs.gnu.org; Sat, 05 Aug 2017 19:57:16 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:45913) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1de8kI-0008QZ-Mg for submit@debbugs.gnu.org; Sat, 05 Aug 2017 19:44:39 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1de8kC-0001wB-BA for submit@debbugs.gnu.org; Sat, 05 Aug 2017 19:44:33 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:42847) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1de8kC-0001w7-7S for submit@debbugs.gnu.org; Sat, 05 Aug 2017 19:44:32 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36882) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1de8kA-0007Pn-RK for bug-gnu-emacs@gnu.org; Sat, 05 Aug 2017 19:44:31 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1de8k4-0001rs-Hb for bug-gnu-emacs@gnu.org; Sat, 05 Aug 2017 19:44:30 -0400 Original-Received: from zm-mta-out-1.u-ga.fr ([152.77.200.56]:35819) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1de8k4-0001qj-8w for bug-gnu-emacs@gnu.org; Sat, 05 Aug 2017 19:44:24 -0400 Original-Received: from zm-mta-out.u-ga.fr (zm-mta-out.u-ga.fr [152.77.200.58]) by zm-mta-out-1.u-ga.fr (Postfix) with ESMTP id DC3EFA02D0 for ; Sun, 6 Aug 2017 01:44:20 +0200 (CEST) Original-Received: from smtps.univ-grenoble-alpes.fr (smtps.univ-grenoble-alpes.fr [152.77.1.30]) by zm-mta-out.u-ga.fr (Postfix) with ESMTP id EA375E0093 for ; Sun, 6 Aug 2017 01:44:20 +0200 (CEST) Original-Received: from [192.168.1.13] (mut38-1-82-67-65-81.fbx.proxad.net [82.67.65.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mounieg@univ-grenoble-alpes.fr) by smtps.univ-grenoble-alpes.fr (Postfix) with ESMTPSA id A4611125EB4 for ; Sun, 6 Aug 2017 01:44:20 +0200 (CEST) Content-Language: en-US X-Greylist: Whitelist-UJF SMTP Authentifie (mounieg@univ-grenoble-alpes.fr) via submission-587 ACL (111) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Mailman-Approved-At: Sat, 05 Aug 2017 19:57:14 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:135457 Archived-At: This is a multi-part message in MIME format. --------------D3D8D93421EA207A219E3ED7 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable When parsing manual in languages with non-ascii letters, the section=20 names using non-ascii letters are not added to the table of content. I noticed the bug reading the French bash manual: the quite useful=20 "COMMANDES INTERNES DE l'INTERPR=C3=89TEUR" section does not appear (SHEL= L=20 BUILTIN COMMAND). (because of the =C3=89 letter) I propose to use Character class instead of ascii interval in the=20 appropriate regexp defvar. It should not change anything for english=20 manual and it should work for many other languages. It works great for the bash manual in French. Gr=C3=A9gory Mouni=C3=A9 --------------D3D8D93421EA207A219E3ED7 Content-Type: text/x-patch; name="0001-Unicode-support-for-man-section-name-detection.patch" Content-Disposition: attachment; filename*0="0001-Unicode-support-for-man-section-name-detection.patch" Content-Transfer-Encoding: quoted-printable >From f9f8b027bcec6fe7aec2c0009eecdcd7e8880292 Mon Sep 17 00:00:00 2001 From: =3D?UTF-8?q?Gr=3DC3=3DA9gory=3D20Mouni=3DC3=3DA9?=3D Date: Sun, 6 Aug 2017 01:22:58 +0200 Subject: [PATCH] Unicode support for man section name detection * lisp/man.el: Replace ascii interval by character class in order to detect correctly the section names in the table of content (eg. in the french version of the bash manual). Copyright-paperwork-exempt: yes --- lisp/man.el | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lisp/man.el b/lisp/man.el index 0e1c92956b..97a4758e7e 100644 --- a/lisp/man.el +++ b/lisp/man.el @@ -278,21 +278,21 @@ Man-cooked-hook :type 'hook :group 'man) =20 -(defvar Man-name-regexp "[-a-zA-Z0-9_=C2=AD+][-a-zA-Z0-9_.:=C2=AD+]*" +(defvar Man-name-regexp "[-[:alnum:]_=C2=AD+][-[:alnum:]_.:=C2=AD+]*" "Regular expression describing the name of a manpage (without section)= .") =20 -(defvar Man-section-regexp "[0-9][a-zA-Z0-9+]*\\|[LNln]" +(defvar Man-section-regexp "[[:digit:]][[:alnum:]+]*\\|[LNln]" "Regular expression describing a manpage section within parentheses.") =20 (defvar Man-page-header-regexp (if (string-match "-solaris2\\." system-configuration) - (concat "^[-A-Za-z0-9_].*[ \t]\\(" Man-name-regexp + (concat "^[-[:alnum:]_].*[ \t]\\(" Man-name-regexp "(\\(" Man-section-regexp "\\))\\)$") (concat "^[ \t]*\\(" Man-name-regexp "(\\(" Man-section-regexp "\\))\\).*\\1")) "Regular expression describing the heading of a page.") =20 -(defvar Man-heading-regexp "^\\([A-Z][A-Z0-9 /-]+\\)$" +(defvar Man-heading-regexp "^\\([[:upper:]][[:upper:][:digit:] /-]+\\)$" "Regular expression describing a manpage heading entry.") =20 (defvar Man-see-also-regexp "SEE ALSO" --=20 2.13.3 --------------D3D8D93421EA207A219E3ED7--