From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stephen Berman Newsgroups: gmane.emacs.bugs Subject: bug#23647: 25.1.50; In man pages, links on hyphenated words don't work Date: Mon, 30 May 2016 15:55:47 +0200 Message-ID: <87a8j7tzto.fsf@gmx.net> References: <87d1o52ntu.fsf@gmx.net> <83eg8lx6wq.fsf@gnu.org> <878tys31i6.fsf@gmx.net> <83vb1wwg0t.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1464616582 10184 80.91.229.3 (30 May 2016 13:56:22 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 30 May 2016 13:56:22 +0000 (UTC) Cc: 23647@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon May 30 15:56:11 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1b7Nfu-0000Kf-0f for geb-bug-gnu-emacs@m.gmane.org; Mon, 30 May 2016 15:56:10 +0200 Original-Received: from localhost ([::1]:59986 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b7Nft-0001Op-Bc for geb-bug-gnu-emacs@m.gmane.org; Mon, 30 May 2016 09:56:09 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43208) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b7Nfo-0001OY-12 for bug-gnu-emacs@gnu.org; Mon, 30 May 2016 09:56:05 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b7Nfm-00054y-RO for bug-gnu-emacs@gnu.org; Mon, 30 May 2016 09:56:03 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:35354) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b7Nfm-00054u-Nk for bug-gnu-emacs@gnu.org; Mon, 30 May 2016 09:56:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1b7Nfm-0005iY-9h for bug-gnu-emacs@gnu.org; Mon, 30 May 2016 09:56:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stephen Berman Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 30 May 2016 13:56:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23647 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 23647-submit@debbugs.gnu.org id=B23647.146461655821967 (code B ref 23647); Mon, 30 May 2016 13:56:02 +0000 Original-Received: (at 23647) by debbugs.gnu.org; 30 May 2016 13:55:58 +0000 Original-Received: from localhost ([127.0.0.1]:47691 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b7Nfi-0005iF-CJ for submit@debbugs.gnu.org; Mon, 30 May 2016 09:55:58 -0400 Original-Received: from mout.gmx.net ([212.227.15.18]:60575) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b7Nff-0005i1-S4 for 23647@debbugs.gnu.org; Mon, 30 May 2016 09:55:56 -0400 Original-Received: from rosalinde ([89.245.103.171]) by mail.gmx.com (mrgmx003) with ESMTPSA (Nemesis) id 0MXIcX-1b28SE3meT-00WHO8; Mon, 30 May 2016 15:55:48 +0200 In-Reply-To: <83vb1wwg0t.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 30 May 2016 03:22:58 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) X-Provags-ID: V03:K0:GSSVFKCpMH2mMbsL53ZsCWA/WQrpDUU+Tap6dwyA3h/W5U57LzZ kjmKAkbqE0GXz+kzgRmqva8YANUt1pQjeITNiTKeAYdG/miG7TvgRtVSEX3U3Kqqbi8OH86 qBhSObC8JpMyK1G9Wd3EJjd9pFi+/msLVTlqbbyI9bmmENk+cF4MKdEPYcM2fF83Ddxciu4 4ElEacLv0LFuqh/RDf4mg== X-UI-Out-Filterresults: notjunk:1;V01:K0:wfrUwSH6Kjs=:sQGN240V7/t5Zg/5epcM9L h/UvBz/5J8Kthii3oPzH9fDW+Wo8FQkp2Egq8SRZeL0Gem9xgz7MWpn7/H45Qundhrab3k9Yq gCeuiMBBhwbUSwGCGjP9i6RSOoZLJUH1tdlHqZtiPA75VhohKJZeDIqm6J2R2uGUwiPlbtWjG XWkBfKE6BYTFKBguyTWnL8/gtpIWYsm8hxAgWb2dHhPXff0SymobBluU77jq8z4tTC/Bbp2P3 mS28WA+ICCfdU0kCulzZkwwW2Hlu3/FaxKm/aOXjnBS2stznZV078c6v+1Dglic9NOoA2rTGa bx7GlQIP0jtxxMlBGmmA7wndQXE/o6si5ZDQ20gw2PYIftr1IKGvDY25UVryTPZLCQzsphx99 iytOt6dS9Y2qh/0MfJX5+ijY33Y4gNQYZa/qTf50q9sFStrB6Q3M9sJG2uJSjsvcyXBYpIOjq bNIUleSxubAUvGJn8XkxryUKmfj67aNpT4RQ2xQPWWQXttT2Dflf7zkuOCRjYYFcqimgK7l0u QILim54nItybjokasl/zD3jMeM3sX/ABluOEbU3fFabQQFUc+pwIWQlCL79bVt3AHXz7Dxcz7 ubUOm7n35+j7LwRshQm6fQRt73hZcgmBpvGedVKCXmhZh5/JG4lU5nTBJBie+wwO41FamxE9b 5HbrIVk1+3IegllDBodK1uc008z36qZa9htVWsJ+7g0smp1GcTTrax0lHhjz4+9XddW8ktmXW WjIW4JN5IN1XSilJUSG4EV8gQBSpGGrFEGy/iWnuHuBpJcBSAyIVZBaTf9ZR/XddiMJFEH4Y X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:118862 Archived-At: On Mon, 30 May 2016 03:22:58 +0300 Eli Zaretskii wrote: >> From: Stephen Berman >> Cc: 23647@debbugs.gnu.org >> Date: Mon, 30 May 2016 01:09:21 +0200 >>=20 >> > Is it only the ASCII hyphen/minus, or could there be other characters >> > (e.g., if Groff/troff are invoked with some exotic -Tfoo switch)? >>=20 >> That possibility didn't occur to me but according to Wikipedia, groff >> also outputs soft hyphens (octal 255) and indeed I see that the function >> Man-build-references-alist, which also removes hyphenation (in a more >> complicated way that doesn't seem to be needed in the present case), >> also takes the soft hyphen into account. That can be done here too by >> changing the above string-match regexp to "[-=C2=AD]". If someone knows= of >> other possibilities allowed by [gt]roff, maybe the regexp could be >> further extended, or the condition reformulated as required. What do >> you think? > > I'm not enough of a roff expert to tell, but how about asking on the > Groff list? I did that and got this feedback from Steffen Nurpmeso: > I have been convinced that soft hyphen is a control character and > not something visual, it should be used as a =C2=ABbreak-indicator=C2=BB > rather than as a hyphenation character, interpretation of which is > left as an excercise for the processing software. I have no idea > still but would guess groff uses "hyphen minus" U+002D or hyphen > U+2010 if Unicode is possible. In a followup to another response he added: > For display purposes however i think U+00AD can't be used > directly, but will be replaced by the renderer to either nothing, > if no wrap is to be applied at the character position, or > something appropriate, like ASCII hyphen-minus or some extended > Unicode "Pd" letter, of which there are some (e.g., U+058A > ARMENIAN HYPHEN, U+1400 CANADIAN SYLLABICS HYPHEN, and more). And he also made this suggestion: > Eli Zaretskii is so active on the > Unicode list, why don't you use the Pd character class for > detecting =C2=ABhyphen=C2=BB? I guess this should cover all such things > already as of today, thanks to Werner Lemberg?! So how should we proceed from here? We could add U+2010 to the regexp in my patch, which would then be this: "[-=E2=80=90=C2=AD]" (hyphen-minus (= ASCII 45), hyphen (U+2010), soft hyphen (U+00AD) -- it seems harmless to retain the latter, given that man.el already uses it elsewhere), but if these are all included in the Unicode Pd character class along with other possible hyphen characters, maybe a different approach is required. I know nothing about the Pd character class and how to detect it with Elisp; I also don't know if doing that would lead to further changes in man.el, making this a larger undertaking. What do you suggest? Steve Berman