unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Stephen Berman <stephen.berman@gmx.net>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 23647@debbugs.gnu.org
Subject: bug#23647: 25.1.50; In man pages, links on hyphenated words don't work
Date: Mon, 30 May 2016 15:55:47 +0200	[thread overview]
Message-ID: <87a8j7tzto.fsf@gmx.net> (raw)
In-Reply-To: <83vb1wwg0t.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 30 May 2016 03:22:58 +0300")

On Mon, 30 May 2016 03:22:58 +0300 Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Stephen Berman <stephen.berman@gmx.net>
>> Cc: 23647@debbugs.gnu.org
>> Date: Mon, 30 May 2016 01:09:21 +0200
>> 
>> > Is it only the ASCII hyphen/minus, or could there be other characters
>> > (e.g., if Groff/troff are invoked with some exotic -Tfoo switch)?
>> 
>> That possibility didn't occur to me but according to Wikipedia, groff
>> also outputs soft hyphens (octal 255) and indeed I see that the function
>> Man-build-references-alist, which also removes hyphenation (in a more
>> complicated way that doesn't seem to be needed in the present case),
>> also takes the soft hyphen into account.  That can be done here too by
>> changing the above string-match regexp to "[-­]".  If someone knows of
>> other possibilities allowed by [gt]roff, maybe the regexp could be
>> further extended, or the condition reformulated as required.  What do
>> you think?
>
> I'm not enough of a roff expert to tell, but how about asking on the
> Groff list?

I did that and got this feedback from Steffen Nurpmeso:

> I have been convinced that soft hyphen is a control character and
> not something visual, it should be used as a «break-indicator»
> rather than as a hyphenation character, interpretation of which is
> left as an excercise for the processing software.  I have no idea
> still but would guess groff uses "hyphen minus" U+002D or hyphen
> U+2010 if Unicode is possible.

In a followup to another response he added:

> For display purposes however i think U+00AD can't be used
> directly, but will be replaced by the renderer to either nothing,
> if no wrap is to be applied at the character position, or
> something appropriate, like ASCII hyphen-minus or some extended
> Unicode "Pd" letter, of which there are some (e.g., U+058A
> ARMENIAN HYPHEN, U+1400 CANADIAN SYLLABICS HYPHEN, and more).

And he also made this suggestion:

> Eli Zaretskii is so active on the
> Unicode list, why don't you use the Pd character class for
> detecting «hyphen»?  I guess this should cover all such things
> already as of today, thanks to Werner Lemberg?!

So how should we proceed from here?  We could add U+2010 to the regexp
in my patch, which would then be this: "[-‐­]" (hyphen-minus (ASCII 45),
hyphen (U+2010), soft hyphen (U+00AD) -- it seems harmless to retain the
latter, given that man.el already uses it elsewhere), but if these are
all included in the Unicode Pd character class along with other possible
hyphen characters, maybe a different approach is required.  I know
nothing about the Pd character class and how to detect it with Elisp; I
also don't know if doing that would lead to further changes in man.el,
making this a larger undertaking.  What do you suggest?

Steve Berman





  reply	other threads:[~2016-05-30 13:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-29  9:52 bug#23647: 25.1.50; In man pages, links on hyphenated words don't work Stephen Berman
2016-05-29 14:42 ` Eli Zaretskii
2016-05-29 23:09   ` Stephen Berman
2016-05-30  0:22     ` Eli Zaretskii
2016-05-30 13:55       ` Stephen Berman [this message]
2016-06-04 15:35         ` Eli Zaretskii
2016-06-05 11:17           ` Stephen Berman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a8j7tzto.fsf@gmx.net \
    --to=stephen.berman@gmx.net \
    --cc=23647@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).