unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Non-ASCII characters in man pages produced by groff 1.23
@ 2023-10-29 16:13 Jens Schmidt
  2023-10-29 16:54 ` Visuwesh
  0 siblings, 1 reply; 2+ messages in thread
From: Jens Schmidt @ 2023-10-29 16:13 UTC (permalink / raw)
  To: emacs-devel

In analogy to Eli's recent Texinfo 7.1 thread ...


Since groff 1.23 (used in Debian testing, for example) we have:

o The an (man) and doc (mdoc) macro packages no longer remap the -, ',
  and ` input characters to Basic Latin code points on UTF-8 devices,
  but treat them as groff normally does (and AT&T troff before it did)
  for typesetting devices, where they become the hyphen, apostrophe or
  right single quotation mark, and left single quotation mark,
  respectively.  This change is expected to expose glyph usage errors in
  man pages.  See the "PROBLEMS" file for a recipe that will conceal
  these errors.  A better long-term approach is for man pages to adopt
  correct input practices; the man pages groff_man_style(7),
  groff_char(7), and man-pages(7) (subsection "Generating optimal
  glyphs"; from the Linux man-pages project) contain such instructions.
  Doing so also improves man page typography when formatting for PDF.

See source

  https://git.savannah.gnu.org/cgit/groff.git/tree/NEWS?h=1.23.0#n206

and also the PROBLEMS entry

  https://git.savannah.gnu.org/cgit/groff.git/tree/PROBLEMS?h=1.23.0#n84

It seems, however, that at least Debian plans to conceal that issue again
later:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052675#18


To clarify what the problem is: If, for example, you search for

  --compressed-ssh

with

  ?\N{HYPHEN-MINUS}?\N{HYPHEN-MINUS}compressed?\N{HYPHEN-MINUS}ssh

in *Man curl* on Debian testing, you won't find that option, because
curl's author hasn't yet properly quoted all minus characters in the
generated man page source.  As a result, they are rendered as
?\N{HYPHEN} in man's output and occur as such in the Man-mode buffer.
(Well, that concrete example above got fixed already, but others are
still left.)


I have been bitten by that already, and not only me:

  https://lists.debian.org/debian-devel/2023/10/msg00083.html


So this is not an Emacs issue, it might get gradually better as man
page authors improve their text, and it probably will go away for
Debian when Debian freezes trixie.  Which means this is just an FYI,
and nothing which requires any action ... or what do you think?



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Non-ASCII characters in man pages produced by groff 1.23
  2023-10-29 16:13 Non-ASCII characters in man pages produced by groff 1.23 Jens Schmidt
@ 2023-10-29 16:54 ` Visuwesh
  0 siblings, 0 replies; 2+ messages in thread
From: Visuwesh @ 2023-10-29 16:54 UTC (permalink / raw)
  To: Jens Schmidt; +Cc: emacs-devel

[ஞாயிறு அக்டோபர் 29, 2023] Jens Schmidt wrote:

> [...]
> To clarify what the problem is: If, for example, you search for
>
>   --compressed-ssh
>
> with
>
>   ?\N{HYPHEN-MINUS}?\N{HYPHEN-MINUS}compressed?\N{HYPHEN-MINUS}ssh
>
> in *Man curl* on Debian testing, you won't find that option, because
> curl's author hasn't yet properly quoted all minus characters in the
> generated man page source.  As a result, they are rendered as
> ?\N{HYPHEN} in man's output and occur as such in the Man-mode buffer.
> (Well, that concrete example above got fixed already, but others are
> still left.)
>
>
> I have been bitten by that already, and not only me:
>
>   https://lists.debian.org/debian-devel/2023/10/msg00083.html
>
>
> So this is not an Emacs issue, it might get gradually better as man
> page authors improve their text, and it probably will go away for
> Debian when Debian freezes trixie.  Which means this is just an FYI,
> and nothing which requires any action ... or what do you think?

Before I pulled the Debian changes which switched back to using ASCII
hyphen, I thought about making Man-softhyphen-to-minus change HYPHEN to
HYPHEN-MINUS to make the man page buffers searchable again.
A solution like char-fold but which works in every search command would
be better (external tools included if possible but that's a great
asking).

As a first step, perhaps we can use char-fold-to-regexp in read-regexp?
For external tools, POSIX equivalence classes might work but from what I
can tell from a cursory internet search they are not reliable.



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-10-29 16:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-29 16:13 Non-ASCII characters in man pages produced by groff 1.23 Jens Schmidt
2023-10-29 16:54 ` Visuwesh

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).