Coding system to encode arguments to groff?

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* Coding system to encode arguments to groff?
@ 2021-09-29  8:01 Tim Landscheidt
  2021-09-29 12:02 ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Landscheidt @ 2021-09-29  8:01 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

I pass text arguments from Emacs Lisp to a groff command
with the "-d" option.  For ASCII strings, this is trivial;
for strings with umlauts, I need to use:

| (encode-coding-string variable-to-pass 'iso-latin-1)

For strings with other Unicode characters like "–" (#x2013),
I need to call groff's preconv like:

| (shell-command-to-string (concat "preconv -r <(echo " (shell-quote-argument variable-to-pass) ")"))

which for "ä–ö" returns something like:

| \[u00E4]\[u2013]\[u00F6]

Now in Emacs, this looks very much like what a coding system
would do.  The info documentation for elisp just laconically
says:

|    How to define a coding system is an arcane matter, and is not
| documented here.

Has someone implemented such a coding system for groff so
that something like:

| (encode-coding-string variable-to-pass 'x-groff)

would do what is needed?

TIA,
Tim

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Coding system to encode arguments to groff?
  2021-09-29  8:01 Coding system to encode arguments to groff? Tim Landscheidt
@ 2021-09-29 12:02 ` Eli Zaretskii
  2021-10-03 13:14   ` Tim Landscheidt
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2021-09-29 12:02 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Tim Landscheidt <tim@tim-landscheidt.de>
> Date: Wed, 29 Sep 2021 08:01:12 +0000
> 
> I pass text arguments from Emacs Lisp to a groff command
> with the "-d" option.  For ASCII strings, this is trivial;
> for strings with umlauts, I need to use:
> 
> | (encode-coding-string variable-to-pass 'iso-latin-1)

What is your default locale's codeset on that system?  In general, if
the default locale matches the encoding you need to use, the above
should happen automagically.

> For strings with other Unicode characters like "–" (#x2013),
> I need to call groff's preconv like:
> 
> | (shell-command-to-string (concat "preconv -r <(echo " (shell-quote-argument variable-to-pass) ")"))
> 
> which for "ä–ö" returns something like:
> 
> | \[u00E4]\[u2013]\[u00F6]

This is just the original "ä–ö" string, so I'm not quite sure what did
the above accomplish.

> Now in Emacs, this looks very much like what a coding system
> would do.  The info documentation for elisp just laconically
> says:
> 
> |    How to define a coding system is an arcane matter, and is not
> | documented here.
> 
> Has someone implemented such a coding system for groff so
> that something like:
> 
> | (encode-coding-string variable-to-pass 'x-groff)

I don't think you should need a new coding-system.  But you didn't
explain why you need to explicitly encode the command-line arguments,
so it's hard to give an accurate advice.  What kind of Groff command
needs this jumping through hoops from you?  E.g., why isn't it enough
to bind coding-system-for-write to whatever you need, around the call
to call-process or whatever?

IOW, please describe in more detail the Groff-related context in which
this problem happens, so that we could have an intelligent discussion
of the issues you might have.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Coding system to encode arguments to groff?
  2021-09-29 12:02 ` Eli Zaretskii
@ 2021-10-03 13:14   ` Tim Landscheidt
  2021-10-03 15:14     ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Landscheidt @ 2021-10-03 13:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

Eli Zaretskii <eliz@gnu.org> wrote:

>> I pass text arguments from Emacs Lisp to a groff command
>> with the "-d" option.  For ASCII strings, this is trivial;
>> for strings with umlauts, I need to use:

>> | (encode-coding-string variable-to-pass 'iso-latin-1)

> What is your default locale's codeset on that system?  In general, if
> the default locale matches the encoding you need to use, the above
> should happen automagically.

If I understand your question correctly, UTF-8:

| [tim@vagabond ~]$ locale
| LANG=de_DE.UTF-8
| LC_CTYPE="de_DE.UTF-8"
| LC_NUMERIC="de_DE.UTF-8"
| LC_TIME="de_DE.UTF-8"
| LC_COLLATE="de_DE.UTF-8"
| LC_MONETARY="de_DE.UTF-8"
| LC_MESSAGES="de_DE.UTF-8"
| LC_PAPER="de_DE.UTF-8"
| LC_NAME="de_DE.UTF-8"
| LC_ADDRESS="de_DE.UTF-8"
| LC_TELEPHONE="de_DE.UTF-8"
| LC_MEASUREMENT="de_DE.UTF-8"
| LC_IDENTIFICATION="de_DE.UTF-8"
| LC_ALL=
| [tim@vagabond ~]$

>> For strings with other Unicode characters like "–" (#x2013),
>> I need to call groff's preconv like:

>> | (shell-command-to-string (concat "preconv -r <(echo " (shell-quote-argument variable-to-pass) ")"))

>> which for "ä–ö" returns something like:

>> | \[u00E4]\[u2013]\[u00F6]

> This is just the original "ä–ö" string, so I'm not quite sure what did
> the above accomplish.

The output is literal, i. e.:

| 0000000   \   [   u   0   0   E   4   ]   \   [   u   2   0   1   3   ]
| 0000020   \   [   u   0   0   F   6   ]  \n

>> Now in Emacs, this looks very much like what a coding system
>> would do.  The info documentation for elisp just laconically
>> says:

>> |    How to define a coding system is an arcane matter, and is not
>> | documented here.

>> Has someone implemented such a coding system for groff so
>> that something like:

>> | (encode-coding-string variable-to-pass 'x-groff)

> I don't think you should need a new coding-system.  But you didn't
> explain why you need to explicitly encode the command-line arguments,
> so it's hard to give an accurate advice.  What kind of Groff command
> needs this jumping through hoops from you?  E.g., why isn't it enough
> to bind coding-system-for-write to whatever you need, around the call
> to call-process or whatever?

> IOW, please describe in more detail the Groff-related context in which
> this problem happens, so that we could have an intelligent discussion
> of the issues you might have.

On Fedora 34 with GNU groff 1.22.4:

| (let
|     ((temp-ps-buffer (generate-new-buffer "*test ps*"))
|      (test-arg "a-o"))
|   (with-temp-buffer
|     (insert ".fam H\n\\*[test-arg]\n")
|     (call-process-region
|      (point-min)
|      (point-max)
|      "groff"
|      nil
|      temp-ps-buffer
|      nil
|      "-Tps"
|      "-d" (concat "test-arg=" test-arg)))
|   (switch-to-buffer temp-ps-buffer)
|   (ps-mode)
|   (doc-view-mode))

produces a PostScript buffer with the text "a-o".

With test-arg = "ä-ö" (ä minus ö), it produces gibberish mi-
nus gibberish.

With test-arg = (encode-coding-string "ä-ö" 'iso-latin-1) (ä
minus ö), it produces the text "ä-ö".

With test-arg = (encode-coding-string "ä–ö" 'iso-latin-1) (ä
endash ö), it produces the text "ä[white space]ö".

With test-arg = (shell-command-to-string (concat "preconv -r
<(echo " (shell-quote-argument "ä–ö") ")")) (ä endash ö), it
produces the intended text "ä–ö".

(Passing "-k" as an additional option to groff does not
change the output as "-k" only converts standard input, not
macro definitions set as command line arguments.)

Tim



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Coding system to encode arguments to groff?
  2021-10-03 13:14   ` Tim Landscheidt
@ 2021-10-03 15:14     ` Eli Zaretskii
  0 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2021-10-03 15:14 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Tim Landscheidt <tim@tim-landscheidt.de>
> Cc: help-gnu-emacs@gnu.org
> Date: Sun, 03 Oct 2021 13:14:04 +0000
> 
> | (let
> |     ((temp-ps-buffer (generate-new-buffer "*test ps*"))
> |      (test-arg "a-o"))
> |   (with-temp-buffer
> |     (insert ".fam H\n\\*[test-arg]\n")
> |     (call-process-region
> |      (point-min)
> |      (point-max)
> |      "groff"
> |      nil
> |      temp-ps-buffer
> |      nil
> |      "-Tps"
> |      "-d" (concat "test-arg=" test-arg)))
> |   (switch-to-buffer temp-ps-buffer)
> |   (ps-mode)
> |   (doc-view-mode))
> 
> produces a PostScript buffer with the text "a-o".
> 
> With test-arg = "ä-ö" (ä minus ö), it produces gibberish mi-
> nus gibberish.
> 
> With test-arg = (encode-coding-string "ä-ö" 'iso-latin-1) (ä
> minus ö), it produces the text "ä-ö".
> 
> With test-arg = (encode-coding-string "ä–ö" 'iso-latin-1) (ä
> endash ö), it produces the text "ä[white space]ö".
> 
> With test-arg = (shell-command-to-string (concat "preconv -r
> <(echo " (shell-quote-argument "ä–ö") ")")) (ä endash ö), it
> produces the intended text "ä–ö".

So the problem is that troff doesn't accept non-ASCII command-line
arguments, and so you want to convert non-ASCII characters into a
series of characters encoded in the [\uNNNN] form, is that right?

Then I guess mapconcat is your friend, something like

  (mapconcat (lambda (ch)
	       (format "[\\u%4.4X]" ch))
	     "ä–ö" "")

There's no need to use preconv at all, as Emacs can do that by itself.
And this isn't an encoding, because codepoints are not encoded in any
sense of that word.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-03 15:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-29  8:01 Coding system to encode arguments to groff? Tim Landscheidt
2021-09-29 12:02 ` Eli Zaretskii
2021-10-03 13:14   ` Tim Landscheidt
2021-10-03 15:14     ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).