* Coding system to encode arguments to groff?
@ 2021-09-29 8:01 Tim Landscheidt
2021-09-29 12:02 ` Eli Zaretskii
0 siblings, 1 reply; 4+ messages in thread
From: Tim Landscheidt @ 2021-09-29 8:01 UTC (permalink / raw)
To: help-gnu-emacs
Hi,
I pass text arguments from Emacs Lisp to a groff command
with the "-d" option. For ASCII strings, this is trivial;
for strings with umlauts, I need to use:
| (encode-coding-string variable-to-pass 'iso-latin-1)
For strings with other Unicode characters like "–" (#x2013),
I need to call groff's preconv like:
| (shell-command-to-string (concat "preconv -r <(echo " (shell-quote-argument variable-to-pass) ")"))
which for "ä–ö" returns something like:
| \[u00E4]\[u2013]\[u00F6]
Now in Emacs, this looks very much like what a coding system
would do. The info documentation for elisp just laconically
says:
| How to define a coding system is an arcane matter, and is not
| documented here.
Has someone implemented such a coding system for groff so
that something like:
| (encode-coding-string variable-to-pass 'x-groff)
would do what is needed?
TIA,
Tim
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Coding system to encode arguments to groff?
2021-09-29 8:01 Coding system to encode arguments to groff? Tim Landscheidt
@ 2021-09-29 12:02 ` Eli Zaretskii
2021-10-03 13:14 ` Tim Landscheidt
0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2021-09-29 12:02 UTC (permalink / raw)
To: help-gnu-emacs
> From: Tim Landscheidt <tim@tim-landscheidt.de>
> Date: Wed, 29 Sep 2021 08:01:12 +0000
>
> I pass text arguments from Emacs Lisp to a groff command
> with the "-d" option. For ASCII strings, this is trivial;
> for strings with umlauts, I need to use:
>
> | (encode-coding-string variable-to-pass 'iso-latin-1)
What is your default locale's codeset on that system? In general, if
the default locale matches the encoding you need to use, the above
should happen automagically.
> For strings with other Unicode characters like "–" (#x2013),
> I need to call groff's preconv like:
>
> | (shell-command-to-string (concat "preconv -r <(echo " (shell-quote-argument variable-to-pass) ")"))
>
> which for "ä–ö" returns something like:
>
> | \[u00E4]\[u2013]\[u00F6]
This is just the original "ä–ö" string, so I'm not quite sure what did
the above accomplish.
> Now in Emacs, this looks very much like what a coding system
> would do. The info documentation for elisp just laconically
> says:
>
> | How to define a coding system is an arcane matter, and is not
> | documented here.
>
> Has someone implemented such a coding system for groff so
> that something like:
>
> | (encode-coding-string variable-to-pass 'x-groff)
I don't think you should need a new coding-system. But you didn't
explain why you need to explicitly encode the command-line arguments,
so it's hard to give an accurate advice. What kind of Groff command
needs this jumping through hoops from you? E.g., why isn't it enough
to bind coding-system-for-write to whatever you need, around the call
to call-process or whatever?
IOW, please describe in more detail the Groff-related context in which
this problem happens, so that we could have an intelligent discussion
of the issues you might have.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Coding system to encode arguments to groff?
2021-09-29 12:02 ` Eli Zaretskii
@ 2021-10-03 13:14 ` Tim Landscheidt
2021-10-03 15:14 ` Eli Zaretskii
0 siblings, 1 reply; 4+ messages in thread
From: Tim Landscheidt @ 2021-10-03 13:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: help-gnu-emacs
Eli Zaretskii <eliz@gnu.org> wrote:
>> I pass text arguments from Emacs Lisp to a groff command
>> with the "-d" option. For ASCII strings, this is trivial;
>> for strings with umlauts, I need to use:
>> | (encode-coding-string variable-to-pass 'iso-latin-1)
> What is your default locale's codeset on that system? In general, if
> the default locale matches the encoding you need to use, the above
> should happen automagically.
If I understand your question correctly, UTF-8:
| [tim@vagabond ~]$ locale
| LANG=de_DE.UTF-8
| LC_CTYPE="de_DE.UTF-8"
| LC_NUMERIC="de_DE.UTF-8"
| LC_TIME="de_DE.UTF-8"
| LC_COLLATE="de_DE.UTF-8"
| LC_MONETARY="de_DE.UTF-8"
| LC_MESSAGES="de_DE.UTF-8"
| LC_PAPER="de_DE.UTF-8"
| LC_NAME="de_DE.UTF-8"
| LC_ADDRESS="de_DE.UTF-8"
| LC_TELEPHONE="de_DE.UTF-8"
| LC_MEASUREMENT="de_DE.UTF-8"
| LC_IDENTIFICATION="de_DE.UTF-8"
| LC_ALL=
| [tim@vagabond ~]$
>> For strings with other Unicode characters like "–" (#x2013),
>> I need to call groff's preconv like:
>> | (shell-command-to-string (concat "preconv -r <(echo " (shell-quote-argument variable-to-pass) ")"))
>> which for "ä–ö" returns something like:
>> | \[u00E4]\[u2013]\[u00F6]
> This is just the original "ä–ö" string, so I'm not quite sure what did
> the above accomplish.
The output is literal, i. e.:
| 0000000 \ [ u 0 0 E 4 ] \ [ u 2 0 1 3 ]
| 0000020 \ [ u 0 0 F 6 ] \n
>> Now in Emacs, this looks very much like what a coding system
>> would do. The info documentation for elisp just laconically
>> says:
>> | How to define a coding system is an arcane matter, and is not
>> | documented here.
>> Has someone implemented such a coding system for groff so
>> that something like:
>> | (encode-coding-string variable-to-pass 'x-groff)
> I don't think you should need a new coding-system. But you didn't
> explain why you need to explicitly encode the command-line arguments,
> so it's hard to give an accurate advice. What kind of Groff command
> needs this jumping through hoops from you? E.g., why isn't it enough
> to bind coding-system-for-write to whatever you need, around the call
> to call-process or whatever?
> IOW, please describe in more detail the Groff-related context in which
> this problem happens, so that we could have an intelligent discussion
> of the issues you might have.
On Fedora 34 with GNU groff 1.22.4:
| (let
| ((temp-ps-buffer (generate-new-buffer "*test ps*"))
| (test-arg "a-o"))
| (with-temp-buffer
| (insert ".fam H\n\\*[test-arg]\n")
| (call-process-region
| (point-min)
| (point-max)
| "groff"
| nil
| temp-ps-buffer
| nil
| "-Tps"
| "-d" (concat "test-arg=" test-arg)))
| (switch-to-buffer temp-ps-buffer)
| (ps-mode)
| (doc-view-mode))
produces a PostScript buffer with the text "a-o".
With test-arg = "ä-ö" (ä minus ö), it produces gibberish mi-
nus gibberish.
With test-arg = (encode-coding-string "ä-ö" 'iso-latin-1) (ä
minus ö), it produces the text "ä-ö".
With test-arg = (encode-coding-string "ä–ö" 'iso-latin-1) (ä
endash ö), it produces the text "ä[white space]ö".
With test-arg = (shell-command-to-string (concat "preconv -r
<(echo " (shell-quote-argument "ä–ö") ")")) (ä endash ö), it
produces the intended text "ä–ö".
(Passing "-k" as an additional option to groff does not
change the output as "-k" only converts standard input, not
macro definitions set as command line arguments.)
Tim
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Coding system to encode arguments to groff?
2021-10-03 13:14 ` Tim Landscheidt
@ 2021-10-03 15:14 ` Eli Zaretskii
0 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2021-10-03 15:14 UTC (permalink / raw)
To: help-gnu-emacs
> From: Tim Landscheidt <tim@tim-landscheidt.de>
> Cc: help-gnu-emacs@gnu.org
> Date: Sun, 03 Oct 2021 13:14:04 +0000
>
> | (let
> | ((temp-ps-buffer (generate-new-buffer "*test ps*"))
> | (test-arg "a-o"))
> | (with-temp-buffer
> | (insert ".fam H\n\\*[test-arg]\n")
> | (call-process-region
> | (point-min)
> | (point-max)
> | "groff"
> | nil
> | temp-ps-buffer
> | nil
> | "-Tps"
> | "-d" (concat "test-arg=" test-arg)))
> | (switch-to-buffer temp-ps-buffer)
> | (ps-mode)
> | (doc-view-mode))
>
> produces a PostScript buffer with the text "a-o".
>
> With test-arg = "ä-ö" (ä minus ö), it produces gibberish mi-
> nus gibberish.
>
> With test-arg = (encode-coding-string "ä-ö" 'iso-latin-1) (ä
> minus ö), it produces the text "ä-ö".
>
> With test-arg = (encode-coding-string "ä–ö" 'iso-latin-1) (ä
> endash ö), it produces the text "ä[white space]ö".
>
> With test-arg = (shell-command-to-string (concat "preconv -r
> <(echo " (shell-quote-argument "ä–ö") ")")) (ä endash ö), it
> produces the intended text "ä–ö".
So the problem is that troff doesn't accept non-ASCII command-line
arguments, and so you want to convert non-ASCII characters into a
series of characters encoded in the [\uNNNN] form, is that right?
Then I guess mapconcat is your friend, something like
(mapconcat (lambda (ch)
(format "[\\u%4.4X]" ch))
"ä–ö" "")
There's no need to use preconv at all, as Emacs can do that by itself.
And this isn't an encoding, because codepoints are not encoded in any
sense of that word.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-10-03 15:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-29 8:01 Coding system to encode arguments to groff? Tim Landscheidt
2021-09-29 12:02 ` Eli Zaretskii
2021-10-03 13:14 ` Tim Landscheidt
2021-10-03 15:14 ` Eli Zaretskii
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).