From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Tim Landscheidt Newsgroups: gmane.emacs.help Subject: Re: Coding system to encode arguments to groff? Date: Sun, 03 Oct 2021 13:14:04 +0000 Organization: http://www.tim-landscheidt.de/ Message-ID: <87bl469roj.fsf@vagabond.tim-landscheidt.de> References: <87v92jyfnb.fsf@vagabond.tim-landscheidt.de> <83o88bio7g.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35818"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) Cc: help-gnu-emacs@gnu.org To: Eli Zaretskii Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sun Oct 03 15:16:50 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mX1MI-000952-Nm for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 03 Oct 2021 15:16:50 +0200 Original-Received: from localhost ([::1]:50710 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mX1MG-0001Yf-U4 for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 03 Oct 2021 09:16:48 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44830) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mX1Jk-0001Vj-Nl for help-gnu-emacs@gnu.org; Sun, 03 Oct 2021 09:14:14 -0400 Original-Received: from andalucia.tim-landscheidt.de ([2a01:4f8:1c1c:d4d0::1]:41204) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mX1Jh-0007Bg-IZ; Sun, 03 Oct 2021 09:14:11 -0400 Original-Received: from dslb-090-186-010-165.090.186.pools.vodafone-ip.de ([90.186.10.165]:58478 helo=vagabond) by andalucia.tim-landscheidt.de with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1mX1Jd-0005xO-A3; Sun, 03 Oct 2021 13:14:05 +0000 In-Reply-To: <83o88bio7g.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 29 Sep 2021 15:02:59 +0300") Received-SPF: pass client-ip=2a01:4f8:1c1c:d4d0::1; envelope-from=tim@tim-landscheidt.de; helo=andalucia.tim-landscheidt.de X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:133522 Archived-At: Eli Zaretskii wrote: >> I pass text arguments from Emacs Lisp to a groff command >> with the "-d" option. For ASCII strings, this is trivial; >> for strings with umlauts, I need to use: >> | (encode-coding-string variable-to-pass 'iso-latin-1) > What is your default locale's codeset on that system? In general, if > the default locale matches the encoding you need to use, the above > should happen automagically. If I understand your question correctly, UTF-8: | [tim@vagabond ~]$ locale | LANG=3Dde_DE.UTF-8 | LC_CTYPE=3D"de_DE.UTF-8" | LC_NUMERIC=3D"de_DE.UTF-8" | LC_TIME=3D"de_DE.UTF-8" | LC_COLLATE=3D"de_DE.UTF-8" | LC_MONETARY=3D"de_DE.UTF-8" | LC_MESSAGES=3D"de_DE.UTF-8" | LC_PAPER=3D"de_DE.UTF-8" | LC_NAME=3D"de_DE.UTF-8" | LC_ADDRESS=3D"de_DE.UTF-8" | LC_TELEPHONE=3D"de_DE.UTF-8" | LC_MEASUREMENT=3D"de_DE.UTF-8" | LC_IDENTIFICATION=3D"de_DE.UTF-8" | LC_ALL=3D | [tim@vagabond ~]$ >> For strings with other Unicode characters like "=E2=80=93" (#x2013), >> I need to call groff's preconv like: >> | (shell-command-to-string (concat "preconv -r <(echo " (shell-quote-arg= ument variable-to-pass) ")")) >> which for "=C3=A4=E2=80=93=C3=B6" returns something like: >> | \[u00E4]\[u2013]\[u00F6] > This is just the original "=C3=A4=E2=80=93=C3=B6" string, so I'm not quit= e sure what did > the above accomplish. The output is literal, i. e.: | 0000000 \ [ u 0 0 E 4 ] \ [ u 2 0 1 3 ] | 0000020 \ [ u 0 0 F 6 ] \n >> Now in Emacs, this looks very much like what a coding system >> would do. The info documentation for elisp just laconically >> says: >> | How to define a coding system is an arcane matter, and is not >> | documented here. >> Has someone implemented such a coding system for groff so >> that something like: >> | (encode-coding-string variable-to-pass 'x-groff) > I don't think you should need a new coding-system. But you didn't > explain why you need to explicitly encode the command-line arguments, > so it's hard to give an accurate advice. What kind of Groff command > needs this jumping through hoops from you? E.g., why isn't it enough > to bind coding-system-for-write to whatever you need, around the call > to call-process or whatever? > IOW, please describe in more detail the Groff-related context in which > this problem happens, so that we could have an intelligent discussion > of the issues you might have. On Fedora 34 with GNU groff 1.22.4: | (let | ((temp-ps-buffer (generate-new-buffer "*test ps*")) | (test-arg "a-o")) | (with-temp-buffer | (insert ".fam H\n\\*[test-arg]\n") | (call-process-region | (point-min) | (point-max) | "groff" | nil | temp-ps-buffer | nil | "-Tps" | "-d" (concat "test-arg=3D" test-arg))) | (switch-to-buffer temp-ps-buffer) | (ps-mode) | (doc-view-mode)) produces a PostScript buffer with the text "a-o". With test-arg =3D "=C3=A4-=C3=B6" (=C3=A4 minus =C3=B6), it produces gibber= ish mi- nus gibberish. With test-arg =3D (encode-coding-string "=C3=A4-=C3=B6" 'iso-latin-1) (=C3= =A4 minus =C3=B6), it produces the text "=C3=A4-=C3=B6". With test-arg =3D (encode-coding-string "=C3=A4=E2=80=93=C3=B6" 'iso-latin-= 1) (=C3=A4 endash =C3=B6), it produces the text "=C3=A4[white space]=C3=B6". With test-arg =3D (shell-command-to-string (concat "preconv -r <(echo " (shell-quote-argument "=C3=A4=E2=80=93=C3=B6") ")")) (=C3=A4 endas= h =C3=B6), it produces the intended text "=C3=A4=E2=80=93=C3=B6". (Passing "-k" as an additional option to groff does not change the output as "-k" only converts standard input, not macro definitions set as command line arguments.) Tim