* How to translate LaTeX into UTF-8 in Elisp? @ 2016-12-08 17:04 Marcin Borkowski 2016-12-08 18:21 ` Carlos Konstanski ` (3 more replies) 0 siblings, 4 replies; 25+ messages in thread From: Marcin Borkowski @ 2016-12-08 17:04 UTC (permalink / raw) To: Help Gnu Emacs mailing list Hi all, I have a string with embedded sequences like "\'e" or "\H{o}". The Emacs TeX input method knows how to convert them into "é" or "ő" (when typing, of course). Is there a way to use that to perform similar conversions in a string? TIA, -- Marcin Borkowski ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski @ 2016-12-08 18:21 ` Carlos Konstanski 2016-12-08 19:13 ` Marcin Borkowski 2016-12-08 22:12 ` Stefan Monnier ` (2 subsequent siblings) 3 siblings, 1 reply; 25+ messages in thread From: Carlos Konstanski @ 2016-12-08 18:21 UTC (permalink / raw) To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list This is not an answer to your question, but rather an alternative way to use non-ASCII chars in a tex file: \usepackage[german]{babel} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} (Replace "german" with the language of your choice) Now you can simply type the actual character rather than using an escape sequence. Carlos Marcin Borkowski <mbork@mbork.pl> writes: > Hi all, > > I have a string with embedded sequences like "\'e" or "\H{o}". The > Emacs TeX input method knows how to convert them into "é" or "ő" (when > typing, of course). Is there a way to use that to perform similar > conversions in a string? > > TIA, ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2016-12-08 18:21 ` Carlos Konstanski @ 2016-12-08 19:13 ` Marcin Borkowski 0 siblings, 0 replies; 25+ messages in thread From: Marcin Borkowski @ 2016-12-08 19:13 UTC (permalink / raw) To: Carlos Konstanski; +Cc: Help Gnu Emacs mailing list On 2016-12-08, at 19:21, Carlos Konstanski <ckonstanski@pippiandcarlos.com> wrote: > This is not an answer to your question, but rather an alternative way to > use non-ASCII chars in a tex file: > > \usepackage[german]{babel} > \usepackage[T1]{fontenc} > \usepackage[utf8]{inputenc} > > (Replace "german" with the language of your choice) I am aware of such solutions, however - as you noted - they solve a different problem. What I have is not a (La)TeX file - it is an UTF-8 encoded XML file with data pulled from LaTeX files (hence my problem). Best, -- Marcin Borkowski ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski 2016-12-08 18:21 ` Carlos Konstanski @ 2016-12-08 22:12 ` Stefan Monnier 2017-01-27 11:48 ` Marcin Borkowski 2017-01-28 8:15 ` Kendall Shaw 2017-07-03 4:56 ` Marcin Borkowski 3 siblings, 1 reply; 25+ messages in thread From: Stefan Monnier @ 2016-12-08 22:12 UTC (permalink / raw) To: help-gnu-emacs > I have a string with embedded sequences like "\'e" or "\H{o}". The > Emacs TeX input method knows how to convert them into "é" or "ő" (when > typing, of course). Is there a way to use that to perform similar > conversions in a string? You can do something like: (with-temp-buffer (insert STRING) (iso-tex2iso (point-min) (point-max)) (buffer-string)) -- Stefan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2016-12-08 22:12 ` Stefan Monnier @ 2017-01-27 11:48 ` Marcin Borkowski 0 siblings, 0 replies; 25+ messages in thread From: Marcin Borkowski @ 2017-01-27 11:48 UTC (permalink / raw) To: Stefan Monnier; +Cc: help-gnu-emacs On 2016-12-08, at 23:12, Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> I have a string with embedded sequences like "\'e" or "\H{o}". The >> Emacs TeX input method knows how to convert them into "é" or "ő" (when >> typing, of course). Is there a way to use that to perform similar >> conversions in a string? > > You can do something like: > > (with-temp-buffer > (insert STRING) > (iso-tex2iso (point-min) (point-max)) > (buffer-string)) Hi, sorry for the delay - I somehow missed yoru answer. Thanks for the tip. It works, but not entirely. It did work for \'{e}, but not for \H{o} - probably because there is no "ő" in ISO 8859-1. So this won't do all the tricks that the TeX inout method does. Still, if nothing else pops up, this is quite useful - thanks! Best, -- Marcin Borkowski ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski 2016-12-08 18:21 ` Carlos Konstanski 2016-12-08 22:12 ` Stefan Monnier @ 2017-01-28 8:15 ` Kendall Shaw 2017-07-03 4:56 ` Marcin Borkowski 3 siblings, 0 replies; 25+ messages in thread From: Kendall Shaw @ 2017-01-28 8:15 UTC (permalink / raw) To: help-gnu-emacs There is a variable tex--prettify-symbols-alist that maps some tex symbols to code points. I think you can use the function set-buffer-file-coding-system to cause any file the buffer is saved to to be in utf-8, then use characters from tex--prettify-symbols-alis. Kendall On 12/08/2016 09:04 AM, Marcin Borkowski wrote: > Hi all, > > I have a string with embedded sequences like "\'e" or "\H{o}". The > Emacs TeX input method knows how to convert them into "é" or "ő" (when > typing, of course). Is there a way to use that to perform similar > conversions in a string? > > TIA, > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski ` (2 preceding siblings ...) 2017-01-28 8:15 ` Kendall Shaw @ 2017-07-03 4:56 ` Marcin Borkowski 2017-07-03 5:43 ` Emanuel Berg ` (2 more replies) 3 siblings, 3 replies; 25+ messages in thread From: Marcin Borkowski @ 2017-07-03 4:56 UTC (permalink / raw) To: Help Gnu Emacs mailing list On 2016-12-08, at 18:04, Marcin Borkowski <mbork@mbork.pl> wrote: > Hi all, > > I have a string with embedded sequences like "\'e" or "\H{o}". The > Emacs TeX input method knows how to convert them into "é" or "ő" (when > typing, of course). Is there a way to use that to perform similar > conversions in a string? Hi all, I'm revisiting this old thread now. Since I got no satisfying answers back then, here is my plan for solution. I'm going first to map \', \` etc. onto /names/ (this is a rather short list!), construct a Unicode name of the character I want and then use =ucs-names=. For instance, \' maps to "ACUTE", then \'a will map to "LATIN SMALL LETTER A ACUTE" and this can be fed into =char-from-name=. It is a horrible hack, but it should work. Any better ideas? Best, -- Marcin Borkowski ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 4:56 ` Marcin Borkowski @ 2017-07-03 5:43 ` Emanuel Berg 2017-07-03 9:16 ` Marcin Borkowski 2017-07-03 8:37 ` Teemu Likonen 2017-07-04 11:18 ` Joost Kremers 2 siblings, 1 reply; 25+ messages in thread From: Emanuel Berg @ 2017-07-03 5:43 UTC (permalink / raw) To: help-gnu-emacs Marcin Borkowski <mbork@mbork.pl> writes: > I'm revisiting this old thread now. Since I got no > satisfying answers back then, here is my plan for > solution. I'm going first to map \', \` etc. > onto /names/ (this is a rather short list!), > construct a Unicode name of the character I want and > then use =ucs-names=. > > For instance, \' maps to "ACUTE", then \'a will map > to "LATIN SMALL LETTER A ACUTE" and this can be fed > into =char-from-name=. > > It is a horrible hack On the contrary. Add another layer of abstraction. If you setup the names consistently it is even a good-looking solution. -- underground experts united http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 5:43 ` Emanuel Berg @ 2017-07-03 9:16 ` Marcin Borkowski 2017-07-03 9:31 ` tomas 2017-07-03 10:24 ` Emanuel Berg 0 siblings, 2 replies; 25+ messages in thread From: Marcin Borkowski @ 2017-07-03 9:16 UTC (permalink / raw) To: Emanuel Berg; +Cc: help-gnu-emacs On 2017-07-03, at 07:43, Emanuel Berg <moasen@zoho.com> wrote: > Marcin Borkowski <mbork@mbork.pl> writes: > >> I'm revisiting this old thread now. Since I got no >> satisfying answers back then, here is my plan for >> solution. I'm going first to map \', \` etc. >> onto /names/ (this is a rather short list!), >> construct a Unicode name of the character I want and >> then use =ucs-names=. >> >> For instance, \' maps to "ACUTE", then \'a will map >> to "LATIN SMALL LETTER A ACUTE" and this can be fed >> into =char-from-name=. >> >> It is a horrible hack > > On the contrary. Add another layer of abstraction. > > If you setup the names consistently it is even > a good-looking solution. I'm not sure whether I follow you here. Why should *I* setup the names? They are in ucs-names (as I said), and they are official Unicode names. It is still a hack, since it relies on the Unicode names being correct. Have you seen this? https://codepoints.net/U+FE18?lang=en Notice the typo in the name. It's in the standard (somehow it slipped through;-)), so the typo is there forever (or rather, for as long as Unicode is going to be around). Best, -- Marcin Borkowski ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 9:16 ` Marcin Borkowski @ 2017-07-03 9:31 ` tomas 2017-07-04 5:55 ` Marcin Borkowski 2017-07-03 10:24 ` Emanuel Berg 1 sibling, 1 reply; 25+ messages in thread From: tomas @ 2017-07-03 9:31 UTC (permalink / raw) To: Marcin Borkowski; +Cc: help-gnu-emacs, Emanuel Berg -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, Jul 03, 2017 at 11:16:12AM +0200, Marcin Borkowski wrote: > > On 2017-07-03, at 07:43, Emanuel Berg <moasen@zoho.com> wrote: > > > Marcin Borkowski <mbork@mbork.pl> writes: > > > >> I'm revisiting this old thread now. Since I got no > >> satisfying answers back then, here is my plan for > >> solution [...] > > If you setup the names consistently it is even > > a good-looking solution. [...] > I'm not sure whether I follow you here. Why should *I* setup the names? > They are in ucs-names (as I said), and they are official Unicode names. > > It is still a hack, since it relies on the Unicode names being correct. > Have you seen this? > > https://codepoints.net/U+FE18?lang=en > > Notice the typo in the name. It's in the standard (somehow it slipped > through;-)), so the typo is there forever (or rather, for as long as > Unicode is going to be around). Yes: they even listed a "correction". Unicode (and Emacs) seem to have a mechanism in place for a code point to have more than one name (I don't know whether there can be more than two, though). So perhaps the only thing to cope with is that the mapping name -> code point isn't injective (whatever that means for your approach)? Cheers - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAllaDvoACgkQBcgs9XrR2kaoqQCfZo3CqBbeWPHBFaszqFd2DsTC ixUAn1HG7Rzbc5KkzrfeUu+gYmEviJEs =CKUu -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 9:31 ` tomas @ 2017-07-04 5:55 ` Marcin Borkowski 0 siblings, 0 replies; 25+ messages in thread From: Marcin Borkowski @ 2017-07-04 5:55 UTC (permalink / raw) To: tomas; +Cc: help-gnu-emacs, Emanuel Berg On 2017-07-03, at 11:31, tomas@tuxteam.de wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Mon, Jul 03, 2017 at 11:16:12AM +0200, Marcin Borkowski wrote: >> >> On 2017-07-03, at 07:43, Emanuel Berg <moasen@zoho.com> wrote: >> >> > Marcin Borkowski <mbork@mbork.pl> writes: >> > >> >> I'm revisiting this old thread now. Since I got no >> >> satisfying answers back then, here is my plan for >> >> solution [...] > >> > If you setup the names consistently it is even >> > a good-looking solution. > > [...] > >> I'm not sure whether I follow you here. Why should *I* setup the names? >> They are in ucs-names (as I said), and they are official Unicode names. >> >> It is still a hack, since it relies on the Unicode names being correct. >> Have you seen this? >> >> https://codepoints.net/U+FE18?lang=en >> >> Notice the typo in the name. It's in the standard (somehow it slipped >> through;-)), so the typo is there forever (or rather, for as long as >> Unicode is going to be around). > > Yes: they even listed a "correction". Unicode (and Emacs) seem to have > a mechanism in place for a code point to have more than one name (I > don't know whether there can be more than two, though). So perhaps the > only thing to cope with is that the mapping name -> code point isn't > injective (whatever that means for your approach)? Interesting. The "other" name doesn't appear in (ucs-names), though, so it's not useful for me here. In any case, I am aware of no such typos in accented letters anyway;-). Best, -- Marcin Borkowski ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 9:16 ` Marcin Borkowski 2017-07-03 9:31 ` tomas @ 2017-07-03 10:24 ` Emanuel Berg 2017-07-03 17:36 ` Marcin Borkowski 1 sibling, 1 reply; 25+ messages in thread From: Emanuel Berg @ 2017-07-03 10:24 UTC (permalink / raw) To: help-gnu-emacs Marcin Borkowski wrote: > It is still a hack, since it relies on the > Unicode names being correct. If it relied on the names being *in*correct, that would make it a hack in the negative sense. -- underground experts united http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 10:24 ` Emanuel Berg @ 2017-07-03 17:36 ` Marcin Borkowski 2017-07-03 20:01 ` Emanuel Berg 2017-07-04 10:23 ` Héctor Lahoz 0 siblings, 2 replies; 25+ messages in thread From: Marcin Borkowski @ 2017-07-03 17:36 UTC (permalink / raw) To: Emanuel Berg; +Cc: help-gnu-emacs On 2017-07-03, at 12:24, Emanuel Berg <moasen@zoho.com> wrote: > Marcin Borkowski wrote: > >> It is still a hack, since it relies on the >> Unicode names being correct. > > If it relied on the names being *in*correct, > that would make it a hack in the > negative sense. OK, so here is a proof of concept: --8<---------------cut here---------------start------------->8--- (defvar TeX-to-Unicode-accents-alist '((?` . "grave") (?' . "acute") (?^ . "circumflex") (?\" . "diaeresis") (?H . "double acute") (?~ . "tilde") (?c . "with cedilla") (?k . "ogonek") (?= . "macron") (?. . "with dot above") (?u . "with breve") (?v . "with caron")) "A mapping from TeX control characters to accent names used in Unicode.") (defun combine-letter-diacritical-mark (letter mark) "Return a Unicode string of LETTER combined with MARK. MARK can be any character that can be used in TeX accenting commands." (let* ((letter (if (stringp letter) (string-to-char letter) letter)) (uppercase (= letter (upcase letter)))) (cdr (assoc-string (format "LATIN %s LETTER %c %s" (if uppercase "CAPITAL" "SMALL") letter (cdr (assoc mark TeX-to-Unicode-accents-alist))) ucs-names t)))) --8<---------------cut here---------------end--------------->8--- As you can see from the mess in `TeX-to-Unicode-accents-alist', this _is_ a hack. Still, it seems to work more or less fine. Best, -- Marcin Borkowski ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 17:36 ` Marcin Borkowski @ 2017-07-03 20:01 ` Emanuel Berg 2017-07-04 10:23 ` Héctor Lahoz 1 sibling, 0 replies; 25+ messages in thread From: Emanuel Berg @ 2017-07-03 20:01 UTC (permalink / raw) To: help-gnu-emacs Marcin Borkowski wrote: > As you can see from the mess in > `TeX-to-Unicode-accents-alist', this _is_ > a hack. Still, it seems to work more or > less fine. It is your background as a humble mathematician that plays you a trick. Here, altho most definitely helped by your math training, you are a practical engineer and engineering is never perfect in the math sense, and it doesn't make it a hack. But whatever makes you sleep at night :) -- underground experts united http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 17:36 ` Marcin Borkowski 2017-07-03 20:01 ` Emanuel Berg @ 2017-07-04 10:23 ` Héctor Lahoz 1 sibling, 0 replies; 25+ messages in thread From: Héctor Lahoz @ 2017-07-04 10:23 UTC (permalink / raw) To: help-gnu-emacs Marcin Borkowski wrote: > OK, so here is a proof of concept: > > --8<---------------cut here---------------start------------->8--- > (defvar TeX-to-Unicode-accents-alist > '((?` . "grave") > (?' . "acute") > (?^ . "circumflex") > (?\" . "diaeresis") > (?H . "double acute") > (?~ . "tilde") > (?c . "with cedilla") > (?k . "ogonek") > (?= . "macron") > (?. . "with dot above") > (?u . "with breve") > (?v . "with caron")) > "A mapping from TeX control characters to accent names used in > Unicode.") > > (defun combine-letter-diacritical-mark (letter mark) > "Return a Unicode string of LETTER combined with MARK. > MARK can be any character that can be used in TeX accenting > commands." > (let* ((letter (if (stringp letter) > (string-to-char letter) > letter)) > (uppercase (= letter > (upcase letter)))) > (cdr (assoc-string > (format "LATIN %s LETTER %c %s" > (if uppercase "CAPITAL" "SMALL") > letter > (cdr (assoc mark TeX-to-Unicode-accents-alist))) > ucs-names > t)))) > --8<---------------cut here---------------end--------------->8--- > Great. Perhaps you could consider translating to unicode combining characters. I think it is closer to the original TeX idea and could be cleaner: 0300;COMBINING GRAVE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING GRAVE;;;; 0301;COMBINING ACUTE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING ACUTE;;;; 0302;COMBINING CIRCUMFLEX ACCENT;Mn;230;NSM;;;;;N;NON-SPACING CIRCUMFLEX;;;; 0303;COMBINING TILDE;Mn;230;NSM;;;;;N;NON-SPACING TILDE;;;; 0304;COMBINING MACRON;Mn;230;NSM;;;;;N;NON-SPACING MACRON;;;; 0305;COMBINING OVERLINE;Mn;230;NSM;;;;;N;NON-SPACING OVERSCORE;;;; 0306;COMBINING BREVE;Mn;230;NSM;;;;;N;NON-SPACING BREVE;;;; 0307;COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;NON-SPACING DOT ABOVE;;;; 0308;COMBINING DIAERESIS;Mn;230;NSM;;;;;N;NON-SPACING DIAERESIS;;;; 0309;COMBINING HOOK ABOVE;Mn;230;NSM;;;;;N;NON-SPACING HOOK ABOVE;;;; 030A;COMBINING RING ABOVE;Mn;230;NSM;;;;;N;NON-SPACING RING ABOVE;;;; 030B;COMBINING DOUBLE ACUTE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING DOUBLE ACUTE;;;; 030C;COMBINING CARON;Mn;230;NSM;;;;;N;NON-SPACING HACEK;;;; 030D;COMBINING VERTICAL LINE ABOVE;Mn;230;NSM;;;;;N;NON-SPACING VERTICAL LINE ABOVE;;;; See the wikipedia article on unicode equivalence: https://en.wikipedia.org/wiki/Unicode_equivalence The difference is that unicode reverses the order. First you have the base character and then all combining characters. For example, \'a would be translated to either 00E1;LATIN SMALL LETTER A WITH ACUTE or 0061;LATIN SMALL LETTER A 0301;COMBINING ACUTE ACCENT I don't know the implications of using unicode combining characters. I guess the choice depends on the purpose of the output. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 4:56 ` Marcin Borkowski 2017-07-03 5:43 ` Emanuel Berg @ 2017-07-03 8:37 ` Teemu Likonen 2017-07-04 5:57 ` Marcin Borkowski 2017-07-04 11:18 ` Joost Kremers 2 siblings, 1 reply; 25+ messages in thread From: Teemu Likonen @ 2017-07-03 8:37 UTC (permalink / raw) To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list [-- Attachment #1: Type: text/plain, Size: 1187 bytes --] Marcin Borkowski [2017-07-03 06:56:36+02] wrote: > On 2016-12-08, at 18:04, Marcin Borkowski <mbork@mbork.pl> wrote: >> I have a string with embedded sequences like "\'e" or "\H{o}". The >> Emacs TeX input method knows how to convert them into "é" or "ő" (when >> typing, of course). Is there a way to use that to perform similar >> conversions in a string? > I'm revisiting this old thread now. > It is a horrible hack, but it should work. Any better ideas? I would filter buffer's content through recode command. [Highlight a region.] C-u M-x shell-command-on-region RET recode tex.. RET You wanted to do this for a string so we can write a function that uses a temporary buffer and returns its content as a string. Here is a quick example: (defun convert-from-latex (string) (with-temp-buffer (insert string) (call-process-region (point-min) (point-max) "recode" t t nil "tex..") (buffer-substring-no-properties (point-min) (point-max)))) -- /// Teemu Likonen - .-.. <https://keybase.io/tlikonen> // // PGP: 4E10 55DC 84E9 DFF6 13D7 8557 719D 69D3 2453 9450 /// [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 8:37 ` Teemu Likonen @ 2017-07-04 5:57 ` Marcin Borkowski 2017-07-04 7:13 ` Udyant Wig 0 siblings, 1 reply; 25+ messages in thread From: Marcin Borkowski @ 2017-07-04 5:57 UTC (permalink / raw) To: Teemu Likonen; +Cc: Help Gnu Emacs mailing list On 2017-07-03, at 10:37, Teemu Likonen <tlikonen@iki.fi> wrote: > Marcin Borkowski [2017-07-03 06:56:36+02] wrote: > >> On 2016-12-08, at 18:04, Marcin Borkowski <mbork@mbork.pl> wrote: >>> I have a string with embedded sequences like "\'e" or "\H{o}". The >>> Emacs TeX input method knows how to convert them into "é" or "ő" (when >>> typing, of course). Is there a way to use that to perform similar >>> conversions in a string? > >> I'm revisiting this old thread now. > >> It is a horrible hack, but it should work. Any better ideas? > > I would filter buffer's content through recode command. > > [Highlight a region.] > > C-u M-x shell-command-on-region RET recode tex.. RET > > You wanted to do this for a string so we can write a function that uses > a temporary buffer and returns its content as a string. Here is a quick > example: > > (defun convert-from-latex (string) > (with-temp-buffer > (insert string) > (call-process-region (point-min) (point-max) > "recode" t t nil "tex..") > (buffer-substring-no-properties (point-min) (point-max)))) Thanks, I didn't know about recode. But it doesn't work all that well: c{\c c}c does not remove braces, for instance, and what's even worse, it apparently doesn't know about \k. But thanks anyway, this is a good thing to remember, even though in my case I perceive it as even more hackish than my approach (I'd prefer Emacs to do the job, not an external utility). Best, -- Marcin Borkowski ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-04 5:57 ` Marcin Borkowski @ 2017-07-04 7:13 ` Udyant Wig 2017-07-04 9:27 ` Thien-Thi Nguyen 0 siblings, 1 reply; 25+ messages in thread From: Udyant Wig @ 2017-07-04 7:13 UTC (permalink / raw) To: help-gnu-emacs This is an interesting point, which I think is worth some thought. On 07/04/2017 11:27 AM, Marcin Borkowski wrote: > (I'd prefer Emacs to do the job, not an external utility). Would you say this ought to hold in general? For instance, both find(1) and grep(1) are external to Emacs, but have such good integration with it that they may as well be native to it. So also the package Magit which makes git(1) seem largely part of Emacs. However, I can see cases where your point is apt. If, say, one wanted to factor numbers, a workable (but horrifying(?)) solution is to call factor(1) in a subprocess and hand-hack the output. I think that as long as the layer between the given tool outside and Emacs is good, it may not matter that the work is obtained from an outsider. But what do you think? Udyant Wig -- ... while the ways of art are hard at the best, they will break you if you go unsustained by belief in what you are trying to do. -- Arthur Quiller-Couch ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-04 7:13 ` Udyant Wig @ 2017-07-04 9:27 ` Thien-Thi Nguyen 2017-07-04 20:37 ` Emanuel Berg 2017-07-05 7:05 ` Udyant Wig 0 siblings, 2 replies; 25+ messages in thread From: Thien-Thi Nguyen @ 2017-07-04 9:27 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 503 bytes --] () Udyant Wig <udyant.wig@gmail.com> () Tue, 4 Jul 2017 12:43:34 +0530 I think that as long as the layer between the given tool outside and Emacs is good Could you explain what you mean by "good", here? -- Thien-Thi Nguyen ----------------------------------------------- (defun responsep (query) (pcase (context query) (`(technical ,ml) (correctp ml)) ...)) 748E A0E8 1CB8 A748 9BFA --------------------------------------- 6CE4 6703 2224 4C80 7502 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-04 9:27 ` Thien-Thi Nguyen @ 2017-07-04 20:37 ` Emanuel Berg 2017-07-05 7:05 ` Udyant Wig 1 sibling, 0 replies; 25+ messages in thread From: Emanuel Berg @ 2017-07-04 20:37 UTC (permalink / raw) To: help-gnu-emacs Thien-Thi Nguyen wrote: >> I think that as long as the layer between >> the given tool outside and Emacs is good > > Could you explain what you mean by > "good", here? Probably he meant a clean interface which is easy to understand and operate and does not require tons of hacking to get the cord onto the coil on the other side. -- underground experts united http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-04 9:27 ` Thien-Thi Nguyen 2017-07-04 20:37 ` Emanuel Berg @ 2017-07-05 7:05 ` Udyant Wig 2017-07-05 16:06 ` Emanuel Berg 2017-07-13 17:45 ` Thien-Thi Nguyen 1 sibling, 2 replies; 25+ messages in thread From: Udyant Wig @ 2017-07-05 7:05 UTC (permalink / raw) To: help-gnu-emacs On 07/04/2017 02:57 PM, Thien-Thi Nguyen wrote: > > () Udyant Wig <udyant.wig@gmail.com> > () Tue, 4 Jul 2017 12:43:34 +0530 > > I think that as long as the layer between the given tool > outside and Emacs is good > > Could you explain what you mean by "good", here? By 'good', I meant that, at least for some definite core functionality of the tool outside, the Emacs layer which interacts with it presents a clean interface to the user, integrated well with the rest of Emacs; one can then work as though the tool (or the above mentioned definite core functionality of it) were an indistinguishable part of Emacs. Of course, a great layer may also go beyond the basics and offer an enhanced experience within Emacs. I have mentioned Magit, which provides a very nice way to work with git. The dictionary.el package is another I use. Did this help? -- ... while the ways of art are hard at the best, they will break you if you go unsustained by belief in what you are trying to do. -- Arthur Quiller-Couch ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-05 7:05 ` Udyant Wig @ 2017-07-05 16:06 ` Emanuel Berg 2017-07-13 17:45 ` Thien-Thi Nguyen 1 sibling, 0 replies; 25+ messages in thread From: Emanuel Berg @ 2017-07-05 16:06 UTC (permalink / raw) To: help-gnu-emacs Udyant Wig wrote: > By 'good', I meant that, at least for some > definite core functionality of the tool > outside, the Emacs layer which interacts with > it presents a clean interface to the user, > integrated well with the rest of Emacs; one > can then work as though the tool (or the > above mentioned definite core functionality > of it) were an indistinguishable part > of Emacs. > > Of course, a great layer may also go beyond the > basics and offer an enhanced experience > within Emacs. I have mentioned Magit, which > provides a very nice way to work with git. > The dictionary.el package is another I use. > > Did this help? Indeed, TTN understood this all good and well - so the question is rather, what did *he* really mean by his question? -- underground experts united http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-05 7:05 ` Udyant Wig 2017-07-05 16:06 ` Emanuel Berg @ 2017-07-13 17:45 ` Thien-Thi Nguyen 2017-07-14 1:48 ` Udyant Wig 1 sibling, 1 reply; 25+ messages in thread From: Thien-Thi Nguyen @ 2017-07-13 17:45 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 881 bytes --] () Udyant Wig <udyant.wig@gmail.com> () Wed, 5 Jul 2017 12:35:35 +0530 By 'good', I meant that [...] clean interface to the user, integrated well with the rest of Emacs; one can then work as though the tool (or the above mentioned definite core functionality of it) were an indistinguishable part of Emacs. [...] Did this help? Yes, thanks. Of late, i wonder a lot about how other people (programmers and non-programmers) perceive and define "good", and how those perceptions and definitions evolve (or not) over time. I'm happy to say your words make sense to me. -- Thien-Thi Nguyen ----------------------------------------------- (defun responsep (query) (pcase (context query) (`(technical ,ml) (correctp ml)) ...)) 748E A0E8 1CB8 A748 9BFA --------------------------------------- 6CE4 6703 2224 4C80 7502 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-13 17:45 ` Thien-Thi Nguyen @ 2017-07-14 1:48 ` Udyant Wig 0 siblings, 0 replies; 25+ messages in thread From: Udyant Wig @ 2017-07-14 1:48 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1.1: Type: text/plain, Size: 789 bytes --] On 07/13/2017 11:15 PM, Thien-Thi Nguyen wrote: > Yes, thanks. Of late, i wonder a lot about how other people > (programmers and non-programmers) perceive and define "good", and how > those perceptions and definitions evolve (or not) over time. I'm > happy to say your words make sense to me. I'm glad you found them helpful. In the belief that there can be no /final/ definition of 'good' software, I link the following. <URL:https://www.eskimo.com/~scs/readings/software_elegance.html> These are the words of a long-time C programmer; also the maintainer of the C FAQs. -- ... while the ways of art are hard at the best, they will break you if you go unsustained by belief in what you are trying to do. -- Arthur Quiller-Couch [-- Attachment #1.2: 0xD133994A.asc --] [-- Type: application/pgp-keys, Size: 3199 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: How to translate LaTeX into UTF-8 in Elisp? 2017-07-03 4:56 ` Marcin Borkowski 2017-07-03 5:43 ` Emanuel Berg 2017-07-03 8:37 ` Teemu Likonen @ 2017-07-04 11:18 ` Joost Kremers 2 siblings, 0 replies; 25+ messages in thread From: Joost Kremers @ 2017-07-04 11:18 UTC (permalink / raw) To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list On Mon, Jul 03 2017, Marcin Borkowski wrote: > On 2016-12-08, at 18:04, Marcin Borkowski <mbork@mbork.pl> > wrote: > >> Hi all, >> >> I have a string with embedded sequences like "\'e" or "\H{o}". >> The >> Emacs TeX input method knows how to convert them into "é" or >> "ő" (when >> typing, of course). Is there a way to use that to perform >> similar >> conversions in a string? > > Hi all, > > I'm revisiting this old thread now. Since I got no satisfying > answers > back then, here is my plan for solution. I'm going first to map > \', \` > etc. onto /names/ (this is a rather short list!), construct a > Unicode > name of the character I want and then use =ucs-names=. > > For instance, \' maps to "ACUTE", then \'a will map to "LATIN > SMALL > LETTER A ACUTE" and this can be fed into =char-from-name=. > > It is a horrible hack, but it should work. Any better ideas? Have you tried looking into the input method mechanism that translates "\'e" into "é"? The info on how do such translations is stored somewhere somehow, so it should in principle be possible to use it to do the translations you want. I don't know very much about quail (I define a few custom input methods in my init files, but that's about it), but it looks like the info is stored as an alist somewhere of the form ("\\'e" ?é). Perhaps you've already explored this idea and found it too unwieldy, but you didn't mention it anywhere. -- Joost Kremers Life has its moments ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2017-07-14 1:48 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski 2016-12-08 18:21 ` Carlos Konstanski 2016-12-08 19:13 ` Marcin Borkowski 2016-12-08 22:12 ` Stefan Monnier 2017-01-27 11:48 ` Marcin Borkowski 2017-01-28 8:15 ` Kendall Shaw 2017-07-03 4:56 ` Marcin Borkowski 2017-07-03 5:43 ` Emanuel Berg 2017-07-03 9:16 ` Marcin Borkowski 2017-07-03 9:31 ` tomas 2017-07-04 5:55 ` Marcin Borkowski 2017-07-03 10:24 ` Emanuel Berg 2017-07-03 17:36 ` Marcin Borkowski 2017-07-03 20:01 ` Emanuel Berg 2017-07-04 10:23 ` Héctor Lahoz 2017-07-03 8:37 ` Teemu Likonen 2017-07-04 5:57 ` Marcin Borkowski 2017-07-04 7:13 ` Udyant Wig 2017-07-04 9:27 ` Thien-Thi Nguyen 2017-07-04 20:37 ` Emanuel Berg 2017-07-05 7:05 ` Udyant Wig 2017-07-05 16:06 ` Emanuel Berg 2017-07-13 17:45 ` Thien-Thi Nguyen 2017-07-14 1:48 ` Udyant Wig 2017-07-04 11:18 ` Joost Kremers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).