How to translate LaTeX into UTF-8 in Elisp?

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* How to translate LaTeX into UTF-8 in Elisp?
@ 2016-12-08 17:04 Marcin Borkowski
  2016-12-08 18:21 ` Carlos Konstanski
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Marcin Borkowski @ 2016-12-08 17:04 UTC (permalink / raw)
  To: Help Gnu Emacs mailing list

Hi all,

I have a string with embedded sequences like "\'e" or "\H{o}".  The
Emacs TeX input method knows how to convert them into "é" or "ő" (when
typing, of course).  Is there a way to use that to perform similar
conversions in a string?

TIA,

-- 
Marcin Borkowski

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski
@ 2016-12-08 18:21 ` Carlos Konstanski
  2016-12-08 19:13   ` Marcin Borkowski
  2016-12-08 22:12 ` Stefan Monnier
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Carlos Konstanski @ 2016-12-08 18:21 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list

This is not an answer to your question, but rather an alternative way to
use non-ASCII chars in a tex file:

\usepackage[german]{babel}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}

(Replace "german" with the language of your choice)

Now you can simply type the actual character rather than using an escape
sequence.

Carlos

Marcin Borkowski <mbork@mbork.pl> writes:

> Hi all,
>
> I have a string with embedded sequences like "\'e" or "\H{o}".  The
> Emacs TeX input method knows how to convert them into "é" or "ő" (when
> typing, of course).  Is there a way to use that to perform similar
> conversions in a string?
>
> TIA,



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2016-12-08 18:21 ` Carlos Konstanski
@ 2016-12-08 19:13   ` Marcin Borkowski
  0 siblings, 0 replies; 25+ messages in thread
From: Marcin Borkowski @ 2016-12-08 19:13 UTC (permalink / raw)
  To: Carlos Konstanski; +Cc: Help Gnu Emacs mailing list

On 2016-12-08, at 19:21, Carlos Konstanski <ckonstanski@pippiandcarlos.com> wrote:

> This is not an answer to your question, but rather an alternative way to
> use non-ASCII chars in a tex file:
>
> \usepackage[german]{babel}
> \usepackage[T1]{fontenc}
> \usepackage[utf8]{inputenc}
>
> (Replace "german" with the language of your choice)

I am aware of such solutions, however - as you noted - they solve
a different problem.  What I have is not a (La)TeX file - it is an UTF-8
encoded XML file with data pulled from LaTeX files (hence my problem).

Best,

-- 
Marcin Borkowski

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski
  2016-12-08 18:21 ` Carlos Konstanski
@ 2016-12-08 22:12 ` Stefan Monnier
  2017-01-27 11:48   ` Marcin Borkowski
  2017-01-28  8:15 ` Kendall Shaw
  2017-07-03  4:56 ` Marcin Borkowski
  3 siblings, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2016-12-08 22:12 UTC (permalink / raw)
  To: help-gnu-emacs

> I have a string with embedded sequences like "\'e" or "\H{o}".  The
> Emacs TeX input method knows how to convert them into "é" or "ő" (when
> typing, of course).  Is there a way to use that to perform similar
> conversions in a string?

You can do something like:

   (with-temp-buffer
     (insert STRING)
     (iso-tex2iso (point-min) (point-max))
     (buffer-string))


-- Stefan




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2016-12-08 22:12 ` Stefan Monnier
@ 2017-01-27 11:48   ` Marcin Borkowski
  0 siblings, 0 replies; 25+ messages in thread
From: Marcin Borkowski @ 2017-01-27 11:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: help-gnu-emacs

On 2016-12-08, at 23:12, Stefan Monnier <monnier@iro.umontreal.ca> wrote:

>> I have a string with embedded sequences like "\'e" or "\H{o}".  The
>> Emacs TeX input method knows how to convert them into "é" or "ő" (when
>> typing, of course).  Is there a way to use that to perform similar
>> conversions in a string?
>
> You can do something like:
>
>    (with-temp-buffer
>      (insert STRING)
>      (iso-tex2iso (point-min) (point-max))
>      (buffer-string))

Hi,

sorry for the delay - I somehow missed yoru answer.

Thanks for the tip.  It works, but not entirely.  It did work for \'{e},
but not for \H{o} - probably because there is no "ő" in ISO 8859-1.  So
this won't do all the tricks that the TeX inout method does.

Still, if nothing else pops up, this is quite useful - thanks!

Best,

-- 
Marcin Borkowski

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski
  2016-12-08 18:21 ` Carlos Konstanski
  2016-12-08 22:12 ` Stefan Monnier
@ 2017-01-28  8:15 ` Kendall Shaw
  2017-07-03  4:56 ` Marcin Borkowski
  3 siblings, 0 replies; 25+ messages in thread
From: Kendall Shaw @ 2017-01-28  8:15 UTC (permalink / raw)
  To: help-gnu-emacs

There is a variable tex--prettify-symbols-alist that maps some tex 
symbols to code points. I think you can use the function 
set-buffer-file-coding-system to cause any file the buffer is saved to 
to be in utf-8, then use characters from tex--prettify-symbols-alis.


Kendall


On 12/08/2016 09:04 AM, Marcin Borkowski wrote:
> Hi all,
>
> I have a string with embedded sequences like "\'e" or "\H{o}".  The
> Emacs TeX input method knows how to convert them into "é" or "ő" (when
> typing, of course).  Is there a way to use that to perform similar
> conversions in a string?
>
> TIA,
>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski
                   ` (2 preceding siblings ...)
  2017-01-28  8:15 ` Kendall Shaw
@ 2017-07-03  4:56 ` Marcin Borkowski
  2017-07-03  5:43   ` Emanuel Berg
                     ` (2 more replies)
  3 siblings, 3 replies; 25+ messages in thread
From: Marcin Borkowski @ 2017-07-03  4:56 UTC (permalink / raw)
  To: Help Gnu Emacs mailing list

On 2016-12-08, at 18:04, Marcin Borkowski <mbork@mbork.pl> wrote:

> Hi all,
>
> I have a string with embedded sequences like "\'e" or "\H{o}".  The
> Emacs TeX input method knows how to convert them into "é" or "ő" (when
> typing, of course).  Is there a way to use that to perform similar
> conversions in a string?

Hi all,

I'm revisiting this old thread now.  Since I got no satisfying answers
back then, here is my plan for solution.  I'm going first to map \', \`
etc. onto /names/ (this is a rather short list!), construct a Unicode
name of the character I want and then use =ucs-names=.

For instance, \' maps to "ACUTE", then \'a will map to "LATIN SMALL
LETTER A ACUTE" and this can be fed into =char-from-name=.

It is a horrible hack, but it should work.  Any better ideas?

Best,

--
Marcin Borkowski

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03  4:56 ` Marcin Borkowski
@ 2017-07-03  5:43   ` Emanuel Berg
  2017-07-03  9:16     ` Marcin Borkowski
  2017-07-03  8:37   ` Teemu Likonen
  2017-07-04 11:18   ` Joost Kremers
  2 siblings, 1 reply; 25+ messages in thread
From: Emanuel Berg @ 2017-07-03  5:43 UTC (permalink / raw)
  To: help-gnu-emacs

Marcin Borkowski <mbork@mbork.pl> writes:

> I'm revisiting this old thread now. Since I got no
> satisfying answers back then, here is my plan for
> solution. I'm going first to map \', \` etc.
> onto /names/ (this is a rather short list!),
> construct a Unicode name of the character I want and
> then use =ucs-names=.
>
> For instance, \' maps to "ACUTE", then \'a will map
> to "LATIN SMALL LETTER A ACUTE" and this can be fed
> into =char-from-name=.
>
> It is a horrible hack

On the contrary. Add another layer of abstraction.

If you setup the names consistently it is even
a good-looking solution.

-- 
underground experts united
http://user.it.uu.se/~embe8573




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03  4:56 ` Marcin Borkowski
  2017-07-03  5:43   ` Emanuel Berg
@ 2017-07-03  8:37   ` Teemu Likonen
  2017-07-04  5:57     ` Marcin Borkowski
  2017-07-04 11:18   ` Joost Kremers
  2 siblings, 1 reply; 25+ messages in thread
From: Teemu Likonen @ 2017-07-03  8:37 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list

[-- Attachment #1: Type: text/plain, Size: 1187 bytes --]

Marcin Borkowski [2017-07-03 06:56:36+02] wrote:

> On 2016-12-08, at 18:04, Marcin Borkowski <mbork@mbork.pl> wrote:
>> I have a string with embedded sequences like "\'e" or "\H{o}".  The
>> Emacs TeX input method knows how to convert them into "é" or "ő" (when
>> typing, of course).  Is there a way to use that to perform similar
>> conversions in a string?

> I'm revisiting this old thread now.

> It is a horrible hack, but it should work.  Any better ideas?

I would filter buffer's content through recode command.

    [Highlight a region.]

    C-u M-x shell-command-on-region RET recode tex.. RET

You wanted to do this for a string so we can write a function that uses
a temporary buffer and returns its content as a string. Here is a quick
example:

    (defun convert-from-latex (string)
      (with-temp-buffer
        (insert string)
        (call-process-region (point-min) (point-max)
                             "recode" t t nil "tex..")
        (buffer-substring-no-properties (point-min) (point-max))))

-- 
/// Teemu Likonen   - .-..   <https://keybase.io/tlikonen> //
// PGP: 4E10 55DC 84E9 DFF6 13D7 8557 719D 69D3 2453 9450 ///

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03  5:43   ` Emanuel Berg
@ 2017-07-03  9:16     ` Marcin Borkowski
  2017-07-03  9:31       ` tomas
  2017-07-03 10:24       ` Emanuel Berg
  0 siblings, 2 replies; 25+ messages in thread
From: Marcin Borkowski @ 2017-07-03  9:16 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: help-gnu-emacs


On 2017-07-03, at 07:43, Emanuel Berg <moasen@zoho.com> wrote:

> Marcin Borkowski <mbork@mbork.pl> writes:
>
>> I'm revisiting this old thread now. Since I got no
>> satisfying answers back then, here is my plan for
>> solution. I'm going first to map \', \` etc.
>> onto /names/ (this is a rather short list!),
>> construct a Unicode name of the character I want and
>> then use =ucs-names=.
>>
>> For instance, \' maps to "ACUTE", then \'a will map
>> to "LATIN SMALL LETTER A ACUTE" and this can be fed
>> into =char-from-name=.
>>
>> It is a horrible hack
>
> On the contrary. Add another layer of abstraction.
>
> If you setup the names consistently it is even
> a good-looking solution.

I'm not sure whether I follow you here.  Why should *I* setup the names?
They are in ucs-names (as I said), and they are official Unicode names.

It is still a hack, since it relies on the Unicode names being correct.
Have you seen this?

https://codepoints.net/U+FE18?lang=en

Notice the typo in the name.  It's in the standard (somehow it slipped
through;-)), so the typo is there forever (or rather, for as long as
Unicode is going to be around).

Best,

--
Marcin Borkowski



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03  9:16     ` Marcin Borkowski
@ 2017-07-03  9:31       ` tomas
  2017-07-04  5:55         ` Marcin Borkowski
  2017-07-03 10:24       ` Emanuel Berg
  1 sibling, 1 reply; 25+ messages in thread
From: tomas @ 2017-07-03  9:31 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: help-gnu-emacs, Emanuel Berg

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Jul 03, 2017 at 11:16:12AM +0200, Marcin Borkowski wrote:
> 
> On 2017-07-03, at 07:43, Emanuel Berg <moasen@zoho.com> wrote:
> 
> > Marcin Borkowski <mbork@mbork.pl> writes:
> >
> >> I'm revisiting this old thread now. Since I got no
> >> satisfying answers back then, here is my plan for
> >> solution [...]

> > If you setup the names consistently it is even
> > a good-looking solution.

[...]
 
> I'm not sure whether I follow you here.  Why should *I* setup the names?
> They are in ucs-names (as I said), and they are official Unicode names.
> 
> It is still a hack, since it relies on the Unicode names being correct.
> Have you seen this?
> 
> https://codepoints.net/U+FE18?lang=en
> 
> Notice the typo in the name.  It's in the standard (somehow it slipped
> through;-)), so the typo is there forever (or rather, for as long as
> Unicode is going to be around).

Yes: they even listed a "correction". Unicode (and Emacs) seem to have
a mechanism in place for a code point to have more than one name (I
don't know whether there can be more than two, though). So perhaps the
only thing to cope with is that the mapping name -> code point isn't
injective (whatever that means for your approach)?

Cheers
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAllaDvoACgkQBcgs9XrR2kaoqQCfZo3CqBbeWPHBFaszqFd2DsTC
ixUAn1HG7Rzbc5KkzrfeUu+gYmEviJEs
=CKUu
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03  9:16     ` Marcin Borkowski
  2017-07-03  9:31       ` tomas
@ 2017-07-03 10:24       ` Emanuel Berg
  2017-07-03 17:36         ` Marcin Borkowski
  1 sibling, 1 reply; 25+ messages in thread
From: Emanuel Berg @ 2017-07-03 10:24 UTC (permalink / raw)
  To: help-gnu-emacs

Marcin Borkowski wrote:

> It is still a hack, since it relies on the
> Unicode names being correct.

If it relied on the names being *in*correct,
that would make it a hack in the
negative sense.

-- 
underground experts united
http://user.it.uu.se/~embe8573




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03 10:24       ` Emanuel Berg
@ 2017-07-03 17:36         ` Marcin Borkowski
  2017-07-03 20:01           ` Emanuel Berg
  2017-07-04 10:23           ` Héctor Lahoz
  0 siblings, 2 replies; 25+ messages in thread
From: Marcin Borkowski @ 2017-07-03 17:36 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: help-gnu-emacs


On 2017-07-03, at 12:24, Emanuel Berg <moasen@zoho.com> wrote:

> Marcin Borkowski wrote:
>
>> It is still a hack, since it relies on the
>> Unicode names being correct.
>
> If it relied on the names being *in*correct,
> that would make it a hack in the
> negative sense.

OK, so here is a proof of concept:

--8<---------------cut here---------------start------------->8---
(defvar TeX-to-Unicode-accents-alist
  '((?` . "grave")
    (?' . "acute")
    (?^ . "circumflex")
    (?\" . "diaeresis")
    (?H . "double acute")
    (?~ . "tilde")
    (?c . "with cedilla")
    (?k . "ogonek")
    (?= . "macron")
    (?. . "with dot above")
    (?u . "with breve")
    (?v . "with caron"))
  "A mapping from TeX control characters to accent names used in
Unicode.")

(defun combine-letter-diacritical-mark (letter mark)
  "Return a Unicode string of LETTER combined with MARK.
MARK can be any character that can be used in TeX accenting
commands."
  (let* ((letter (if (stringp letter)
                     (string-to-char letter)
                   letter))
         (uppercase (= letter
                       (upcase letter))))
    (cdr (assoc-string
          (format "LATIN %s LETTER %c %s"
                  (if uppercase "CAPITAL" "SMALL")
                  letter
                  (cdr (assoc mark TeX-to-Unicode-accents-alist)))
          ucs-names
          t))))
--8<---------------cut here---------------end--------------->8---

As you can see from the mess in `TeX-to-Unicode-accents-alist', this
_is_ a hack.  Still, it seems to work more or less fine.

Best,

-- 
Marcin Borkowski



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03 17:36         ` Marcin Borkowski
@ 2017-07-03 20:01           ` Emanuel Berg
  2017-07-04 10:23           ` Héctor Lahoz
  1 sibling, 0 replies; 25+ messages in thread
From: Emanuel Berg @ 2017-07-03 20:01 UTC (permalink / raw)
  To: help-gnu-emacs

Marcin Borkowski wrote:

> As you can see from the mess in
> `TeX-to-Unicode-accents-alist', this _is_
> a hack. Still, it seems to work more or
> less fine.

It is your background as a humble mathematician
that plays you a trick. Here, altho most
definitely helped by your math training, you
are a practical engineer and engineering is
never perfect in the math sense, and it doesn't
make it a hack. But whatever makes you sleep at
night :)

-- 
underground experts united
http://user.it.uu.se/~embe8573

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03  9:31       ` tomas
@ 2017-07-04  5:55         ` Marcin Borkowski
  0 siblings, 0 replies; 25+ messages in thread
From: Marcin Borkowski @ 2017-07-04  5:55 UTC (permalink / raw)
  To: tomas; +Cc: help-gnu-emacs, Emanuel Berg


On 2017-07-03, at 11:31, tomas@tuxteam.de wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Mon, Jul 03, 2017 at 11:16:12AM +0200, Marcin Borkowski wrote:
>> 
>> On 2017-07-03, at 07:43, Emanuel Berg <moasen@zoho.com> wrote:
>> 
>> > Marcin Borkowski <mbork@mbork.pl> writes:
>> >
>> >> I'm revisiting this old thread now. Since I got no
>> >> satisfying answers back then, here is my plan for
>> >> solution [...]
>
>> > If you setup the names consistently it is even
>> > a good-looking solution.
>
> [...]
>  
>> I'm not sure whether I follow you here.  Why should *I* setup the names?
>> They are in ucs-names (as I said), and they are official Unicode names.
>> 
>> It is still a hack, since it relies on the Unicode names being correct.
>> Have you seen this?
>> 
>> https://codepoints.net/U+FE18?lang=en
>> 
>> Notice the typo in the name.  It's in the standard (somehow it slipped
>> through;-)), so the typo is there forever (or rather, for as long as
>> Unicode is going to be around).
>
> Yes: they even listed a "correction". Unicode (and Emacs) seem to have
> a mechanism in place for a code point to have more than one name (I
> don't know whether there can be more than two, though). So perhaps the
> only thing to cope with is that the mapping name -> code point isn't
> injective (whatever that means for your approach)?

Interesting.  The "other" name doesn't appear in (ucs-names), though, so
it's not useful for me here.  In any case, I am aware of no such typos
in accented letters anyway;-).

Best,

-- 
Marcin Borkowski



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03  8:37   ` Teemu Likonen
@ 2017-07-04  5:57     ` Marcin Borkowski
  2017-07-04  7:13       ` Udyant Wig
  0 siblings, 1 reply; 25+ messages in thread
From: Marcin Borkowski @ 2017-07-04  5:57 UTC (permalink / raw)
  To: Teemu Likonen; +Cc: Help Gnu Emacs mailing list


On 2017-07-03, at 10:37, Teemu Likonen <tlikonen@iki.fi> wrote:

> Marcin Borkowski [2017-07-03 06:56:36+02] wrote:
>
>> On 2016-12-08, at 18:04, Marcin Borkowski <mbork@mbork.pl> wrote:
>>> I have a string with embedded sequences like "\'e" or "\H{o}".  The
>>> Emacs TeX input method knows how to convert them into "é" or "ő" (when
>>> typing, of course).  Is there a way to use that to perform similar
>>> conversions in a string?
>
>> I'm revisiting this old thread now.
>
>> It is a horrible hack, but it should work.  Any better ideas?
>
> I would filter buffer's content through recode command.
>
>     [Highlight a region.]
>
>     C-u M-x shell-command-on-region RET recode tex.. RET
>
> You wanted to do this for a string so we can write a function that uses
> a temporary buffer and returns its content as a string. Here is a quick
> example:
>
>     (defun convert-from-latex (string)
>       (with-temp-buffer
>         (insert string)
>         (call-process-region (point-min) (point-max)
>                              "recode" t t nil "tex..")
>         (buffer-substring-no-properties (point-min) (point-max))))

Thanks, I didn't know about recode.  But it doesn't work all that well:
c{\c c}c does not remove braces, for instance, and what's even worse, it
apparently doesn't know about \k.

But thanks anyway, this is a good thing to remember, even though in my
case I perceive it as even more hackish than my approach (I'd prefer
Emacs to do the job, not an external utility).

Best,

-- 
Marcin Borkowski



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-04  5:57     ` Marcin Borkowski
@ 2017-07-04  7:13       ` Udyant Wig
  2017-07-04  9:27         ` Thien-Thi Nguyen
  0 siblings, 1 reply; 25+ messages in thread
From: Udyant Wig @ 2017-07-04  7:13 UTC (permalink / raw)
  To: help-gnu-emacs

This is an interesting point, which I think is worth some thought.

On 07/04/2017 11:27 AM, Marcin Borkowski wrote:
> (I'd prefer Emacs to do the job, not an external utility).

Would you say this ought to hold in general?  For instance, both find(1)
and grep(1) are external to Emacs, but have such good integration with
it that they may as well be native to it.  So also the package Magit
which makes git(1) seem largely part of Emacs.

However, I can see cases where your point is apt.  If, say, one wanted
to factor numbers, a workable (but horrifying(?))  solution is to call
factor(1) in a subprocess and hand-hack the output.

I think that as long as the layer between the given tool outside and
Emacs is good, it may not matter that the work is obtained from an
outsider.

But what do you think?

Udyant Wig
-- 
... while the ways of art are hard at the best, they will break you if
you go unsustained by belief in what you are trying to do.
                                -- Arthur Quiller-Couch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-04  7:13       ` Udyant Wig
@ 2017-07-04  9:27         ` Thien-Thi Nguyen
  2017-07-04 20:37           ` Emanuel Berg
  2017-07-05  7:05           ` Udyant Wig
  0 siblings, 2 replies; 25+ messages in thread
From: Thien-Thi Nguyen @ 2017-07-04  9:27 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 503 bytes --]


() Udyant Wig <udyant.wig@gmail.com>
() Tue, 4 Jul 2017 12:43:34 +0530

   I think that as long as the layer between the given tool
   outside and Emacs is good

Could you explain what you mean by "good", here?

-- 
Thien-Thi Nguyen -----------------------------------------------
 (defun responsep (query)
   (pcase (context query)
     (`(technical ,ml) (correctp ml))
     ...))                              748E A0E8 1CB8 A748 9BFA
--------------------------------------- 6CE4 6703 2224 4C80 7502


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03 17:36         ` Marcin Borkowski
  2017-07-03 20:01           ` Emanuel Berg
@ 2017-07-04 10:23           ` Héctor Lahoz
  1 sibling, 0 replies; 25+ messages in thread
From: Héctor Lahoz @ 2017-07-04 10:23 UTC (permalink / raw)
  To: help-gnu-emacs

Marcin Borkowski wrote:
> OK, so here is a proof of concept:
> 
> --8<---------------cut here---------------start------------->8---
> (defvar TeX-to-Unicode-accents-alist
>   '((?` . "grave")
>     (?' . "acute")
>     (?^ . "circumflex")
>     (?\" . "diaeresis")
>     (?H . "double acute")
>     (?~ . "tilde")
>     (?c . "with cedilla")
>     (?k . "ogonek")
>     (?= . "macron")
>     (?. . "with dot above")
>     (?u . "with breve")
>     (?v . "with caron"))
>   "A mapping from TeX control characters to accent names used in
> Unicode.")
> 
> (defun combine-letter-diacritical-mark (letter mark)
>   "Return a Unicode string of LETTER combined with MARK.
> MARK can be any character that can be used in TeX accenting
> commands."
>   (let* ((letter (if (stringp letter)
>                      (string-to-char letter)
>                    letter))
>          (uppercase (= letter
>                        (upcase letter))))
>     (cdr (assoc-string
>           (format "LATIN %s LETTER %c %s"
>                   (if uppercase "CAPITAL" "SMALL")
>                   letter
>                   (cdr (assoc mark TeX-to-Unicode-accents-alist)))
>           ucs-names
>           t))))
> --8<---------------cut here---------------end--------------->8---
> 

Great.

Perhaps you could consider translating to unicode combining characters.
I think it is closer to the original TeX idea and could be cleaner:

0300;COMBINING GRAVE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING GRAVE;;;;
0301;COMBINING ACUTE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING ACUTE;;;;
0302;COMBINING CIRCUMFLEX ACCENT;Mn;230;NSM;;;;;N;NON-SPACING CIRCUMFLEX;;;;
0303;COMBINING TILDE;Mn;230;NSM;;;;;N;NON-SPACING TILDE;;;;
0304;COMBINING MACRON;Mn;230;NSM;;;;;N;NON-SPACING MACRON;;;;
0305;COMBINING OVERLINE;Mn;230;NSM;;;;;N;NON-SPACING OVERSCORE;;;;
0306;COMBINING BREVE;Mn;230;NSM;;;;;N;NON-SPACING BREVE;;;;
0307;COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;NON-SPACING DOT ABOVE;;;;
0308;COMBINING DIAERESIS;Mn;230;NSM;;;;;N;NON-SPACING DIAERESIS;;;;
0309;COMBINING HOOK ABOVE;Mn;230;NSM;;;;;N;NON-SPACING HOOK ABOVE;;;;
030A;COMBINING RING ABOVE;Mn;230;NSM;;;;;N;NON-SPACING RING ABOVE;;;;
030B;COMBINING DOUBLE ACUTE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING DOUBLE ACUTE;;;;
030C;COMBINING CARON;Mn;230;NSM;;;;;N;NON-SPACING HACEK;;;;
030D;COMBINING VERTICAL LINE ABOVE;Mn;230;NSM;;;;;N;NON-SPACING VERTICAL LINE ABOVE;;;;

See the wikipedia article on unicode equivalence:
https://en.wikipedia.org/wiki/Unicode_equivalence

The difference is that unicode reverses the order. First you have the
base character and then all combining characters. For example, \'a would
be translated to either

00E1;LATIN SMALL LETTER A WITH ACUTE

or

0061;LATIN SMALL LETTER A
0301;COMBINING ACUTE ACCENT

I don't know the implications of using unicode combining characters.
I guess the choice depends on the purpose of the output.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-03  4:56 ` Marcin Borkowski
  2017-07-03  5:43   ` Emanuel Berg
  2017-07-03  8:37   ` Teemu Likonen
@ 2017-07-04 11:18   ` Joost Kremers
  2 siblings, 0 replies; 25+ messages in thread
From: Joost Kremers @ 2017-07-04 11:18 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list


On Mon, Jul 03 2017, Marcin Borkowski wrote:
> On 2016-12-08, at 18:04, Marcin Borkowski <mbork@mbork.pl> 
> wrote:
>
>> Hi all,
>>
>> I have a string with embedded sequences like "\'e" or "\H{o}". 
>> The
>> Emacs TeX input method knows how to convert them into "é" or 
>> "ő" (when
>> typing, of course).  Is there a way to use that to perform 
>> similar
>> conversions in a string?
>
> Hi all,
>
> I'm revisiting this old thread now.  Since I got no satisfying 
> answers
> back then, here is my plan for solution.  I'm going first to map 
> \', \`
> etc. onto /names/ (this is a rather short list!), construct a 
> Unicode
> name of the character I want and then use =ucs-names=.
>
> For instance, \' maps to "ACUTE", then \'a will map to "LATIN 
> SMALL
> LETTER A ACUTE" and this can be fed into =char-from-name=.
>
> It is a horrible hack, but it should work.  Any better ideas?

Have you tried looking into the input method mechanism that 
translates "\'e" into "é"? The info on how do such translations is 
stored somewhere somehow, so it should in principle be possible to 
use it to do the translations you want. I don't know very much 
about quail (I define a few custom input methods in my init files, 
but that's about it), but it looks like the info is stored as an 
alist somewhere of the form ("\\'e" ?é).

Perhaps you've already explored this idea and found it too 
unwieldy, but you didn't mention it anywhere. 



-- 
Joost Kremers
Life has its moments



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-04  9:27         ` Thien-Thi Nguyen
@ 2017-07-04 20:37           ` Emanuel Berg
  2017-07-05  7:05           ` Udyant Wig
  1 sibling, 0 replies; 25+ messages in thread
From: Emanuel Berg @ 2017-07-04 20:37 UTC (permalink / raw)
  To: help-gnu-emacs

Thien-Thi Nguyen wrote:

>> I think that as long as the layer between
>> the given tool outside and Emacs is good
>
> Could you explain what you mean by
> "good", here?

Probably he meant a clean interface which is
easy to understand and operate and does not
require tons of hacking to get the cord onto
the coil on the other side.

-- 
underground experts united
http://user.it.uu.se/~embe8573




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-04  9:27         ` Thien-Thi Nguyen
  2017-07-04 20:37           ` Emanuel Berg
@ 2017-07-05  7:05           ` Udyant Wig
  2017-07-05 16:06             ` Emanuel Berg
  2017-07-13 17:45             ` Thien-Thi Nguyen
  1 sibling, 2 replies; 25+ messages in thread
From: Udyant Wig @ 2017-07-05  7:05 UTC (permalink / raw)
  To: help-gnu-emacs

On 07/04/2017 02:57 PM, Thien-Thi Nguyen wrote:
> 
> () Udyant Wig <udyant.wig@gmail.com>
> () Tue, 4 Jul 2017 12:43:34 +0530
> 
>    I think that as long as the layer between the given tool
>    outside and Emacs is good
> 
> Could you explain what you mean by "good", here?

By 'good', I meant that, at least for some definite core functionality
of the tool outside, the Emacs layer which interacts with it presents a
clean interface to the user, integrated well with the rest of Emacs; one
can then work as though the tool (or the above mentioned definite core
functionality of it) were an indistinguishable part of Emacs.

Of course, a great layer may also go beyond the basics and offer an
enhanced experience within Emacs.  I have mentioned Magit, which
provides a very nice way to work with git.  The dictionary.el package is
another I use.

Did this help?

-- 
... while the ways of art are hard at the best, they will break you if
you go unsustained by belief in what you are trying to do.
                                -- Arthur Quiller-Couch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-05  7:05           ` Udyant Wig
@ 2017-07-05 16:06             ` Emanuel Berg
  2017-07-13 17:45             ` Thien-Thi Nguyen
  1 sibling, 0 replies; 25+ messages in thread
From: Emanuel Berg @ 2017-07-05 16:06 UTC (permalink / raw)
  To: help-gnu-emacs

Udyant Wig wrote:

> By 'good', I meant that, at least for some
> definite core functionality of the tool
> outside, the Emacs layer which interacts with
> it presents a clean interface to the user,
> integrated well with the rest of Emacs; one
> can then work as though the tool (or the
> above mentioned definite core functionality
> of it) were an indistinguishable part
> of Emacs.
>
> Of course, a great layer may also go beyond the
> basics and offer an enhanced experience
> within Emacs. I have mentioned Magit, which
> provides a very nice way to work with git.
> The dictionary.el package is another I use.
>
> Did this help?

Indeed, TTN understood this all good and well -
so the question is rather, what did *he* really
mean by his question?

-- 
underground experts united
http://user.it.uu.se/~embe8573




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-05  7:05           ` Udyant Wig
  2017-07-05 16:06             ` Emanuel Berg
@ 2017-07-13 17:45             ` Thien-Thi Nguyen
  2017-07-14  1:48               ` Udyant Wig
  1 sibling, 1 reply; 25+ messages in thread
From: Thien-Thi Nguyen @ 2017-07-13 17:45 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 881 bytes --]

() Udyant Wig <udyant.wig@gmail.com>
() Wed, 5 Jul 2017 12:35:35 +0530

   By 'good', I meant that [...] clean interface to the user,
   integrated well with the rest of Emacs; one can then work as
   though the tool (or the above mentioned definite core
   functionality of it) were an indistinguishable part of Emacs.

   [...]

   Did this help?

Yes, thanks.  Of late, i wonder a lot about how other people
(programmers and non-programmers) perceive and define "good",
and how those perceptions and definitions evolve (or not) over
time.  I'm happy to say your words make sense to me.

-- 
Thien-Thi Nguyen -----------------------------------------------
 (defun responsep (query)
   (pcase (context query)
     (`(technical ,ml) (correctp ml))
     ...))                              748E A0E8 1CB8 A748 9BFA
--------------------------------------- 6CE4 6703 2224 4C80 7502

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to translate LaTeX into UTF-8 in Elisp?
  2017-07-13 17:45             ` Thien-Thi Nguyen
@ 2017-07-14  1:48               ` Udyant Wig
  0 siblings, 0 replies; 25+ messages in thread
From: Udyant Wig @ 2017-07-14  1:48 UTC (permalink / raw)
  To: help-gnu-emacs


[-- Attachment #1.1: Type: text/plain, Size: 789 bytes --]

On 07/13/2017 11:15 PM, Thien-Thi Nguyen wrote:
> Yes, thanks.  Of late, i wonder a lot about how other people
> (programmers and non-programmers) perceive and define "good", and how
> those perceptions and definitions evolve (or not) over time.  I'm
> happy to say your words make sense to me.

I'm glad you found them helpful.  In the belief that there can be no
/final/ definition of 'good' software, I link the following.

<URL:https://www.eskimo.com/~scs/readings/software_elegance.html>

These are the words of a long-time C programmer; also the maintainer of
the C FAQs.

-- 
... while the ways of art are hard at the best, they will break you if
you go unsustained by belief in what you are trying to do.
                                -- Arthur Quiller-Couch


[-- Attachment #1.2: 0xD133994A.asc --]
[-- Type: application/pgp-keys, Size: 3199 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-07-14  1:48 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski
2016-12-08 18:21 ` Carlos Konstanski
2016-12-08 19:13   ` Marcin Borkowski
2016-12-08 22:12 ` Stefan Monnier
2017-01-27 11:48   ` Marcin Borkowski
2017-01-28  8:15 ` Kendall Shaw
2017-07-03  4:56 ` Marcin Borkowski
2017-07-03  5:43   ` Emanuel Berg
2017-07-03  9:16     ` Marcin Borkowski
2017-07-03  9:31       ` tomas
2017-07-04  5:55         ` Marcin Borkowski
2017-07-03 10:24       ` Emanuel Berg
2017-07-03 17:36         ` Marcin Borkowski
2017-07-03 20:01           ` Emanuel Berg
2017-07-04 10:23           ` Héctor Lahoz
2017-07-03  8:37   ` Teemu Likonen
2017-07-04  5:57     ` Marcin Borkowski
2017-07-04  7:13       ` Udyant Wig
2017-07-04  9:27         ` Thien-Thi Nguyen
2017-07-04 20:37           ` Emanuel Berg
2017-07-05  7:05           ` Udyant Wig
2017-07-05 16:06             ` Emanuel Berg
2017-07-13 17:45             ` Thien-Thi Nguyen
2017-07-14  1:48               ` Udyant Wig
2017-07-04 11:18   ` Joost Kremers

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.