From: "Héctor Lahoz" <hectorlahoz@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Re: How to translate LaTeX into UTF-8 in Elisp?
Date: Tue, 4 Jul 2017 12:23:48 +0200 [thread overview]
Message-ID: <20170704102348.GA3579@workstation> (raw)
In-Reply-To: <87tw2t1kyf.fsf@jane>
Marcin Borkowski wrote:
> OK, so here is a proof of concept:
>
> --8<---------------cut here---------------start------------->8---
> (defvar TeX-to-Unicode-accents-alist
> '((?` . "grave")
> (?' . "acute")
> (?^ . "circumflex")
> (?\" . "diaeresis")
> (?H . "double acute")
> (?~ . "tilde")
> (?c . "with cedilla")
> (?k . "ogonek")
> (?= . "macron")
> (?. . "with dot above")
> (?u . "with breve")
> (?v . "with caron"))
> "A mapping from TeX control characters to accent names used in
> Unicode.")
>
> (defun combine-letter-diacritical-mark (letter mark)
> "Return a Unicode string of LETTER combined with MARK.
> MARK can be any character that can be used in TeX accenting
> commands."
> (let* ((letter (if (stringp letter)
> (string-to-char letter)
> letter))
> (uppercase (= letter
> (upcase letter))))
> (cdr (assoc-string
> (format "LATIN %s LETTER %c %s"
> (if uppercase "CAPITAL" "SMALL")
> letter
> (cdr (assoc mark TeX-to-Unicode-accents-alist)))
> ucs-names
> t))))
> --8<---------------cut here---------------end--------------->8---
>
Great.
Perhaps you could consider translating to unicode combining characters.
I think it is closer to the original TeX idea and could be cleaner:
0300;COMBINING GRAVE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING GRAVE;;;;
0301;COMBINING ACUTE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING ACUTE;;;;
0302;COMBINING CIRCUMFLEX ACCENT;Mn;230;NSM;;;;;N;NON-SPACING CIRCUMFLEX;;;;
0303;COMBINING TILDE;Mn;230;NSM;;;;;N;NON-SPACING TILDE;;;;
0304;COMBINING MACRON;Mn;230;NSM;;;;;N;NON-SPACING MACRON;;;;
0305;COMBINING OVERLINE;Mn;230;NSM;;;;;N;NON-SPACING OVERSCORE;;;;
0306;COMBINING BREVE;Mn;230;NSM;;;;;N;NON-SPACING BREVE;;;;
0307;COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;NON-SPACING DOT ABOVE;;;;
0308;COMBINING DIAERESIS;Mn;230;NSM;;;;;N;NON-SPACING DIAERESIS;;;;
0309;COMBINING HOOK ABOVE;Mn;230;NSM;;;;;N;NON-SPACING HOOK ABOVE;;;;
030A;COMBINING RING ABOVE;Mn;230;NSM;;;;;N;NON-SPACING RING ABOVE;;;;
030B;COMBINING DOUBLE ACUTE ACCENT;Mn;230;NSM;;;;;N;NON-SPACING DOUBLE ACUTE;;;;
030C;COMBINING CARON;Mn;230;NSM;;;;;N;NON-SPACING HACEK;;;;
030D;COMBINING VERTICAL LINE ABOVE;Mn;230;NSM;;;;;N;NON-SPACING VERTICAL LINE ABOVE;;;;
See the wikipedia article on unicode equivalence:
https://en.wikipedia.org/wiki/Unicode_equivalence
The difference is that unicode reverses the order. First you have the
base character and then all combining characters. For example, \'a would
be translated to either
00E1;LATIN SMALL LETTER A WITH ACUTE
or
0061;LATIN SMALL LETTER A
0301;COMBINING ACUTE ACCENT
I don't know the implications of using unicode combining characters.
I guess the choice depends on the purpose of the output.
next prev parent reply other threads:[~2017-07-04 10:23 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-08 17:04 How to translate LaTeX into UTF-8 in Elisp? Marcin Borkowski
2016-12-08 18:21 ` Carlos Konstanski
2016-12-08 19:13 ` Marcin Borkowski
2016-12-08 22:12 ` Stefan Monnier
2017-01-27 11:48 ` Marcin Borkowski
2017-01-28 8:15 ` Kendall Shaw
2017-07-03 4:56 ` Marcin Borkowski
2017-07-03 5:43 ` Emanuel Berg
2017-07-03 9:16 ` Marcin Borkowski
2017-07-03 9:31 ` tomas
2017-07-04 5:55 ` Marcin Borkowski
2017-07-03 10:24 ` Emanuel Berg
2017-07-03 17:36 ` Marcin Borkowski
2017-07-03 20:01 ` Emanuel Berg
2017-07-04 10:23 ` Héctor Lahoz [this message]
2017-07-03 8:37 ` Teemu Likonen
2017-07-04 5:57 ` Marcin Borkowski
2017-07-04 7:13 ` Udyant Wig
2017-07-04 9:27 ` Thien-Thi Nguyen
2017-07-04 20:37 ` Emanuel Berg
2017-07-05 7:05 ` Udyant Wig
2017-07-05 16:06 ` Emanuel Berg
2017-07-13 17:45 ` Thien-Thi Nguyen
2017-07-14 1:48 ` Udyant Wig
2017-07-04 11:18 ` Joost Kremers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170704102348.GA3579@workstation \
--to=hectorlahoz@gmail.com \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).