From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Marcin Borkowski Newsgroups: gmane.emacs.help Subject: Re: How to translate LaTeX into UTF-8 in Elisp? Date: Mon, 03 Jul 2017 19:36:40 +0200 Message-ID: <87tw2t1kyf.fsf@jane> References: <87shpyfj2q.fsf@mbork.pl> <87bmp2rud7.fsf@jane> <87y3s6vzwg.fsf@debian.uxu> <87van9dgo3.fsf@jane> <861spxx1h4.fsf@zoho.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1499103457 32536 195.159.176.226 (3 Jul 2017 17:37:37 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 3 Jul 2017 17:37:37 +0000 (UTC) User-Agent: mu4e 0.9.19; emacs 26.0.50 Cc: help-gnu-emacs@gnu.org To: Emanuel Berg Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Jul 03 19:37:30 2017 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dS5Hp-0007pO-Cx for geh-help-gnu-emacs@m.gmane.org; Mon, 03 Jul 2017 19:37:25 +0200 Original-Received: from localhost ([::1]:36521 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dS5Hu-0007ym-Li for geh-help-gnu-emacs@m.gmane.org; Mon, 03 Jul 2017 13:37:30 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38372) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dS5HQ-0007yR-0Z for help-gnu-emacs@gnu.org; Mon, 03 Jul 2017 13:37:01 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dS5HM-0005BK-3Y for help-gnu-emacs@gnu.org; Mon, 03 Jul 2017 13:37:00 -0400 Original-Received: from mail.mojserwer.eu ([195.110.48.8]:38758) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dS5HL-000570-Sh for help-gnu-emacs@gnu.org; Mon, 03 Jul 2017 13:36:56 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by mail.mojserwer.eu (Postfix) with ESMTP id AE15FE662A; Mon, 3 Jul 2017 19:36:51 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.mojserwer.eu Original-Received: from mail.mojserwer.eu ([127.0.0.1]) by localhost (mail.mojserwer.eu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x_6XBHV1fZsT; Mon, 3 Jul 2017 19:36:48 +0200 (CEST) Original-Received: from localhost (static-dwadziewiec-jedenpiec7.echostar.pl [109.232.29.157]) by mail.mojserwer.eu (Postfix) with ESMTPSA id 18F07E62F2; Mon, 3 Jul 2017 19:36:48 +0200 (CEST) In-reply-to: <861spxx1h4.fsf@zoho.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.110.48.8 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:113675 Archived-At: On 2017-07-03, at 12:24, Emanuel Berg wrote: > Marcin Borkowski wrote: > >> It is still a hack, since it relies on the >> Unicode names being correct. > > If it relied on the names being *in*correct, > that would make it a hack in the > negative sense. OK, so here is a proof of concept: --8<---------------cut here---------------start------------->8--- (defvar TeX-to-Unicode-accents-alist '((?` . "grave") (?' . "acute") (?^ . "circumflex") (?\" . "diaeresis") (?H . "double acute") (?~ . "tilde") (?c . "with cedilla") (?k . "ogonek") (?= . "macron") (?. . "with dot above") (?u . "with breve") (?v . "with caron")) "A mapping from TeX control characters to accent names used in Unicode.") (defun combine-letter-diacritical-mark (letter mark) "Return a Unicode string of LETTER combined with MARK. MARK can be any character that can be used in TeX accenting commands." (let* ((letter (if (stringp letter) (string-to-char letter) letter)) (uppercase (= letter (upcase letter)))) (cdr (assoc-string (format "LATIN %s LETTER %c %s" (if uppercase "CAPITAL" "SMALL") letter (cdr (assoc mark TeX-to-Unicode-accents-alist))) ucs-names t)))) --8<---------------cut here---------------end--------------->8--- As you can see from the mess in `TeX-to-Unicode-accents-alist', this _is_ a hack. Still, it seems to work more or less fine. Best, -- Marcin Borkowski