From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: "Unidecode" functionality in Emacs Date: Tue, 20 Mar 2018 08:20:22 +0200 Message-ID: <83r2ofmfo9.fsf@gnu.org> References: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1521526722 937 195.159.176.226 (20 Mar 2018 06:18:42 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 20 Mar 2018 06:18:42 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Mar 20 07:18:38 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eyAbS-0008WU-LZ for geh-help-gnu-emacs@m.gmane.org; Tue, 20 Mar 2018 07:18:34 +0100 Original-Received: from localhost ([::1]:46386 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eyAdV-0006q4-LC for geh-help-gnu-emacs@m.gmane.org; Tue, 20 Mar 2018 02:20:41 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34560) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eyAd4-0006py-UM for help-gnu-emacs@gnu.org; Tue, 20 Mar 2018 02:20:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eyAd2-0000w1-7E for help-gnu-emacs@gnu.org; Tue, 20 Mar 2018 02:20:14 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:45512) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eyAd2-0000vc-43 for help-gnu-emacs@gnu.org; Tue, 20 Mar 2018 02:20:12 -0400 Original-Received: from [176.228.60.248] (port=3813 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1eyAd1-00085s-6K for help-gnu-emacs@gnu.org; Tue, 20 Mar 2018 02:20:11 -0400 In-reply-to: (message from John Mastro on Mon, 19 Mar 2018 15:04:29 -0700) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:116210 Archived-At: > From: John Mastro > Date: Mon, 19 Mar 2018 15:04:29 -0700 > > There are "Unidecode" packages for Perl[1], Python[2], and Emacs[3] > (derived from one another in that order). They each transliterate > Unicode text to ASCII, e.g.: > > (unidecode "Déjà vu") > ;=> "Deja vu" > (unidecode "北亰") > ;=> "Bei Jing " > > Does Emacs have equivalent functionality built-in? It's possible to remove accents (the first example) using the functionality in ucs-normalize.el. Some transliteration is possible for scripts for which there exists a "transliteration" input method, using the code by Michael Welsh Duggan posted here: http://lists.gnu.org/archive/html/emacs-devel/2018-02/msg00387.html For example, you can transliterate Cyrillic text using the cyrillic-translit input method that comes with Emacs. But there are no general-purpose transliteration capabilities in Emacs, AFAIK. However, it looks like the Perl package is just a huge database of precomputed transliterations, in which case doing the same in Emacs Lisp should be almost trivial.