From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: John Mastro Newsgroups: gmane.emacs.help Subject: Re: "Unidecode" functionality in Emacs Date: Tue, 20 Mar 2018 10:23:03 -0700 Message-ID: References: <83r2ofmfo9.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1521566536 3062 195.159.176.226 (20 Mar 2018 17:22:16 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 20 Mar 2018 17:22:16 +0000 (UTC) To: Help Gnu Emacs mailing list Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Mar 20 18:22:12 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eyKxf-0000hR-LK for geh-help-gnu-emacs@m.gmane.org; Tue, 20 Mar 2018 18:22:11 +0100 Original-Received: from localhost ([::1]:50860 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eyKzi-0007j8-Oy for geh-help-gnu-emacs@m.gmane.org; Tue, 20 Mar 2018 13:24:18 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47656) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eyKys-0007gP-5J for help-gnu-emacs@gnu.org; Tue, 20 Mar 2018 13:23:27 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eyKyr-0006EO-51 for help-gnu-emacs@gnu.org; Tue, 20 Mar 2018 13:23:26 -0400 Original-Received: from mail-qt0-x22c.google.com ([2607:f8b0:400d:c0d::22c]:37548) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eyKyr-0006Dy-15 for help-gnu-emacs@gnu.org; Tue, 20 Mar 2018 13:23:25 -0400 Original-Received: by mail-qt0-x22c.google.com with SMTP id n11so2139595qti.4 for ; Tue, 20 Mar 2018 10:23:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-transfer-encoding; bh=TaBBz63Tw/VcA/L+4lG/p2bb2KCMPSKLAhzKk8EDz44=; b=emV/h1nMBY1uVYP8zf5DezY2R4rr2U7L2/VkbRpWoXjsOAmRnOLUvutFE6hpQNutqR ca4e4KdhJCt+VDt9piGndwJSC9xvn/6TMHr6t95VZB1vrv7g8E7YqyNrPnMB2Aq1miqy qLIWSG6LEtO57qvQOXd5BtFA/ZJN2XWtIlCZFLVVYXMT9bBBV2MSVOYbuGsbGEg7InRq gaxbBibzVLHSggxhvcsTZUwSd1EzK8BW5QJ7qTi7tFj3wgrXHkXk+wgtyqKwMsgI7+sy MHpKNcprhG8cui7WooyngROL4BQC5q1kgXXVL1cDWHNpA6IuddLTM1F2vvp5NHiZRoJn OPVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-transfer-encoding; bh=TaBBz63Tw/VcA/L+4lG/p2bb2KCMPSKLAhzKk8EDz44=; b=HxL1X7Qe6xgbq7vHUM+VdAqFdNO3+k8Yu+qAItmV6ReEIqao/JvnhY9xD8+L45HeHo 3aQ7A1/xn+wN5zZMStuLnU9Xye7vYxe2b1qmaoik3elzDKTUufa27fs/pngORsGoh5RA C4TBFH8x42TMYyR3n6VgjK549xjFd5uVCDyeRoAZs6ZQPsDg8Z8fFnhPeALCyKHXpE9y 2WPq1JyZNUfxJD/pa3pktG6LKifaDitY88+Nv5GwlaOr04YnQ2GRD6x3X0UYD5AgB27E 3qu5xlD1b6oJTdjQCDHw/rd6yLak96I4uQ/8AfBWjjvF3JUwSpGPwQ9wfuVMKXcGPU3F WlzA== X-Gm-Message-State: AElRT7HOajg51X4cDoU3ScrGqyqxdB433Eqa4WLKmarHanqEyxwUMnwD 8h9zcqaanWUKpSa+cRHPP6I8xxl4SIX23Z0GeetR6ysl X-Google-Smtp-Source: AG47ELu9la+/1Za6kz3NOO9jnozGdbH+utMlS6uG0Fu3xWCihJ2TdRaLJ8bXsLtyTFw4nMOudvzGCMprsT8/Nmc3hEc= X-Received: by 10.200.58.69 with SMTP id w63mr25767573qte.339.1521566603994; Tue, 20 Mar 2018 10:23:23 -0700 (PDT) Original-Received: by 10.200.48.179 with HTTP; Tue, 20 Mar 2018 10:23:03 -0700 (PDT) In-Reply-To: <83r2ofmfo9.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c0d::22c X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:116212 Archived-At: Eli Zaretskii wrote: >> There are "Unidecode" packages for Perl[1], Python[2], and Emacs[3] >> (derived from one another in that order). They each transliterate >> Unicode text to ASCII, e.g.: >> >> (unidecode "D=C3=A9j=C3=A0 vu") >> ;=3D> "Deja vu" >> (unidecode "=E5=8C=97=E4=BA=B0") >> ;=3D> "Bei Jing " >> >> Does Emacs have equivalent functionality built-in? > > It's possible to remove accents (the first example) using the > functionality in ucs-normalize.el. Some transliteration is possible > for scripts for which there exists a "transliteration" input method, > using the code by Michael Welsh Duggan posted here: > > http://lists.gnu.org/archive/html/emacs-devel/2018-02/msg00387.html > > For example, you can transliterate Cyrillic text using the > cyrillic-translit input method that comes with Emacs. But there are > no general-purpose transliteration capabilities in Emacs, AFAIK. Thanks, I'll take a look at those. > However, it looks like the Perl package is just a huge database of > precomputed transliterations, in which case doing the same in Emacs > Lisp should be almost trivial. Yep, that's how the Emacs package works too. It boils down to 25 lines of Lisp[1] plus the database[2]. Thanks John [1]: https://github.com/sindikat/unidecode/blob/5502ada9287b4012eabb879f12f= 5b0a9df52c5b7/unidecode.el#L56-L82 [2]: https://github.com/sindikat/unidecode/tree/5502ada9287b4012eabb879f12f= 5b0a9df52c5b7/data