From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Marcin Borkowski Newsgroups: gmane.emacs.help Subject: Re: Is there a way to "asciify" a string? Date: Sun, 27 May 2018 14:36:27 +0200 Message-ID: <87y3g5l1h0.fsf@mbork.pl> References: <87zi0llisj.fsf@mbork.pl> <20180527073645.GB17354@tuxteam.de> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1527424565 25809 195.159.176.226 (27 May 2018 12:36:05 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 27 May 2018 12:36:05 +0000 (UTC) User-Agent: mu4e 1.1.0; emacs 27.0.50 Cc: help-gnu-emacs@gnu.org To: tomas@tuxteam.de Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sun May 27 14:36:01 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fMuu0-0006cL-Cr for geh-help-gnu-emacs@m.gmane.org; Sun, 27 May 2018 14:36:00 +0200 Original-Received: from localhost ([::1]:51994 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fMuw7-0001TO-7C for geh-help-gnu-emacs@m.gmane.org; Sun, 27 May 2018 08:38:11 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45557) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fMuve-0001TJ-0T for help-gnu-emacs@gnu.org; Sun, 27 May 2018 08:37:43 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fMuva-0007bf-2c for help-gnu-emacs@gnu.org; Sun, 27 May 2018 08:37:42 -0400 Original-Received: from mail.mojserwer.eu ([195.110.48.8]:38715) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fMuvZ-0007aj-R7 for help-gnu-emacs@gnu.org; Sun, 27 May 2018 08:37:38 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by mail.mojserwer.eu (Postfix) with ESMTP id 3C28CE6C5C; Sun, 27 May 2018 14:37:36 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.mojserwer.eu Original-Received: from mail.mojserwer.eu ([127.0.0.1]) by localhost (mail.mojserwer.eu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Fo1lEjxyBkwB; Sun, 27 May 2018 14:37:32 +0200 (CEST) Original-Received: from localhost (static-dwadziewiec-jedenpiec7.echostar.pl [109.232.29.157]) by mail.mojserwer.eu (Postfix) with ESMTPSA id 4DC04E6831; Sun, 27 May 2018 14:37:32 +0200 (CEST) In-reply-to: <20180527073645.GB17354@tuxteam.de> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.110.48.8 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:116867 Archived-At: On 2018-05-27, at 09:36, tomas@tuxteam.de wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Sun, May 27, 2018 at 08:22:20AM +0200, Marcin Borkowski wrote: >> Hi all, >> >> I want to convert e.g. "=C5=BC=C3=B3=C5=82=C4=87" to "zolc", or "Poincar= =C3=A9" to "Poincare" >> etc. IOW, I want to replace all these funny Unicode accented characters >> with their ASCII equivalents. >> >> Is there anything for that in Emacs? > > I haven't an answer to your direct question, just a warning: without a > language context, you can't do it "correctly". For one illustrative > example, in German "=C3=BC" -> "ue", but in Spanish "=C3=BC" -> "u" (thos= e diaereses > do have different functions in those languages). Transliterating "=C3=BC"= with > just "u" in German would be wrong (but the reader might make some sense > of it), transliterating "=C3=BC" with "ue" in Spanish would not only be w= rong, > but would almost certainly throw off the reader's auto-correction feature > (unles (s)he knows German and can recall that association). > > I'm sure there are tons of other examples like that. > > Heck, even up- and downcasing is strictly language context dependent > (witness the Turkish dotless I). > > Sigh :-) I understand that. Still, I need something *simple*. I have a person's name (possibly with some national characters), and I want to derive a filename from it. It doesn't have to be correct in 100% cases. It doesn't even have to be unambiguous (there will be a number for that in the filename, too). At worst, I might just reimplement `tr' in Emacs and use it to convert Polish letters to their Latin equivalents (which will cover 99.9% cases), but I thought that with the (in)famous char folding etc. Emacs can handle this out of the box. (BTW, if there is some command-line utility to do that, that's fine too.) Best, -- Marcin Borkowski http://mbork.pl