From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Marcin Borkowski Newsgroups: gmane.emacs.help Subject: Re: Is there a way to "asciify" a string? Date: Sat, 02 Jun 2018 20:07:50 +0200 Message-ID: <871sdp2hah.fsf@mbork.pl> References: <87zi0llisj.fsf@mbork.pl> <20180527073645.GB17354@tuxteam.de> <87y3g5l1h0.fsf@mbork.pl> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1527962865 2442 195.159.176.226 (2 Jun 2018 18:07:45 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 2 Jun 2018 18:07:45 +0000 (UTC) User-Agent: mu4e 1.1.0; emacs 27.0.50 Cc: Help Gnu Emacs mailing list To: John Mastro Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Jun 02 20:07:41 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fPAwE-0000WM-Jg for geh-help-gnu-emacs@m.gmane.org; Sat, 02 Jun 2018 20:07:38 +0200 Original-Received: from localhost ([::1]:60678 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fPAyL-0002LP-O1 for geh-help-gnu-emacs@m.gmane.org; Sat, 02 Jun 2018 14:09:49 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46452) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fPAxp-0002L3-OM for help-gnu-emacs@gnu.org; Sat, 02 Jun 2018 14:09:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fPAxk-0005V9-Pq for help-gnu-emacs@gnu.org; Sat, 02 Jun 2018 14:09:17 -0400 Original-Received: from mail.mojserwer.eu ([195.110.48.8]:36187) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fPAxk-0005Ur-Hi for help-gnu-emacs@gnu.org; Sat, 02 Jun 2018 14:09:12 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by mail.mojserwer.eu (Postfix) with ESMTP id CC91AE79A5; Sat, 2 Jun 2018 20:09:01 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.mojserwer.eu Original-Received: from mail.mojserwer.eu ([127.0.0.1]) by localhost (mail.mojserwer.eu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 20kXqK4Hg5Ff; Sat, 2 Jun 2018 20:08:58 +0200 (CEST) Original-Received: from localhost (static-dwadziewiec-jedenpiec7.echostar.pl [109.232.29.157]) by mail.mojserwer.eu (Postfix) with ESMTPSA id 541E7E74DE; Sat, 2 Jun 2018 20:08:58 +0200 (CEST) In-reply-to: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.110.48.8 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:116948 Archived-At: On 2018-05-31, at 04:03, John Mastro wrote: > Yuri Khan wrote: >> > I have a person's name (possibly with >> > some national characters), and I want to derive a filename from it. It >> > doesn't have to be correct in 100% cases. It doesn't even have to be >> > unambiguous (there will be a number for that in the filename, too). >> >> Technically you could use the name of a person as is, as long as it is >> representable in Unicode and contains neither the null character nor the >> slash character. But I assume you want a filename that is portable betwe= en >> file systems, or a filename that can be represented in an URI path segme= nt >> without %-encoding, or any combination of the above. >> >> In that case, the Python unidecode library is probably the closest that = you >> can find. But make very sure that the people involved never see their own >> name=E2=80=99s transliteration. > > There's also an Emacs Lisp port of unidecode[1] > > (unidecode "=C5=BC=C3=B3=C5=82=C4=87") > ;=3D> "zolc" > > [1]: https://github.com/sindikat/unidecode Thanks, and thanks also to all the others for their input. I didn't really intend to create such a storm. My use case is much, much simpler than much of the stuff mentioned in this thread. 99.5% (or more) of the cases are Polish names, where we have only 9 "offending" letters, all easily asciified. I thought there is a simple, general solution (and I learned there isn't and probably there can't be). Hence, I'm going to stick with Eli's suggestion (and manual conversion of "=C5=82" into "l"). And in case I encounter a non-Polish name with some letters outside the English alphabet (this may very rarely happen), I can just manually override this simple solution. IOW, KISS. But thanks for the opportunity to learn a few things! -- Marcin Borkowski http://mbork.pl