From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: "James K. Lowden" Newsgroups: gmane.emacs.help Subject: Re: Is there a way to "asciify" a string? Date: Thu, 31 May 2018 19:23:48 -0400 Organization: http://www.NewsDemon.com Message-ID: <20180531192348.22baa2917129486248557378@speakeasy.net> References: <87zi0llisj.fsf@mbork.pl> <20180527073645.GB17354@tuxteam.de> <87y3g5l1h0.fsf@mbork.pl> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1527809016 26698 195.159.176.226 (31 May 2018 23:23:36 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 31 May 2018 23:23:36 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Jun 01 01:23:32 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fOWup-0006s3-W3 for geh-help-gnu-emacs@m.gmane.org; Fri, 01 Jun 2018 01:23:32 +0200 Original-Received: from localhost ([::1]:46636 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fOWwx-0001YM-3E for geh-help-gnu-emacs@m.gmane.org; Thu, 31 May 2018 19:25:43 -0400 Original-Path: usenet.stanford.edu!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!85.12.16.68.MISMATCH!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer04.fr7!futter-mich.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!post01.iad!fx36.iad.POSTED!not-for-mail Original-Newsgroups: gnu.emacs.help X-Newsreader: Sylpheed 3.4.3 (GTK+ 2.24.28; x86_64--netbsd) Original-Lines: 47 Original-X-Complaints-To: abuse@newsdemon.com Original-NNTP-Posting-Date: Thu, 31 May 2018 23:23:59 UTC X-Received-Bytes: 2811 X-Received-Body-CRC: 1536652265 Original-Xref: usenet.stanford.edu gnu.emacs.help:222811 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:116932 Archived-At: On Thu, 31 May 2018 17:42:33 +0200 Marcin Borkowski wrote: > > I really strongly recommend you try to solve this problem by doing > > nothing: keep the name in its full glory. Nowadays users *should* > > expect this to work. >=20 > It's tempting, but no: these files will eventually be sent to > e.g. people on Windows XP and the like. I don't want to take risks of > unreadable filenames. It's good advice, though treacherous. If you use any encoding other than ASCII, you'll need to indicate the encoding used, and put up with recipients who don't know what "encoding" is, or can't re-encode the names to their machine's preferred encoding. =20 For instance, if you send UTF-8, you can expect befuddlement from Windows users, whose system implicitly recognizes UTF-16LE. =20 I can hardly blame you for not wanting to do that. =20 If Windows's filename rules were the actual constraint, the allowed characters in a Windows filename is well defined. The prohibited characters could be URL-encoded or similar. That would yield a recognizable, unique name, and the original could be recovered by reversing the process. =20 If I were solving your problem, I'd look for something similar to what you describe, but wholly reversible. I'd use ascii//TRANSLIT or similar to get the "unaccented" version of the character, and insert a URL-style escape after each one representing the original Unicode character in hex. So,=20 J=F6nk=F6ping becomes Jo%F6nko%F6ping If you escape literal percent signs, too, ("%" becomes "%%25") then the reversal rule is simply "for every /%[:xdigit:]{2}/, replace the previous character with the indicated codepoint". =20 This approach preserves uniqueness in the filename, so you can dispense with "uniquifying" it with a meaningless integer. =20 --jkl