unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: "James K. Lowden" <jklowden@speakeasy.net>
To: help-gnu-emacs@gnu.org
Subject: Re: Is there a way to "asciify" a string?
Date: Thu, 31 May 2018 19:23:48 -0400	[thread overview]
Message-ID: <20180531192348.22baa2917129486248557378@speakeasy.net> (raw)
In-Reply-To: mailman.871.1527781438.1292.help-gnu-emacs@gnu.org

On Thu, 31 May 2018 17:42:33 +0200
Marcin Borkowski <mbork@mbork.pl> wrote:

> > I really strongly recommend you try to solve this problem by doing
> > nothing: keep the name in its full glory.  Nowadays users *should*
> > expect this to work.
> 
> It's tempting, but no: these files will eventually be sent to
> e.g. people on Windows XP and the like.  I don't want to take risks of
> unreadable filenames.

It's good advice, though treacherous.  If you use any encoding other
than ASCII, you'll need to indicate the encoding used, and put up with
recipients who don't know what "encoding" is, or can't re-encode the
names to their machine's preferred encoding.  

For instance, if you send UTF-8, you can expect befuddlement from
Windows users, whose system implicitly recognizes UTF-16LE.  

I can hardly blame you for not wanting to do that.  

If Windows's filename rules were the actual constraint, the allowed
characters in a Windows filename is well defined.  The
prohibited characters could be URL-encoded or similar.  That would
yield a recognizable, unique name, and the original could be recovered
by reversing the process.  

If I were solving your problem, I'd look for something similar to what
you describe, but wholly reversible.  I'd use ascii//TRANSLIT or similar
to get the "unaccented" version of the character, and insert a
URL-style escape after each one representing the original
Unicode character in hex.  So, 

	Jönköping

becomes

	Jo%F6nko%F6ping

If you escape literal percent signs, too, ("%" becomes "%%25") then
the reversal rule is simply "for every /%[:xdigit:]{2}/, replace the
previous character with the indicated codepoint".  

This approach preserves uniqueness in the filename, so you can dispense
with "uniquifying" it with a meaningless integer.  

--jkl


  parent reply	other threads:[~2018-05-31 23:23 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-27  6:22 Is there a way to "asciify" a string? Marcin Borkowski
2018-05-27  7:36 ` tomas
2018-05-27 12:36   ` Marcin Borkowski
2018-05-27 12:52     ` Teemu Likonen
2018-05-27 16:07       ` Eli Zaretskii
2018-05-27 16:59         ` Teemu Likonen
2018-05-28  5:24           ` Tak Kunihiro
2018-05-30 10:12           ` Marcin Borkowski
2018-05-30 17:05             ` Eli Zaretskii
2018-05-30 19:38               ` Marcin Borkowski
2018-05-27 20:00         ` tomas
2018-05-28 18:27           ` Eli Zaretskii
2018-05-29  6:37             ` tomas
2018-05-27 13:04     ` Yuri Khan
2018-05-30 10:14       ` Marcin Borkowski
2018-05-30 11:51         ` Yuri Khan
2018-05-30 15:04           ` Marcin Borkowski
2018-05-31  2:03       ` John Mastro
2018-06-02 18:07         ` Marcin Borkowski
2018-06-02 18:48           ` tomas
2018-06-07 17:16             ` Marcin Borkowski
2018-06-02 22:33           ` Drew Adams
2018-06-07 17:15             ` Marcin Borkowski
2018-06-02 18:12         ` Marcin Borkowski
2018-05-27 19:53     ` tomas
2018-05-28  8:15     ` Philipp Stephani
2018-05-28 10:28       ` Marcin Borkowski
2018-05-28 10:39         ` tomas
2018-05-28 15:30           ` Yuri Khan
2018-05-28 16:02             ` tomas
2018-05-30 10:12           ` Marcin Borkowski
2018-05-31 14:23     ` Stefan Monnier
2018-05-31 15:08       ` S. Champailler
2018-05-31 22:52         ` Richard Wordingham
2018-05-31 15:42       ` Marcin Borkowski
2018-05-31 15:53         ` Eli Zaretskii
2018-05-31 16:20           ` Yuri Khan
2018-05-31 19:03           ` Stefan Monnier
     [not found]       ` <mailman.871.1527781438.1292.help-gnu-emacs@gnu.org>
2018-05-31 23:23         ` James K. Lowden [this message]
2018-06-01  2:04           ` Stefan Monnier
2018-06-01  7:02           ` Eli Zaretskii
2018-05-27 14:55 ` Eric Abrahamsen
2018-05-27 16:00 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180531192348.22baa2917129486248557378@speakeasy.net \
    --to=jklowden@speakeasy.net \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).