From: Marko Rauhamaa <marko@pacujo.net>
To: Eli Zaretskii <eliz@gnu.org>
Cc: guile-user@gnu.org, dak@gnu.org
Subject: Re: guile can't find a chinese named file
Date: Mon, 30 Jan 2017 22:46:26 +0200 [thread overview]
Message-ID: <87tw8gb7j1.fsf@elektro.pacujo.net> (raw)
In-Reply-To: <83r33kwd24.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 30 Jan 2017 21:41:23 +0200")
Eli Zaretskii <eliz@gnu.org>:
>> From: Marko Rauhamaa <marko@pacujo.net>
>>
>> UTF-8 beautifully bridges the interpretation gap between 8-bit character
>> strings and text. However, the interpretation step should be done in the
>> application and not in the programming language.
>
> You can't do that in an environment that specifically targets
> sophisticated multi-lingual text processing independent of the outside
> locale. Unless you can interpret byte sequences as characters, you
> will be unable to even count characters in a range of text,
If you need to operate on Unicode text, have the application invoke the
UTF-8 (or locale-specific) decoder. However, have the application
request it instead of guessing that the environment is all Unicode.
> You do need "other typesetting effects", naturally, but that doesn't
> mean you can get away without more or less full support of Unicode
> nowadays.
Do support it, fully even, but let the application invoke the
conversion when appropriate.
> You are talking about programming, but we should instead think about
> applications -- those of them which need to process text, or even
> access files, as this discussion shows, do need decent Unicode
> support.
Why should opening a file require Unicode support if the underlying
operating system knows nothing about Unicode? I can open a any given
file in a tiny C program without any Unicode support, under Linux, that
is.
> E.g., users generally expect that decomposed and composed character
> sequences behave and are treated identically, although they are
> different byte-stream wise.
Linux begs to differ. Regardless of the locale, two different octet
sequences that ought to be equivalent UTF-8-wise will be considered
different pathnames under Linux.
I don't need a helicopter to walk across the street.
>> But is also causing unnecessary grief in the computer-computer
>> interface, where the classic textual naming and textual protocols
>> are actually cutely chosen octet-aligned binary formats.
>
> The universal acceptance of UTF-8 nowadays makes this much less of an
> issue, IME.
You are jumping the gun. Linux won't be there for a long time if ever.
Nothing prevents a pathname, or a command-line argument, or an
environment variable, or the standard input from containing illegal
UTF-8.
I also wouldn't like my SMTP server to throw a UTF-8 decoding exception
on parsing a command.
(Also note that even Windows allows pathnames with illegal Unicode in
them if I'm not mistaken.)
Marko
next prev parent reply other threads:[~2017-01-30 20:46 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-27 11:58 guile can't find a chinese named file Thomas Morley
2016-11-27 12:16 ` Chaos Eternal
2016-11-28 8:54 ` Thomas Morley
2017-01-26 21:59 ` Linas Vepstas
2017-01-30 14:20 ` Ludovic Courtès
2017-01-30 15:48 ` David Kastrup
2017-01-30 16:41 ` Ludovic Courtès
2017-01-30 17:04 ` David Kastrup
2017-01-30 15:54 ` Marko Rauhamaa
2017-01-30 16:19 ` David Kastrup
2017-01-30 16:33 ` Marko Rauhamaa
2017-01-30 16:42 ` David Kastrup
2017-01-30 17:58 ` Marko Rauhamaa
2017-01-30 18:32 ` David Kastrup
2017-01-30 18:50 ` Eli Zaretskii
2017-01-30 19:00 ` David Kastrup
2017-01-30 19:32 ` Eli Zaretskii
2017-01-30 19:59 ` Eli Zaretskii
2017-01-30 20:42 ` Mike Gran
2017-01-31 3:31 ` Eli Zaretskii
2017-01-31 6:16 ` Mike Gran
2017-01-31 8:51 ` David Kastrup
2017-01-30 19:01 ` Marko Rauhamaa
2017-01-30 19:27 ` David Kastrup
2017-02-14 20:10 ` Linas Vepstas
2017-02-14 20:54 ` Mike Gran
2017-02-14 21:07 ` Marko Rauhamaa
2017-02-14 21:52 ` Mike Gran
2017-02-14 22:12 ` Marko Rauhamaa
2017-02-14 22:19 ` Chris Vine
2017-02-15 7:15 ` Marko Rauhamaa
2017-02-15 9:18 ` tomas
2017-02-15 9:54 ` David Kastrup
2017-02-15 10:10 ` tomas
2017-02-15 17:04 ` Eli Zaretskii
2017-02-15 20:07 ` tomas
2017-02-15 20:22 ` Eli Zaretskii
2017-02-15 10:50 ` Marko Rauhamaa
2017-02-15 11:18 ` David Kastrup
2017-02-15 10:15 ` Chris Vine
2017-02-15 11:48 ` tomas
2017-02-15 12:13 ` Chris Vine
2017-02-15 12:41 ` tomas
2017-02-15 13:11 ` Chris Vine
2017-02-15 13:31 ` tomas
2017-02-15 17:07 ` Eli Zaretskii
2017-02-26 20:58 ` Andy Wingo
2017-02-27 16:02 ` Eli Zaretskii
2017-02-26 20:52 ` Andy Wingo
2017-02-15 16:59 ` Eli Zaretskii
2017-02-15 17:53 ` Marko Rauhamaa
2017-02-15 20:20 ` tomas
2017-02-15 20:32 ` Eli Zaretskii
2017-02-15 21:04 ` Marko Rauhamaa
2017-02-16 5:44 ` Eli Zaretskii
2017-02-16 6:15 ` Marko Rauhamaa
2017-02-16 6:29 ` Eli Zaretskii
2017-02-16 6:41 ` Eli Zaretskii
2017-02-16 7:16 ` Marko Rauhamaa
2017-02-16 8:26 ` David Kastrup
2017-02-16 10:21 ` Marko Rauhamaa
2017-02-16 10:43 ` David Kastrup
2017-02-16 11:04 ` Marko Rauhamaa
2017-02-16 11:11 ` David Kastrup
2017-02-16 11:32 ` Marko Rauhamaa
2017-02-16 11:49 ` David Kastrup
2017-02-16 12:14 ` Marko Rauhamaa
2017-02-16 16:21 ` Eli Zaretskii
2017-02-16 16:38 ` Marko Rauhamaa
2017-02-16 17:46 ` Eli Zaretskii
2017-02-16 18:38 ` Marko Rauhamaa
2017-02-16 18:46 ` Eli Zaretskii
2017-02-16 19:35 ` Marko Rauhamaa
2017-02-16 20:10 ` Eli Zaretskii
2017-02-16 20:52 ` David Kastrup
2017-02-16 21:13 ` Marko Rauhamaa
2017-02-17 6:44 ` Eli Zaretskii
2017-02-17 8:46 ` Marko Rauhamaa
2017-02-17 9:04 ` David Kastrup
2017-02-17 9:57 ` tomas
2017-02-17 9:07 ` Eli Zaretskii
2017-02-17 6:32 ` Eli Zaretskii
2017-02-16 16:06 ` Eli Zaretskii
2017-02-16 16:35 ` Marko Rauhamaa
2017-02-16 17:41 ` Eli Zaretskii
2017-02-16 18:30 ` Mike Gran
2017-02-16 18:48 ` David Kastrup
2017-02-16 7:02 ` Marko Rauhamaa
2017-02-16 15:47 ` Eli Zaretskii
2017-02-15 21:15 ` tomas
2017-02-16 5:54 ` Eli Zaretskii
2017-02-14 23:58 ` David Kastrup
2017-02-15 10:12 ` tomas
2017-02-15 12:04 ` Marko Rauhamaa
2017-02-26 21:20 ` Andy Wingo
2017-02-27 9:10 ` David Kastrup
2017-02-27 11:02 ` Andy Wingo
2017-02-27 12:09 ` David Kastrup
2017-02-27 12:33 ` Andy Wingo
2017-02-27 16:07 ` Eli Zaretskii
2017-02-27 19:29 ` Andy Wingo
2017-02-27 20:24 ` Jan Wedekind
2017-02-27 20:33 ` Eli Zaretskii
2017-02-14 22:26 ` Ludovic Courtès
2017-02-26 21:23 ` Andy Wingo
2017-01-30 19:41 ` Eli Zaretskii
2017-01-30 20:46 ` Marko Rauhamaa [this message]
2017-01-31 12:20 ` tomas
2017-02-14 19:58 ` Linas Vepstas
2017-02-26 21:33 ` Andy Wingo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tw8gb7j1.fsf@elektro.pacujo.net \
--to=marko@pacujo.net \
--cc=dak@gnu.org \
--cc=eliz@gnu.org \
--cc=guile-user@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).