From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Marko Rauhamaa Newsgroups: gmane.lisp.guile.user Subject: Re: guile can't find a chinese named file Date: Mon, 30 Jan 2017 22:46:26 +0200 Message-ID: <87tw8gb7j1.fsf@elektro.pacujo.net> References: <874m0gd3z4.fsf@gnu.org> <87wpdc8rx7.fsf@elektro.pacujo.net> <87poj4r04c.fsf@fencepost.gnu.org> <87k29c8q3b.fsf@elektro.pacujo.net> <87h94gqz34.fsf@fencepost.gnu.org> <87fuk0ctve.fsf@elektro.pacujo.net> <878tpsqtzl.fsf@fencepost.gnu.org> <87zii8bcdw.fsf@elektro.pacujo.net> <83r33kwd24.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1485809215 31327 195.159.176.226 (30 Jan 2017 20:46:55 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 30 Jan 2017 20:46:55 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) Cc: guile-user@gnu.org, dak@gnu.org To: Eli Zaretskii Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Mon Jan 30 21:46:51 2017 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cYIqg-00080Q-MQ for guile-user@m.gmane.org; Mon, 30 Jan 2017 21:46:50 +0100 Original-Received: from localhost ([::1]:35004 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cYIqm-0007uC-60 for guile-user@m.gmane.org; Mon, 30 Jan 2017 15:46:56 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58080) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cYIqO-0007u6-Cv for guile-user@gnu.org; Mon, 30 Jan 2017 15:46:33 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cYIqN-0003wz-2J for guile-user@gnu.org; Mon, 30 Jan 2017 15:46:32 -0500 Original-Received: from [2001:1bc8:1a0:5384:7a2b:cbff:fe9f:e508] (port=58244 helo=pacujo.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cYIqK-0003wT-Rv; Mon, 30 Jan 2017 15:46:29 -0500 Original-Received: from elektro.pacujo.net (192.168.1.200) by elektro.pacujo.net; Mon, 30 Jan 2017 22:46:26 +0200 Original-Received: by elektro.pacujo.net (sSMTP sendmail emulation); Mon, 30 Jan 2017 22:46:26 +0200 In-Reply-To: <83r33kwd24.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 30 Jan 2017 21:41:23 +0200") X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2001:1bc8:1a0:5384:7a2b:cbff:fe9f:e508 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:13154 Archived-At: Eli Zaretskii : >> From: Marko Rauhamaa >> >> UTF-8 beautifully bridges the interpretation gap between 8-bit character >> strings and text. However, the interpretation step should be done in the >> application and not in the programming language. > > You can't do that in an environment that specifically targets > sophisticated multi-lingual text processing independent of the outside > locale. Unless you can interpret byte sequences as characters, you > will be unable to even count characters in a range of text, If you need to operate on Unicode text, have the application invoke the UTF-8 (or locale-specific) decoder. However, have the application request it instead of guessing that the environment is all Unicode. > You do need "other typesetting effects", naturally, but that doesn't > mean you can get away without more or less full support of Unicode > nowadays. Do support it, fully even, but let the application invoke the conversion when appropriate. > You are talking about programming, but we should instead think about > applications -- those of them which need to process text, or even > access files, as this discussion shows, do need decent Unicode > support. Why should opening a file require Unicode support if the underlying operating system knows nothing about Unicode? I can open a any given file in a tiny C program without any Unicode support, under Linux, that is. > E.g., users generally expect that decomposed and composed character > sequences behave and are treated identically, although they are > different byte-stream wise. Linux begs to differ. Regardless of the locale, two different octet sequences that ought to be equivalent UTF-8-wise will be considered different pathnames under Linux. I don't need a helicopter to walk across the street. >> But is also causing unnecessary grief in the computer-computer >> interface, where the classic textual naming and textual protocols >> are actually cutely chosen octet-aligned binary formats. > > The universal acceptance of UTF-8 nowadays makes this much less of an > issue, IME. You are jumping the gun. Linux won't be there for a long time if ever. Nothing prevents a pathname, or a command-line argument, or an environment variable, or the standard input from containing illegal UTF-8. I also wouldn't like my SMTP server to throw a UTF-8 decoding exception on parsing a command. (Also note that even Windows allows pathnames with illegal Unicode in them if I'm not mistaken.) Marko