From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Marko Rauhamaa Newsgroups: gmane.lisp.guile.user Subject: Re: guile can't find a chinese named file Date: Thu, 16 Feb 2017 14:14:41 +0200 Message-ID: <871suyuyby.fsf@elektro.pacujo.net> References: <878tpsqtzl.fsf@fencepost.gnu.org> <87r330cwhj.fsf@elektro.pacujo.net> <191859705.4469709.1487109121157@mail.yahoo.com> <20170214221914.1483ddb1@bother.homenet> <20170215091832.GA28017@tuxteam.de> <83inobz8yl.fsf@gnu.org> <20170215202056.GB3723@tuxteam.de> <83d1ejyz2e.fsf@gnu.org> <87y3x7kvwr.fsf@elektro.pacujo.net> <837f4qzo31.fsf@gnu.org> <87h93u4q5e.fsf@elektro.pacujo.net> <831suyzm11.fsf@gnu.org> <83zihmy6wb.fsf@gnu.org> <87wpcq38sa.fsf@elektro.pacujo.net> <87efyy4k4c.fsf@fencepost.gnu.org> <87mvdmv3kg.fsf@elektro.pacujo.net> <8760ka4drd.fsf@fencepost.gnu.org> <87h93uv1kl.fsf@elektro.pacujo.net> <871suy4cha.fsf@fencepost.gnu.org> <877f4qv0a8.fsf@elektro.pacujo.net> <87wpcq2w58.fsf@fencepost.gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1487247806 32293 195.159.176.226 (16 Feb 2017 12:23:26 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 16 Feb 2017 12:23:26 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) Cc: guile-user@gnu.org To: David Kastrup Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Thu Feb 16 13:23:20 2017 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ceL5i-0007ln-Fv for guile-user@m.gmane.org; Thu, 16 Feb 2017 13:23:18 +0100 Original-Received: from localhost ([::1]:46065 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ceL5n-0001QS-Tn for guile-user@m.gmane.org; Thu, 16 Feb 2017 07:23:23 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43947) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ceKxU-0003N3-6P for guile-user@gnu.org; Thu, 16 Feb 2017 07:14:49 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ceKxT-0001tB-0t for guile-user@gnu.org; Thu, 16 Feb 2017 07:14:48 -0500 Original-Received: from [2001:1bc8:1a0:5384:7a2b:cbff:fe9f:e508] (port=33172 helo=pacujo.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ceKxR-0001o1-54; Thu, 16 Feb 2017 07:14:45 -0500 Original-Received: from elektro.pacujo.net (192.168.1.200) by elektro.pacujo.net; Thu, 16 Feb 2017 14:14:41 +0200 Original-Received: by elektro.pacujo.net (sSMTP sendmail emulation); Thu, 16 Feb 2017 14:14:41 +0200 In-Reply-To: <87wpcq2w58.fsf@fencepost.gnu.org> (David Kastrup's message of "Thu, 16 Feb 2017 12:49:23 +0100") X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2001:1bc8:1a0:5384:7a2b:cbff:fe9f:e508 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:13256 Archived-At: David Kastrup : > Marko Rauhamaa writes: >> And the point of bringing concatenation into the discussion was that >> remapping byte sequences to byte sequences breaks concatenation >> additivity: >> >> U(x) + U(y) = U(x + y) > > But Emacs' implementation doesn't in any respect "break concatenation > additivity". > > If you split an arbitrary byte stream (including material invalid as > UTF-8) at an arbitrary point (including in the middle of an UTF-8 > character), decode the resulting pieces as UTF-8 (as one of several > "reversible" encodings Emacs can interpret), concatenate the resulting > Emacs strings and reencode the result as UTF-8 (since you actually > need to provide a byte sequence to open(1) or similar), you will > retain the original byte stream. No ifs and buts. > > The _decoded_ concatenated string might differ from decoding the > unsplit byte string: it might contain "byte 0xc2, byte 0x80" > (represented as 0xc1 0x82 0xc0 0x80) at the concatenation point rather > than "character 0x80" (represented as 0xc2 0x80). But the moment you > use this concatenation of half-sequences as a file name, it gets > reencoded into the bytes 0xc2 and 0x80 and works just fine. That is already a lot, maybe even enough. (On the other side of the equation, expressing a filename in Unicode may not produce an unambiguous code point sequence... ) Marko