From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: David Kastrup Newsgroups: gmane.lisp.guile.user Subject: Re: guile can't find a chinese named file Date: Thu, 16 Feb 2017 12:49:23 +0100 Message-ID: <87wpcq2w58.fsf@fencepost.gnu.org> References: <878tpsqtzl.fsf@fencepost.gnu.org> <578885360.4452806.1487105647708@mail.yahoo.com> <87r330cwhj.fsf@elektro.pacujo.net> <191859705.4469709.1487109121157@mail.yahoo.com> <20170214221914.1483ddb1@bother.homenet> <20170215091832.GA28017@tuxteam.de> <83inobz8yl.fsf@gnu.org> <20170215202056.GB3723@tuxteam.de> <83d1ejyz2e.fsf@gnu.org> <87y3x7kvwr.fsf@elektro.pacujo.net> <837f4qzo31.fsf@gnu.org> <87h93u4q5e.fsf@elektro.pacujo.net> <831suyzm11.fsf@gnu.org> <83zihmy6wb.fsf@gnu.org> <87wpcq38sa.fsf@elektro.pacujo.net> <87efyy4k4c.fsf@fencepost.gnu.org> <87mvdmv3kg.fsf@elektro.pacujo.net> <8760ka4drd.fsf@fencepost.gnu.org> <87h93uv1kl.fsf@elektro.pacujo.net> <871suy4cha.fsf@fencepost.gnu.org> <877f4qv0a8.fsf@elektro.pacujo.net> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1487245800 25510 195.159.176.226 (16 Feb 2017 11:50:00 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 16 Feb 2017 11:50:00 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) Cc: guile-user@gnu.org To: Marko Rauhamaa Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Thu Feb 16 12:49:55 2017 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ceKZM-0005zv-UP for guile-user@m.gmane.org; Thu, 16 Feb 2017 12:49:53 +0100 Original-Received: from localhost ([::1]:45935 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ceKZR-0006Qw-47 for guile-user@m.gmane.org; Thu, 16 Feb 2017 06:49:57 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36627) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ceKZ1-0006Qn-CR for guile-user@gnu.org; Thu, 16 Feb 2017 06:49:32 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ceKYx-0005z0-G4 for guile-user@gnu.org; Thu, 16 Feb 2017 06:49:31 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:52131) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ceKYx-0005yw-Dm; Thu, 16 Feb 2017 06:49:27 -0500 Original-Received: from x2f414f4.dyn.telefonica.de ([2.244.20.244]:57850 helo=lola) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ceKYw-0007VZ-Ly; Thu, 16 Feb 2017 06:49:27 -0500 In-Reply-To: <877f4qv0a8.fsf@elektro.pacujo.net> (Marko Rauhamaa's message of "Thu, 16 Feb 2017 13:32:31 +0200") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:13255 Archived-At: Marko Rauhamaa writes: > David Kastrup : >> It's still irrelevant since split does not _use_ the existing file name >> for constructing new file names. > > Split was just an example of a command that concatenates bytes sequences > to get pathnames, nothing more. > > Such concatenation is commonplace in Linux programs of all kinds. > > And the point of bringing concatenation into the discussion was that > remapping byte sequences to byte sequences breaks concatenation > additivity: > > U(x) + U(y) = U(x + y) But Emacs' implementation doesn't in any respect "break concatenation additivity". If you split an arbitrary byte stream (including material invalid as UTF-8) at an arbitrary point (including in the middle of an UTF-8 character), decode the resulting pieces as UTF-8 (as one of several "reversible" encodings Emacs can interpret), concatenate the resulting Emacs strings and reencode the result as UTF-8 (since you actually need to provide a byte sequence to open(1) or similar), you will retain the original byte stream. No ifs and buts. The _decoded_ concatenated string might differ from decoding the unsplit byte string: it might contain "byte 0xc2, byte 0x80" (represented as 0xc1 0x82 0xc0 0x80) at the concatenation point rather than "character 0x80" (represented as 0xc2 0x80). But the moment you use this concatenation of half-sequences as a file name, it gets reencoded into the bytes 0xc2 and 0x80 and works just fine. -- David Kastrup