From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: Re: 01/02: utils: Change 'patch-shebangs' to use binary input. Date: Fri, 27 Feb 2015 23:30:04 -0500 Message-ID: <87r3tay7xv.fsf@netris.org> References: <20150228001057.17733.82336@vcs.savannah.gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:53233) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YRZ2K-0004fM-6e for guix-devel@gnu.org; Fri, 27 Feb 2015 23:30:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YRZ2F-00023g-Du for guix-devel@gnu.org; Fri, 27 Feb 2015 23:29:56 -0500 In-Reply-To: (Ludovic Court's message of "Sat, 28 Feb 2015 00:10:58 +0000") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: guix-devel@gnu.org Ludovic Court=C3=A8s writes: > commit ca1e3ad2faa59d5b32289f84e0937fa476e21a1a > Author: Ludovic Court=C3=A8s > Date: Sat Feb 28 01:01:51 2015 +0100 > > utils: Change 'patch-shebangs' to use binary input. >=20=20=20=20=20 > * guix/build/utils.scm (get-char*): New procedure. > (patch-shebang): Use it instead of 'read-char'. > (fold-port-matches): Remove local 'get-char' and use 'get-char*' > instead. > --- > guix/build/utils.scm | 22 +++++++++++----------- > 1 files changed, 11 insertions(+), 11 deletions(-) > > diff --git a/guix/build/utils.scm b/guix/build/utils.scm > index a3f8911..c98c4ca 100644 > --- a/guix/build/utils.scm > +++ b/guix/build/utils.scm > @@ -618,6 +618,14 @@ transferred and the continuation of the transfer as = a thunk." > (stat:atimensec stat) > (stat:mtimensec stat))) >=20=20 > +(define (get-char* p) > + ;; We call it `get-char', but that's really a binary version > + ;; thereof. (The real `get-char' cannot be used here because our > + ;; bootstrap Guile is hacked to always use UTF-8.) > + (match (get-u8 p) > + ((? integer? x) (integer->char x)) > + (x x))) > + This is equivalent to reading with the ISO-8859-1 encoding. The problem is that the procedures that use 'get-char*' will then typically use UTF-8 to write these characters back, so all non-ASCII characters will get corrupted by these filters. For now, I would suggest just using ISO-8859-1 for all of these build utilities that filter or substitute existing files, and then use the textual I/O procedures. A better solution going forward would be to implement and use a permissive UTF-8 encoding in Guile. What do you think? Mark