From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: Locale of build environments Date: Sun, 01 Mar 2015 17:48:49 +0100 Message-ID: <877fv0br4e.fsf@gnu.org> References: <20150210201452.GA15529@debian> <87h9urt50j.fsf@netris.org> <87mw4iq3uz.fsf_-_@gnu.org> <87bnkgw9fn.fsf@gnu.org> <87y4nk6xv1.fsf@netris.org> <87mw3ztzcr.fsf@gnu.org> <87385rsaqz.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:47514) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YS735-0003Cc-Bw for guix-devel@gnu.org; Sun, 01 Mar 2015 11:49:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YS730-0002PS-AN for guix-devel@gnu.org; Sun, 01 Mar 2015 11:48:59 -0500 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:35097) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YS730-0002PO-6m for guix-devel@gnu.org; Sun, 01 Mar 2015 11:48:54 -0500 In-Reply-To: <87385rsaqz.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Fri, 27 Feb 2015 15:13:40 +0100") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Mark H Weaver Cc: guix-devel@gnu.org ludo@gnu.org (Ludovic Court=C3=A8s) skribis: > Besides, commit e8c9f04 is interesting: =E2=80=98substitute*=E2=80=99 wil= l now break > non-UTF-8 files by defaults (replacing invalid UTF-8 sequences with > question marks in the output.) Based on that observation, commit dd0a8ef forced the =E2=80=98patch-*=E2=80= =99 procedures to treat files as if they were ISO-8859-1=E2=80=93i.e., leaving = their byte sequence uninterpreted, and thus avoiding multibyte sequence decoding errors. Then, as Mark suggested, commit 4db8716 forces strict encoding/decoding errors. The problem then is that we=E2=80=99re getting things like : --8<---------------cut here---------------start------------->8--- phase `unpack' succeeded after 0 seconds starting phase `patch-usr-bin-file' patch-/usr/bin/file: ./configure: changing `/usr/bin/file' to `/gnu/store/a= 31g38iykai59jqmcwknxyjddc5zxm9b-file-5.22/bin/file' patch-/usr/bin/file: ./configure: changing `/usr/bin/file' to `/gnu/store/a= 31g38iykai59jqmcwknxyjddc5zxm9b-file-5.22/bin/file' patch-/usr/bin/file: ./configure: changing `/usr/bin/file' to `/gnu/store/a= 31g38iykai59jqmcwknxyjddc5zxm9b-file-5.22/bin/file' patch-/usr/bin/file: ./configure: changing `/usr/bin/file' to `/gnu/store/a= 31g38iykai59jqmcwknxyjddc5zxm9b-file-5.22/bin/file' patch-/usr/bin/file: ./configure: changing `/usr/bin/file' to `/gnu/store/a= 31g38iykai59jqmcwknxyjddc5zxm9b-file-5.22/bin/file' patch-/usr/bin/file: ./configure: changing `/usr/bin/file' to `/gnu/store/a= 31g38iykai59jqmcwknxyjddc5zxm9b-file-5.22/bin/file' patch-/usr/bin/file: ./configure: changing `/usr/bin/file' to `/gnu/store/a= 31g38iykai59jqmcwknxyjddc5zxm9b-file-5.22/bin/file' patch-/usr/bin/file: ./configure: changing `/usr/bin/file' to `/gnu/store/a= 31g38iykai59jqmcwknxyjddc5zxm9b-file-5.22/bin/file' patch-/usr/bin/file: ./configure: changing `/usr/bin/file' to `/gnu/store/a= 31g38iykai59jqmcwknxyjddc5zxm9b-file-5.22/bin/file' Backtrace: [...] 745: 10 [patch-/usr/bin/file "./configure" #:file-command ...] In ice-9/boot-9.scm: 171: 9 [with-throw-handler #t ...] 867: 8 [call-with-input-file "./configure" ...] In /gnu/store/wcrp88qjv5bfhwcsxhbiqfh29da8pg81-module-import/guix/build/uti= ls.scm: 474: 7 [# #] 500: 6 [# # ...] In srfi/srfi-1.scm: 465: 5 [fold # ...] In /gnu/store/wcrp88qjv5bfhwcsxhbiqfh29da8pg81-module-import/guix/build/uti= ls.scm: 503: 4 [# # ...] In ice-9/regex.scm: 189: 3 [list-matches # ...] 176: 2 [fold-matches # ...] In unknown file: ?: 1 [regexp-exec # ...] In ice-9/boot-9.scm: 106: 0 [# en= coding-error ...] ice-9/boot-9.scm:106:20: In procedure #: ice-9/boot-9.scm:106:20: Throw to key `encoding-error' with args `("scm_to_= stringn" "cannot convert narrow string to output locale" 84 #f #f)'. --8<---------------cut here---------------end--------------->8--- The failure here occurs when using =E2=80=98guile-final=E2=80=99 (which has= full iconv support.) When it stumbles upon the =C2=A9 sign in =E2=80=98configure=E2= =80=99, it reads it, with =E2=80=98read-line=E2=80=99, as the sequence #\302 #\251. However, when passing that line back to =E2=80=98regexp-exec=E2=80=99, =E2= =80=98regex-exec=E2=80=99 calls =E2=80=98scm_to_locale_string=E2=80=99 on it, which fails with the er= ror above: this is because, in this build, we=E2=80=99re running on the C locale and #= \302 aka. #\=C3=82 cannot be represented in ASCII (the encoding of the C locale.) To solve that problem, commit 87c8b92 makes UTF-8 locales available right after =E2=80=98guile-final=E2=80=99 is built. That way, calls to =E2=80=98scm_to_locale_string=E2=80=99 actually convert to UTF-8, which alw= ays work. (Note that the bootstrap Guile doesn=E2=80=99t have this problem because it= uses UTF-8 for everything and ignores locale settings.) Hopefully we can enable full builds of =E2=80=98core-updates=E2=80=99 very = soon now. Ludo=E2=80=99.