From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: libc upgrade vs. incompatible locales Date: Sun, 30 Aug 2015 21:46:11 +0200 Message-ID: <87zj18h858.fsf_-_@gnu.org> References: <87vbcnb2vp.fsf@igalia.com> <871tfapi6h.fsf@netris.org> <877fp25j6g.fsf@igalia.com> <87egj0k0ld.fsf@gnu.org> <87bne26h9b.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:42072) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZW8YT-00048s-RE for guix-devel@gnu.org; Sun, 30 Aug 2015 15:46:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZW8YQ-0004qA-IO for guix-devel@gnu.org; Sun, 30 Aug 2015 15:46:17 -0400 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:39435) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZW8YQ-0004q6-FV for guix-devel@gnu.org; Sun, 30 Aug 2015 15:46:14 -0400 Received: from reverse-83.fdn.fr ([80.67.176.83]:51325 helo=pluto) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ZW8YP-0001B2-TJ for guix-devel@gnu.org; Sun, 30 Aug 2015 15:46:14 -0400 In-Reply-To: <87bne26h9b.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Thu, 20 Aug 2015 00:33:04 +0200") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: guix-devel@gnu.org ludo@gnu.org (Ludovic Court=C3=A8s) skribis: > (The branch is called =E2=80=98wip-=E2=80=99 because the glibc upgrade ha= ppens to cause > troubles: since it has new locale category elements, the locale data is > incompatible with that older libcs expect, which means the bootstrap > binaries fail with an assertion failure when trying to load the new > locale data, like: > > xz: loadlocale.c:130: _nl_intern_locale_data: Assertion `cnt < (sizeof = (_nl_value_type_LC_COLLATE) / sizeof (_nl_value_type_LC_COLLATE[0]))' faile= d. I thought spelling out the details of why this is annoying might help find a solution, so here we go. The binary format for locales is dependent on the libc version. Over the last few releases, it turned out to be compatible, but that of 2.22 differs from that of 2.21 (a new element was added to locale categories, according to ChangeLog.) During bootstrapping, at some point we build =E2=80=98guile-final=E2=80=99 = against the latest libc (2.22.) In gnu-build-system.scm we heavily use =E2=80=98regexp-exec=E2=80=99 (via =E2=80=98substitute*=E2=80=99), which ca= lls C code, and thus uses =E2=80=98scm_to_locale_string=E2=80=99. If we run in the =E2=80=9CC=E2=80=9D locale, we can only pass to =E2=80=98r= egexp-exec=E2=80=99 purely ASCII strings. However, it turns out that, occasionally, strings read from files (in =E2=80=98patch-shebangs=E2=80=99 etc.) are not ASCII, but ra= ther UTF-8 (see commit 87c8b92.) Thus, calls to =E2=80=98regexp-exec=E2=80=99 with th= ese strings lead to a =E2=80=9Cfailed to convert to locale encoding=E2=80=9D error. So =E2=80=98guile-final=E2=80=99 needs to run in a UTF-8 locale (the bootst= rap Guile doesn=E2=80=99t have that problem thanks to the hacky =E2=80=98guile-default-utf8.patch=E2=80=99.) However, it we set LOCPATH to point to the libc 2.22 locales, we satisfy =E2=80=98guile-final=E2=80=99, but we break all the bootstrap binaries, whi= ch were built with an older libc; specifically, these binaries terminate with the assertion failure above. (If you=E2=80=99re still reading, I thank you for= your support.) So we have some sort of an =E2=80=9Cinteresting=E2=80=9D checking-and-egg p= roblem. We could side-step the issue by using the pure-Scheme SRFI-105 instead of =E2=80=98regexp-exec=E2=80=99. That may work to some extent, but we can= not get rid of =E2=80=98substitute*=E2=80=99 entirely overnight, so it=E2=80=99s not cl= ear whether this would be enough. Apart from that, I can only think of dirty hacks. What do people think? Ludo=E2=80=99.