From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#15608: kbd.scm unicode problem Date: Wed, 16 Oct 2013 15:32:37 +0200 Message-ID: <871u3lfiiy.fsf@gnu.org> References: <8911b57e425d728f6dd5087409c29e22.squirrel@fruiteater.riseup.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:49015) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VWRDq-00053s-4y for bug-guix@gnu.org; Wed, 16 Oct 2013 09:33:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VWRDj-0003Y6-RC for bug-guix@gnu.org; Wed, 16 Oct 2013 09:33:10 -0400 Received: from debbugs.gnu.org ([140.186.70.43]:39837) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VWRDj-0003Y2-Nu for bug-guix@gnu.org; Wed, 16 Oct 2013 09:33:03 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1VWRDi-0002Me-OA for bug-guix@gnu.org; Wed, 16 Oct 2013 09:33:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <8911b57e425d728f6dd5087409c29e22.squirrel@fruiteater.riseup.net> (Guy Ze Grant's message of "Mon, 14 Oct 2013 01:14:43 -0700") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org To: Guy Ze Grant Cc: 15608@debbugs.gnu.org "Guy Ze Grant" skribis: > I was asked to post here, so here it is; Attached is my expression thus > far for kbd, at the patch phase though -- the following issue pops up: > "find-files: ./doc/utf/??????: No such file or directory", along with a > spew of guile barf following (which you can see in the kbd.scm.log file). As we discussed on IRC, the problem is that POSIX considers file names to be just byte strings, and the encoding is up to the application, and locale-dependent by default. In practice people typically use UTF-8 (GLib and friends expect that.) Here=E2=80=99s the problem and fix: --8<---------------cut here---------------start------------->8--- $ echo $LANG en_US.UTF-8 $ ls -a . .. =E2=99=AA=E2=99=AC $ guile -c '(use-modules (ice-9 ftw)) (pk (scandir "."))' ;;; (("." ".." "??????")) $ guile -c '(use-modules (ice-9 ftw)) (setlocale LC_ALL "en_US.utf8") (pk (= scandir "."))' ;;; (("." ".." "=E2=99=AA=E2=99=AC")) --8<---------------cut here---------------end--------------->8--- In the first run Guile uses the =E2=80=9CC=E2=80=9D locale, so it=E2=80=99s= unable to decode =E2=80=9C=E2=99=AA=E2=99=AC=E2=80=9D. In the second run it uses a UTF-8 locale, so everything works fine. Back to kbd: can you add a phase after the =E2=80=98unpack=E2=80=99 phase t= hat just does this? (setlocale "LC_ALL" "en_US.utf8") If that works we can probably make it the default in the =E2=80=98core-upda= tes=E2=80=99 branch. > (define-module (gnu packages kbd) Since kbd is Linux-specific, could you put it in (gnu packages linux)? [...] > 102: 1 [patch-source-shebangs # ...] > In unknown file: > ?: 0 [sort # #] > > ERROR: In procedure sort: > ERROR: In procedure list-copy: Wrong type argument in position 1: ("./con= fig/ylwrap" "./config/install-sh" "./config/depcomp" "./config/config.sub" = "./config/config.rpath" "./config/missing" "./config/config.guess" "./confi= g/mkinstalldirs" "./config/compile" "./ [...] > " "./doc/utf/README" "./doc/utf/ethiopic" . #f) It=E2=80=99s also a bug that =E2=80=98find-files=E2=80=99 returns a imprope= r list when encountering such an issue. I=E2=80=99ll fit it in =E2=80=98core-updates= =E2=80=99. Thanks, Ludo=E2=80=99.