From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id EPIbI4FXkGCLxwAAgWs5BA (envelope-from ) for ; Mon, 03 May 2021 22:05:21 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id YDi3HoFXkGAAJQAAB5/wlQ (envelope-from ) for ; Mon, 03 May 2021 20:05:21 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 81A332ED47 for ; Mon, 3 May 2021 22:05:20 +0200 (CEST) Received: from localhost ([::1]:59502 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ldeoh-0001gm-9H for larch@yhetil.org; Mon, 03 May 2021 16:05:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55486) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ldeoR-0001gO-03 for bug-guix@gnu.org; Mon, 03 May 2021 16:05:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:36882) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ldeoQ-00010B-HQ for bug-guix@gnu.org; Mon, 03 May 2021 16:05:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ldeoQ-0004eR-Bf for bug-guix@gnu.org; Mon, 03 May 2021 16:05:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#48114: Disarchive occasionally fails tests Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Mon, 03 May 2021 20:05:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 48114 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Timothy Sample Received: via spool by 48114-submit@debbugs.gnu.org id=B48114.162007224817865 (code B ref 48114); Mon, 03 May 2021 20:05:02 +0000 Received: (at 48114) by debbugs.gnu.org; 3 May 2021 20:04:08 +0000 Received: from localhost ([127.0.0.1]:48426 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldenY-0004e5-JK for submit@debbugs.gnu.org; Mon, 03 May 2021 16:04:08 -0400 Received: from eggs.gnu.org ([209.51.188.92]:47064) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldenW-0004di-LG for 48114@debbugs.gnu.org; Mon, 03 May 2021 16:04:06 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:33242) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ldenR-0000P2-Ac; Mon, 03 May 2021 16:04:01 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=54470 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ldenQ-0007UM-KF; Mon, 03 May 2021 16:04:00 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <87v984gkhn.fsf@inria.fr> <87pmybeen3.fsf@ngyro.com> <874kfk6h8o.fsf@gnu.org> <87a6pceerf.fsf@ngyro.com> <8735v4ea7y.fsf@ngyro.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 14 =?UTF-8?Q?Flor=C3=A9al?= an 229 de la =?UTF-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 03 May 2021 22:03:59 +0200 In-Reply-To: <8735v4ea7y.fsf@ngyro.com> (Timothy Sample's message of "Mon, 03 May 2021 00:02:09 -0400") Message-ID: <874kfjwpn4.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 48114@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1620072320; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=F9OmqDXK7w7PtVtzDDwvhdycm6X04pqCsrejHNqfNuw=; b=jpc6NM2lsAcwWiOuY+qvxj8EB4GMDZmfcynEmNLsdlo1NdVwdjITdoQCnCK0XGwZHHMfvt E9c+YfuI29vvmKss9QSGiOCWN1EJgkTGaD9QcnudqJvhMS5DNvNSt2KZ+qaFoJdvOMf9UB mzFwJwJWhbgo1jrrytwOT/2xJ57OL02Npic7SGR03JlGWLC+2kXlfrbIEtXb4Lc4lqC7T7 iSvDGhyNFBNJiHI88HmznHBecpx9Zb3ibtqDJT6MaB7gQjSCsb/ueIRwgFjpplMCMne2BL VHf/jlN2MDI40AtkO1UAF5dbkeN++kRabrnvCAqVUpALgo72V3RwvzakDhZu6A== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1620072320; a=rsa-sha256; cv=none; b=pCIMCipssq0aC9Spid9NFKzq/zPiLItt8/GLZkXWOMKCXMs28n2Io0GBFDQFbbcENbYQuO pBCC5K31kIvX9bvy8fUke6/N0t/O7/4hSnOy3nH3i1YTbDq4r1ryMnLUVOoUcIp9eKNT/D IwJfxj6pQ1Onvxm4hxUx2udEJsT4BDPPSp2nOKtF8LsWeLz7YSuxqIBcZmGtNwTPU+y4fJ Qh1htzBw7BKBkc1KxC0AqkD0+v0BJJ48721YwDbw5Nw7neAT0d5DFXk1Rn3vpWKoUk1H+u xLE3u4m1ce2fd1VQ8aQmr0rOzLgjkLL4oRhNnG9vGIjIVRuxP2daA9E0HaFqRA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Spam-Score: -2.96 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: 81A332ED47 X-Spam-Score: -2.96 X-Migadu-Scanner: scn0.migadu.com X-TUID: rL+1w5D16xm5 Hi! Timothy Sample skribis: > Timothy Sample writes: > >> I=E2=80=99m still looking into this, but I wanted to quickly post this >> reproducer for the Guile bug: >> >> (use-modules (ice-9 regex)) >> (define str >> "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\= U101e41\U02e330\u0177\u2492") >> (match:substring (string-match "[0-8]+" str)) >> >> This triggers the out-of-range error when run with =E2=80=9CLC_ALL=3DC= =E2=80=9D. > > It turns out that all that=E2=80=99s needed is the last code point, which= is > =E2=80=9CNumber Eleven Full Stop=E2=80=9D, or =E2=80=98=E2=92=92=E2=80=99. Whaaat? =E2=80=9CNumber Eleven Full Stop=E2=80=9D, I wonder how the Unicode= folks came up with that one. =E3=8A=B7 =3D =E3=89=9A + =E2=92=93 > When Guile converts this to an ASCII C string using > =E2=80=98u32_conv_from_encoding=E2=80=99, it becomes =E2=80=9C11.=E2=80= =9D. The regex (=E2=80=9C[0-8]+=E2=80=9D) > matches the =E2=80=9C11=E2=80=9D part with start index 0 and end index 2.= The > =E2=80=98fixup_multibyte_match=E2=80=99 function does nothing (it only ma= tters when > the locale encoding is multibyte) [1]. Guile then builds the match > vector with the original string but keeps the ASCII offsets. In other > words, it thinks the match substring goes from 0 to 2 in a single code > point string: > > ,use (ice-9 regex) > (string-match "11" "\u2492") > =3D> #("\u2492" (0 . 2)) > > I=E2=80=99m not sure there=E2=80=99s any way to solve this nicely in Guil= e. It would be > clearer if the match vector included the string as libc matched it, but > it=E2=80=99s still surprising that the match happens with a different str= ing. Yeah, I don=E2=80=99t think there=E2=80=99s much we can do. It=E2=80=99s a= lot of fun anyway. Thanks for investigating! Ludo=E2=80=99.