From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id MHJuDQZ2j2AgqwAAgWs5BA (envelope-from ) for ; Mon, 03 May 2021 06:03:18 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id UCb5CAZ2j2ApBgAAbx9fmQ (envelope-from ) for ; Mon, 03 May 2021 04:03:18 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 764E810277 for ; Mon, 3 May 2021 06:03:16 +0200 (CEST) Received: from localhost ([::1]:55892 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ldPne-0008DZ-9s for larch@yhetil.org; Mon, 03 May 2021 00:03:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46840) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ldPnS-0008DG-Ik for bug-guix@gnu.org; Mon, 03 May 2021 00:03:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:34445) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ldPnS-0000qf-AN for bug-guix@gnu.org; Mon, 03 May 2021 00:03:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ldPnS-0006ru-5g for bug-guix@gnu.org; Mon, 03 May 2021 00:03:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#48114: Disarchive occasionally fails tests Resent-From: Timothy Sample Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Mon, 03 May 2021 04:03:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 48114 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 48114-submit@debbugs.gnu.org id=B48114.162001454126390 (code B ref 48114); Mon, 03 May 2021 04:03:02 +0000 Received: (at 48114) by debbugs.gnu.org; 3 May 2021 04:02:21 +0000 Received: from localhost ([127.0.0.1]:45991 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldPmm-0006ra-PS for submit@debbugs.gnu.org; Mon, 03 May 2021 00:02:20 -0400 Received: from out4-smtp.messagingengine.com ([66.111.4.28]:49517) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldPmi-0006rU-VH for 48114@debbugs.gnu.org; Mon, 03 May 2021 00:02:19 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 7D99E5C00B3; Mon, 3 May 2021 00:02:11 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Mon, 03 May 2021 00:02:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=ge1IckHNqeMpTFfHUYsnfoyHsqLL+btOedgAWituY 6o=; b=Co6NZlX3gDmwvXfntXC0aWzdi6enAisWzCyht+wDGV9J5rekMcUlRHYX+ I+MeXnDGdwlrJiEDmU+HoSkz1X43hqPCgF4vpXAs8zhbDYRVD3U2hFHr5N+nI2Th rDGQuXvXVdi1LXrZJgnBDO4ljojvH8eQJWAd4WgU30eqoG12hX2YYH5PKNPD2lx7 zby3wwoFAYMaab2gh+AGVXdRVTvDHrggCVQ+OTkgrPJhD2518qMqadMtMZ2Zje3C u5h4pEZC4AZj6C48mhzXA9F5bPz7xc4OAKlCufUrht0suQJTbcTb7AsFXb0MZvhM 5cVFq48DFnKE8Ht2oJxhrCuYuxy4w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefvddgudehtdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufhffjgfkfgggtgfgsehtqhertddtreejnecuhfhrohhmpefvihhm ohhthhihucfurghmphhlvgcuoehsrghmphhlvghtsehnghihrhhordgtohhmqeenucggtf frrghtthgvrhhnpeeiudeuhfeggeelleevheegudfguefhieekffdtveeilefglefhvddt gfeiheetgfenucfkphepjeegrdduudeirddukeeirdeggeenucevlhhushhtvghrufhiii gvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehsrghmphhlvghtsehnghihrhhordgt ohhm X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 3 May 2021 00:02:10 -0400 (EDT) From: Timothy Sample References: <87v984gkhn.fsf@inria.fr> <87pmybeen3.fsf@ngyro.com> <874kfk6h8o.fsf@gnu.org> <87a6pceerf.fsf@ngyro.com> Date: Mon, 03 May 2021 00:02:09 -0400 In-Reply-To: <87a6pceerf.fsf@ngyro.com> (Timothy Sample's message of "Sun, 02 May 2021 22:24:04 -0400") Message-ID: <8735v4ea7y.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 48114@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1620014597; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=ge1IckHNqeMpTFfHUYsnfoyHsqLL+btOedgAWituY6o=; b=N97MPWSLGx6t86RjwsRJFdh2o3rfB+Ic/qOhEvErBovBRRyp27ULAMZkXirXmRQVqF14eH XiSsZXE8q0DzS0dbLX5GfylLhDW5/exOaNC4BbqW2j/Mts3x5xJXszThgF0aIq+7FLnwDJ ouWGFvgyLR+wXa9HTiYBKSJGOkenZtagv8CrE8wt6ftEriqSutArAvVpxgNykuNyfrjJ3l Wl34kOQnR1h+BOv570YoMjVKhkT3Im5R4zJTwsQmeod65tBEYWRT56dEvu2JNABVYuBRJH RcebF225NGJI2Ipnxoj+k8tBszvKZJNT5zLpz6302YkByfSwNL18KzyH0x+WAw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1620014597; a=rsa-sha256; cv=none; b=AbRDsRUbYZAU96ewhjYFZ2YzU2U1EM0JOEuHTeoMq652h9MJzFfTO+yiU9hKbeBeN7Eghi 786l+NI7tgIvNjsL7yhxcjlYObbdjE2PA5B1sHOTVHbUMfOu845CTiMNULjtrB8X6LidGY Yy50kEuun2aUOqhf19ZMu4bp/OQpOSIHosud4h2QNRLs1lfZURiB7kZEnrb43dXoCj6BGU z/7QLEL30mwgs/XvmmsmqkNtxVgQMDfR00ow2lo7uEXHNqlq/myuYDbjnLYQU6KSHlutMy lvZhEpzOljugiAjApCx1FUuuaTPwQ4jlkSj1ox1tqUJrjLOduqfBd8jsI/LHYQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=messagingengine.com header.s=fm2 header.b=Co6NZlX3; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Spam-Score: -1.46 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=messagingengine.com header.s=fm2 header.b=Co6NZlX3; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: 764E810277 X-Spam-Score: -1.46 X-Migadu-Scanner: scn0.migadu.com X-TUID: wEPbwbgRQKIs Timothy Sample writes: > I=E2=80=99m still looking into this, but I wanted to quickly post this > reproducer for the Guile bug: > > (use-modules (ice-9 regex)) > (define str > "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U= 101e41\U02e330\u0177\u2492") > (match:substring (string-match "[0-8]+" str)) > > This triggers the out-of-range error when run with =E2=80=9CLC_ALL=3DC=E2= =80=9D. It turns out that all that=E2=80=99s needed is the last code point, which is =E2=80=9CNumber Eleven Full Stop=E2=80=9D, or =E2=80=98=E2=92=92=E2=80=99. = When Guile converts this to an ASCII C string using =E2=80=98u32_conv_from_encoding=E2=80=99, it becomes =E2=80= =9C11.=E2=80=9D. The regex (=E2=80=9C[0-8]+=E2=80=9D) matches the =E2=80=9C11=E2=80=9D part with start= index 0 and end index 2. The =E2=80=98fixup_multibyte_match=E2=80=99 function does nothing (it only = matters when the locale encoding is multibyte) [1]. Guile then builds the match vector with the original string but keeps the ASCII offsets. In other words, it thinks the match substring goes from 0 to 2 in a single code point string: ,use (ice-9 regex) (string-match "11" "\u2492") =3D> #("\u2492" (0 . 2)) I=E2=80=99m not sure there=E2=80=99s any way to solve this nicely in Guile.= It would be clearer if the match vector included the string as libc matched it, but it=E2=80=99s still surprising that the match happens with a different strin= g. In Disarchive, I can rewrite the generator without regex. I=E2=80=99ll do = that and see what I can do about the =E2=80=9CGave up!=E2=80=9D issue. [1] It works on the converted-to-ASCII C string, which means that the byte offsets and code point offsets are the same. Hence, it has nothing to do. -- Tim