From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 8HdwHYF7z17RJwAA0tVLHw (envelope-from ) for ; Thu, 28 May 2020 08:51:13 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id 0C00GYF7z15+ZAAAbx9fmQ (envelope-from ) for ; Thu, 28 May 2020 08:51:13 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 9A36994038F for ; Thu, 28 May 2020 08:51:12 +0000 (UTC) Received: from localhost ([::1]:50636 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jeEFq-00037E-4L for larch@yhetil.org; Thu, 28 May 2020 04:51:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59786) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jeEFj-000355-3F for guix-patches@gnu.org; Thu, 28 May 2020 04:51:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:39398) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jeEFh-0002iY-PC for guix-patches@gnu.org; Thu, 28 May 2020 04:51:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jeEFh-0006GH-MQ for guix-patches@gnu.org; Thu, 28 May 2020 04:51:01 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#41119] [PATCH] fix some issues with (guix nar) Resent-From: Caleb Ristvedt Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Thu, 28 May 2020 08:51:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 41119 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: guix-sysadmin@gnu.org, 41119@debbugs.gnu.org Received: via spool by 41119-submit@debbugs.gnu.org id=B41119.159065585224050 (code B ref 41119); Thu, 28 May 2020 08:51:01 +0000 Received: (at 41119) by debbugs.gnu.org; 28 May 2020 08:50:52 +0000 Received: from localhost ([127.0.0.1]:50944 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jeEFS-0006Fk-CG for submit@debbugs.gnu.org; Thu, 28 May 2020 04:50:52 -0400 Received: from mail-il1-f194.google.com ([209.85.166.194]:32907) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jeEFQ-0006FT-Ck for 41119@debbugs.gnu.org; Thu, 28 May 2020 04:50:44 -0400 Received: by mail-il1-f194.google.com with SMTP id y17so24618997ilg.0 for <41119@debbugs.gnu.org>; Thu, 28 May 2020 01:50:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cune-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=YN/2ALYPOHeIQ0JBw91PgZgk2usughOcWDS6dCss+tI=; b=qnuv2G5B1+2H3AsHk/3IuKOQ5h4el/dYxyoDP/A8D3u8Nmk8t+z8D2wlszyTfqwb5N WHEgYBu3lWdcT5vXhanIY3wlwafn5+qUvForuvuWwsd+Nfde/vcE362aPFJzykyRdj0g V3/+IKOY29xnMqPZEl71DQvsNLuOyXqFp4516lSQVhmspeQDz6YkgCovSV1/N0augi+i QeXXZnSC4z/fVI4F/aH9kSCRxm7MCcZW7lVKTEpxEVBcY01PTMIcVoEFXS2X+WsVkrx6 EXyellOzY94CfrFIA6B52XwOZn/+5DqCM2WLIH+2hwYgiCHTgYLuJNxdKWaDgc4htf0m Ea0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=YN/2ALYPOHeIQ0JBw91PgZgk2usughOcWDS6dCss+tI=; b=UAE02IKduPu/xzV6oWNcuiP3s/9JmX5Iu2aDOfuhYE83s6wxIPD6xYW0sV//jcq8i2 Kq6Ko2k6mHWC033WVXvY0EGSYT3PSJbFF3UurxAuGcuLUOXK628netsb9zkwDAOsNOqp +QZHMKf4vpBmxG9+M3YfjPXPnuIzDl9SRAncHtPwzb7rbARMb2DwwDkG7ulGzjkjhwTn NDdwUFHizoB3584pP0FFJhNb0KI1Kl6El3lSBwVIYmLp4doRV2SWmBqXqKxI8aKHBC35 8X9MJ1+9530Yb0MuxcuVuVz7tCN4RiYVCkoPkjqBR+dEuq73kcK14pqDYGbjSi3uSACf 8mrA== X-Gm-Message-State: AOAM5322TgEB+gJB5ogX9I9YsXeDegM+IMsSF/L17OSDJuuOsvKVuPLn pMw8rCT/e3cBI67MOaOEEDU4zw== X-Google-Smtp-Source: ABdhPJxcJByLBwU7op0xaPgFqPeVmRSGIEkXNj1/fN+rI6oItLWTcx+SDCICZJDAteHzOkfU1sIK4A== X-Received: by 2002:a92:d989:: with SMTP id r9mr1890281iln.30.1590655838488; Thu, 28 May 2020 01:50:38 -0700 (PDT) Received: from GuixPotato ([208.89.170.24]) by smtp.gmail.com with ESMTPSA id s26sm2365905iol.10.2020.05.28.01.50.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 May 2020 01:50:37 -0700 (PDT) From: Caleb Ristvedt References: <87h7wsqu50.fsf@cune.org> <87k10xgi1t.fsf@gnu.org> Date: Thu, 28 May 2020 03:50:36 -0500 In-Reply-To: <87k10xgi1t.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Wed, 27 May 2020 23:54:38 +0200") Message-ID: <87mu5se943.fsf@cune.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -1.0 (-) X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=cune-org.20150623.gappssmtp.com header.s=20150623 header.b=qnuv2G5B; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Spam-Score: -0.01 X-TUID: 3NnJVdAAtrKN Ludovic Court=C3=A8s writes: > Hi, > > This change had an undesirable effect: the connection would be kept for > the body of =E2=80=98with-temporary-store-file=E2=80=99, during which we= =E2=80=99d call: > > finalize-store-file -> register-path > > which accesses the database. At this point, for each =E2=80=98guix offlo= ad=E2=80=99 > process, we=E2=80=99d thus have the database open twice: once for the ses= sion=E2=80=99s > guix-daemon, and once for that =E2=80=98register-path=E2=80=99 call. If the connection wasn't kept for the body of with-temporary-store-file, the temporary store file wouldn't be protected from GC during the body (the daemon treats unlocked temproots files as "stale"), thus rather defeating the purpose. It makes sense, then, that the connection was also kept for the body prior to this patch - indeed, unless emacs's parenthesis-matching capabilities are failing me, it appears that the body is solidly within the 'with-store' form in 37edbc91e34fb5658261e637e6ffccdb381e5271. > On berlin, the effect is that we see many =E2=80=98guix offload=E2=80=99 = processes > stalled because the SQLite database is busy: ... which makes this quite the mystery indeed. I assume you've tested with the patch reverted and found that this issue goes away? If so, I am very puzzled. One would expect that "database open twice" would tend to have *less* contention issues than "database open at least twice". AFAIK just having the database open doesn't by itself impose any locks. The daemon process we're connected to should have it open, but should just be blocked waiting for our next RPC. Database locks happen when transactions are started (either explicitly or implicitly), and implicitly-started transactions are automatically committed by sqlite (specifically when the statement that started the transaction is either reset or finalized). The only loose end I can think of right now is that call-with-transaction only catches exceptions of type 'sqlite-error, so in theory if a different type of exception were to be thrown, it could exit in a broken state where neither a commit nor a rollback has been performed. Really it should catch all exception types, and use match in the handler to pick out the sqlite-errors. If that were causing the problems, though, we'd expect to see some errors appearing in the offload output. Actually, come to think of it, there could be another issue with call-with-transaction: if somehow it's possible for SQLITE_BUSY errors to occur despite the connection having succeeded with a 'begin immediate;' (which immediately starts a write transaction), then the rollback wouldn't occur, and what should be a failed transaction followed by a successful transaction becomes one long, restarted-in-the-middle transaction. I'm not sure if that's a problem in practice, though. And now that I look at it again, it turns out that most of our database query procedures in (guix store database) aren't finalizing their statements in case of a nonlocal exit... which would tend to happen if, for example, an SQLITE_BUSY error occurred. Which would cause the statement to not be finalized until the garbage collector got ahold of it. But due to statement caching the garbage collector likely won't get ahold of it until the database connection itself is destroyed. The wording at https://www.sqlite.org/lang_transaction.html makes me think that this shouldn't be an issue because the errors we'd expect all seem to roll back automatically, but if we somehow got one that didn't roll back automatically, there would potentially be an extended amount of time before the statement was finalized and the implicit transaction committed. Also, I've noticed that with the way that finalize-store-file is written, we actually already have a database open when we call register-path. This is because it's needed in order to call path-id, but the scope of that with-database form is rather larger than it needs to be. We may have a situation here where things go fine until a single SQLITE_BUSY error is produced by chance, and that causes more SQLITE_BUSY errors, and so on. In summary, there are many things I could imagine going wrong to cause / contribute to the observed behavior, but the patch, barring some absurd guile compilation bug, is not one of them. I do, however, think that (guix store database) needs some love. > They loop pretty much indefinitely on this and nothing (or very little) > happens on the system. To be clear, the nothing-happening status is common to all processes that use the database, including daemon processes? That's quite severe. > I=E2=80=99ll revert this patch but I=E2=80=99m happy to hear what you thi= nk, Caleb. If the data says it's causing those problems, I'd tend to agree with that. I would really like to understand how, though, because even after a few hours of brainstorming bizarre edge cases I still can't come up with a satisfying explanation. > Another reason to implement temp roots in Scheme, as it would allow us > to not open a connection to the daemon from =E2=80=98guix offload=E2=80= =99! Soon=E2=84=A2. Conceptually the code is there, I'm working towards a rebase= that tries to first make the rest of daemon-side guix compatible with fibers - thread pools=E2=9C=93, eval-with-container=E2=9C=93, fibers-friendly wait= pid=E2=9C=93, etc. - reepca