From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id F308E6DE10B1 for ; Wed, 4 Dec 2019 11:51:17 -0800 (PST) Authentication-Results: arlo.cworth.org; dkim=pass (2048-bit key; unprotected) header.d=orangeseeds.org header.i=@orangeseeds.org header.b="PsB2C5VK"; dkim-atps=neutral X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.148 X-Spam-Level: X-Spam-Status: No, score=-0.148 tagged_above=-999 required=5 tests=[AWL=0.053, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0u45r7hKv9a7 for ; Wed, 4 Dec 2019 11:51:16 -0800 (PST) Received: from marcos.anarc.at (marcos.anarc.at [206.248.172.91]) by arlo.cworth.org (Postfix) with ESMTPS id 3D4246DE10AA for ; Wed, 4 Dec 2019 11:51:16 -0800 (PST) Received: by marcos.anarc.at (Postfix, from userid 1000) id 9BD2E10E081; Wed, 4 Dec 2019 14:51:13 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=orangeseeds.org; s=marcos; t=1575489073; bh=SQoC4A6abfGmolroebSgk4vTvosykFaA/zE+G8o1f0g=; h=From:To:Subject:In-Reply-To:References:Date:From; b=PsB2C5VKuDyvUHuzdCrhMLj5vKUR0Ci7Rrz9lM0TMTgLodg+ZSk+JJb3+mtN2GPYw H9/ZJGVyxJXlYur+5WVoHbDDvWOvCwxnS69HjPWPkszxpiYvnsm26i2hrPiih43XGg 09y+GRCzmOfMZ8hMfWeywCXGeG6XMr4goxSYTRVobpOxfGqYJw2g7cgW6qc5WwMKf+ tluxsLN8cqVOZ2P8XT9QMLvfBycEHQX/m6oKQMGkARC/4iAbkY4HatC7HJkPi2VHx/ 7IAB1EN2osyiBiFfOccJ+y3fqfB/8Bh+ttFGEyl4F+TwwzdjMFaJMIQ8K292LapMdk A7CuSKwTE2izA== Received: by curie.anarc.at (Postfix, from userid 1000) id B7D9A124FA3; Wed, 4 Dec 2019 14:51:12 -0500 (EST) From: =?utf-8?Q?Antoine_Beaupr=C3=A9?= To: Daniel Kahn Gillmor , notmuch@notmuchmail.org Subject: Re: compacting the notmuch database through systemd In-Reply-To: <871rtk6k00.fsf@fifthhorseman.net> References: <87v9qz93b0.fsf@curie.anarc.at> <871rtk6k00.fsf@fifthhorseman.net> Date: Wed, 04 Dec 2019 14:51:12 -0500 Message-ID: <8736dz98en.fsf@curie.anarc.at> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Dec 2019 19:51:18 -0000 On 2019-12-04 13:09:03, Daniel Kahn Gillmor wrote: > Thanks for raising this, Anarcat! > > One more advantage that i think you haven't noted yet about regular > database compaction: > > "notmuch compact" tends to get rid of a lot of lingering written data > that is no longer referenced. While this isn't robust "secure > deletion", it's a lot better than not compacting. see > https://trac.xapian.org/ticket/742 for more discussion. Cool. > Some questions below=E2=80=A6 > > On Sun 2019-12-01 15:52:19 -0500, Antoine Beaupr=C3=A9 wrote: > >> Thanks to Bremner, I just realized that notmuch-compact(1) is a thing, >> and that thing allows me to compress my notmuch databases by about 50%. > > do you know why you get the large size/speed gain? Not sure, but if I'd venture a guess: I never ran notmuch-compact(1) as far as I can remember. > do you regularly delete files from your message archive? Yes, all the time. I have had `d` mapped to `+deleted` basically forever, and have a pre-new hook that actually deletes those messages from this. Yes, I am an heretic. ;) >> So I whipped together two systemd units (attached) that will run that >> command every month on my notmuch database. Just drop them in >> `~/.config/systemd/user/` and run: >> >> systemctl --user daemon-reload >> systemctl --user enable notmuch-compact.timer >> systemctl --user start notmuch-compact.timer > > ("systemctl --user enable --now notmuch-compact.timer" will suffice for > the final two commands on any reasonably modern version of systemd) Whoa. TIL. > How long does it take for these the notmuch-compact.service to complete? I don't remember... it took less than a minute at the first run, I think. > What happens if this is happening when, say, you put your machine to > sleep, or you power it down? No idea. I think it's an atomic process as notmuch-compact(1) says: The compacted database is built in a temporary directory and is later moved into the place of the origin database. The original uncompacted database is discarded, unless the --backup=3D option is used. > While notmuch-compact.service is running, does "notmuch new" or "notmuch > insert" work? If not, how do they fail (e.g. blocking indefinitely, > returning a comprehensible error message)? No idea. Manpage says: Note that the database write lock will be held during the compaction process (which may be quite long) to protect data integrity. > Can you read your mail while notmuch-compact.service is running? I don't see why not, but I haven't tried. Considering I run it once a week, it would seem like a small tradeoff if that would cause problems anyways. >> Maybe those could be shipped with the Debian package somehow? Not sure >> how that works, but I think that's how gpg-agent gets started now, if >> you want any inspiration... > > gpg-agent is socket-activated, which is different from the > timer-activation you are proposing here. I thought about socket activation, but I don't think it would work in this case. > We could easily ship these systemd user unit files in the notmuch > package now that #764678 is resolved. Do you think that the timer > should be enabled by default? Sure, I don't see why not, unless we have concerns about notmuch-compact(1) being unsafe or counter-productive. > What should happen if the user hasn't set up notmuch? Maybe we need a > ConditionPathExists=3D or something like that on either the .timer or the > .service? Maybe: ConditionPathExists=3D$HOME/.notmuch-config ? > Do we expect this to run even when the user isn't logged in at all (a > background compaction?) Maybe not? No idea. > it always gets more complex when you think about trying to do it at > scale :) Yes. >> It would be great if notmuch-new ran this on its own, when it >> thought that this was "important", somehow like git-gc sometimes runs on >> its own. > > I'm not convinced i like this idea without more profiling and an > understanding of what it might cause. I have grown to *really* dislike > the highly variable latency and warnings caused by GnuPG's > "auto-check-trustdb", for example (especially as the keyring grows > larger). Again, tradeoffs: I prefer to have my trustdb actually checked once in a while (right?) and not pay that latency cost at some random gpg invocation (which seems to happen all the time). So I disable the built-in, inline checks and queue them in a timer instead. >> [ notmuch-compact.timer: text/plain ] >> [Unit] >> Description=3Dcompact the notmuch database > > systemd timer unit descriptions typically include some mention of the > duration. See for example: > > /lib/systemd/system/systemd-tmpfiles-clean.timer > "Daily Cleanup of Temporary Directories" > > /lib/systemd/system/certbot.timer > "Run certbot twice daily" > > /lib/systemd/system/phpsessionclean.timer > "Clean PHP session files every 30 mins" > > I recommend: > > Description=3DCompact the notmuch database every month Cool. >> [ notmuch-compact.service: text/plain ] >> [Unit] >> Description=3Dcompact the notmuch database > > The convention is to lead with an upper-case letter: > > Description=3DCompact the notmuch database Yay! > OK OK enough with the nit-picking! Thanks for the review! a. --=20 L'adversaire d'une vraie libert=C3=A9 est un d=C3=A9sir excessif de s=C3=A9= curit=C3=A9. - Jean de la Fontaine