From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:403:478a::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id 0HdLIT44uWR1KQEASxT56A (envelope-from ) for ; Thu, 20 Jul 2023 15:35:58 +0200 Received: from aspmx1.migadu.com ([2001:41d0:403:478a::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id ELgHIT44uWQQMQEA9RJhRA (envelope-from ) for ; Thu, 20 Jul 2023 15:35:58 +0200 Received: from mail.notmuchmail.org (yantan.tethera.net [135.181.149.255]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id BDE45636CC for ; Thu, 20 Jul 2023 15:35:57 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 135.181.149.255 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1689860158; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding:list-id:list-help: list-owner:list-unsubscribe:list-subscribe:list-post; bh=ORYDKntyXkxFeziW2cUZ0ixRaJJ7VZbQXbkAflaphbg=; b=hIh/4Oc2n+guEYcEfPVqpbTahgsy4vJUD6MEJ+3uQV6NU3rO9m9KeKFi5WjuiPzcShL0Hq 2kDg88cvpmGJ3X8FYK3MNPYK3ziDHz/WwYKOM9oT8Chg4RAPpT4sEmZO6xOd3pAg7FLNGa ULGbPq+ykhdWR4nzOnRCmIsSqfFj/VWUZo87UZqlNqI/FaAn6XMSYLklsB+CMKdxlDMvSq MVI3MDJFVJqRV5+4VZKpyJkOOXWYSzTyKoBObHzKbYFlnojWOtA/sMuvAafDAMw0jPc+rd UQEMUyRbCASr1vZqtU4hxzCpoluVavv0+qmkFeW5urtgFXfjcDrT9LvJH+LZUg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 135.181.149.255 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org; dmarc=none ARC-Seal: i=1; s=key1; d=yhetil.org; t=1689860158; a=rsa-sha256; cv=none; b=q7fT4pCGnGlW/2jYH5v14hFH+wOlNAYl7uR0Gj/fDSJFnyNdGID357HbsBYIcF9Iq8I6sl 0u+JiOHLjxlBIvB8pZV7xIvY+lfpyYNsZZalZAv1dtL0j/ylToUbCoYjw9Fih9+gtvI9aR HKU9Dywoo/42Ix/Hyz01meiPW9OFmnWjlhqFosm5/cdTNNypGCozzN6R0lC8cEXCvLI/Pj /8EqU/v1nz7WBxa6AmJQn2FOioNuZ2ZJLpCRWBp2bS1ebutCVcEoL82Np3Lg8ZZjPQIEtl li/qHPGlKp4MjCAtDpYMtWEevwLU97BBsAXthbBudQ6e5Vt+02vCho+3u6BXyQ== Received: from yantan.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id 64E5C5F4A1; Thu, 20 Jul 2023 13:35:54 +0000 (UTC) Received: from phubs.tethera.net (phubs.tethera.net [192.99.9.157]) by mail.notmuchmail.org (Postfix) with ESMTPS id C8C595F361 for ; Thu, 20 Jul 2023 13:35:51 +0000 (UTC) Received: from tethera.net (fctnnbsc51w-159-2-210-253.dhcp-dynamic.fibreop.nb.bellaliant.net [159.2.210.253]) by phubs.tethera.net (Postfix) with ESMTPS id 0958C180172; Thu, 20 Jul 2023 10:35:50 -0300 (ADT) Received: (nullmailer pid 2216441 invoked by uid 1000); Thu, 20 Jul 2023 12:08:25 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: Speedup for deleting files Date: Thu, 20 Jul 2023 09:07:59 -0300 Message-Id: <20230720120801.2215538-1-david@tethera.net> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Message-ID-Hash: URLCGHMZJD4VDGPUP32ECEKUTCTUX7DR X-Message-ID-Hash: URLCGHMZJD4VDGPUP32ECEKUTCTUX7DR X-MailFrom: bremner@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: ukleinek@debian.org X-Mailman-Version: 3.3.3 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN X-Migadu-Country: DE X-Spam-Score: -3.91 X-Migadu-Queue-Id: BDE45636CC X-Migadu-Spam-Score: -3.91 X-Migadu-Scanner: mx0.migadu.com X-TUID: QHrnWmDsAR59 Thanks to discussion with Olly Betts and some perf runs, I realized the current clean up of deleted files from the database is somewhat wasteful since it modifies the message documents (by deleting the filename) before in most cases deleting the whole record. While trying this out, I triggered what seems to be a bug in existing notmuch; the code checks for the existence of a certain Xapian term after deleting the document. Compared to git master [1], cleaning up 50k of 200k messages now takes about 44s versus 80. So not quite a 50% improvement, but not bad. I would expect a larger relative improvement on larger databases. [1]: https://notmuchmail.org/perf-test-results/2023-07-18-minkowski/