unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Subject: [PATCH 3/4] CLI/git: current cache contents (file list) of index
Date: Sun,  3 Jul 2022 12:11:02 -0300	[thread overview]
Message-ID: <20220703151103.1800726-4-david@tethera.net> (raw)
In-Reply-To: <20220703151103.1800726-1-david@tethera.net>

Rather than shelling out once per message to get the list of files
corresponding to tags, it is much faster (although potentially a bit
memory intensive) to read them all at once.
---
 notmuch-git.py | 58 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/notmuch-git.py b/notmuch-git.py
index b3ae044e..a3ae15f7 100644
--- a/notmuch-git.py
+++ b/notmuch-git.py
@@ -738,6 +738,7 @@ class PrivateIndex:
         self.lastmod = None
         self.checksum = None
         self._load_cache_file()
+        self.file_tree = None
         self._index_tags()
 
     def __enter__(self):
@@ -763,6 +764,43 @@ class PrivateIndex:
             _LOG.error("Error decoding cache")
             _sys.exit(1)
 
+    @timed
+    def _read_file_tree(self):
+        self.file_tree = {}
+
+        with _git(
+                args = ['ls-files', 'tags'],
+                additional_env = {'GIT_INDEX_FILE': self.index_path},
+                stdout = _subprocess.PIPE) as git:
+            for file in git.stdout:
+                dir = _os.path.dirname(file)
+                tag = _os.path.basename(file).rstrip()
+                if dir not in self.file_tree:
+                    self.file_tree[dir] = [tag]
+                else:
+                    self.file_tree[dir].append(tag)
+
+
+    def _clear_tags_for_message(self, id):
+        """
+        Clear any existing index entries for message 'id'
+
+        Neither 'id' nor the tags in 'tags' should be encoded/escaped.
+        """
+
+        if self.file_tree == None:
+            self._read_file_tree()
+
+        dir = _id_path(id)
+
+        if dir not in self.file_tree:
+            return
+
+        for file in self.file_tree[dir]:
+            line = '0 0000000000000000000000000000000000000000\t{:s}/{:s}\n'.format(dir,file)
+            yield line
+
+
     @timed
     def _index_tags(self):
         "Write notmuch tags to private git index."
@@ -798,7 +836,7 @@ class PrivateIndex:
                         if tag.startswith(prefix)]
                     id = _xapian_unquote(string=id)
                     if clear_tags:
-                        for line in _clear_tags_for_message(index=self.index_path, id=id):
+                        for line in self._clear_tags_for_message(id=id):
                             git.stdin.write(line)
                     for line in _index_tags_for_message(
                             id=id, status='A', tags=tags):
@@ -835,24 +873,6 @@ def _read_index_checksum (index_path):
     except FileNotFoundError:
         return None
 
-
-def _clear_tags_for_message(index, id):
-    """
-    Clear any existing index entries for message 'id'
-
-    Neither 'id' nor the tags in 'tags' should be encoded/escaped.
-    """
-
-    dir = _id_path(id)
-
-    with _git(
-            args=['ls-files', dir],
-            additional_env={'GIT_INDEX_FILE': index},
-            stdout=_subprocess.PIPE) as git:
-        for file in git.stdout:
-            line = '0 0000000000000000000000000000000000000000\t{:s}\n'.format(file.strip())
-            yield line
-
 def _read_database_lastmod():
     with _spawn(
             args=['notmuch', 'count', '--lastmod', '*'],
-- 
2.35.2

  parent reply	other threads:[~2022-07-03 15:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-03 15:10 performance improvements for notmuch git checkout David Bremner
2022-07-03 15:11 ` [PATCH 1/4] debian: add git as a build-dependency, for the test suite David Bremner
2022-07-07 10:30   ` David Bremner
2022-07-03 15:11 ` [PATCH 2/4] perf-test: add tests for notmuch-git David Bremner
2022-07-05 16:40   ` Tomi Ollila
2022-07-07  9:47     ` David Bremner
2022-07-03 15:11 ` David Bremner [this message]
2022-07-03 15:11 ` [PATCH 4/4] CLI/git: replace calls to notmuch-search with database access David Bremner
2022-07-07 14:51   ` Tomi Ollila
2022-07-07 15:59     ` David Bremner
2022-07-09 19:35       ` Michael J Gruber
2022-07-15 14:43   ` [PATCH v2] CLI/git: opportunistically use bindings to check for known messages David Bremner
2022-07-16 19:23     ` Tomi Ollila
2022-07-17  0:51       ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220703151103.1800726-4-david@tethera.net \
    --to=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).