unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* performance improvements for notmuch git checkout
@ 2022-07-03 15:10 David Bremner
  2022-07-03 15:11 ` [PATCH 1/4] debian: add git as a build-dependency, for the test suite David Bremner
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: David Bremner @ 2022-07-03 15:10 UTC (permalink / raw)
  To: notmuch

This series speeds up "notmuch git checkout" from "intolerably slow"
to "OKish" on my mail.

[PATCH 1/4] debian: add git as a build-dependency, for the test suite

       This is unrelated to the rest of the series, but needed to build the debian package

[PATCH 2/4] perf-test: add tests for notmuch-git

       Mainly commit and checkout the entire corpus.

[PATCH 3/4] CLI/git: current cache contents (file list) of index

       Remove many (1 per message) execs of git-ls-files

[PATCH 4/4] CLI/git: replace calls to notmuch-search with database

       Remove many (1 per message) execs notmuch-search

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] debian: add git as a build-dependency, for the test suite
  2022-07-03 15:10 performance improvements for notmuch git checkout David Bremner
@ 2022-07-03 15:11 ` David Bremner
  2022-07-07 10:30   ` David Bremner
  2022-07-03 15:11 ` [PATCH 2/4] perf-test: add tests for notmuch-git David Bremner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: David Bremner @ 2022-07-03 15:11 UTC (permalink / raw)
  To: notmuch

This is needed to run (and test) notmuch-git.
---
 debian/control | 1 +
 1 file changed, 1 insertion(+)

diff --git a/debian/control b/debian/control
index 9706b0f7..0ffe958c 100644
--- a/debian/control
+++ b/debian/control
@@ -20,6 +20,7 @@ Build-Depends:
  dtach (>= 0.8) <!nocheck>,
  emacs-nox | emacs-gtk | emacs-lucid | emacs25-nox | emacs25 (>=25~) | emacs25-lucid (>=25~) | emacs24-nox | emacs24 (>=24~) | emacs24-lucid (>=24~),
  gdb [!ia64 !mips !mips64el !kfreebsd-any !alpha !hppa] <!nocheck>,
+ git <!nocheck>,
  gnupg <!nocheck>,
  gpgsm <!nocheck>,
  libgmime-3.0-dev (>= 3.0.3~),
-- 
2.35.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/4] perf-test: add tests for notmuch-git
  2022-07-03 15:10 performance improvements for notmuch git checkout David Bremner
  2022-07-03 15:11 ` [PATCH 1/4] debian: add git as a build-dependency, for the test suite David Bremner
@ 2022-07-03 15:11 ` David Bremner
  2022-07-05 16:40   ` Tomi Ollila
  2022-07-03 15:11 ` [PATCH 3/4] CLI/git: current cache contents (file list) of index David Bremner
  2022-07-03 15:11 ` [PATCH 4/4] CLI/git: replace calls to notmuch-search with database access David Bremner
  3 siblings, 1 reply; 14+ messages in thread
From: David Bremner @ 2022-07-03 15:11 UTC (permalink / raw)
  To: notmuch

The main focus of these initial tests is the (currently unacceptably
slow) checkout performance.
---
 performance-test/T07-git.sh | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
 create mode 100755 performance-test/T07-git.sh

diff --git a/performance-test/T07-git.sh b/performance-test/T07-git.sh
new file mode 100755
index 00000000..11dfec05
--- /dev/null
+++ b/performance-test/T07-git.sh
@@ -0,0 +1,23 @@
+#!/usr/bin/env bash
+
+test_description='notmuch-git'
+
+. $(dirname "$0")/perf-test-lib.sh || exit 1
+
+time_start
+
+time_run 'init' "notmuch git init"
+
+time_run 'commit --force' "notmuch git commit --force"
+time_run 'commit' "notmuch git -l error commit"
+time_run 'commit' "notmuch git -l error commit"
+
+time_run 'checkout' "notmuch git checkout"
+
+time_run 'tag -inbox' "notmuch tag -inbox '*'"
+
+time_run 'checkout --force' "notmuch git checkout --force"
+
+
+
+time_done
-- 
2.35.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/4] CLI/git: current cache contents (file list) of index
  2022-07-03 15:10 performance improvements for notmuch git checkout David Bremner
  2022-07-03 15:11 ` [PATCH 1/4] debian: add git as a build-dependency, for the test suite David Bremner
  2022-07-03 15:11 ` [PATCH 2/4] perf-test: add tests for notmuch-git David Bremner
@ 2022-07-03 15:11 ` David Bremner
  2022-07-03 15:11 ` [PATCH 4/4] CLI/git: replace calls to notmuch-search with database access David Bremner
  3 siblings, 0 replies; 14+ messages in thread
From: David Bremner @ 2022-07-03 15:11 UTC (permalink / raw)
  To: notmuch

Rather than shelling out once per message to get the list of files
corresponding to tags, it is much faster (although potentially a bit
memory intensive) to read them all at once.
---
 notmuch-git.py | 58 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/notmuch-git.py b/notmuch-git.py
index b3ae044e..a3ae15f7 100644
--- a/notmuch-git.py
+++ b/notmuch-git.py
@@ -738,6 +738,7 @@ class PrivateIndex:
         self.lastmod = None
         self.checksum = None
         self._load_cache_file()
+        self.file_tree = None
         self._index_tags()
 
     def __enter__(self):
@@ -763,6 +764,43 @@ class PrivateIndex:
             _LOG.error("Error decoding cache")
             _sys.exit(1)
 
+    @timed
+    def _read_file_tree(self):
+        self.file_tree = {}
+
+        with _git(
+                args = ['ls-files', 'tags'],
+                additional_env = {'GIT_INDEX_FILE': self.index_path},
+                stdout = _subprocess.PIPE) as git:
+            for file in git.stdout:
+                dir = _os.path.dirname(file)
+                tag = _os.path.basename(file).rstrip()
+                if dir not in self.file_tree:
+                    self.file_tree[dir] = [tag]
+                else:
+                    self.file_tree[dir].append(tag)
+
+
+    def _clear_tags_for_message(self, id):
+        """
+        Clear any existing index entries for message 'id'
+
+        Neither 'id' nor the tags in 'tags' should be encoded/escaped.
+        """
+
+        if self.file_tree == None:
+            self._read_file_tree()
+
+        dir = _id_path(id)
+
+        if dir not in self.file_tree:
+            return
+
+        for file in self.file_tree[dir]:
+            line = '0 0000000000000000000000000000000000000000\t{:s}/{:s}\n'.format(dir,file)
+            yield line
+
+
     @timed
     def _index_tags(self):
         "Write notmuch tags to private git index."
@@ -798,7 +836,7 @@ class PrivateIndex:
                         if tag.startswith(prefix)]
                     id = _xapian_unquote(string=id)
                     if clear_tags:
-                        for line in _clear_tags_for_message(index=self.index_path, id=id):
+                        for line in self._clear_tags_for_message(id=id):
                             git.stdin.write(line)
                     for line in _index_tags_for_message(
                             id=id, status='A', tags=tags):
@@ -835,24 +873,6 @@ def _read_index_checksum (index_path):
     except FileNotFoundError:
         return None
 
-
-def _clear_tags_for_message(index, id):
-    """
-    Clear any existing index entries for message 'id'
-
-    Neither 'id' nor the tags in 'tags' should be encoded/escaped.
-    """
-
-    dir = _id_path(id)
-
-    with _git(
-            args=['ls-files', dir],
-            additional_env={'GIT_INDEX_FILE': index},
-            stdout=_subprocess.PIPE) as git:
-        for file in git.stdout:
-            line = '0 0000000000000000000000000000000000000000\t{:s}\n'.format(file.strip())
-            yield line
-
 def _read_database_lastmod():
     with _spawn(
             args=['notmuch', 'count', '--lastmod', '*'],
-- 
2.35.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] CLI/git: replace calls to notmuch-search with database access
  2022-07-03 15:10 performance improvements for notmuch git checkout David Bremner
                   ` (2 preceding siblings ...)
  2022-07-03 15:11 ` [PATCH 3/4] CLI/git: current cache contents (file list) of index David Bremner
@ 2022-07-03 15:11 ` David Bremner
  2022-07-07 14:51   ` Tomi Ollila
  2022-07-15 14:43   ` [PATCH v2] CLI/git: opportunistically use bindings to check for known messages David Bremner
  3 siblings, 2 replies; 14+ messages in thread
From: David Bremner @ 2022-07-03 15:11 UTC (permalink / raw)
  To: notmuch

This introduces a dependency on the (new) python bindings, but since
it also yields a 4x performance improvement on the large performance
corpus, I think it is worth it.
---
 debian/control   |  1 +
 notmuch-git.py   | 18 +++++++++---------
 test/T850-git.sh |  2 ++
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/debian/control b/debian/control
index 0ffe958c..7099fe97 100644
--- a/debian/control
+++ b/debian/control
@@ -73,6 +73,7 @@ Depends:
  git,
  notmuch,
  python3,
+ python3-notmuch2
  ${misc:Depends}
 Description: thread-based email index, search and tagging
  Notmuch is a system for indexing, searching, reading, and tagging
diff --git a/notmuch-git.py b/notmuch-git.py
index a3ae15f7..eac24a46 100644
--- a/notmuch-git.py
+++ b/notmuch-git.py
@@ -701,21 +701,21 @@ def _is_unmerged(ref='@{upstream}'):
 
 @timed
 def get_status():
+    from notmuch2 import Database
+
     status = {
         'deleted': {},
         'missing': {},
         }
     with PrivateIndex(repo=NOTMUCH_GIT_DIR, prefix=TAG_PREFIX) as index:
         maybe_deleted = index.diff(filter='D')
-        for id, tags in maybe_deleted.items():
-            (_, stdout, stderr) = _spawn(
-                args=['notmuch', 'search', '--output=files', 'id:{0}'.format(id)],
-                stdout=_subprocess.PIPE,
-                wait=True)
-            if stdout:
-                status['deleted'][id] = tags
-            else:
-                status['missing'][id] = tags
+        with Database() as notmuch:
+            for id, tags in maybe_deleted.items():
+                try:
+                    _ =  notmuch.find(id)
+                    status['deleted'][id] = tags
+                except LookupError:
+                    status['missing'][id] = tags
         status['added'] = index.diff(filter='A')
 
     return status
diff --git a/test/T850-git.sh b/test/T850-git.sh
index 81400328..b1bc9e7e 100755
--- a/test/T850-git.sh
+++ b/test/T850-git.sh
@@ -7,6 +7,8 @@ if [ $NOTMUCH_HAVE_SFSEXP -ne 1 ]; then
     test_done
 fi
 
+export PYTHONPATH="$NOTMUCH_BUILDDIR/bindings/python-cffi/build/stage:${PYTHONPATH:+:$PYTHONPATH}"
+
 # be very careful using backup_database / restore_database in this
 # file, as they fool the cache invalidation checks in notmuch-git.
 
-- 
2.35.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] perf-test: add tests for notmuch-git
  2022-07-03 15:11 ` [PATCH 2/4] perf-test: add tests for notmuch-git David Bremner
@ 2022-07-05 16:40   ` Tomi Ollila
  2022-07-07  9:47     ` David Bremner
  0 siblings, 1 reply; 14+ messages in thread
From: Tomi Ollila @ 2022-07-05 16:40 UTC (permalink / raw)
  To: David Bremner, notmuch

On Sun, Jul 03 2022, David Bremner wrote:

> The main focus of these initial tests is the (currently unacceptably
> slow) checkout performance.
> ---
>  performance-test/T07-git.sh | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>  create mode 100755 performance-test/T07-git.sh
>
> diff --git a/performance-test/T07-git.sh b/performance-test/T07-git.sh
> new file mode 100755
> index 00000000..11dfec05
> --- /dev/null
> +++ b/performance-test/T07-git.sh
> @@ -0,0 +1,23 @@
> +#!/usr/bin/env bash
> +
> +test_description='notmuch-git'
> +
> +. $(dirname "$0")/perf-test-lib.sh || exit 1
> +
> +time_start
> +
> +time_run 'init' "notmuch git init"
> +
> +time_run 'commit --force' "notmuch git commit --force"
> +time_run 'commit' "notmuch git -l error commit"
> +time_run 'commit' "notmuch git -l error commit"
> +
> +time_run 'checkout' "notmuch git checkout"
> +
> +time_run 'tag -inbox' "notmuch tag -inbox '*'"
> +
> +time_run 'checkout --force' "notmuch git checkout --force"
> +
> +
> +

Is three empty lines a bit excessive...?

> +time_done
> -- 
> 2.35.2
>
> _______________________________________________
> notmuch mailing list -- notmuch@notmuchmail.org
> To unsubscribe send an email to notmuch-leave@notmuchmail.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] perf-test: add tests for notmuch-git
  2022-07-05 16:40   ` Tomi Ollila
@ 2022-07-07  9:47     ` David Bremner
  0 siblings, 0 replies; 14+ messages in thread
From: David Bremner @ 2022-07-07  9:47 UTC (permalink / raw)
  To: Tomi Ollila, notmuch

Tomi Ollila <tomi.ollila@iki.fi> writes:

>> +
>> +
>
> Is three empty lines a bit excessive...?
>
>> +time_done

Fixed in git.

d

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] debian: add git as a build-dependency, for the test suite
  2022-07-03 15:11 ` [PATCH 1/4] debian: add git as a build-dependency, for the test suite David Bremner
@ 2022-07-07 10:30   ` David Bremner
  0 siblings, 0 replies; 14+ messages in thread
From: David Bremner @ 2022-07-07 10:30 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> This is needed to run (and test) notmuch-git.

applied this one patch to master

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] CLI/git: replace calls to notmuch-search with database access
  2022-07-03 15:11 ` [PATCH 4/4] CLI/git: replace calls to notmuch-search with database access David Bremner
@ 2022-07-07 14:51   ` Tomi Ollila
  2022-07-07 15:59     ` David Bremner
  2022-07-15 14:43   ` [PATCH v2] CLI/git: opportunistically use bindings to check for known messages David Bremner
  1 sibling, 1 reply; 14+ messages in thread
From: Tomi Ollila @ 2022-07-07 14:51 UTC (permalink / raw)
  To: David Bremner, notmuch

On Sun, Jul 03 2022, David Bremner wrote:

> This introduces a dependency on the (new) python bindings, but since
> it also yields a 4x performance improvement on the large performance
> corpus, I think it is worth it.
> ---
>  debian/control   |  1 +
>  notmuch-git.py   | 18 +++++++++---------
>  test/T850-git.sh |  2 ++
>  3 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/debian/control b/debian/control
> index 0ffe958c..7099fe97 100644
> --- a/debian/control
> +++ b/debian/control
> @@ -73,6 +73,7 @@ Depends:
>   git,
>   notmuch,
>   python3,
> + python3-notmuch2
>   ${misc:Depends}
>  Description: thread-based email index, search and tagging
>   Notmuch is a system for indexing, searching, reading, and tagging
> diff --git a/notmuch-git.py b/notmuch-git.py
> index a3ae15f7..eac24a46 100644
> --- a/notmuch-git.py
> +++ b/notmuch-git.py
> @@ -701,21 +701,21 @@ def _is_unmerged(ref='@{upstream}'):
>  
>  @timed
>  def get_status():
> +    from notmuch2 import Database
> +
>      status = {
>          'deleted': {},
>          'missing': {},
>          }
>      with PrivateIndex(repo=NOTMUCH_GIT_DIR, prefix=TAG_PREFIX) as index:
>          maybe_deleted = index.diff(filter='D')
> -        for id, tags in maybe_deleted.items():
> -            (_, stdout, stderr) = _spawn(
> -                args=['notmuch', 'search', '--output=files', 'id:{0}'.format(id)],
> -                stdout=_subprocess.PIPE,
> -                wait=True)
> -            if stdout:
> -                status['deleted'][id] = tags
> -            else:
> -                status['missing'][id] = tags
> +        with Database() as notmuch:
> +            for id, tags in maybe_deleted.items():
> +                try:
> +                    _ =  notmuch.find(id)

One extra space above.

For me looking further here stalled to the introduced python bindings
dependency -- it make is much harder to make working notmuch-git
installation...

... currently it is enough that notmuch(1) binary is in PATH -- I
personally just build new versions of notmuch and copy it to $HOME/bin
and it just works(tm). Same with nmbug. To get python bindings work one has
to be able to build the c module, and then copy the set of python files (or
make zip archive) to a directory (along w/ that c module -- where did that
get copied cannot remember now...:)

I've trying to think if there were a way to somehow run only one notmuch
command instead of notmuch search on all maeby-deleted files -- or
alternatively attempt to load python bindings and in case of failure use
the notmuch-search methid...

Tomi

> +                    status['deleted'][id] = tags
> +                except LookupError:
> +                    status['missing'][id] = tags
>          status['added'] = index.diff(filter='A')
>  
>      return status
> diff --git a/test/T850-git.sh b/test/T850-git.sh
> index 81400328..b1bc9e7e 100755
> --- a/test/T850-git.sh
> +++ b/test/T850-git.sh
> @@ -7,6 +7,8 @@ if [ $NOTMUCH_HAVE_SFSEXP -ne 1 ]; then
>      test_done
>  fi
>  
> +export PYTHONPATH="$NOTMUCH_BUILDDIR/bindings/python-cffi/build/stage:${PYTHONPATH:+:$PYTHONPATH}"
> +
>  # be very careful using backup_database / restore_database in this
>  # file, as they fool the cache invalidation checks in notmuch-git.
>  
> -- 
> 2.35.2
>
> _______________________________________________
> notmuch mailing list -- notmuch@notmuchmail.org
> To unsubscribe send an email to notmuch-leave@notmuchmail.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] CLI/git: replace calls to notmuch-search with database access
  2022-07-07 14:51   ` Tomi Ollila
@ 2022-07-07 15:59     ` David Bremner
  2022-07-09 19:35       ` Michael J Gruber
  0 siblings, 1 reply; 14+ messages in thread
From: David Bremner @ 2022-07-07 15:59 UTC (permalink / raw)
  To: Tomi Ollila, notmuch

Tomi Ollila <tomi.ollila@iki.fi> writes:

> On Sun, Jul 03 2022, David Bremner wrote:
>
> I've trying to think if there were a way to somehow run only one notmuch
> command instead of notmuch search on all maeby-deleted files -- or
> alternatively attempt to load python bindings and in case of failure use
> the notmuch-search methid...
>

We could run one notmuch-dump command to get a list of the message-ids
in the database, and build a dictionary in memory. Might be a bit slow?
Here it would take 10-15s just do the dump. But certainly faster than
500K execs of notmuch search

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] CLI/git: replace calls to notmuch-search with database access
  2022-07-07 15:59     ` David Bremner
@ 2022-07-09 19:35       ` Michael J Gruber
  0 siblings, 0 replies; 14+ messages in thread
From: Michael J Gruber @ 2022-07-09 19:35 UTC (permalink / raw)
  To: David Bremner; +Cc: Tomi Ollila, notmuch

Am Do., 7. Juli 2022 um 17:59 Uhr schrieb David Bremner <david@tethera.net>:
>
> Tomi Ollila <tomi.ollila@iki.fi> writes:
>
> > On Sun, Jul 03 2022, David Bremner wrote:
> >
> > I've trying to think if there were a way to somehow run only one notmuch
> > command instead of notmuch search on all maeby-deleted files -- or
> > alternatively attempt to load python bindings and in case of failure use
> > the notmuch-search methid...
> >
>
> We could run one notmuch-dump command to get a list of the message-ids
> in the database, and build a dictionary in memory. Might be a bit slow?
> Here it would take 10-15s just do the dump. But certainly faster than
> 500K execs of notmuch search

When I first saw that `notmuch-git` is implemented in python and calls
out to `notmuch search` I wondered: Why doesn't it use the python
bindings?

I don't think "building the project partially and installing by
copying parts somewhere" is a use case that the design implementation
has to cater for, especially if this incurs performance penalties.
Scripting around `notmuch dump` does not make things better.

I do understand that you want lean dependencies server side, but
having python there isn't really uncommon, is it?

If building and installing from git via `make install` is too much of
a hassle we should probably work on reducing the hassle ;)

Cheers
Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] CLI/git: opportunistically use bindings to check for known messages
  2022-07-03 15:11 ` [PATCH 4/4] CLI/git: replace calls to notmuch-search with database access David Bremner
  2022-07-07 14:51   ` Tomi Ollila
@ 2022-07-15 14:43   ` David Bremner
  2022-07-16 19:23     ` Tomi Ollila
  1 sibling, 1 reply; 14+ messages in thread
From: David Bremner @ 2022-07-15 14:43 UTC (permalink / raw)
  To: David Bremner, notmuch

If the bindings are installed, use them to avoid one exec of notmuch
search per message.
---
 notmuch-git.py | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

I decided to leave the old (slow) path in for now, since it is fast
enough for use of nmbug to manage notmuch developement tags.


diff --git a/notmuch-git.py b/notmuch-git.py
index 4d9887c8..ceb86fbc 100644
--- a/notmuch-git.py
+++ b/notmuch-git.py
@@ -698,6 +698,32 @@ def _is_unmerged(ref='@{upstream}'):
         stdout=_subprocess.PIPE, wait=True)
     return base != fetch_head
 
+class DatabaseCache:
+    def __init__(self):
+        try:
+            from notmuch2 import Database
+            self._notmuch = Database()
+        except ImportError:
+            self._notmuch = None
+        self._known = {}
+
+    def known(self,id):
+        if id in self._known:
+            return self._known[id];
+
+        if self._notmuch:
+            try:
+                _ = self._notmuch.find(id)
+                self._known[id] = True
+            except LookupError:
+                self._known[id] = False
+        else:
+            (_, stdout, stderr) = _spawn(
+                args=['notmuch', 'search', '--output=files', 'id:{0}'.format(id)],
+                stdout=_subprocess.PIPE,
+                wait=True)
+            self._known[id] = stdout != None
+        return self._known[id]
 
 @timed
 def get_status():
@@ -705,14 +731,11 @@ def get_status():
         'deleted': {},
         'missing': {},
         }
+    db = DatabaseCache()
     with PrivateIndex(repo=NOTMUCH_GIT_DIR, prefix=TAG_PREFIX) as index:
         maybe_deleted = index.diff(filter='D')
         for id, tags in maybe_deleted.items():
-            (_, stdout, stderr) = _spawn(
-                args=['notmuch', 'search', '--output=files', 'id:{0}'.format(id)],
-                stdout=_subprocess.PIPE,
-                wait=True)
-            if stdout:
+            if db.known(id):
                 status['deleted'][id] = tags
             else:
                 status['missing'][id] = tags
-- 
2.35.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] CLI/git: opportunistically use bindings to check for known messages
  2022-07-15 14:43   ` [PATCH v2] CLI/git: opportunistically use bindings to check for known messages David Bremner
@ 2022-07-16 19:23     ` Tomi Ollila
  2022-07-17  0:51       ` David Bremner
  0 siblings, 1 reply; 14+ messages in thread
From: Tomi Ollila @ 2022-07-16 19:23 UTC (permalink / raw)
  To: David Bremner, notmuch

On Fri, Jul 15 2022, David Bremner wrote:

> If the bindings are installed, use them to avoid one exec of notmuch
> search per message.

tnx. continues to work for me where I have symlink to nmbug in ~/bin/.
some time in the future i'll investigate whether i get 
python3 path/to/nmbug.zip ... working but not today...

the series looks good to me.

Tomi

> ---
>  notmuch-git.py | 33 ++++++++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 5 deletions(-)
>
> I decided to leave the old (slow) path in for now, since it is fast
> enough for use of nmbug to manage notmuch developement tags.
>
>
> diff --git a/notmuch-git.py b/notmuch-git.py
> index 4d9887c8..ceb86fbc 100644
> --- a/notmuch-git.py
> +++ b/notmuch-git.py
> @@ -698,6 +698,32 @@ def _is_unmerged(ref='@{upstream}'):
>          stdout=_subprocess.PIPE, wait=True)
>      return base != fetch_head
>  
> +class DatabaseCache:
> +    def __init__(self):
> +        try:
> +            from notmuch2 import Database
> +            self._notmuch = Database()
> +        except ImportError:
> +            self._notmuch = None
> +        self._known = {}
> +
> +    def known(self,id):
> +        if id in self._known:
> +            return self._known[id];
> +
> +        if self._notmuch:
> +            try:
> +                _ = self._notmuch.find(id)
> +                self._known[id] = True
> +            except LookupError:
> +                self._known[id] = False
> +        else:
> +            (_, stdout, stderr) = _spawn(
> +                args=['notmuch', 'search', '--output=files', 'id:{0}'.format(id)],
> +                stdout=_subprocess.PIPE,
> +                wait=True)
> +            self._known[id] = stdout != None
> +        return self._known[id]
>  
>  @timed
>  def get_status():
> @@ -705,14 +731,11 @@ def get_status():
>          'deleted': {},
>          'missing': {},
>          }
> +    db = DatabaseCache()
>      with PrivateIndex(repo=NOTMUCH_GIT_DIR, prefix=TAG_PREFIX) as index:
>          maybe_deleted = index.diff(filter='D')
>          for id, tags in maybe_deleted.items():
> -            (_, stdout, stderr) = _spawn(
> -                args=['notmuch', 'search', '--output=files', 'id:{0}'.format(id)],
> -                stdout=_subprocess.PIPE,
> -                wait=True)
> -            if stdout:
> +            if db.known(id):
>                  status['deleted'][id] = tags
>              else:
>                  status['missing'][id] = tags
> -- 
> 2.35.1
>
> _______________________________________________
> notmuch mailing list -- notmuch@notmuchmail.org
> To unsubscribe send an email to notmuch-leave@notmuchmail.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] CLI/git: opportunistically use bindings to check for known messages
  2022-07-16 19:23     ` Tomi Ollila
@ 2022-07-17  0:51       ` David Bremner
  0 siblings, 0 replies; 14+ messages in thread
From: David Bremner @ 2022-07-17  0:51 UTC (permalink / raw)
  To: Tomi Ollila, notmuch

Tomi Ollila <tomi.ollila@iki.fi> writes:

> On Fri, Jul 15 2022, David Bremner wrote:
>
>> If the bindings are installed, use them to avoid one exec of notmuch
>> search per message.
>
> tnx. continues to work for me where I have symlink to nmbug in ~/bin/.
> some time in the future i'll investigate whether i get 
> python3 path/to/nmbug.zip ... working but not today...
>
> the series looks good to me.
>
> Tomi

(remaining patches of) series applied to master

d

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-07-17  0:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-03 15:10 performance improvements for notmuch git checkout David Bremner
2022-07-03 15:11 ` [PATCH 1/4] debian: add git as a build-dependency, for the test suite David Bremner
2022-07-07 10:30   ` David Bremner
2022-07-03 15:11 ` [PATCH 2/4] perf-test: add tests for notmuch-git David Bremner
2022-07-05 16:40   ` Tomi Ollila
2022-07-07  9:47     ` David Bremner
2022-07-03 15:11 ` [PATCH 3/4] CLI/git: current cache contents (file list) of index David Bremner
2022-07-03 15:11 ` [PATCH 4/4] CLI/git: replace calls to notmuch-search with database access David Bremner
2022-07-07 14:51   ` Tomi Ollila
2022-07-07 15:59     ` David Bremner
2022-07-09 19:35       ` Michael J Gruber
2022-07-15 14:43   ` [PATCH v2] CLI/git: opportunistically use bindings to check for known messages David Bremner
2022-07-16 19:23     ` Tomi Ollila
2022-07-17  0:51       ` David Bremner

Code repositories for project(s) associated with this inbox:

	notmuch.git.git (no URL configured)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).