unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Correcting message references
@ 2023-04-21 15:11 Al Haji-Ali
  2023-04-22 11:09 ` David Bremner
  0 siblings, 1 reply; 8+ messages in thread
From: Al Haji-Ali @ 2023-04-21 15:11 UTC (permalink / raw)
  To: notmuch


I am running into an issue where I created a draft (let call it "D") as a reply to a message (let's call it "A") that I don't have in my database by setting "In-Reply-To" in the draft to "A".
I accidentally left over a "References" field which contains the ID of another message ("B"). This grouped the new draft "D" with the message "B" when viewing search results. I saved the draft and ran `notmuch new` before realizing my mistake.

I changed the message, removed "B" from "References" and deleted the files of all old (and intermediate) drafts that have "B" in "References".  But no matter what I do, I have "B" grouped with "D" and any other messages which I create with "In-Reply-To" being "A".

I suspect that somewhere in the database the IDs of "A" and "B" are linked now. Is there a way (short of deleting the database and re-indexing) to correct this and remove this connection?

Thanks,
-- Al

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Correcting message references
  2023-04-21 15:11 Correcting message references Al Haji-Ali
@ 2023-04-22 11:09 ` David Bremner
  2023-04-22 12:20   ` Al Haji-Ali
  0 siblings, 1 reply; 8+ messages in thread
From: David Bremner @ 2023-04-22 11:09 UTC (permalink / raw)
  To: Al Haji-Ali, notmuch

[-- Attachment #1: Type: text/plain, Size: 1777 bytes --]

Al Haji-Ali <abdo.haji.ali@gmail.com> writes:

> I changed the message, removed "B" from "References" and deleted the
> files of all old (and intermediate) drafts that have "B" in
> "References".  But no matter what I do, I have "B" grouped with "D"
> and any other messages which I create with "In-Reply-To" being "A".

How did you find the files to delete? One trap to watch out for is that
if using notmuch, you should use notmuch search --exlude=false, to make
sure messages are not being hidden because of their tags.

> I suspect that somewhere in the database the IDs of "A" and "B" are
> linked now. Is there a way (short of deleting the database and
> re-indexing) to correct this and remove this connection?

The database does not store relationships explicitely, only via messages
with references to other messages. At a high level you can try the
attached script to get a picture of the corresponding thread.

If you can't run the script, or it doesn't help, you can interrogate the
database directly without going through notmuch.

if the message-id of B is 'foo@example.org' you can search with for
replies with xapian-delve (in xapian-tools on Debian and derivatives).

xapian-delve -d .local/share/notmuch/default/xapian \
             -t 'XREPLYTOfoo@example.org'

and for references

xapian-delve -d .local/share/notmuch/default/xapian \
             -t 'XREFERENCEfoo@example.org'

That will give you Xapian record numbers, and you can turn those into
files with something like

xapian-delve -d .local/share/notmuch/default/xapian -r 801793 -1 | \
             perl -ne  's/XF(D|O).*?:// && print'

For records with multiple files, you will have to figure out with file
goes with which directory (or just find the file names, which supposed
ot be unique).



[-- Attachment #2: draw-thread --]
[-- Type: application/octet-stream, Size: 948 bytes --]

#!/bin/bash

# This script can be used like
# NOTMUCH_CONFIG=test/tmp.T580-thread-search/notmuch-config \
#    devel/draw-thread thread:0000000000000002 | dot -Tpdf > thread2.pdf

# In addition to notmuch, you will need the following tools installed
# - graphviz
# - mblaze (or replace the call to mhdr)

threadid=$1

declare -a edges

declare -a dest
echo "digraph \"$threadid\" {"
for messageid in $(notmuch search --exclude=false --output=messages $threadid); do
    echo "subgraph \"cluster_$messageid\" {"
    printf "\"%s\" [shape=folder];\n" ${messageid#id:}
    for file in $(notmuch search --exclude=false --output=files $messageid); do
        node=$(basename $file)
        printf "\"%s\" [shape=note];\n" $node

        mapfile -t dest < <(mhdr -hReferences $file | tr '<>,' '"" ')
        edge="\"$node\" -> { ${dest[*]} }"
        edges+=($edge)
    done
    echo "}"
done

for edge in "${edges[*]}"; do
    echo $edge
done

echo "}"

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Correcting message references
  2023-04-22 11:09 ` David Bremner
@ 2023-04-22 12:20   ` Al Haji-Ali
  2023-04-22 16:37     ` David Bremner
  0 siblings, 1 reply; 8+ messages in thread
From: Al Haji-Ali @ 2023-04-22 12:20 UTC (permalink / raw)
  To: David Bremner, notmuch

[-- Attachment #1: Type: text/plain, Size: 1818 bytes --]

Thanks David for the script and the instruction. I am still not sure where notmuch is getting the association between my 5 messages in the thread.

The attached pdf is the output of the script. As you can see the draft "draft-m25y9oqhep.fsf@gmail.com" which contains the header

,----
| From: Al Haji-Ali <abdo.haji.ali@gmail.com>
| To: bug-gnu-emacs@gnu.org, monnier@iro.umontreal.ca, Eli Zaretskii <eliz@gnu.org>
| Subject: bug#53632: Function definition history
| In-Reply-To: <jwvczk7opm8.fsf-monnier+emacs@gnu.org>
| Message-ID: <draft-m25y9oqhep.fsf@gmail.com>
| Date: Sat, 22 Apr 2023 12:34:54 +0100
| X-Notmuch-Emacs-Draft: True
| MIME-Version: 1.0
| Content-Type: text/plain
`----

is completely unconnected the other 4 messages in the thread. Note that if I change the "In-Reply-To" field in this message to anything else, notmuch no longer groups these 5 messages into a single thread.

I tried searching for "jwvczk7opm8.fsf-monnier+emacs@gnu.org" using `xapian-delve` but got

,----
| term 'jwvczk7opm8.fsf-monnier+emacs@gnu.org' not in database
`----

Finally, I tried grepping for the same ID in my notmuch folder (with all mails and database) and got two hits (actually three including this message which I am currently writing):

,----
| ./Drafts/cur/1682165661.M337064P23717.m2air.local,U=151:2,DS:In-Reply-To: <jwvczk7opm8.fsf-monnier+emacs@gnu.org>
| Binary file ./.notmuch/xapian/postlist.glass matches
`----

The first one is the draft. The second hit is the reason I thought the only place left for notmuch to associate these messages is in the xapian database. Note that if I delete the draft and reindex, only the postlist.glass hit stubbornly remains and there seems to be no way to make notmuch forget about this ID. 

I am running notmuch 0.37 and xapian 1.4.21 if that's relevant.

-- Al


[-- Attachment #2: thread2.pdf --]
[-- Type: application/pdf, Size: 32820 bytes --]

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Correcting message references
  2023-04-22 12:20   ` Al Haji-Ali
@ 2023-04-22 16:37     ` David Bremner
  2023-04-22 16:44       ` David Bremner
  2023-04-23  8:19       ` Al Haji-Ali
  0 siblings, 2 replies; 8+ messages in thread
From: David Bremner @ 2023-04-22 16:37 UTC (permalink / raw)
  To: Al Haji-Ali, notmuch

Al Haji-Ali <abdo.haji.ali@gmail.com> writes:


> is completely unconnected the other 4 messages in the thread. Note
> that if I change the "In-Reply-To" field in this message to anything
> else, notmuch no longer groups these 5 messages into a single thread.
>

Yes, that's puzzling. I did not think about "ghost messages" (see below)
when writing that script, so maybe that's the issue. 

> I tried searching for "jwvczk7opm8.fsf-monnier+emacs@gnu.org" using `xapian-delve` but got
>
> ,----
> | term 'jwvczk7opm8.fsf-monnier+emacs@gnu.org' not in database
> `----

You need to give the appropriate term prefix. Q for message id,
or XREPLYTO or XREFERENCE as in my last message.

> The first one is the draft. The second hit is the reason I thought the
> only place left for notmuch to associate these messages is in the
> xapian database. Note that if I delete the draft and reindex, only the
> postlist.glass hit stubbornly remains and there seems to be no way to
> make notmuch forget about this ID.

As long as some message refers to that ID, notmuch will create a "ghost
message", used for threading.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Correcting message references
  2023-04-22 16:37     ` David Bremner
@ 2023-04-22 16:44       ` David Bremner
  2023-04-23  8:19       ` Al Haji-Ali
  1 sibling, 0 replies; 8+ messages in thread
From: David Bremner @ 2023-04-22 16:44 UTC (permalink / raw)
  To: Al Haji-Ali, notmuch

>
> As long as some message refers to that ID, notmuch will create a "ghost
> message", used for threading.

You can look for a specific ghost message with something like

$ quest -btype:T -b id:Q -d .local/share/notmuch/default/xapian \
     "type:ghost and id:jwvczk7opm8.fsf-monnier+emacs@gnu.org"

quest is also part of xapian-tools. Unfortunately I don't think quest
understands the way notmuch uses multiletter prefixes (without a :), so
to find references you still need to use xapian-delve.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Correcting message references
  2023-04-22 16:37     ` David Bremner
  2023-04-22 16:44       ` David Bremner
@ 2023-04-23  8:19       ` Al Haji-Ali
       [not found]         ` <87r0s8141e.fsf@tethera.net>
  1 sibling, 1 reply; 8+ messages in thread
From: Al Haji-Ali @ 2023-04-23  8:19 UTC (permalink / raw)
  To: David Bremner, notmuch


On 22/04/2023, David Bremner wrote:
> You need to give the appropriate term prefix. Q for message id,
> or XREPLYTO or XREFERENCE as in my last message.
My apologies. I misunderstood the syntax.

>> The first one is the draft. The second hit is the reason I thought the
>> only place left for notmuch to associate these messages is in the
>> xapian database. Note that if I delete the draft and reindex, only the
>> postlist.glass hit stubbornly remains and there seems to be no way to
>> make notmuch forget about this ID.
>
> As long as some message refers to that ID, notmuch will create a "ghost
> message", used for threading.
I've deleted all messages/draft referring to this message ID, then got these results:


$ export MSG_ID=jwvczk7opm8.fsf-monnier+emacs@gnu.org
$ export NM_DB=~/.mail/.notmuch/xapian

$ xapian-delve -d $NM_DB -t "XREPLYTO${MSG_ID}"
term 'XREPLYTOjwvczk7opm8.fsf-monnier+emacs@gnu.org' not in database

$ xapian-delve -d $NM_DB -t "XREFERENCE${MSG_ID}"
term 'XREFERENCEjwvczk7opm8.fsf-monnier+emacs@gnu.org' not in database

$ quest -btype:T -b id:Q -d ~/.mail/.notmuch/xapian "id:${MSG_ID}"
Parsed Query: Query(0 * (Qjwvczk7opm8.fsf-monnier+emacs@gnu.org AND Tghost))
Exactly 1 matches
MSet:
75982: [0]

$ xapian-delve -d $NM_DB -r 75982 -1
Data for record #75982:

Term List for record #75982:
G000000000000cc96
Qjwvczk7opm8.fsf-monnier+emacs@gnu.org
Tghost


So it does seem to be a lingering ghost message, but I am sure that there are no messages in the database referring to this ID (except messages in this current thread which have the ID in the message body).
I don't know why this particular ID is associated to messages in another seemingly unrelated thread as you in the pdf.

Is there a way to remove this ghost message record somehow to test it? Or is there a better way of figuring this out.

Best regards,
-- Al

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Correcting message references
       [not found]         ` <87r0s8141e.fsf@tethera.net>
@ 2023-04-25 19:40           ` David Bremner
  2023-04-25 21:20             ` Al Haji-Ali
  0 siblings, 1 reply; 8+ messages in thread
From: David Bremner @ 2023-04-25 19:40 UTC (permalink / raw)
  To: Al Haji-Ali, notmuch

[-- Attachment #1: Type: text/plain, Size: 1696 bytes --]

David Bremner <david@tethera.net> writes:

> Al Haji-Ali <abdo.haji.ali@gmail.com> writes:
>
>> So it does seem to be a lingering ghost message, but I am sure that there are no messages in the database referring to this ID (except messages in this current thread which have the ID in the message body).
>> I don't know why this particular ID is associated to messages in another seemingly unrelated thread as you in the pdf.
>>
>> Is there a way to remove this ghost message record somehow to test it? Or is there a better way of figuring this out.
>
> It turns out notmuch does not remove ghost messages until all the other
> messages in the thread are deleted. I guess if you temporarily move
> the other messages in the thread out of the way and run notmuch new, the
> ghost message should be deleted.
>
> I don't know how often this lazy deletion is a problem. Deleting
> messages is already a bottleneck in notmuch-new so I am a bit hesitant
> to make it more complicated. It is possible to "garbage collect"
> unreferenced ghost messages. I'll have to think about how big a
> performance hit it would be to add this to notmuch new.
>
> d

Here is a prototype standalone program to find lingering unreferenced
ghosts.  I find 33 (out of about 60k total ghost messages) in about 0.3s
on this laptop. Currently it does not modify the database, but the next
step would be to delete the documents rather than just printing them
out.

If you have libxapian-dev (or equivalent) installed you can build it
with

$ c++ ggc.cc -o ggc -lxapian

and then run it

$ ./ggc ~/.local/share/notmuch/default/xapian

I would be interested if it finds your problematic ghost message (and
how long it takes).



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ggc.cc --]
[-- Type: text/x-c++src, Size: 922 bytes --]

#include <xapian.h>
#include <iostream>
int main(int argc, char **argv){
  if (argc != 2) {
    fprintf (stderr, "usage: ggc xapian-database\n");
    exit (1);
  }

  Xapian::Database db(argv[1]);
  Xapian::Enquire enquire(db);

  enquire.set_query(Xapian::Query("Tghost"));

  auto mset = enquire.get_mset (0,db.get_doccount ());

  for (auto iter=mset.begin (); iter != mset.end(); iter++){
      std::string mid;
      auto doc = iter.get_document ();
      auto term_iter = doc.termlist_begin ();

      term_iter.skip_to ("Q");
      mid=(*term_iter).substr(1);

      std::string ref_term = "XREFERENCE" + mid;
      auto ref_count = db.get_termfreq (ref_term);

      std::string reply_term = "XREPLYTO" + mid;
      auto reply_count = db.get_termfreq (reply_term);

      if (ref_count+reply_count == 0){
	  std::cout << "docid=" <<  *iter;
	  std::cout << " mid=" << mid;
	  std::cout << std::endl;
      }
  }
}

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Correcting message references
  2023-04-25 19:40           ` David Bremner
@ 2023-04-25 21:20             ` Al Haji-Ali
  0 siblings, 0 replies; 8+ messages in thread
From: Al Haji-Ali @ 2023-04-25 21:20 UTC (permalink / raw)
  To: David Bremner, notmuch


On 25/04/2023, David Bremner wrote:
> I would be interested if it finds your problematic ghost message (and
> how long it takes).

Thanks! This is much quicker than a script that I wrote using quest and xapian-delve (which took
minutes!)

Your code took 0.03 seconds to find 74 unreferenced ghost messages out of 9335 ghost messages,
I can't imagine why so many un-referenced ghost messages were
created. 47 of the 74 messages have "draft" in the ID (seemingly created by notmuch).

At first your code didn't find my problematic message (which caused a draft with the ID <jwvczk7opm8.fsf-monnier+emacs@gnu.org> in `In-Reply-To` to be grouped with unrelated messages from a completely separate thread).
But then I deleted the draft (including the file), ran `notmuch new` and re-ran the script and the problematic ghost message was correctly reported.

So this approach would work to find un-referenced messages, but not messages which are being erroneously grouped (without first deleting the offending message), correct?

-- Al

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-25 21:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-21 15:11 Correcting message references Al Haji-Ali
2023-04-22 11:09 ` David Bremner
2023-04-22 12:20   ` Al Haji-Ali
2023-04-22 16:37     ` David Bremner
2023-04-22 16:44       ` David Bremner
2023-04-23  8:19       ` Al Haji-Ali
     [not found]         ` <87r0s8141e.fsf@tethera.net>
2023-04-25 19:40           ` David Bremner
2023-04-25 21:20             ` Al Haji-Ali

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).