unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* subjects and duplicated message id's
@ 2017-12-14 14:03 David Bremner
  2017-12-14 14:32 ` [PATCH] test: add known broken test for regexp search of second subject David Bremner
  2017-12-14 16:57 ` subjects and duplicated message id's Daniel Kahn Gillmor
  0 siblings, 2 replies; 9+ messages in thread
From: David Bremner @ 2017-12-14 14:03 UTC (permalink / raw)
  To: notmuch


There are currently several somewhat related issues with notmuch's
handling of subject headers for messages with duplicate message-ids
(i.e. several files on disk with the same message id).  These are all
reflections of the fact that we use a value slot for subjects in the
database message document (i.e. the database object keyed by the
message-id).  Among other things, using a value slot is what makes
regular expression searching (and potentially sorting) by subject work.

When we have multiple files with the same message-id, but different
subjects (probably indicating a "real" mid collision).

1. The output of notmuch-show can be inconsistent with notmuch-search

   - this is because show is reading from the lexicographically first
     file, while show is reading the database value slot.

   - in principle this could be fixed by making show read the value
     slot; but then the subject might not be consistent with the rest of
     the message content. Also, it looks like a bit of a pain to refactor
     so all that sprinter code has database access.

   - we could also force the value slot to have the lexico first files'
     subject during indexing. This would be a bit fiddly, but localized.
     It would have the surprising effect of having the subject updated
     when new messages arrived.

2. Regular expression search doesn't work for subjects not in the value
   slot.

   - this could be fixed by putting all subjects in the value slot,
     perhaps as newline seperated strings. This would also be a
     potential solution for the "subject hiding" issue mentioned above,
     although it would take some front-end effort as well to deal with
     "multi-subjects".  This could be reported in e.g. json output as an
     array of subjects.

I'm open to other, better ideas of how to do this. I'm also curious how
important people think these bugs are.

d
           
     

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-05-07  0:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-14 14:03 subjects and duplicated message id's David Bremner
2017-12-14 14:32 ` [PATCH] test: add known broken test for regexp search of second subject David Bremner
2018-05-03 10:52   ` David Bremner
2017-12-14 16:57 ` subjects and duplicated message id's Daniel Kahn Gillmor
2017-12-15  1:23   ` David Bremner
2017-12-19 14:15     ` WIP, all subjects in value slot David Bremner
2017-12-19 14:15       ` [PATCH] WIP: add all subjects to value David Bremner
2018-05-04 13:48         ` Daniel Kahn Gillmor
2018-05-07  0:54           ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).