unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] allow to not sort the search results
@ 2010-04-14  6:30 Sebastian Spaeth
  2010-04-14  6:55 ` Jason White
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Spaeth @ 2010-04-14  6:30 UTC (permalink / raw)
  To: Notmuch developer list

previously we were always sorting the returned results by some string value,
but sometimes we might just be interested in the number of results, and don't
need any sorting.

Also add a --sort=unsorted command line option to notmuch search to test this.
A search that matches 1200 messages, returns in default sort in 0.982 seconds
and unsorted in 0.978 seconds with very little variance (with a warm cache).
Xapian contributor Olly Betts says that the speed gains for a cold cache are
likely to be much higher though.

Signed-off-by: Sebastian Spaeth <Sebastian@SSpaeth.de>
---
 lib/notmuch.h    |    3 ++-
 lib/query.cc     |    2 ++
 notmuch-search.c |    2 ++
 3 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/lib/notmuch.h b/lib/notmuch.h
index a7e66dd..bae48a6 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -346,7 +346,8 @@ notmuch_query_create (notmuch_database_t *database,
 typedef enum {
     NOTMUCH_SORT_OLDEST_FIRST,
     NOTMUCH_SORT_NEWEST_FIRST,
-    NOTMUCH_SORT_MESSAGE_ID
+    NOTMUCH_SORT_MESSAGE_ID,
+    NOTMUCH_SORT_UNSORTED
 } notmuch_sort_t;
 
 /* Specify the sorting desired for this query. */
diff --git a/lib/query.cc b/lib/query.cc
index 10f8dc8..4148f9b 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -148,6 +148,8 @@ notmuch_query_search_messages (notmuch_query_t *query)
 	case NOTMUCH_SORT_MESSAGE_ID:
 	    enquire.set_sort_by_value (NOTMUCH_VALUE_MESSAGE_ID, FALSE);
 	    break;
+        case NOTMUCH_SORT_UNSORTED:
+	    break;
 	}
 
 #if DEBUG_QUERY
diff --git a/notmuch-search.c b/notmuch-search.c
index 4e3514b..854a9ae 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -217,6 +217,8 @@ notmuch_search_command (void *ctx, int argc, char *argv[])
 		sort = NOTMUCH_SORT_OLDEST_FIRST;
 	    } else if (strcmp (opt, "newest-first") == 0) {
 		sort = NOTMUCH_SORT_NEWEST_FIRST;
+	    } else if (strcmp (opt, "unsorted") == 0) {
+		sort = NOTMUCH_SORT_UNSORTED;
 	    } else {
 		fprintf (stderr, "Invalid value for --sort: %s\n", opt);
 		return 1;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] allow to not sort the search results
  2010-04-14  6:30 [PATCH] allow to not sort the search results Sebastian Spaeth
@ 2010-04-14  6:55 ` Jason White
  2010-04-14  7:51   ` Sebastian Spaeth
  0 siblings, 1 reply; 11+ messages in thread
From: Jason White @ 2010-04-14  6:55 UTC (permalink / raw)
  To: notmuch

Sebastian Spaeth <Sebastian@SSpaeth.de> wrote:
> previously we were always sorting the returned results by some string value,
> but sometimes we might just be interested in the number of results, and don't
> need any sorting.
> 
> Also add a --sort=unsorted command line option to notmuch search to test this.

Does this provide relevance-ranked search results? I think relevance ranking
is the Xapian default if a sort order isn't specified. 

It would be useful to be able to obtain a relevance-ranked list of messages or
threads that satisfy a query.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] allow to not sort the search results
  2010-04-14  6:55 ` Jason White
@ 2010-04-14  7:51   ` Sebastian Spaeth
  2010-04-15 12:54     ` Olly Betts
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Spaeth @ 2010-04-14  7:51 UTC (permalink / raw)
  To: Jason White, notmuch

On 2010-04-14, Jason White wrote:
> > Also add a --sort=unsorted command line option to notmuch search to test this.
> 
> Does this provide relevance-ranked search results? I think relevance ranking
> is the Xapian default if a sort order isn't specified. 

Yes, by default it is using sort_by_relevance, so "unsorted" implies
just that. (in fact, a previous incarnation of this patch called it
--sort=relevance)

However, given that many of our terms, are boolean, relevance only
really comes into play when including terms within the body of a
message. (and *I* have not found the relevance sorting to be much useful
yet when I cared about some sort order, but that might just be me).
 
> It would be useful to be able to obtain a relevance-ranked list of messages or
> threads that satisfy a query.

I would be happy to have it called --sort=relevance too, the unsorted
points out potential performance improvements a bit better, IMHO
(although they seem to be really small with a warm cache).

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] allow to not sort the search results
  2010-04-14  7:51   ` Sebastian Spaeth
@ 2010-04-15 12:54     ` Olly Betts
  2010-04-16  6:37       ` Sebastian Spaeth
  0 siblings, 1 reply; 11+ messages in thread
From: Olly Betts @ 2010-04-15 12:54 UTC (permalink / raw)
  To: notmuch

Sebastian Spaeth writes:
> On 2010-04-14, Jason White wrote:
> > > Also add a --sort=unsorted command line option to notmuch search to test
> > > this.
> > 
> > Does this provide relevance-ranked search results? I think relevance ranking
> > is the Xapian default if a sort order isn't specified. 
> 
> Yes, by default it is using sort_by_relevance, so "unsorted" implies
> just that. (in fact, a previous incarnation of this patch called it
> --sort=relevance)

Except notmuch (at least in the code I've looked at) sets the weighting scheme
to BoolWeight, so the ordering is actually just the raw docid ordering
(BoolWeight gives all matching docs a weight of 0).

> I would be happy to have it called --sort=relevance too, the unsorted
> points out potential performance improvements a bit better, IMHO
> (although they seem to be really small with a warm cache).

When using the results of a search to add/remove tags, there's likely to be
an additional win from --sort=unsorted as documents will now be processed
in docid order which will tend to have a more cache friendly locality of
access.

Also, sorting by relevance requires more calculations and may require fetching
additional data (document length for example).

So I think it would make sense for --sort=relevance and --sort=unsorted to be
separate options.

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] allow to not sort the search results
  2010-04-15 12:54     ` Olly Betts
@ 2010-04-16  6:37       ` Sebastian Spaeth
  2010-04-16  6:38         ` [PATCH 1/3] query.cc: allow to return query results unsorted Sebastian Spaeth
                           ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Sebastian Spaeth @ 2010-04-16  6:37 UTC (permalink / raw)
  To: Olly Betts, notmuch

On 2010-04-15, Olly Betts wrote:
 
> > I would be happy to have it called --sort=relevance too, the unsorted
> > points out potential performance improvements a bit better, IMHO
> > (although they seem to be really small with a warm cache).
> 
> When using the results of a search to add/remove tags, there's likely to be
> an additional win from --sort=unsorted as documents will now be processed
> in docid order which will tend to have a more cache friendly locality of
> access.

Olly was right in that even for "notmuch tag" we were sorting the
results by date before applying tag changes. I have slightly reworked my
patch to have notmuch tag avoid doing that. I also split up the patch in
3 patches that do one thing each.

The patches do:
1: Introduce NOTMUCH_SORT_UNSORTED
2: Introduce notmuch search --sort=unsorted
3: Make notmuch tag not sort results by date

#2 is the one I am least sure about, I don't know if there is a use case
for notmuch search returning unsorted results. But 1 & 3 are useful at
least.
 
> Also, sorting by relevance requires more calculations and may require fetching
> additional data (document length for example).
> 
> So I think it would make sense for --sort=relevance and --sort=unsorted to be
> separate options.

Now I am a bit confused. The API docs state that sort_by_relevance is
the default. So by skipping any sort_by_value() will that incur the additional
calculations (with our BoolWeight set?). All I want is the fasted way
to return a searched set of docs :-).

Patches 1-3 follow as reply to this one
Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] query.cc: allow to return query results unsorted
  2010-04-16  6:37       ` Sebastian Spaeth
@ 2010-04-16  6:38         ` Sebastian Spaeth
  2010-04-16  6:38         ` [PATCH 2/3] notmuch-search: Introduce --sort=unsorted Sebastian Spaeth
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Sebastian Spaeth @ 2010-04-16  6:38 UTC (permalink / raw)
  To: Notmuch developer list

Previously, we always sorted the returned results by some string value,
(newest-to-oldest by default), however in some cases (as when applying
tags to a search result) we are not interested in any special order.

This introduces a NOTMUCH_SORT_UNSORTED value that does just that. It is
not used at the moment anywhere in the code.

Signed-off-by: Sebastian Spaeth <Sebastian@SSpaeth.de>
---
 lib/notmuch.h |    3 ++-
 lib/query.cc  |    2 ++
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/lib/notmuch.h b/lib/notmuch.h
index a7e66dd..bae48a6 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -346,7 +346,8 @@ notmuch_query_create (notmuch_database_t *database,
 typedef enum {
     NOTMUCH_SORT_OLDEST_FIRST,
     NOTMUCH_SORT_NEWEST_FIRST,
-    NOTMUCH_SORT_MESSAGE_ID
+    NOTMUCH_SORT_MESSAGE_ID,
+    NOTMUCH_SORT_UNSORTED
 } notmuch_sort_t;
 
 /* Specify the sorting desired for this query. */
diff --git a/lib/query.cc b/lib/query.cc
index 10f8dc8..4148f9b 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -148,6 +148,8 @@ notmuch_query_search_messages (notmuch_query_t *query)
 	case NOTMUCH_SORT_MESSAGE_ID:
 	    enquire.set_sort_by_value (NOTMUCH_VALUE_MESSAGE_ID, FALSE);
 	    break;
+        case NOTMUCH_SORT_UNSORTED:
+	    break;
 	}
 
 #if DEBUG_QUERY
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] notmuch-search: Introduce --sort=unsorted
  2010-04-16  6:37       ` Sebastian Spaeth
  2010-04-16  6:38         ` [PATCH 1/3] query.cc: allow to return query results unsorted Sebastian Spaeth
@ 2010-04-16  6:38         ` Sebastian Spaeth
  2010-04-16  6:38         ` [PATCH 3/3] notmuch-tag: don't sort messages before applying tag changes Sebastian Spaeth
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Sebastian Spaeth @ 2010-04-16  6:38 UTC (permalink / raw)
  To: Notmuch developer list

In some cases, we might not be interested in any special sort order, so
this introduces a --sort=unsorted command line option together with its
documentation.

Signed-off-by: Sebastian Spaeth <Sebastian@SSpaeth.de>
---
 notmuch-search.c |    2 ++
 notmuch.1        |   10 ++++++----
 notmuch.c        |    7 ++++---
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/notmuch-search.c b/notmuch-search.c
index 4e3514b..854a9ae 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -217,6 +217,8 @@ notmuch_search_command (void *ctx, int argc, char *argv[])
 		sort = NOTMUCH_SORT_OLDEST_FIRST;
 	    } else if (strcmp (opt, "newest-first") == 0) {
 		sort = NOTMUCH_SORT_NEWEST_FIRST;
+	    } else if (strcmp (opt, "unsorted") == 0) {
+		sort = NOTMUCH_SORT_UNSORTED;
 	    } else {
 		fprintf (stderr, "Invalid value for --sort: %s\n", opt);
 		return 1;
diff --git a/notmuch.1 b/notmuch.1
index 86830f4..6d4beaf 100644
--- a/notmuch.1
+++ b/notmuch.1
@@ -152,12 +152,14 @@ Presents the results in either JSON or plain-text (default).
 .RE
 .RS 4
 .TP 4
-.BR \-\-sort= ( newest\-first | oldest\-first )
+.BR \-\-sort= ( newest\-first | oldest\-first | unsorted)
 
 This option can be used to present results in either chronological order
-.RB ( oldest\-first )
-or reverse chronological order
-.RB ( newest\-first ).
+.RB ( oldest\-first ),
+reverse chronological order
+.RB ( newest\-first )
+or without any defined sort order
+.RB ( unsorted ).
 
 Note: The thread order will be distinct between these two options
 (beyond being simply reversed). When sorting by
diff --git a/notmuch.c b/notmuch.c
index dcfda32..e31dd88 100644
--- a/notmuch.c
+++ b/notmuch.c
@@ -165,11 +165,12 @@ command_t commands[] = {
       "\t\tPresents the results in either JSON or\n"
       "\t\tplain-text (default)\n"
       "\n"
-      "\t--sort=(newest-first|oldest-first)\n"
+      "\t--sort=(newest-first|oldest-first|unsorted)\n"
       "\n"
       "\t\tPresent results in either chronological order\n"
-      "\t\t(oldest-first) or reverse chronological order\n"
-      "\t\t(newest-first), which is the default.\n"
+      "\t\t(oldest-first),reverse chronological order\n"
+      "\t\t(newest-first), which is the default or\n"
+      "\t\t(unsorted) without any special sort order.\n"
       "\n"
       "\tSee \"notmuch help search-terms\" for details of the search\n"
       "\tterms syntax." },
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] notmuch-tag: don't sort messages before applying tag changes
  2010-04-16  6:37       ` Sebastian Spaeth
  2010-04-16  6:38         ` [PATCH 1/3] query.cc: allow to return query results unsorted Sebastian Spaeth
  2010-04-16  6:38         ` [PATCH 2/3] notmuch-search: Introduce --sort=unsorted Sebastian Spaeth
@ 2010-04-16  6:38         ` Sebastian Spaeth
  2010-04-16  6:58         ` [PATCH] allow to not sort the search results Olly Betts
  2010-04-18 13:56         ` Sebastian Spaeth
  4 siblings, 0 replies; 11+ messages in thread
From: Sebastian Spaeth @ 2010-04-16  6:38 UTC (permalink / raw)
  To: Notmuch developer list

It's not neccessary to sort the results before we apply tags. Xapian
contributor Olly Betts says that savings might be bigger with a cold
file cache and (as unsorted implies really sorted by document id) a better
cache locality when applying tags to messages.

Signed-off-by: Sebastian Spaeth <Sebastian@SSpaeth.de>
---
 notmuch-tag.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/notmuch-tag.c b/notmuch-tag.c
index 8b6f7dc..fd54bc7 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -107,6 +107,9 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 	return 1;
     }
 
+    /* tagging is not interested in any special sort order */
+    notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED);
+
     for (messages = notmuch_query_search_messages (query);
 	 notmuch_messages_valid (messages) && !interrupted;
 	 notmuch_messages_move_to_next (messages))
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] allow to not sort the search results
  2010-04-16  6:37       ` Sebastian Spaeth
                           ` (2 preceding siblings ...)
  2010-04-16  6:38         ` [PATCH 3/3] notmuch-tag: don't sort messages before applying tag changes Sebastian Spaeth
@ 2010-04-16  6:58         ` Olly Betts
  2010-04-18 13:56         ` Sebastian Spaeth
  4 siblings, 0 replies; 11+ messages in thread
From: Olly Betts @ 2010-04-16  6:58 UTC (permalink / raw)
  To: Sebastian Spaeth; +Cc: notmuch

On Fri, Apr 16, 2010 at 08:37:04AM +0200, Sebastian Spaeth wrote:
> On 2010-04-15, Olly Betts wrote:
> > Also, sorting by relevance requires more calculations and may require
> > fetching additional data (document length for example).
> > 
> > So I think it would make sense for --sort=relevance and --sort=unsorted to
> > be separate options.
> 
> Now I am a bit confused. The API docs state that sort_by_relevance is
> the default. So by skipping any sort_by_value() will that incur the additional
> calculations (with our BoolWeight set?). All I want is the fasted way
> to return a searched set of docs :-).

Yes, sort_by_relevance() is the default.  But if you set BoolWeight as the
weighting scheme then the relevance is simply zero, and Xapian doesn't have
to fetch any statistics and calculate a score from them.  When documents
have exactly equal relevance weight, then the docid order is used.  So
although sort_by_relevance() is technically still on with BoolWeight, by
"sorting by relevance" I wasn't talking about this case.

So --sort=unsorted and --sort=relevance would only differ in code by the former
setting BoolWeight and the latter not.

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] allow to not sort the search results
  2010-04-16  6:37       ` Sebastian Spaeth
                           ` (3 preceding siblings ...)
  2010-04-16  6:58         ` [PATCH] allow to not sort the search results Olly Betts
@ 2010-04-18 13:56         ` Sebastian Spaeth
  2010-04-21 23:09           ` Carl Worth
  4 siblings, 1 reply; 11+ messages in thread
From: Sebastian Spaeth @ 2010-04-18 13:56 UTC (permalink / raw)
  To: Olly Betts, notmuch

On 2010-04-16, Sebastian Spaeth wrote:
> Olly was right in that even for "notmuch tag" we were sorting the
> results by date before applying tag changes. I have slightly reworked my
> patch to have notmuch tag avoid doing that. I also split up the patch in
> 3 patches that do one thing each.
> 
> The patches do:
> 1: Introduce NOTMUCH_SORT_UNSORTED
> 2: Introduce notmuch search --sort=unsorted
> 3: Make notmuch tag not sort results by date
> 
> #2 is the one I am least sure about, I don't know if there is a use case
> for notmuch search returning unsorted results. But 1 & 3 are useful at
> least.

> Patches 1-3 follow as reply to this one

May I advocate patches 1 & 3 for inclusion in 0.3? I've been using this
in my tree without problems. patch 2 is left to your judgement as to
whether a "--sort=unsorted" is useful for notmuch search. (it will
probably rather benefit from a --sort=relevance, I guess).

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] allow to not sort the search results
  2010-04-18 13:56         ` Sebastian Spaeth
@ 2010-04-21 23:09           ` Carl Worth
  0 siblings, 0 replies; 11+ messages in thread
From: Carl Worth @ 2010-04-21 23:09 UTC (permalink / raw)
  To: Sebastian Spaeth, Olly Betts, notmuch

[-- Attachment #1: Type: text/plain, Size: 1148 bytes --]

On Sun, 18 Apr 2010 15:56:58 +0200, "Sebastian Spaeth" <Sebastian@SSpaeth.de> wrote:
> > The patches do:
> > 1: Introduce NOTMUCH_SORT_UNSORTED
> > 2: Introduce notmuch search --sort=unsorted
> > 3: Make notmuch tag not sort results by date
> > 
> > #2 is the one I am least sure about, I don't know if there is a use case
> > for notmuch search returning unsorted results. But 1 & 3 are useful at
> > least.
...
> May I advocate patches 1 & 3 for inclusion in 0.3? I've been using this
> in my tree without problems. patch 2 is left to your judgement as to
> whether a "--sort=unsorted" is useful for notmuch search. (it will
> probably rather benefit from a --sort=relevance, I guess).

Done.

I've pushed out 1 & 3 now. I can't find any use case where
--sort=unsorted is interesting for "notmuch search".

If people want to start playing with --sort=relevance then that could be
very interesting. (For me, and for almost all email searches I do, I
almost always want date-based sorting. But if I start having trouble
finding things with particular queries, I might want to play with the
relevance-based stuff.)

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-04-21 23:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-14  6:30 [PATCH] allow to not sort the search results Sebastian Spaeth
2010-04-14  6:55 ` Jason White
2010-04-14  7:51   ` Sebastian Spaeth
2010-04-15 12:54     ` Olly Betts
2010-04-16  6:37       ` Sebastian Spaeth
2010-04-16  6:38         ` [PATCH 1/3] query.cc: allow to return query results unsorted Sebastian Spaeth
2010-04-16  6:38         ` [PATCH 2/3] notmuch-search: Introduce --sort=unsorted Sebastian Spaeth
2010-04-16  6:38         ` [PATCH 3/3] notmuch-tag: don't sort messages before applying tag changes Sebastian Spaeth
2010-04-16  6:58         ` [PATCH] allow to not sort the search results Olly Betts
2010-04-18 13:56         ` Sebastian Spaeth
2010-04-21 23:09           ` Carl Worth

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).