unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [RFC] Gnus generalized search, part II
@ 2017-04-21 21:35 Eric Abrahamsen
  2017-04-22  0:16 ` Andrew Cohen
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Eric Abrahamsen @ 2017-04-21 21:35 UTC (permalink / raw)
  To: emacs-devel; +Cc: ding

[-- Attachment #1: Type: text/plain, Size: 5782 bytes --]

I've been working on a generalized searching for Gnus, where a single
query language gets translated into different search-engine-appropriate
strings. This allows searching multiple backends at once. It's more or
less working, and I'm attaching the new version of the nnir.el file here
which can be used as a drop-in replacement for the existing file.
Ideally, if accepted, this would get rebased on top of Andy Cohen's
reworking of nnir/nnselect.

How it works:

The query entered by the user is parsed into a sexp structure, and then
each engine is responsible for interpreting that.

For instance, you mark one IMAP group, and one maildir group (indexed
with notmuch). Then you enter a query:

"from:john after:1w or -mark:!"

Internally, this becomes:
((from . "john") (or (since 14 4 2017) (not (mark . "flag"))))

The imap engine turns that into:
"FROM john OR SINCE 14-Apr-2017 UNFLAGGED"

And the notmuch engine turns it into:
"from:john date:4/14/2017.. or not tag:flag"

Results from both servers are put in the same summary buffer.

That's pretty much it, I hope people will be interested in this. I've
started writing tests, and will do documentation if this is accepted.
I've pasted the complete docstring of nnir-search-parse-query below.

---------------------
Notes for the curious:

The search engines are now implemented as classes. This allowed for
factoring out a bunch of common code.

I nearly set this up for running multiple searches each in their own
thread, allowing for limited search concurrency. I backed off at the
last minute because of weird IMAP behavior, but the code is pretty much
set up for threads, if IMAP can get sorted out.

I re-implemented a limited version of the IMAP LITERAL+ code I wrote
years ago. If the server advertises support, searches for non-ASCII
strings will make use the LITERAL+ mechanism. ¡¡Turning this on enforces
CHARSET UTF-8!! Ie, the assumption is that if a server can handle
LITERAL+, it can handle CHARSET UTF-8. This is probably totally wrong,
but it would be easy to shut off, or fix if I can figure out how to
DTRT.

So far as I can tell, Hyrex and Swish-e are defunct. They're still in
there, but their search transformation is lacking because there are no
good docs.

Namazu docs are also lacking: they give the examples of searching on
"message-id", "from", and "subject" headers, but are there more? I don't
know. I can't test because mknmz errors on my machine.

Things I'd like to add:

1. Support for IMAP MULTISEARCH and FUZZY
2. A command to automatically update all engine indexes.
3. Regular expression searches for engines that support them.
4. Engines for lucene, solr, raw xapian, sphinx... What else are people
   using? There's a base class for locally-indexed search engines, so
   these should be easy to add.
5. Create an offline index of gmane messages, to be updated monthly. The
   gmane search engine would search locally but request remotely (only
   partly joking).

------------------------------
nnir-search-parse-query is a Lisp closure.

(nnir-search-parse-query STRING)

Turn STRING into an s-expression based query.

The resulting query structure is passed to the various search
backends, each of which adapts it as needed.

The search "language" is essentially a series of key:value
expressions.  Key is most often a mail header, but there are
other keys.  Value is a string, quoted if it contains spaces.
Key and value are separated by a colon, no space.  Expressions
are implictly ANDed; the "or" keyword can be used to
OR. "not" will negate the following expression, or keys can be
prefixed with a "-".  The "near" operator will work for
engines that understand it; other engines will convert it to
"or".  Parenthetical groups work as expected.

A key that matches the name of a mail header will search that
header.

Search keys can be abbreviated so long as they remain
unambiguous, ie "f" will search the "from" header. "s" will raise an
error.

Other keys:

"address" will search all sender and recipient headers.

"recipient" will search "To", "Cc", and "Bcc".

"before" will search messages sent before the specified
date (date specifications to come later).  Date is exclusive.

"after" (or its synonym "since") will search messages sent
after the specified date.  Date is inclusive.

"mark" will search messages that have some sort of mark.
Likely values include "flag", "seen", "read", "replied".
It’s also possible to use Gnus’ internal marks, ie "mark:R"
will be interpreted as mark:read.

"tag" will search tags -- right now that’s translated to
"keyword" in IMAP, and left as "tag" for notmuch. At some
point this should also be used to search marks in the Gnus
registry.

"contact" will search messages to/from a contact.  Contact
management packages must push a function onto
‘nnir-search-contact-sources’, the docstring of which see, for
this to work.

"contact-from" does what you’d expect.

"contact-to" searches the same headers as "recipient".

Other keys can be specified, provided that the search backends
know how to interpret them.

Date values (any key in ‘nnir-search-date-keys’) can be provided
in any format that ‘parse-time-string’ can parse (note that this
can produce weird results).  Dates with missing bits will be
interpreted as the most recent occurance thereof (ie "march 03"
is the most recent March 3rd).  Lastly, relative specifications
such as 1d (one day ago) are understood.  This also accepts w, m,
and y.  m is assumed to be 30 days.

This function will accept pretty much anything as input. Its only job is
to parse the query into a sexp, and pass that on -- it is the job of the
search backends to make sense of the structured query. Malformed,
unusable or invalid queries will typically be silently ignored.

[-- Attachment #2: nnir.el --]
[-- Type: application/emacs-lisp, Size: 89882 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-04-30 17:46 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-21 21:35 [RFC] Gnus generalized search, part II Eric Abrahamsen
2017-04-22  0:16 ` Andrew Cohen
2017-04-22  5:27   ` Eric Abrahamsen
2017-04-22  8:08     ` Eli Zaretskii
2017-04-22 15:08       ` Eric Abrahamsen
2017-04-22 15:17         ` Eli Zaretskii
2017-04-22 15:25           ` Eli Zaretskii
2017-04-22 19:25           ` Eric Abrahamsen
2017-04-22 20:06             ` Eli Zaretskii
2017-04-22 22:50               ` Eric Abrahamsen
2017-04-30 17:46                 ` Eric Abrahamsen
2017-04-24 17:17             ` Stephen Leake
2017-04-26  9:42               ` Eli Zaretskii
2017-04-22 16:00         ` Noam Postavsky
2017-04-22  7:50   ` Eli Zaretskii
2017-04-22  8:00     ` Andrew Cohen
2017-04-22 19:53 ` Lars Ingebrigtsen
2017-04-22 20:26   ` Eric Abrahamsen
2017-04-24 20:30   ` Eric Abrahamsen
2017-04-26  4:41     ` Andrew Cohen
2017-04-26  9:21       ` Joakim Jalap
2017-04-26  8:18     ` Andrew Cohen
2017-04-26  8:22     ` Andrew Cohen
2017-04-23 13:48 ` Dan Christensen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).