From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id A31734196F0 for ; Sat, 6 Nov 2010 13:12:39 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -1.89 X-Spam-Level: X-Spam-Status: No, score=-1.89 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_MIME_NO_TEXT=0.01] autolearn=ham Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aV2PvNtRPruO for ; Sat, 6 Nov 2010 13:12:25 -0700 (PDT) Received: from tarap.cc.columbia.edu (tarap.cc.columbia.edu [128.59.29.7]) by olra.theworths.org (Postfix) with ESMTP id 93AD340D14D for ; Sat, 6 Nov 2010 13:12:25 -0700 (PDT) Received: from servo.finestructure.net (cpe-74-66-82-137.nyc.res.rr.com [74.66.82.137]) (user=jgr2110 author=jrollins@finestructure.net mech=PLAIN bits=0) by tarap.cc.columbia.edu (8.14.4/8.14.3) with ESMTP id oA6KCNAj016264 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT) for ; Sat, 6 Nov 2010 16:12:24 -0400 (EDT) Received: from jrollins by servo.finestructure.net with local (Exim 4.72) (envelope-from ) id 1PEp7e-0000GJ-DN for notmuch@notmuchmail.org; Sat, 06 Nov 2010 16:12:22 -0400 From: Jameson Rollins To: Notmuch Mail Subject: notmuch for documents User-Agent: Notmuch/0.4 (http://notmuchmail.org) Emacs/23.2.1 (i486-pc-linux-gnu) Date: Sat, 06 Nov 2010 16:12:17 -0400 Message-ID: <87k4kqp5y6.fsf@servo.finestructure.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" X-No-Spam-Score: Local X-Scanned-By: MIMEDefang 2.68 on 128.59.29.7 X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Nov 2010 20:12:39 -0000 --=-=-= A little while ago on #notmuch, madduck and ojwb mentioned that they thought notmuch was overly focused on mail. At the time I thought this was a silly criticism and defended notmuch as doing what it does *really* well and that we shouldn't expect notmuch to be all things to all people. Yesterday, however, I had the profound realization that madduck and ojwb are right. Notmuch stores database entries for email messages. However, these messages are nothing more than simple rfc5322 [0] structured documents. They include nothing more than headers and a text body. Imagine now that I have a collection of ebooks, each stored in a single rfc5322-formatted text file: ------------------------------------------------------------ From: Italo Calvino Subject: If on a winter's night a traveler Date: 1979 You are about to begin reading Italo Calvino's new novel, ... ------------------------------------------------------------ I store them all in a directory. I now create a NOTMUCH_CONFIG with a database.path that points to that directory, and run notmuch. Notmuch works *out of the box* (almost) perfectly to index my collection of ebooks. All the notmuch commands work exactly as expected. I can search through the bodies, search the titles, search for an author, search for a publication date, etc. The emacs interface even works as expected. Try it: it really works! There are only a couple of very little things that are a little funky: * the "headers" in my ebooks aren't exactly intuitive ("From" instead of "Author", "Subject" instead of "Title", etc.) and there are some missing headers ("Publisher"). I also had to format some of them in a strange way (I had to add "" in the "From" field in order to get it to index properly for some reason). * The documentation keeps referring to "messages", even though my documents are books. And there are some subcommands that don't seem to make sense ("reply" to a book?). But that's it! Everything else works as a perfect ebook indexer. I can of course even add tags to my books. Beautiful. It's really quite incredible how well it works for this out of the box. The only other issue is that my ebooks don't come in rfc5322-formatted files. I have to translate them for notmuch to work. So what would have to be tweaked in notmuch to make it work even better as an ebook indexer? * add some sort of translator to extract the "headers" and "body" from my non-rfc5322-formatted ebook files * allow me to specify which "headers" from my ebooks I want indexed ("Author", "Publisher", etc.) * tweak notmuch show to just open the ebook itself in an ebook reader instead of outputting it to stdout * tweak the documentation Those are not very big changes. And yet, with these changes notmuch can now work for *many* other large classes of structured documents. Another real world example: I have hundreds of scientific journal articles on my computer. They are all pdf files and each has a corresponding bibtex entry in a flat text file. If notmuch could read the headers from the bibtex file and the body from the text in the pdf (ps2ascii), notmuch would work *perfectly* as an indexer for my scientific journal articles. So what do people think about this idea? Does it make sense to look into extending notmuch to handle non-mail documents? We definitely would *not* want to compromise notmuch as a mail indexer/reader. Notmuch is the best damn mail system there ever was and we wouldn't want to mess with that. Does abstracting everything in notmuch from "messages" -> "documents" hurt it as a mail system? What if just the back-end were abstracted, to allow for different front-ends for different classes of documents, i.e. "messages", "articles", "books", "rss feeds", etc.? Are there any big problems with this proposal that I'm overlooking? I'm very interested to hear what others think about this idea. jamie. [0] http://tools.ietf.org/html/rfc5322 --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAEBCAAGBQJM1bahAAoJEO00zqvie6q8dqAP/iWt3A64h397BZmg1jDmfLRq QmYG7uqKYOhVAzx5nwyUishOgf7dyJnD3FJv/KYc5WH5XvBGu6zIjoSoo708Qt0d JnG1upGysiarL+BDDii1N3EMicWuBBCFsIXW69opWlshQ7k7Jyk6fFmkBMNA4IS5 jhFuzGFlA/28mePVkRusqVOd0yPeddWwKjaFJAr8OnKJvuuqPrw6PQPEs5u/DCfF Pp51z4gssYOukT/NoTH4oEHsiordZhlET7WAmGDoo9mj1MEI7VDPTTm4BMzBnVnf WbFBKdlR4KvAb566A+0c5RssNLkx6Vsc9NkQQL/MgbwoJHLlHQmPnuzh+qSgE9EJ nL1XWAejDL5dTtgt+L/qq4xHN1a5yUnWHpXkj7Msfz99gmQDnKzLypkCbA6wuK2g PJGQtntuhTRvhqvVHFbAy/vbN7CDVygzKTb7OzyhhWunV04Nsi4b7hvnjLJla+ZG bB3MtjQzvWRFym4/KyXURn0tEKWAEuMUksep0Nxul0goO6Ikbw9At3bdfixv2RxX Hm30REQAIIBEbDPJ54o1A1EKugLCrd/NZq/oWxOFWtAk3WvLVEnXfYUVuCEjNyOl 8FyTVgP5f7ljz+6PVovhFJhlgb7L2a+32SCrBHr1Io59SsHYNs6A8lzg0FweObGp H04BMMxuYJ0BAVZiPXHs =vkhz -----END PGP SIGNATURE----- --=-=-=--