From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id E70D0431FB6 for ; Mon, 25 Jun 2012 13:42:03 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 1.061 X-Spam-Level: * X-Spam-Status: No, score=1.061 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_BL_SPAMCOP_NET=1.246, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_SORBS_WEB=0.614] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id etYpz1mvgVZv for ; Mon, 25 Jun 2012 13:42:03 -0700 (PDT) Received: from mail-wg0-f45.google.com (mail-wg0-f45.google.com [74.125.82.45]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id DEA6E431FAF for ; Mon, 25 Jun 2012 13:42:02 -0700 (PDT) Received: by wgbdt14 with SMTP id dt14so3702167wgb.2 for ; Mon, 25 Jun 2012 13:42:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:x-mailer; bh=RaH7ALy2EvXXsJ0kTVpbL/8cRdvGWoxTlwrSJ6urX9g=; b=KWIpgHQ4iA2AB5+p47f4RUnlF3zdGYVeFodXyna0XrPge9JpMkRICA52ASbMFyXloK cuSKFmZpBlJMN3/hwU4rNXrDUUxnZL8IA1az4Y4sqezra1pL5dgd3rrnykN1SLHNK9v+ C5hu7tdQBsAB8G8EJdtS4iI0oMMcW+qKElWcyKSjwmfhbyK+GscNQo1lIwlXTd3eZJmz cwiTV21WXLvDdl2s16B6oR2mIp7duIVEkDZe+gQlzd53fn1h3ANa59DEW4UvdaoFwBV0 7QS/jIjtuO9TBFdDq//SVHQAV9mP1PJJUos105M2ie24rBhCHaF5xk6s1usOkJiltj/L uQWw== Received: by 10.180.93.99 with SMTP id ct3mr27120427wib.13.1340656919909; Mon, 25 Jun 2012 13:41:59 -0700 (PDT) Received: from localhost ([195.24.209.21]) by mx.google.com with ESMTPS id eu4sm11406143wib.2.2012.06.25.13.41.55 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 25 Jun 2012 13:41:59 -0700 (PDT) From: Ethan Glasser-Camp To: notmuch@notmuchmail.org Subject: [RFC PATCH 00/14] modular mail stores based on URIs Date: Mon, 25 Jun 2012 16:41:25 -0400 Message-Id: <1340656899-5644-1-git-send-email-ethan@betacantrips.com> X-Mailer: git-send-email 1.7.9.5 X-Mailman-Approved-At: Tue, 26 Jun 2012 03:51:54 -0700 X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jun 2012 20:42:04 -0000 Hi guys, Sorry for dropping off the mailing list after I sent my last patch series (http://notmuchmail.org/pipermail/notmuch/2012/009470.html). I haven't had the time or a stable enough email address to really follow notmuch development :) I signed onto #notmuch a week or two ago and asked what I would need to do to get a feature like this one into mainline. j4ni told me that he agreed with the feedback to my original patch series, and suggested that I follow mjw1009's advice of having filenames encode all information about mail storage transparently, and that this would solve the problem with the original patch series of sprinkling mail storage parameters all over the place. bremner suggested that he had been thinking about how to support mbox or other multiple-message archives, and also commented that he wasn't crazy about so much of the API being in strings. Based on this advice, I decided to revise my approach to this patchset, one that is based around the stated desire to work with mbox formats. This approach, in contrast to the mailstore approach that Michal Sojka proposed and I revised, encodes all mail access information as URIs. These URIs are stored in Xapian the way that relative paths are right now. Examples might be: maildir:///home/ethan/Mail/folder/cur/filename:2,S mbox:///home/ethan/Mail/folder/file.mbox#byte-offset+lenght couchdb://ethan:password@localhost:8080/some-doc-id Personally, this isn't my favorite approach, for the following reasons: 1. Notmuch, at some point in its history, chose to store file paths relative to a "mail database", with the intent that if this mail database was moved, filenames would not change and everything would Just Work (tm). The above scheme completely reverses this design decision, and in general completely breaks this relocatability. I don't see any easy way to handle this problem. This isn't just a wishlist feature; at least two things in the test suite (caching of corpus.mail, and the atomicity tests) rely on this behavior. 2. Mail access information, i.e. open connections, etc. can only be stored in variables global to the mailstore code, and cannot be stored as private members of a mailstore object. This is more an aesthetic concern than a functional one. Anyhow, the following (enormous) patch series implement this design. I used uriparser as an external library to parse URIs. The API for this library is a little idiosyncratic. uriparser supports parsing Unicode URIs (strings of wchar_t), but I just used ASCII filenames because I think that's what comes out of Xapian. Patch 11 is borrowed directly from the last patch series. The last four or five patches add mbox support, including a few tests. That part of the series is still very first-draft: I added a new config option to specify URIs to scan, and ">From " lines still need to be unescaped. However, we support scanning mbox files whether messages have content-length or not. I will try to receive feedback on this series more gratefully than the last one. :) Thanks again for your time, Ethan