From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 35E7F431FAF for ; Thu, 28 Jun 2012 00:39:04 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.798 X-Spam-Level: X-Spam-Status: No, score=-0.798 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1MkKDWu5fN7y for ; Thu, 28 Jun 2012 00:39:02 -0700 (PDT) Received: from mail-vc0-f181.google.com (mail-vc0-f181.google.com [209.85.220.181]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id E9A63431FB6 for ; Thu, 28 Jun 2012 00:39:01 -0700 (PDT) Received: by vcbf1 with SMTP id f1so1528629vcb.26 for ; Thu, 28 Jun 2012 00:39:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=mLbRIJZDgr6d1TaGvW5+oCmWHJVufh5MZXpWEh0H90Y=; b=B5m+zqwX9HxzFtkqMGI7yjrd4IkOlXZFJYDFlprGcrtFtq6u57eyW5hGKkJmCb4pUV J3L4GbBLdePJcRxPFhg3Zr79YEJ2JvsjrwBakgJJTKRtJicRwrmC4WKBVmeruyvvZCH0 zD8dJ/IwWx/2J5WIoR75kXlag/DOiaALaUHVrbDBmI693/l7lsCgP9xP5nuTOlGLActr s5Qncl2LQ7cLBBH7jNW6BYQFW8SCPkodq8MxyhtJXIqvRyN3N3s2yjpxssf6emGdLxIR jo6PgtI6we7Nv1CmGx7vKvnzsHKxdGSzF5ICfDJx6VvMODWNpREbr4jxpb0xR/kux2EX RNnw== MIME-Version: 1.0 Received: by 10.52.75.99 with SMTP id b3mr492975vdw.75.1340869140085; Thu, 28 Jun 2012 00:39:00 -0700 (PDT) Received: by 10.220.6.3 with HTTP; Thu, 28 Jun 2012 00:39:00 -0700 (PDT) In-Reply-To: <877gutnmf1.fsf@qmul.ac.uk> References: <1340656899-5644-1-git-send-email-ethan@betacantrips.com> <877gutnmf1.fsf@qmul.ac.uk> Date: Thu, 28 Jun 2012 03:39:00 -0400 Message-ID: Subject: Re: [RFC PATCH 00/14] modular mail stores based on URIs From: Ethan To: Mark Walters Content-Type: multipart/alternative; boundary=bcaec501604bc8fa8004c3836cee X-Mailman-Approved-At: Thu, 28 Jun 2012 07:51:08 -0700 Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jun 2012 07:39:04 -0000 --bcaec501604bc8fa8004c3836cee Content-Type: text/plain; charset=ISO-8859-1 I sent this at first as a reply-only-to-sender. Oops! Sorry Mark for the double send. On Wed, Jun 27, 2012 at 5:17 AM, Mark Walters wrote: > > Personally, this isn't my favorite approach, for the following reasons: > > > > 1. Notmuch, at some point in its history, chose to store file paths > > relative to a "mail database", with the intent that if this mail > > database was moved, filenames would not change and everything would > > Just Work (tm). The above scheme completely reverses this design > > decision, and in general completely breaks this relocatability. I > > don't see any easy way to handle this problem. This isn't just a > > wishlist feature; at least two things in the test suite (caching of > > corpus.mail, and the atomicity tests) rely on this behavior. > > Why can't the URI just store a relative path, at least for maildir:// > and mbox:// ? It is purely internal to notmuch so it doesn't need to be > very standard. > Well, relative to where? This is especially relevant now that we can have multiple mail stores. It sounds like you are suggesting that all mbox:// URIs are relative to an "mbox root", but the fundamental question is how to pass that information from the configuration into the library. Even using configuration itself may be problematic, because only the CLI uses the configuration, and language bindings like Python and Ruby might get out of sync! (But note also that the Python bindings currently use .notmuch-config to find the database path, so maybe it's not a big deal.) If I could do whatever I wanted, every mailstore would get registered somehow and the URIs could use those registered names to specify what they're relative to: maybe using hostname, such as maildir://university-mail/some-mail-file, mbox://old-unix-system/some.mbox. Then changing these names in .notmuch-config would be fine. I just don't know how to pass that configuration information without an approach like in the past patch series. > 2. Mail access information, i.e. open connections, etc. can only be > > stored in variables global to the mailstore code, and cannot be stored > > as private members of a mailstore object. This is more an aesthetic > > concern than a functional one. > > > > Anyhow, the following (enormous) patch series implement this design. I > > used uriparser as an external library to parse URIs. The API for this > > library is a little idiosyncratic. uriparser supports parsing Unicode > > URIs (strings of wchar_t), but I just used ASCII filenames because I > > think that's what comes out of Xapian. > > Why use a library? Isn't it just a question of does the string contain > // and, if so, splitting it? I guess that // is a nice separator as I > think we can assume that a true path does not contain it (since a > filename cannot contain /). > The URIs are true URIs. Filenames are provided by the "path" segment of the uri -- everything from the first slash after the hostname up to a ? for query arguments. My concern was that filenames could (in theory) contain # or ?, and in practice they contain : (maildir flags). I figured it was better to do it right. > Patch 11 is borrowed directly from the last patch series. > > > > The last four or five patches add mbox support, including a few > > tests. That part of the series is still very first-draft: I added a > > new config option to specify URIs to scan, and ">From " lines still > > need to be unescaped. However, we support scanning mbox files whether > > messages have content-length or not. > > I have an idea that mbox byte-locations change when messages are marked > as read (amongst other things). It might be worth saying that this > initial implementation only works for unchanging mboxs (rather than the > append only condition that you currently say). But I have not got as far > as applying/testing the series yet. > Yeah, I don't even know how an mbox message gets flagged read and I don't know how I would support it. Ethan --bcaec501604bc8fa8004c3836cee Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I sent this at first as a reply-only-to-sender. = Oops! Sorry Mark for the double send.

On Wed, Jun 27, 2012 at 5:17 A= M, Mark Walters <markwalters1009@gmail.com> wrote:
> Personally, this isn't my favorite = approach, for the following reasons:
>
> 1. Notmuch, at some point in its history, chose to store file paths > relative to a "mail database", with the intent that if this = mail
> database was moved, filenames would not change and everything would > Just Work (tm). The above scheme completely reverses this design
> decision, and in general completely breaks this relocatability. I
> don't see any easy way to handle this problem. This isn't just= a
> wishlist feature; at least two things in the test suite (caching of > corpus.mail, and the atomicity tests) rely on this behavior.

Why can't the URI just store a relative path, at least for maildi= r://
and mbox:// ? It is purely internal to notmuch so it doesn't need to be=
very standard.
=A0
Well, relative to where? This is= especially relevant now that we can have multiple mail stores. It sounds like you are suggesting that all=20 mbox:// URIs are relative to an "mbox root", but the fundamental= =20 question is how to pass that information from the configuration into the library.

Even using configuration itself may be problematic, because only the CLI uses the configuration, and language bindings like Python and Ruby=20 might get out of sync! (But note also that the Python bindings currently use .notmuch-config to find the database path, so maybe it's not a big= =20 deal.)

If I could do whatever I wanted, every mailstore would get=20 registered somehow and the URIs could use those registered names to=20 specify what they're relative to: maybe using hostname, such as=20 maildir://university-mail/some-mail-file, mbox://old-unix-system/some.mbox. Then changing these names in .notmuch-config would be fine. I just=20 don't know how to pass that configuration information without an=20 approach like in the past patch series.

> 2. Mail access information, i.e. open connections, etc. can only be > stored in variables global to the mailstore code, and cannot be stored=
> as private members of a mailstore object. This is more an aesthetic > concern than a functional one.
>
> Anyhow, the following (enormous) patch series implement this design. I=
> used uriparser as an external library to parse URIs. The API for this<= br> > library is a little idiosyncratic. uriparser supports parsing Unicode<= br> > URIs (strings of wchar_t), but I just used ASCII filenames because I > think that's what comes out of Xapian.

Why use a library? Isn't it just a question of does the string co= ntain
// and, if so, splitting it? I guess that // is a nice separator as I
think we can assume that a true path does not contain it (since a
filename cannot contain /).

The URIs are true URIs= . Filenames are provided by the "path" segment of=20 the uri -- everything from the first slash after the hostname up to a ?=20 for query arguments. My concern was that filenames could (in theory)=20 contain # or ?, and in practice they contain : (maildir flags). I=20 figured it was better to do it right.

> Patch 11 is borrowed directly from the last patch series.
>
> The last four or five patches add mbox support, including a few
> tests. That part of the series is still very first-draft: I added a > new config option to specify URIs to scan, and ">From " l= ines still
> need to be unescaped. However, we support scanning mbox files whether<= br> > messages have content-length or not.

I have an idea that mbox byte-locations change when messages are mark= ed
as read (amongst other things). It might be worth saying that this
initial implementation only works for unchanging mboxs (rather than the
append only condition that you currently say). But I have not got as far as applying/testing the series yet.

Yeah, I don= 9;t even know how an mbox message gets flagged read and I don't know ho= w I would support it.

Ethan

--bcaec501604bc8fa8004c3836cee--