From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 49766431FAF for ; Sun, 1 Jul 2012 09:02:12 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.798 X-Spam-Level: X-Spam-Status: No, score=-0.798 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ppDft23gV5PP for ; Sun, 1 Jul 2012 09:02:10 -0700 (PDT) Received: from mail-vc0-f181.google.com (mail-vc0-f181.google.com [209.85.220.181]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id EE8B7431FAE for ; Sun, 1 Jul 2012 09:02:09 -0700 (PDT) Received: by vcbf1 with SMTP id f1so3668325vcb.26 for ; Sun, 01 Jul 2012 09:02:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=UeoCrT7iWqZ6Z/rdq9M9nP8td9UW6zCNti7yveOS2lo=; b=TX0jkZbJQgm7exG1oTm8YIZj1IuG2umya2Rg8J0sAl36nFu6YpnI+yQl/TdOJK9+3R Xz9uDmaB/QbggCiLiN+n2kd6OSvJ9c0mfoJL7YCHipW9T4mKuAWmcx3zg5MiB2fhaZfr mbGMms5oVoY9+xDh8oWEdMD8VKApBBD7RmmghG3F4bVmLNnTVR/6uTfQXbdbYE4fjTRA +E3BVE16aFo+AAfVlUBRg5rl9t6HQHb4xmCy9paxhMdHLwBuOtlNJVcD1M9jPREB4HPY 5H84BlTUinzBGrETQ56Q8I10R+vmypPJYUvvYF9Qktrv8TjE18XE6aSpPSNAlssnnVze 6Gcw== MIME-Version: 1.0 Received: by 10.52.17.207 with SMTP id q15mr3929265vdd.49.1341158528204; Sun, 01 Jul 2012 09:02:08 -0700 (PDT) Received: by 10.220.6.3 with HTTP; Sun, 1 Jul 2012 09:02:08 -0700 (PDT) In-Reply-To: <87k3yrmahu.fsf@qmul.ac.uk> References: <1340656899-5644-1-git-send-email-ethan@betacantrips.com> <877gutnmf1.fsf@qmul.ac.uk> <87k3yrmahu.fsf@qmul.ac.uk> Date: Sun, 1 Jul 2012 12:02:08 -0400 Message-ID: Subject: Re: [RFC PATCH 00/14] modular mail stores based on URIs From: Ethan To: Mark Walters Content-Type: multipart/alternative; boundary=bcaec5040a4ca93e8104c3c6cd04 Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Jul 2012 16:02:12 -0000 --bcaec5040a4ca93e8104c3c6cd04 Content-Type: text/plain; charset=ISO-8859-1 Thanks for going through it, I know there's a lot to go through.. On Thu, Jun 28, 2012 at 4:45 PM, Mark Walters wrote: > I was thinking of just having one mail root and inside that there could > be maildirs and mboxes. Everything would still be relative to the root. > I'm hesitant to have directories that contain maildirs and mboxes. It should be possible to unambiguously distinguish between a maildir file and an mbox file (mboxes always start with "From ", no colon) but it sounds kind of fragile. > 1. Are URIs the way to specify individual messages, despite bremner's > > concerns about too much of the API being strings? Is adding another > library > > is the easiest way to parse URIs? > > In my opinion the nice thing about using strings is that it does not > require > any changes to the Xapian database to store them. I think using URIs may > not be best though as they seem to be annoying to parse (as filenames > can contain the same characters) and you seem to need to work around the > parser in some cases. > I think that's more the fault of the parser than of the URIs. If glib came with a parser, that would be great. There aren't a lot of options for pure-C URI parsing. Besides uriparser, there's also some code in the W3C sample code library, but it looked like integrating it would be a pain so I let it go. I wonder if the following would be practical: use // as the field > separator: > > e.g. mbox://filename//start_of_message+length > > I think 2 consecutive slashes // is about the only thing we can assume > is not in the path or filename. Since it is not in the filename I think > parsing should be trivial (thus avoiding the extra library). > Can you explain what you mean when you say that two consecutive slashes can't appear in a URL? Ordinary filesystem paths can contain them, and so can file: URLs. (I just looked up file:///home/ethan///////tmp and Firefox handled that OK.) I've sometimes seen machine-generated filenames with double slashes because that way you don't have to make sure the incoming filename was correctly terminated before adding another level. > Secondly, I would prefer to keep maildirs as just the bare file name: so > the existence of // can be the signal that there is some other > scheme. This is asymmetric, but is rather more backwardly compatible. > Based on your and Jani's reasoning, I did this. Revised patch series follows. Ethan --bcaec5040a4ca93e8104c3c6cd04 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks for going through it, I know there's a lot to go through..
On Thu, Jun 28, 2012 at 4:45 PM, Mark Walters = <markwalters1009@gmail.com> wrote:
I was thinking of just having one mail root = and inside that there could
be maildirs and mboxes. Everything would still be relative to the root.
=

I'm hesitant to have directories that contain mai= ldirs and mboxes. It should be possible to unambiguously distinguish betwee= n a maildir file and an mbox file (mboxes always start with "From &quo= t;, no colon) but it sounds kind of fragile.

> =A01. Are URIs the way to specify individual messages, despite bremner= 's
> =A0concerns about too much of the API being strings? Is adding another= library
> =A0is the easiest way to parse URIs?

In my opinion =A0the nice thing about using strings is that it do= es not require
any changes to the Xapian database to store them. I think using URIs may not be best though as they seem to be annoying to parse (as filenames
can contain the same characters) and you seem to need to work around the parser in some cases.

I think that's more the = fault of the parser than of the URIs. If glib came with a parser, that woul= d be great. There aren't a lot of options for pure-C URI parsing. Besid= es uriparser, there's also some code in the W3C sample code library, bu= t it looked like integrating it would be a pain so I let it go.

I wonder if the following would be practical: use // as the field
separator:

e.g. mbox://filename//start_of_message+length

I think 2 consecutive slashes // is about the only thing we can assume
is not in the path or filename. Since it is not in the filename I think
parsing should be trivial (thus avoiding the extra library).

Can you explain what you mean when you say that two consecutive = slashes can't appear in a URL? Ordinary filesystem paths can contain th= em, and so can file: URLs. (I just looked up file:///home/ethan///////tmp a= nd Firefox handled that OK.) I've sometimes seen machine-generated fil= enames with double slashes because that way you don't have to make sure= the incoming filename was correctly terminated before adding another level= .
=A0
Secondly, I would prefer to keep maildirs as just the bare file name: so the existence of // can be the signal that there is some other
scheme. This is asymmetric, but is rather more backwardly compatible.

Based on your and Jani's reasoning, I did this. Rev= ised patch series follows.

Ethan

--bcaec5040a4ca93e8104c3c6cd04--