From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ethan.glasser.camp@gmail.com>
Received: from localhost (localhost [127.0.0.1])
	by olra.theworths.org (Postfix) with ESMTP id 49766431FAF
	for <notmuch@notmuchmail.org>; Sun,  1 Jul 2012 09:02:12 -0700 (PDT)
X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
X-Spam-Flag: NO
X-Spam-Score: -0.798
X-Spam-Level: 
X-Spam-Status: No, score=-0.798 tagged_above=-999 required=5
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7]
	autolearn=disabled
Received: from olra.theworths.org ([127.0.0.1])
	by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id ppDft23gV5PP for <notmuch@notmuchmail.org>;
	Sun,  1 Jul 2012 09:02:10 -0700 (PDT)
Received: from mail-vc0-f181.google.com (mail-vc0-f181.google.com
	[209.85.220.181]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
	(No client certificate requested)
	by olra.theworths.org (Postfix) with ESMTPS id EE8B7431FAE
	for <notmuch@notmuchmail.org>; Sun,  1 Jul 2012 09:02:09 -0700 (PDT)
Received: by vcbf1 with SMTP id f1so3668325vcb.26
	for <notmuch@notmuchmail.org>; Sun, 01 Jul 2012 09:02:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=UeoCrT7iWqZ6Z/rdq9M9nP8td9UW6zCNti7yveOS2lo=;
	b=TX0jkZbJQgm7exG1oTm8YIZj1IuG2umya2Rg8J0sAl36nFu6YpnI+yQl/TdOJK9+3R
	Xz9uDmaB/QbggCiLiN+n2kd6OSvJ9c0mfoJL7YCHipW9T4mKuAWmcx3zg5MiB2fhaZfr
	mbGMms5oVoY9+xDh8oWEdMD8VKApBBD7RmmghG3F4bVmLNnTVR/6uTfQXbdbYE4fjTRA
	+E3BVE16aFo+AAfVlUBRg5rl9t6HQHb4xmCy9paxhMdHLwBuOtlNJVcD1M9jPREB4HPY
	5H84BlTUinzBGrETQ56Q8I10R+vmypPJYUvvYF9Qktrv8TjE18XE6aSpPSNAlssnnVze
	6Gcw==
MIME-Version: 1.0
Received: by 10.52.17.207 with SMTP id q15mr3929265vdd.49.1341158528204; Sun,
	01 Jul 2012 09:02:08 -0700 (PDT)
Received: by 10.220.6.3 with HTTP; Sun, 1 Jul 2012 09:02:08 -0700 (PDT)
In-Reply-To: <87k3yrmahu.fsf@qmul.ac.uk>
References: <1340656899-5644-1-git-send-email-ethan@betacantrips.com>
	<877gutnmf1.fsf@qmul.ac.uk>
	<CAOJ+Ob0Kw0Kkhh9C27Xv9gvqtNowzQiNqrLAtvti7fL8NND2+w@mail.gmail.com>
	<87k3yrmahu.fsf@qmul.ac.uk>
Date: Sun, 1 Jul 2012 12:02:08 -0400
Message-ID: <CAOJ+Ob0MSOez2MvD2fCgF7t32kFPk4g2+xCud88QmBLt_b5pOA@mail.gmail.com>
Subject: Re: [RFC PATCH 00/14] modular mail stores based on URIs
From: Ethan <ethan.glasser.camp@gmail.com>
To: Mark Walters <markwalters1009@gmail.com>
Content-Type: multipart/alternative; boundary=bcaec5040a4ca93e8104c3c6cd04
Cc: notmuch@notmuchmail.org
X-BeenThere: notmuch@notmuchmail.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: "Use and development of the notmuch mail system."
	<notmuch.notmuchmail.org>
List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
	<mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
List-Archive: <http://notmuchmail.org/pipermail/notmuch>
List-Post: <mailto:notmuch@notmuchmail.org>
List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
	<mailto:notmuch-request@notmuchmail.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jul 2012 16:02:12 -0000

--bcaec5040a4ca93e8104c3c6cd04
Content-Type: text/plain; charset=ISO-8859-1

Thanks for going through it, I know there's a lot to go through..

On Thu, Jun 28, 2012 at 4:45 PM, Mark Walters <markwalters1009@gmail.com>wrote:

> I was thinking of just having one mail root and inside that there could
> be maildirs and mboxes. Everything would still be relative to the root.
>

I'm hesitant to have directories that contain maildirs and mboxes. It
should be possible to unambiguously distinguish between a maildir file and
an mbox file (mboxes always start with "From ", no colon) but it sounds
kind of fragile.

>  1. Are URIs the way to specify individual messages, despite bremner's
> >  concerns about too much of the API being strings? Is adding another
> library
> >  is the easiest way to parse URIs?
>
> In my opinion  the nice thing about using strings is that it does not
> require
> any changes to the Xapian database to store them. I think using URIs may
> not be best though as they seem to be annoying to parse (as filenames
> can contain the same characters) and you seem to need to work around the
> parser in some cases.
>

I think that's more the fault of the parser than of the URIs. If glib came
with a parser, that would be great. There aren't a lot of options for
pure-C URI parsing. Besides uriparser, there's also some code in the W3C
sample code library, but it looked like integrating it would be a pain so I
let it go.

I wonder if the following would be practical: use // as the field
> separator:
>
> e.g. mbox://filename//start_of_message+length
>
> I think 2 consecutive slashes // is about the only thing we can assume
> is not in the path or filename. Since it is not in the filename I think
> parsing should be trivial (thus avoiding the extra library).
>

Can you explain what you mean when you say that two consecutive slashes
can't appear in a URL? Ordinary filesystem paths can contain them, and so
can file: URLs. (I just looked up file:///home/ethan///////tmp and Firefox
handled that OK.) I've sometimes seen machine-generated filenames with
double slashes because that way you don't have to make sure the incoming
filename was correctly terminated before adding another level.


> Secondly, I would prefer to keep maildirs as just the bare file name: so
> the existence of // can be the signal that there is some other
> scheme. This is asymmetric, but is rather more backwardly compatible.
>

Based on your and Jani's reasoning, I did this. Revised patch series
follows.

Ethan

--bcaec5040a4ca93e8104c3c6cd04
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks for going through it, I know there&#39;s a lot to go through..<br><b=
r><div class=3D"gmail_quote">On Thu, Jun 28, 2012 at 4:45 PM, Mark Walters =
<span dir=3D"ltr">&lt;<a href=3D"mailto:markwalters1009@gmail.com" target=
=3D"_blank">markwalters1009@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">I was thinking of just having one mail root =
and inside that there could<br>
be maildirs and mboxes. Everything would still be relative to the root.<br>=
</blockquote><div><br>I&#39;m hesitant to have directories that contain mai=
ldirs and mboxes. It should be possible to unambiguously distinguish betwee=
n a maildir file and an mbox file (mboxes always start with &quot;From &quo=
t;, no colon) but it sounds kind of fragile.<br>


<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
&gt; =A01. Are URIs the way to specify individual messages, despite bremner=
&#39;s<br>
&gt; =A0concerns about too much of the API being strings? Is adding another=
 library<br>
&gt; =A0is the easiest way to parse URIs?<br>

</div><br>In my opinion =A0the nice thing about using strings is that it do=
es not require<br>
any changes to the Xapian database to store them. I think using URIs may<br=
>
not be best though as they seem to be annoying to parse (as filenames<br>
can contain the same characters) and you seem to need to work around the<br=
>
parser in some cases.<br></blockquote><div><br>I think that&#39;s more the =
fault of the parser than of the URIs. If glib came with a parser, that woul=
d be great. There aren&#39;t a lot of options for pure-C URI parsing. Besid=
es uriparser, there&#39;s also some code in the W3C sample code library, bu=
t it looked like integrating it would be a pain so I let it go.<br>

<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

I wonder if the following would be practical: use // as the field<br>
separator:<br>
<br>
e.g. mbox://filename//start_of_message+length<br>
<br>
I think 2 consecutive slashes // is about the only thing we can assume<br>
is not in the path or filename. Since it is not in the filename I think<br>
parsing should be trivial (thus avoiding the extra library).<br></blockquot=
e><div><br>Can you explain what you mean when you say that two consecutive =
slashes can&#39;t appear in a URL? Ordinary filesystem paths can contain th=
em, and so can file: URLs. (I just looked up file:///home/ethan///////tmp a=
nd Firefox handled that OK.) I&#39;ve sometimes seen  machine-generated fil=
enames with double slashes because that way you don&#39;t have to make sure=
 the incoming filename was correctly terminated before adding another level=
.<br>

=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8e=
x;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Secondly, I would prefer to keep maildirs as just the bare file name: so<br=
>
the existence of // can be the signal that there is some other<br>
scheme. This is asymmetric, but is rather more backwardly compatible.<br></=
blockquote><div><br>Based on your and Jani&#39;s reasoning, I did this. Rev=
ised patch series follows.<br>
<br></div><div>Ethan<br><br></div></div>

--bcaec5040a4ca93e8104c3c6cd04--