From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id A45C1431FBC for ; Fri, 8 Jan 2010 21:51:26 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DbYt+p2Rk+Lz for ; Fri, 8 Jan 2010 21:51:26 -0800 (PST) Received: from vuizook.err.no (vuizook.err.no [85.19.221.46]) by olra.theworths.org (Postfix) with ESMTP id A0F88431FAE for ; Fri, 8 Jan 2010 21:51:25 -0800 (PST) Received: from cha92-13-88-165-248-19.fbx.proxad.net ([88.165.248.19] helo=jigen) by vuizook.err.no with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1NTUEL-0001y1-94 for notmuch@notmuchmail.org; Sat, 09 Jan 2010 06:51:24 +0100 Received: from mh by jigen with local (Exim 4.71) (envelope-from ) id 1NTUEK-0000x0-7b for notmuch@notmuchmail.org; Sat, 09 Jan 2010 06:51:20 +0100 Date: Sat, 9 Jan 2010 06:51:20 +0100 From: Mike Hommey To: notmuch@notmuchmail.org Message-ID: <20100109055120.GA3109@glandium.org> References: <874oo7hex2.fsf@yoom.home.cworth.org> <87y6lewqtw.fsf@convex-new.cs.unb.ca> <87638i75sz.fsf@home.veldthuis.com> <1260227209-sup-184@riseup.net> <874oo22blf.fsf@yoom.home.cworth.org> <20100108025620.GB28357@lapse.rw.madduck.net> <20100108080636.GA26839@glandium.org> <20100108090317.GB735@lapse.rw.madduck.net> <20100108092019.GA6671@glandium.org> <20100108102631.GB11257@lapse.rw.madduck.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100108102631.GB11257@lapse.rw.madduck.net> X-GPG-Fingerprint: A479 A824 265C B2A5 FC54 8D1E DE4B DA2C 54FD 2A58 User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Re: Quick thoughts on a notmuch daemon X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Jan 2010 05:51:26 -0000 On Fri, Jan 08, 2010 at 11:26:31PM +1300, martin f krafft wrote: > also sprach Mike Hommey [2010.01.08.2220 +1300]: > > FYI, I have a good experience writing fuse filesystems, both with > > high-level and low-level APIs. I'd avise to use the low-level API, > > which allows for better performance. > > I don't have any experience with FUSE yet, but the examples in > /usr/share/doc/libfuse-dev/examples/ look trivial. This is where > I would start, one function at a time. If you have a better > suggestion, I'd love to hear it; or to clone your repo! ;) As I said above, there are 2 sets of APIs in FUSE. The high-level API sends the full path for the file being accessed for every system call. And except for specific cases such as read(), write() or readdir() you have nothing else to identify the file you are referring to, which means you have to parse the path, and find the proper file accordingly. In notmuch case, that would mean doing a search for most system calls. Try to imagine how many syscalls that are not read(), write() or readdir() mutt does when opening a Maildir. The low-level API, otoh, uses inode numbers extensively (again, except for read, write and readdir). The lookup call is responsible for resolving the paths, given an inode and a name. Its results are cached by the kernel. So, for example reading foo/bar from your fuse mount point will lookup foo in the inode 1 (FUSE_ROOT_ID) and then do another lookup for bar in the first result. One of the problems with this API is that the inode number type is unsigned long, which means you can't necessarily map real inode numbers, which can be 64 bits. And even if it could, afaik, there is no quick way to get a file from its inode, sadly. All in all, in the high-level API case, that means we would need lookups caching badly, and in the low-level API case, some fast way to map on one hand virtual directories with inodes numbers, and on the other hand, real files with inode numbers. Some quick thoughts, about the whole thing: - We will need to be careful about deduplication: if you copy a file from one directory to another, you don't want to have the copy in the underlying Maildir. But as you won't know until the file is totally written and closed... - We should probably allow extra files to be stored in the virtual Maildir (for example, courierimap stores stuff in a Maildir) - We may not need a client program at all, the "search directories" configuration could be handled via extended file attributes. I also had another not quite unrelated idea a while ago, that could have its value here: a generic data store, very much like the git object database (an idea would be to have the git object datastore be a special case of this generic data store, for possibly interesting compatibility), which would allow for better storage of the messages: if the maildir is exposed via fuse, why would you need a raw maildir for ? It would also allow easier deduplication of messages that are different but not quite: - Mailing list replies you get both directly and from the mailing list software, their headers have differences, but the files are mostly equivalent - Mail quotes are found in both the original message and its response. Mike