From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 344406DE1B1B for ; Sat, 22 Aug 2015 22:42:12 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -2.208 X-Spam-Level: X-Spam-Status: No, score=-2.208 tagged_above=-999 required=5 tests=[AWL=0.643, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.55, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cd5ew9ssLEEv for ; Sat, 22 Aug 2015 22:42:09 -0700 (PDT) Received: from market.scs.stanford.edu (market.scs.stanford.edu [171.66.3.10]) by arlo.cworth.org (Postfix) with ESMTPS id 6C6FD6DE18EF for ; Sat, 22 Aug 2015 22:42:09 -0700 (PDT) Received: from market.scs.stanford.edu (localhost.scs.stanford.edu [127.0.0.1]) by market.scs.stanford.edu (8.14.7/8.14.7) with ESMTP id t7N5g1Cj019244; Sat, 22 Aug 2015 22:42:01 -0700 (PDT) Received: (from dm@localhost) by market.scs.stanford.edu (8.14.7/8.14.7/Submit) id t7N5g0Dj012017; Sat, 22 Aug 2015 22:42:00 -0700 (PDT) X-Authentication-Warning: market.scs.stanford.edu: dm set sender to return-tscnjiupa5jk2z8akbff4tt9se@ta.scs.stanford.edu using -f From: David Mazieres To: Amadeusz =?utf-8?B?xbtvxYJub3dza2k=?= , notmuch@notmuchmail.org Subject: Re: muchsync files renames In-Reply-To: <878u93ujdo.fsf@freja.aidecoe.name> References: <878u93ujdo.fsf@freja.aidecoe.name> Date: Sat, 22 Aug 2015 22:41:59 -0700 Message-ID: <876146o920.fsf@ta.scs.stanford.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Aug 2015 05:42:12 -0000 Amadeusz =C5=BBo=C5=82nowski writes: > Hi, > > I am testing muchsync-2 and it looks to me that files names across > machines are different. Moreover when syncing again after > initialization it seems muchsync is working on something. I have > canceled this and rerun muchsync. notmuch reported lots of files > renames on server. What and why it happens? What muchsync specifically synchronizes for messages in the mapping: (directory, SHA-1-hash, link-count) So if a directory contains two copies of a file on one machine, it will end up with two copies on the other machine. However, the file names themselves are not the same, but rather are created in accordance with the maildir spec. (Note SHA-1 wouldn't be my first choice of hash function, but notmuch already uses this for messages with long message IDs, so I figured I'd just be consistent with existing practice.) In terms of what muchsync is working on, you can run it with "-vvvv" on both sides to get an idea, as in "muchsync -vvvv server -vvvv". Better yet, you can just run it on one side with "muchsync -vvvv". You'll get a lot of output, so maybe run it inside the script command to save the output.maybe run it inside the script command to save the output. If you have enabled maildir.synchronize_flags, it could be that notmuch is initially renaming all of your files, in which case muchsync needs to re-hash them to make sure they haven't changed. How did you cancel muchsync? If you send it a single SIGINT or SIGTERM, it attempts to clean up after itself. However, upon multiple signals or other signals, it immediately exits. Muchsync is conservative about updating the database, to avoid missing tags or files that have been changed. It always updates the notmuch database first, then its own sqlite database with a version number. That means if you kill muchsync, some number of files may get picked up as changed again even though really they were just copied from a peer. To mitigate this problem, the muchsync client syncs the database every 10 seconds, so that in theory you should only get 10 seconds of extra work from killing the client. However, the server does not sync periodically, on the assumption that it is more likely to read an EOF than get killed, although currently it doesn't appear to commit any pending transactions to the sqlite database upon EOF, which may be an oversight. So to summarize: * File names are not the same across machine, only file contents and directory structure. * Give muchsync lots of "-v" options to see what it is doing. * Try to avoid killing muchsync. Doing so is safe, but likely to generate extra work in the form of phantom renames or tag changes that get synchronized even though they don't need to be. * Possibly the server should handle EOF more gracefully and commit any pending transactions, or the client should periodically send a commit command to the server. If you think something is wrong, I can help you figure it out, but I need to know what maildir.synchronize_flags is set to on each replica, what you mean by "canceled", and roughly what was happening when you canceled (uploading or downloading). David