From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Eric S. Raymond" Newsgroups: gmane.emacs.devel Subject: Re: resolving ambiguity in action stamps Date: Sun, 14 Sep 2014 13:12:11 -0400 Organization: Eric Conspiracy Secret Labs Message-ID: <20140914171211.GA2521@thyrsus.com> References: <87wq9841zx.fsf@uwakimon.sk.tsukuba.ac.jp> <20140913053525.GA15582@thyrsus.com> <87tx4c3t4k.fsf@uwakimon.sk.tsukuba.ac.jp> <20140913.092630.2301242291023129455.hanche@math.ntnu.no> <20140913105058.GA16776@thyrsus.com> <87r3ze4pw6.fsf@uwakimon.sk.tsukuba.ac.jp> <20140914105531.GA30576@thyrsus.com> <87egve49d9.fsf@uwakimon.sk.tsukuba.ac.jp> <20140914142117.GA935@thyrsus.com> <878ulm434v.fsf@uwakimon.sk.tsukuba.ac.jp> Reply-To: esr@thyrsus.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1410714786 14667 80.91.229.3 (14 Sep 2014 17:13:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 14 Sep 2014 17:13:06 +0000 (UTC) Cc: Harald Hanche-Olsen , emacs-devel@gnu.org To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 14 19:13:03 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XTDME-0005xz-6Z for ged-emacs-devel@m.gmane.org; Sun, 14 Sep 2014 19:13:02 +0200 Original-Received: from localhost ([::1]:55434 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XTDMD-00026x-OU for ged-emacs-devel@m.gmane.org; Sun, 14 Sep 2014 13:13:01 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38053) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XTDLy-00026a-IP for emacs-devel@gnu.org; Sun, 14 Sep 2014 13:12:50 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XTDLu-0002Te-FI for emacs-devel@gnu.org; Sun, 14 Sep 2014 13:12:46 -0400 Original-Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:54556 helo=snark.thyrsus.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XTDLu-0002TZ-AH for emacs-devel@gnu.org; Sun, 14 Sep 2014 13:12:42 -0400 Original-Received: by snark.thyrsus.com (Postfix, from userid 1000) id CD76838071E; Sun, 14 Sep 2014 13:12:11 -0400 (EDT) Content-Disposition: inline In-Reply-To: <878ulm434v.fsf@uwakimon.sk.tsukuba.ac.jp> X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 71.162.243.5 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:174294 Archived-At: Stephen J. Turnbull : > Eric S. Raymond writes: > > > It is still a technical fact that no git translation containing SHA1s > > can be built without passing through a VCS-independent representation > > of commit refs on the way. > > Fact!? I would use the bzr revid, and insert the revid, SHA1 pair > after I commit each new revision in git on Pass 1. What am I missing? For one thing, variant forms of commit reference. Somewhere in there we'll need the equivalent of a canonicalization pass for the references. If you go the database-of-pairs route, what you're actually doing is temporarily creating a VCS-independendent reference ID that mimics a bzr reference number. A subtle point, I know - but in principle there's no actual win in the database-of-pairs that you wouldn't also get from unique inline reference cookies generated in an intermediate pass. In practise, the way my toolkit works, I basically have to have something like a revision-stamp inline in intermediate versions (that is, the database-of-pairs approach is out) even if it's massaged into a SHA1 in the final version. This is because my tools are an ecology of import-stream processors built on the assumption that the stream captures all relevant metadata. Your instinct may be to come back that this approach is too limiting, but there are very good reasons for it (beginning with the cross-VCS portability of the stream files) and 22KLOC of algorithmically dense tool code built around those reasons. If you want a high-quality conversion in reasonable time rather than an open-ended R&D project, your odds of doing better are effectively nil. > Speaking of databases: since AFAIK you're basically done creating git > blobs and trees (ie, except for new commits to the public repo), I > assume you are using a pre-primed object db when you run your > conversion? If not, you should get a 20% speed up or so. You might > be able to get a lot more speed up if you could just work with bzr log > and git filter-branch. (That's a pretty crazy idea and quite possibly > not at all worth the work even if possible. But let me throw it out > there....) It's not crazy, but it is too much work. I'd effectively have to throw away the rest of my tools. > > > Actually, I disagree. It would be a really good thing if they > > > are precise. Do you really want to put anybody through the > > > trouble of translating randomized format cookies, which may point > > > to any of several commits, again? Then revising their scripts > > > every time a new variant shows up? > > > > It has yet to be demonstrated that this is a problem in a real use > > case. And, actually, I already checked this; the Emacs history > > doesn't have any version-stamp collisions in actually referenced > > revisions. > > That's not what I'm talking about. I'm talking about > 2014/09/15!esr@thyrsus.com vs. 2014-09-15/esr@thyrsus.com vs. > 9/15/2014!esr vs. .... People *will* handwrite those references, > precisely because they're more or less human-readable. Engineering is tradeoffs. Readability (which is a good thing) comes with this price. > > > Existence proof comes before characterization, please. > > Ie, I suppose you don't get any collisions in referenced revisions. > But we know that there could be. Maybe "almost correct" is good > enough for you, but I think Emacs deserves better from its VCS. Worse > is not better when best already exists. Engineering is tradeoffs. "Best" by what metric? Readability and portability are not trivial features. One significant disadvantage of building in SHA1s that I haven't mentioned yet is that they make references brittle. Editing metadata invalidates all hashes downstream of it invalid. Yes, this is a real problem which I have experienced before in big messy conversions like this one! So, we put up a brand shiny new repo - and a few days (or weeks, or months) later someone spots a conversion bug that has to be fixed. It might be easy for you to say "oh, we just regenerate all the commit references, then". Actually doing that is a nasty, picky job even with best-in-class tools like mine, especially on a repo this size. I'm not sure anyone on this list but me properly groks the complexity scale of this conversion wgen they talk so casually about changing how it's done. To get some idea, fetch https://gitorious.org/emacs-transition/emacs-transition/raw/ca127b4e1a70cd17f2979b330b3f9dcedaf5bbd8:emacs.lift and skim all 1018 lines of it - which doesn't count 2.5 Klines of program- generated stuff included. When I said this was the biggest, nastiest conversion I've ever done, I wan't kidding. Nothing else has even come close. -- Eric S. Raymond