From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Goals for repo conversion day Date: Sun, 26 Jan 2014 19:32:52 +0200 Message-ID: <83zjmiabsr.fsf@gnu.org> References: <20140124170751.GA23376@thyrsus.com> <87mwils3b3.fsf@igel.home> <20140124185429.GA25191@thyrsus.com> <83k3dpcbpe.fsf@gnu.org> <20140125062551.GA2554@thyrsus.com> <83bnz0cxp8.fsf@gnu.org> <20140125140637.GA5631@thyrsus.com> <83vbx8azss.fsf@gnu.org> <20140125160124.GA8171@thyrsus.com> <83ppngasor.fsf@gnu.org> <20140125210132.GB13305@thyrsus.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1390757642 32717 80.91.229.3 (26 Jan 2014 17:34:02 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 26 Jan 2014 17:34:02 +0000 (UTC) Cc: schwab@linux-m68k.org, emacs-devel@gnu.org To: esr@thyrsus.com Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jan 26 18:34:09 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1W7Tay-0000GN-NI for ged-emacs-devel@m.gmane.org; Sun, 26 Jan 2014 18:34:08 +0100 Original-Received: from localhost ([::1]:55623 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W7Tay-00031V-3x for ged-emacs-devel@m.gmane.org; Sun, 26 Jan 2014 12:34:08 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58789) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W7Taq-0002yg-VL for emacs-devel@gnu.org; Sun, 26 Jan 2014 12:34:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W7Tak-0001ZX-Tv for emacs-devel@gnu.org; Sun, 26 Jan 2014 12:34:00 -0500 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:53414) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W7Tak-0001ZQ-En for emacs-devel@gnu.org; Sun, 26 Jan 2014 12:33:54 -0500 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0N0000F00RA6XE00@a-mtaout20.012.net.il> for emacs-devel@gnu.org; Sun, 26 Jan 2014 19:33:03 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0N0000F1IRF2R650@a-mtaout20.012.net.il>; Sun, 26 Jan 2014 19:33:03 +0200 (IST) In-reply-to: <20140125210132.GB13305@thyrsus.com> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.166 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:169131 Archived-At: > Date: Sat, 25 Jan 2014 16:01:32 -0500 > From: "Eric S. Raymond" > Cc: schwab@linux-m68k.org, emacs-devel@gnu.org > > But as the size and complexity of the repo goes up, so does the value > of in-band references actually working. Emacs is an exceptionally *bad* > case for relying solely on an external reference map, not an exceptionally > good one. There's no argument about the higher value of having all the references resolved. What I fear of is the inordinate amount of work that might require, for too little benefit, and the unintended consequences of a too-deep surgery on the history that will be needed. Already the effort to get the list of references right devoured many messages (and I'm sure each message caused a non-trivial amount of work), and we are still not there (see below). In particular, it worries me that you seem to be unable to extract the full list of bzr revno references, after so many attempts. Why this doesn't worry you, and why you still refuse to accept that maybe, just maybe, this is a lot of effort for a relatively small gain, is beyond me. If this is in any way indicative of the other problematic issues of the conversion, then "Houston, we have a problem", indeed. > > I'd appreciate if you posted the final list of the references, when > > you are finished with it, so we could have some QA. > > Here is the current list. It is not final because I expect to resolve > at least a few more of these, and it is still possible more fossil > references could turn up in odd places. > ChangeLog: > [...] I found that at least these ones are missing: lisp/ChangeLog.15 references 103083 lisp/ChangeLog.16 references 103471 and 107149 src/ChangeLog.12 references 104015 and 103913 > Change comments: > [...] This list of 40 references in the commit messages to bzr revisions is definitely incomplete. It misses many references (I counted more than 300 overall, including those you show). Here are just a few that you missed, and only from the trunk branch: r116131 references 116113 r116056 references 116055 r115997 references 115992 r115964 references 115961 r115920 references 115918 r115859 references 115838 r115029 references 112851 r114978 references 114965 r114798 references 114795 r112011 references 112010 r106733.1.27 references 111919 r110764.1.510 references 111040 r110764.1.338 references 111367 and 111368 r110879 references 110857 (from emacs-24 branch) r110306 references 110305 r99375 references 99362 It sounds like the scripts or methods you are using to find such references are not catching some of them. E.g., bare numbers, without any leading "r" or "revno:" etc. are mostly (or maybe completely) missing. Given this quality, I once again question the need for all this work. If we cannot guarantee coverage very close to 100%, what would be the value of such a partial conversion? More importantly, do we have reasonably effective methods of QA for the results? The omissions I discovered are based on simple bzr commands followed by manual inspection (to avoid quite a few false positives); unless we can come up with better ways that don't involve manual labor, the overall quality will not be high enough, as manual labor is inherently error prone. Btw, what about references to repositories of other projects? Here's one example (from trunk): revno: 110764.1.388 committer: Bastien Guerry branch nick: emacs-24 timestamp: Tue 2013-01-08 19:49:37 +0100 message: Merge Org up to commit 4cac75153. Some ChangeLog formatting fixes. Are we going to replace the git sha1 here by something more universal? If so, there's much more work around the corner; if not, why does it make sense to insist on doing that for Emacs's own branches? > Some of the remaining CVS references cannot be reseolved within the Emacs > history; they actually point to other projects. One particularly fertile > source of these, which I think accounts for this group > > 1.85 > 1.878 > 1.113 > 1.244 > 1.34 > 1.233 > > in ChangeLogs, is the CVS history of the erc files before they were merged > into Emacs. See above: this is just the tip of the iceberg. I think you will find much more of such references, with Org, CEDET, MH-E, and Gnus being the most frequent ones. Doesn't leaving those out of this conversion undermine the goal? > > The problem is not the size of the repository alone. The problem is > > that different portions of a single changeset were committed many > > revisions apart. And I don't even understand (and you didn't explain) > > how will you handle the situation I described above, where a single > > commit checked in ChangeLog changes for several unrelated commits in > > the same directory. Which commit clique will you assign the ChangeLog > > commit to? The devil is in the details, but you haven't provided any > > details about your plans in this matter. Would you please do that? > > I see we are using the term "changeset" slightly differently, and this has > produced some confusion. > > The uncoalesced changesets I am looking for are not defined by "all > share the same ChangeLog entry" (though usually that is the case). > You are quite right that attempting to coalesce all of those would > produce perverse results in cases of several unrelated commits. > > Fortunately, most of the unresolved cliques are not like this. The > usual case, in this conversion as in others I've seen (such as groff) > is that an unresolved clique consists of one or several closely > related changes and one ChangeLog modification, without intervening > commits by others. This is what I think of as a changeset. I thought a "changeset" was well defined in the context of a VCS. My definition is a set of changes made as part of working on a single isolated issue. IOW, what would have constituted a single indivisible commit with our current procedures. Your definition sounds subtly different, and you didn't define "closely related changes", so it's hard to judge its exact meaning. As for "one ChangeLog modification" and "without intervening commits", see below. > Normally tools such as parsecvs collect these into single changesets. > But these converters have a maximum coalescence window. If such a span > of commits took place over a longer period of time than the window, it > won't be coalesced. >From a cursory look I had at the current git mirror, no coalescing was done there. But perhaps I'm missing something; Andreas, can you please comment on this? > When there is CVS in the history, a standard part of my cleanup is > basically to run a coalescence pass with a very long window. > Semi-automating this operation, so it (a) doesn't have to be done > manually, but (b) is easily checked by skilled human judgment, was > one of the purposes for which I originally wrote reposurgeon. > > Fortunately the bad cases aren't actually very common. Can we take a real-life use case, please? Please show the cliques produced by your analysis in this range of bzr revisions on the trunk: 39997..40058. You can see the details with these bzr commands: . This will show a 1-line summary for every revision in the range: bzr log --line -r39997..40058 . This will show the full commit messages and other meta-data of a single revision, 40000 in the example (can also be used with a range -rNNN..MMM): bzr log --long --show-ids -c40000 . This will show the files modified/added/deleted by a single revision (can also be used with a range -rNNN..MMM): bzr status -c40000 The above range of revisions shows a typical routine of commits when Emacs was using CVS; in particular, "*** empty log message ***" are most probably ChangeLog commits which usually followed commits of the files whose log entries are in the ChangeLog change. Note that the commit messages are almost always different (they are actually the ChangeLog entries for the files being committed), although the changes belong to the same changeset. Also note how commits by different people working on separate changesets sometimes overlap, as in revisions 40033..40038. How will these be handled during your proposed conversion? And what will be the commit messages of the coalesced commits? > > > > > 5. Unconverted .bzrignores (and possibly .cvsignores) in the history. > > > > > > > > Why is that a problem? > > > > > > See "seamless history browsing". > > > > Sorry, I don't understand. Please elaborate: what is the relation > > between these ignore files and history browsing? > > In a properly done conversion, file ignores don't abruptly stop working > bevcause you browsed back past the point of conversion and what should > be .gitignore files are nmow .bzrignores or .cvsignores. So you will be adding .gitignore to revisions where there was none? If not, how do you plan on attacking this issue? > > > The way this is working is that I am building a reposurgeon script that > > > expresses a sequence of edits to Andreas's mirror. On conversion day > > > we will apply that script once, after which everyone can re-clone and > > > go on as before. > > > > Sorry, I don't see how this changes anything. You are still going to > > make deep changes to the existing mirror. > > Yes, for arguable values of "deep". As Paul Eggert (I think) said, I'm > after a result that is stainless steel rather than earthenware. With > ugly cracks in it. I have my doubts about the "stainless steel" part, sorry. Unfortunately, nothing you've said so far contributes to my confidence in the outcome. And if the outcome will more like "earthenware" than "stainless steel", then we might as well continue using what we have now in the existing mirror. > > > > Noble goals all of them, but I'm skeptical as to whether they can be > > > > achieved in practice. What's worse, we won't know whether some issues > > > > remained until much later. > > > > > > I know they can be achieved in practice because I have achieved them before, > > > many times. Most recently in the conversion of the groff history, but > > > you could check with the maintainers of NUT or Hercules or robotfindskitten > > > or Roundup as well. Or the Blender Foundation - blender is a big reposurgeon > > > conversion done by someone else. > > > > Sorry, been there done that. The CVS to bzr conversion also seemed > > flawless until much later. > > There are several differences this time. One of the most important is that > the state of the art has advanced. My tools do things that would have been > impossible or impractical before they existed. I have auditing capabilities > you would probably have to work a bit to even imagine. The important question here is "is your best good enough?" I have absolutely no idea what is the answer to that question, and frankly, your way of promoting your tools and techniques doesn't help at all. Neither do the apparent deficiencies in identifying revision references shown above. If you really want to build confidence in your methods and tools, some kind of statistics about the conversion jobs done using them, and the time passed since the conversion would probably be a good start. (Yes, time since conversion is important because the problems are usually subtle and don't stick out until much later.) Detailed description of the planned steps during the conversion and how you intend to control the quality of each step, will also be appreciated. > As a relatively trivial example - if Stefan or some other person with > policy authority makes the call, I could reliably split elpa out into > its own repo with one short command in the reposurgeon DSL. This is great, but doesn't really address the worrisome aspects of the conversion we care about. We no longer care about the elpa branch in the bzr repository. We do care about the few other branches, such as emacs-24. And it is not even clear what will become of those after the conversion; the reposurgeon man page cites a limitation related to that, allegedly stemming from some (imaginary) bzr confusion between branches and repositories, but ends up saying nothing about the branches after the conversion. Will they end up in a single git repository, like any other git branches, or won't they? Will the merges between those branches show up as expected in git DAG? How will merges from external branches (such as Org or MH-E) or from local feature branches be represented? Those are much more important issues than the ability to split elpa. > > > If we find any problems afterwards, I have the tools to fix them. Part of > > > my commitment is to do that. > > > > I don't think any of us can in good faith give such promises. > > The span of my contributions to Emacs is measures in decades. I do not > think you need to fear that I will vanish before this job is done. I was talking about the "problems afterwards" part. I don't question your intentions, but life is not an entirely predictable endeavor. Perhaps you have a way to tell the future, but in that case, I may wish to hire you to help me with my stock investments.