From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: 23.0.50; Diff refinement Date: Thu, 08 Nov 2007 10:30:53 -0500 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1194535873 16977 80.91.229.12 (8 Nov 2007 15:31:13 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 8 Nov 2007 15:31:13 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 08 16:31:17 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Iq9Ld-00055S-8v for ged-emacs-devel@m.gmane.org; Thu, 08 Nov 2007 16:31:13 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Iq9LR-0004R2-Mz for ged-emacs-devel@m.gmane.org; Thu, 08 Nov 2007 10:31:01 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Iq9LO-0004Q3-0q for emacs-devel@gnu.org; Thu, 08 Nov 2007 10:30:58 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Iq9LL-0004Nn-Ck for emacs-devel@gnu.org; Thu, 08 Nov 2007 10:30:56 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Iq9LL-0004Nk-6O for emacs-devel@gnu.org; Thu, 08 Nov 2007 10:30:55 -0500 Original-Received: from tomts25.bellnexxia.net ([209.226.175.188] helo=tomts25-srv.bellnexxia.net) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Iq9LK-0004DM-Px for emacs-devel@gnu.org; Thu, 08 Nov 2007 10:30:55 -0500 Original-Received: from pastel.home ([70.55.146.252]) by tomts25-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20071108153053.EMAE19497.tomts25-srv.bellnexxia.net@pastel.home> for ; Thu, 8 Nov 2007 10:30:53 -0500 Original-Received: by pastel.home (Postfix, from userid 20848) id 751D57FC8; Thu, 8 Nov 2007 10:30:53 -0500 (EST) In-Reply-To: ("Johan =?iso-8859-1?Q?Bockg=E5rd=22's?= message of "Wed\, 07 Nov 2007 19\:35\:43 +0100") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/23.0.50 (gnu/linux) X-detected-kernel: by monty-python.gnu.org: Solaris 8 (1) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:82806 Archived-At: >>> C-c C-b in the diff buffer produces this refinement: >> >>> (foo abc (bar abc x)) >>> ^^^^^^ ^^^^ ^^^^^^ >> >> :-( >> That sucks! >> You can fix it by setting smerge-refine-weight-hack to nil. > So it this just a bad side-effect, rather than a bug? It's a bug alright, but it's not a simple coding bug: it's that the assumptions made by smerge-refine-weight-hack about how `diff' works aren't true. And I can't think of any way to fix it right now. The worst part is that this assumption seems to be correct most of the time. > It also goes away when using > (setq smerge-refine-ignore-whitespace nil) But it's not clear that it wouldn't come back on a different example. > or > -d --minimal Try hard to find a smaller set of changes. > (In the latter case, it finds the second "abc" in the line.) Hmm... that's interesting. This might have a good chance to ensure the hypothesis is correct. For those who want to know: The refined highlighting works by cutting a region into words. So (foo bar) is cut into ( foo bar ) and then the other side is cut similarly and the result is passed diff. smerge-refine-weight-hack changes the way the region is cut into words. What it does is that the resulting file that is passed to diff has the property that it has as many lines as the region has characters, i.e. for the above example it cuts the region into: ( foo foo foo bar bar bar ) this has 2 advantages: 1- it's easy to take diff's output (which has line numbers) and map it back into char-positions in the original region. 2- if diff tries to minimize the number of lines changed (which it appears is what it does) rather than the number of bytes changed, then the simple "cut into words" tends to give too much weight to spaces and punctuation. smerge-refine-weight-hack counter-acts this. The main assumption made by smerge-refine-weight-hack is that if one of the three lines of "foo" in the above example appears in a change, then the other two will appear there as well. This makes sense: if it's in a change, that means the other file didn't have "foo" there but something else, so the other "foo" will also fail to match the other file. But in your example, we pass the following to diff: .. .. abc abc abc .. .. abc abc abc .. .. and instead of diff saying that the following is added: abc abc abc .. .. it says that the following is added one line further: abc .. .. abc abc which is still correct because the place where this change is added looks like abc abc abc so inserting the first thing before the first line or inserting the second thing after the second line both result in the same output. The thing is: both outputs are equally valid and equally small, so diff can't know that one is preferable. And really there are 4 possible equally valid and equally good outputs. But it's likely that "--minimal" will make the search return either the "first" or the "last" one of those 4 (both of those are fine for us, only the middle 2 are problematic). Stefan