unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* ediff feature request: diffing line by line
@ 2002-03-16 13:54 Karl Eichwalder
  2002-03-16 16:27 ` Carlo Traverso
  2002-03-17 19:20 ` Richard Stallman
  0 siblings, 2 replies; 10+ messages in thread
From: Karl Eichwalder @ 2002-03-16 13:54 UTC (permalink / raw)
  Cc: Carlo Traverso, Michael Kifer

[-- Attachment #1: Type: text/plain, Size: 43 bytes --]

Let me forward this enhancement proposal:


[-- Attachment #2: Type: message/rfc822, Size: 861 bytes --]

From: Carlo Traverso <traverso@dm.unipi.it>
To: "Project Gutenberg volunteer discussion" <gutvol-d@listserv.unc.edu>
Cc: gutvol-d@listserv.unc.edu
Subject: Re: Intelligent diffing (Re: britannica 1911)
Date: Sat, 16 Mar 2002 13:47:22 +0100 (CET)
Message-ID: <LISTMANAGERSQL-1207707-1176632-2002.03.16-07.47.23--ke#gnu.franken.de@listserv.unc.edu>

[...]

ediff is great, (is one of my favorite tools) but is insufficient if
you have more differences in the same line, or differences in several
consecutive lines (and you have to choose differently for each of
them); it is clearly tuned for different versions of code, not on OCR
output. Initially I considered modifying ediff, but the code is too
complex, I hope to have something usable with much less effort.


[-- Attachment #3: Type: text/plain, Size: 531 bytes --]



This basically means a diff is required that compares line by line and
that allows to say "next word" from version A or "rest of line" from
version B is wanted.

Here is Carlo's prototyp in Lisp:

    http://www.dm.unipi.it/~traverso/Ebooks/Lsp/ocrdiff.lsp

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ediff feature request: diffing line by line
  2002-03-16 13:54 ediff feature request: diffing line by line Karl Eichwalder
@ 2002-03-16 16:27 ` Carlo Traverso
  2002-03-16 17:43   ` Michael Kifer
  2002-03-17 19:20 ` Richard Stallman
  1 sibling, 1 reply; 10+ messages in thread
From: Carlo Traverso @ 2002-03-16 16:27 UTC (permalink / raw)
  Cc: emacs-devel, kifer

>>>>> "Karl" == Karl Eichwalder <ke@gnu.franken.de> writes:

    Karl> Let me forward this enhancement proposal:


    Karl> This basically means a diff is required that compares line
    Karl> by line and that allows to say "next word" from version A or
    Karl> "rest of line" from version B is wanted.

Another point concerns whitespace: the "ignore whitespace" mode should
consider equivalent any positive quantity of whitespace (including
line ends), but should not ignore the difference between existing and
non-existing whitespace. Currently ediff ignores spaces between word
and punctuation.

    Karl> Here is Carlo's prototyp in Lisp:

    Karl>     http://www.dm.unipi.it/~traverso/Ebooks/Lsp/ocrdiff.lsp

I have put there a real-life sample of two different OCR with the
corresponding correct output.

Carlo


_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ediff feature request: diffing line by line
  2002-03-16 16:27 ` Carlo Traverso
@ 2002-03-16 17:43   ` Michael Kifer
  2002-03-16 23:04     ` Alex Schroeder
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Kifer @ 2002-03-16 17:43 UTC (permalink / raw)
  Cc: keichwa, emacs-devel

>>>>> "CT" == Carlo Traverso <of Sat, 16 Mar 2002 17:27:23 +0100> writes:

>>>>> "Karl" == Karl Eichwalder <ke@gnu.franken.de> writes:

    Karl> Let me forward this enhancement proposal:


    Karl> This basically means a diff is required that compares line
    Karl> by line and that allows to say "next word" from version A or
    Karl> "rest of line" from version B is wanted.

Ediff is designed to parse the output of diff and then present it in
different ways. What you are proposing is a kind of "incremental diff", if I
understand it correctly. I think it would require a major generalization of
the existing code to do that.

    CT> Another point concerns whitespace: the "ignore whitespace" mode should
    CT> consider equivalent any positive quantity of whitespace (including
    CT> line ends), but should not ignore the difference between existing and
    CT> non-existing whitespace. Currently ediff ignores spaces between word
    CT> and punctuation.

This is because punctuation is considered to be part of the word by
default. Somehow I feel this is more useful in general. However, you can
customize this.


	--michael  

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ediff feature request: diffing line by line
  2002-03-16 17:43   ` Michael Kifer
@ 2002-03-16 23:04     ` Alex Schroeder
  2002-03-17  4:04       ` Karl Eichwalder
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Schroeder @ 2002-03-16 23:04 UTC (permalink / raw)
  Cc: traverso, keichwa, emacs-devel

kifer@cs.sunysb.edu (Michael Kifer) writes:

>     Karl> This basically means a diff is required that compares line
>     Karl> by line and that allows to say "next word" from version A or
>     Karl> "rest of line" from version B is wanted.
>
> Ediff is designed to parse the output of diff and then present it in
> different ways. What you are proposing is a kind of "incremental diff", if I
> understand it correctly. I think it would require a major generalization of
> the existing code to do that.

I'm not sure, as I understand it, all Karl is saying that we would
like to see more commands to act upon the differences -- copy the
replace the first word of chunk A with the first word of chunk B and
show me a new diff output.  The diff output -- the underlying calls to
diff and the processing of the output, and the ediff display need not
be changed.  I assume that is biggest part of the code... (without
looking at it).

Some of what Carlo Traverso wishes for could be implemented by other
people by writing some sort of "filters" -- functions that
get run when the files are loaded and not yet saved as temporary
files.  The diffing, parsing, mangling, displaying, etc. remains as it
is.  But authors of a specialized ocr-ediff could then write a
collection of functions (plus relevant customization variables) which
transform (normalize, canonicalize, whatever) the text of files,
before ediff gets started.  This might be nice to have in ediff itself
(some sort of hook) but is not really required.  Perhaps Carlo can
tell us wether that would be good enough?

Alex.
-- 
http://www.emacswiki.org/

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ediff feature request: diffing line by line
  2002-03-16 23:04     ` Alex Schroeder
@ 2002-03-17  4:04       ` Karl Eichwalder
  2002-03-17 15:40         ` Andreas Schwab
  0 siblings, 1 reply; 10+ messages in thread
From: Karl Eichwalder @ 2002-03-17  4:04 UTC (permalink / raw)
  Cc: Michael Kifer, traverso, emacs-devel

Alex Schroeder <alex@gnu.org> writes:

>> Ediff is designed to parse the output of diff and then present it in
>> different ways.

Concerning presentation there isn't that much to change

>> What you are proposing is a kind of "incremental diff", if I
>> understand it correctly. I think it would require a major
>> generalization of the existing code to do that.

Yes, it is kind of "incremental"; let's consider these variants:

    ->>Everyone in the world is permitted to copy and distribute verbatim copies
            ^^^^^^^^^^^^
    of this license document, but changing it is not allowed.
       ^^^^
    <<-

    ->>Everyone is permitted to copy and distribute verbatim copies
    of the license document, but changing it is not allowed.
       ~~~
    <<-

Ediff sees only 1 difference, and the user may ask for a
`ediff-switch-to-line-mode' option able to make chunks of the hunks:

    ->>Everyone in the world is permitted to copy and distribute verbatim copies
                ^^^^^^^^^^^^
    <<-->>of this license document, but changing it is not allowed.
             ^^^^
    <<-

    ->>Everyone is permitted to copy and distribute verbatim copies
    <<-->>of the license document, but changing it is not allowed.
             ~~~
    <<-

Of course, this option is useful only as long as we have to compare the
same number of similar lines.

> I'm not sure, as I understand it, all Karl is saying that we would
> like to see more commands to act upon the differences -- copy the
> replace the first word of chunk A with the first word of chunk B and
> show me a new diff output.  The diff output -- the underlying calls to
> diff and the processing of the output, and the ediff display need not
> be changed.

Yes, that's the "incremental" aspect of the proposal.  Thanks for
clarifying!

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ediff feature request: diffing line by line
  2002-03-17  4:04       ` Karl Eichwalder
@ 2002-03-17 15:40         ` Andreas Schwab
  2002-03-17 16:26           ` Carlo Traverso
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Schwab @ 2002-03-17 15:40 UTC (permalink / raw)
  Cc: Alex Schroeder, Michael Kifer, traverso, emacs-devel

Karl Eichwalder <ke@gnu.franken.de> writes:

|> Alex Schroeder <alex@gnu.org> writes:
|> 
|> >> Ediff is designed to parse the output of diff and then present it in
|> >> different ways.
|> 
|> Concerning presentation there isn't that much to change
|> 
|> >> What you are proposing is a kind of "incremental diff", if I
|> >> understand it correctly. I think it would require a major
|> >> generalization of the existing code to do that.
|> 
|> Yes, it is kind of "incremental"; let's consider these variants:
|> 
|>     ->>Everyone in the world is permitted to copy and distribute verbatim copies
|>             ^^^^^^^^^^^^
|>     of this license document, but changing it is not allowed.
|>        ^^^^
|>     <<-
|> 
|>     ->>Everyone is permitted to copy and distribute verbatim copies
|>     of the license document, but changing it is not allowed.
|>        ~~~
|>     <<-
|> 
|> Ediff sees only 1 difference, and the user may ask for a
|> `ediff-switch-to-line-mode' option able to make chunks of the hunks:

Emerge has a command to split a difference into two hunks
(emerge-split-difference).  I have used this quite often before I
switched to ediff.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE GmbH, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ediff feature request: diffing line by line
  2002-03-17 15:40         ` Andreas Schwab
@ 2002-03-17 16:26           ` Carlo Traverso
  2002-03-17 18:37             ` Michael Kifer
  0 siblings, 1 reply; 10+ messages in thread
From: Carlo Traverso @ 2002-03-17 16:26 UTC (permalink / raw)
  Cc: keichwa, alex, kifer, emacs-devel



I had missed ediff-regions-wordwise and ediff-windows-wordwise, that
solve a lot of my problems; however these three enhancements would
help:

1 - switching from ediff-buffers to ediff-regions-wordwise: a key
could be defined to select the current ediff regions in both
buffers and enter an ediff-regions-wordwise on them; the same for
ediff-windows-wordwise. This is currently possible, but not with one
key (this should be extremely easy to implement).

2 - the highlighting scheme should be revised, since entering
ediff-regions-wordwise from ediff-buffers removes highlighting from
the current word (i.e. the current region in ediff and the current
word in ediff-regions-wordwise are highlited in the same color...)
ediff-windows-wordwise inside of ediff-buffers is even worse....
(this should be very easy too)

3 - enhancing ediff-regions-wordwise (ediff-windows-wordwise) allowing
to discover and reconcile whitespace "substantial" differences: I
consider "substantial" these differences:

- additional blank lines
- space between words vs no space between words (e.g. "one=1" vs "one = 1"

The amount of whitespace (e.g. "  " vs " ") or the type (space, tab,
newline) is inessential (but two consecutive newlines is not the same
as one newline...)

(I am uncertain about space at the beginning of a line...)

----

Before sending this message (after composing it) I have compiled emacs
21.1.1 (I was using 20.7.1); unfortunately, I have to say that the new
version of ediff-*-wordwise is worse for my purpose:

in this example, in the old version there were two differences:


Que la dont je viens d outremer
               ^^^^^  ^
Que la dont je vieng d'outremer


that have collapsed in one larger difference:


Que la dont je viens d outremer
               ^^^^^^^^^^^^^^^^
Que la dont je vieng d'outremer

and of course here I have to choose one from each version...

Apparently the definition of "wordwise" has changed; maybe, what I
would like to have is an "ediff-*-characterwise" with the possibility
to switch from one "ediff-*-*wise" to the other.

There is a small shift in behaviour as far as point 2 of the above
enhancements is concerned, with a small (but non substantial)
improvement.

Carlo


_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ediff feature request: diffing line by line
  2002-03-17 16:26           ` Carlo Traverso
@ 2002-03-17 18:37             ` Michael Kifer
  2002-03-17 20:41               ` Carlo Traverso
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Kifer @ 2002-03-17 18:37 UTC (permalink / raw)
  Cc: schwab, keichwa, alex, emacs-devel

>>>>> "CT" == Carlo Traverso <of Sun, 17 Mar 2002 17:26:46 +0100> writes:

    CT> I had missed ediff-regions-wordwise and ediff-windows-wordwise, that
    CT> solve a lot of my problems; however these three enhancements would
    CT> help:

    CT> 1 - switching from ediff-buffers to ediff-regions-wordwise: a key
    CT> could be defined to select the current ediff regions in both
    CT> buffers and enter an ediff-regions-wordwise on them; the same for
    CT> ediff-windows-wordwise. This is currently possible, but not with one
    CT> key (this should be extremely easy to implement).

I didn't understand the original problem, but when Alex Schroeder explained
it I also thought about ediff-regions-wordwise. If I understand you and him
correctly, all that is needed is to be able to conveniently invoke this
function on the currently highlighted regions.
In fact this key already exists (=), but it asks you to select a region
instead of taking the currently highlighted diffs. 
I felt that having this key is not very useful, because one can simply run
ediff-regions-* from command line or from the menu, and this won't be any
more difficult. So, I am thinking of repurposing this key to run
ediff-regions-wordwise on the selected diff regions.

    CT> 2 - the highlighting scheme should be revised, since entering
    CT> ediff-regions-wordwise from ediff-buffers removes highlighting from
    CT> the current word (i.e. the current region in ediff and the current
    CT> word in ediff-regions-wordwise are highlited in the same color...)
    CT> ediff-windows-wordwise inside of ediff-buffers is even worse....
    CT> (this should be very easy too)

I don't understand. Are you saying that the highlighting of the current
diff is not removed when you invoke ediff-regions-*? This is a bug, which I
noticed recently.


    CT> 3 - enhancing ediff-regions-wordwise (ediff-windows-wordwise) allowing
    CT> to discover and reconcile whitespace "substantial" differences: I
    CT> consider "substantial" these differences:

    CT> - additional blank lines
    CT> - space between words vs no space between words (e.g. "one=1" vs "one = 1"

    CT> The amount of whitespace (e.g. "  " vs " ") or the type (space, tab,
    CT> newline) is inessential (but two consecutive newlines is not the same
    CT> as one newline...)

What you are saying is that for word-wise operations the meaning of
ediff-word should be different from line-wise operations. This makes sense.
If somebody comes up with a better definition, I can incorporate it.
Ediff is using a simple heuristic to determine what should constitute a word
for the purpose of diffing. Take a look at ediff-forward-word in ediff-diff.el.
I found it to work very well for line-wise diffing, but I don't use
word-wise diffing much and have no opinion about it.
If you can come up with a good (and simple) heuristic for word-wise diffing, I can
incorporate it.


	--michael  

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ediff feature request: diffing line by line
  2002-03-16 13:54 ediff feature request: diffing line by line Karl Eichwalder
  2002-03-16 16:27 ` Carlo Traverso
@ 2002-03-17 19:20 ` Richard Stallman
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Stallman @ 2002-03-17 19:20 UTC (permalink / raw)
  Cc: emacs-devel, traverso, kifer

    This basically means a diff is required that compares line by line and
    that allows to say "next word" from version A or "rest of line" from
    version B is wanted.

For something like this, and in particular for correcting OCR, I think
a completely different program with a different UI is what you want,
more like an extended M-x compare-windows than like an extended ediff.

I will send mail to fetch that URL.

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ediff feature request: diffing line by line
  2002-03-17 18:37             ` Michael Kifer
@ 2002-03-17 20:41               ` Carlo Traverso
  0 siblings, 0 replies; 10+ messages in thread
From: Carlo Traverso @ 2002-03-17 20:41 UTC (permalink / raw)
  Cc: schwab, keichwa, alex, emacs-devel

>>>>> "Michael" == Michael Kifer <kifer@cs.sunysb.edu> writes:

>>>>> "CT" == Carlo Traverso <of Sun, 17 Mar 2002 17:26:46 +0100> writes:

    CT> I had missed ediff-regions-wordwise and
    CT> ediff-windows-wordwise, that solve a lot of my problems;
    CT> however these three enhancements would help:

    CT> 1 - switching from ediff-buffers to ediff-regions-wordwise: a
    CT> key could be defined to select the current ediff regions in
    CT> both buffers and enter an ediff-regions-wordwise on them; the
    CT> same for ediff-windows-wordwise. This is currently possible,
    CT> but not with one key (this should be extremely easy to
    CT> implement).

    Michael> I didn't understand the original problem, but when Alex
    Michael> Schroeder explained it I also thought about
    Michael> ediff-regions-wordwise. If I understand you and him
    Michael> correctly, all that is needed is to be able to
    Michael> conveniently invoke this function on the currently
    Michael> highlighted regions.  In fact this key already exists
    Michael> (=), but it asks you to select a region instead of taking
    Michael> the currently highlighted diffs.  I felt that having this
    Michael> key is not very useful, because one can simply run
    Michael> ediff-regions-* from command line or from the menu, and
    Michael> this won't be any more difficult. So, I am thinking of
    Michael> repurposing this key to run ediff-regions-wordwise on the
    Michael> selected diff regions.

Please, don't. I hate when a key to which I am used changes; there are
other unused keys, e.g. + and -, to run ediff-*-wordwise on the current *.



    CT> 2 - the highlighting scheme should be revised, since entering
    CT> ediff-regions-wordwise from ediff-buffers removes highlighting
    CT> from the current word (i.e. the current region in ediff and
    CT> the current word in ediff-regions-wordwise are highlited in
    CT> the same color...)  ediff-windows-wordwise inside of
    CT> ediff-buffers is even worse....  (this should be very easy
    CT> too)

    Michael> I don't understand. Are you saying that the highlighting
    Michael> of the current diff is not removed when you invoke
    Michael> ediff-regions-*? This is a bug, which I noticed recently.

Yes, non removing the highlighting makes the highlighting of the new
session ineffective.


    CT> 3 - enhancing ediff-regions-wordwise (ediff-windows-wordwise)
    CT> allowing to discover and reconcile whitespace "substantial"
    CT> differences: I consider "substantial" these differences:

    CT> - additional blank lines - space between words vs no space
    CT> between words (e.g. "one=1" vs "one = 1"

    CT> The amount of whitespace (e.g. "  " vs " ") or the type
    CT> (space, tab, newline) is inessential (but two consecutive
    CT> newlines is not the same as one newline...)

    Michael> What you are saying is that for word-wise operations the
    Michael> meaning of ediff-word should be different from line-wise
    Michael> operations. This makes sense.  If somebody comes up with
    Michael> a better definition, I can incorporate it.  Ediff is
    Michael> using a simple heuristic to determine what should
    Michael> constitute a word for the purpose of diffing. Take a look
    Michael> at ediff-forward-word in ediff-diff.el.  I found it to
    Michael> work very well for line-wise diffing, but I don't use
    Michael> word-wise diffing much and have no opinion about it.  If
    Michael> you can come up with a good (and simple) heuristic for
    Michael> word-wise diffing, I can incorporate it.

I'll look at that. 

I have also remarked some strange behaviour in ediff-*-wordwise; in
particular, if you accept one version of all the differences, then you
find a new set of differences; this is mainly due to the handling of
whitespace. I'll prepare a report on what I think is wrong (and maybe
a patch...).

Try for example a file consisting of the line

"one two three"

and one with the line

"one three"

(the result is "one twothree")


Carlo


_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-03-17 20:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-03-16 13:54 ediff feature request: diffing line by line Karl Eichwalder
2002-03-16 16:27 ` Carlo Traverso
2002-03-16 17:43   ` Michael Kifer
2002-03-16 23:04     ` Alex Schroeder
2002-03-17  4:04       ` Karl Eichwalder
2002-03-17 15:40         ` Andreas Schwab
2002-03-17 16:26           ` Carlo Traverso
2002-03-17 18:37             ` Michael Kifer
2002-03-17 20:41               ` Carlo Traverso
2002-03-17 19:20 ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).