[Request for Mentor] subst-char-in-region

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* [Request for Mentor] subst-char-in-region
@ 2014-12-12 11:43 Phillip Lord
  2014-12-12 14:35 ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Phillip Lord @ 2014-12-12 11:43 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1553 bytes --]

I believe that subst-char-in-region is signally incorrectly to before
and after-change-functions. This is causing me difficulties with a
package I am writing (lentic, formerly linked-buffer), as
subst-char-in-region is called by fill-paragraph which breaks lentic.

In a clean emacs, launched with the file open.el (it and noisy-change.el
are attached), I create a single buffer with contents

ooo
bbb

Changing ooo to xxx reports the following from Emacs

before(1,8)
after(1,4,3)

But this does not really make sense to me because the start and stop
reported by the before-change-function is much wider than that reported
by the after-change-functions. The reason for this is that the
before-change-function gets from the first change to the end of the
entire region here (in editfns.c)

modify_text (pos, XINT (end));

while after-change gets from the start to the last changed character here.

signal_after_change (changed,
		   last_changed - changed, last_changed - changed);

I have created a (bad) fix for this, currently on branch...

origin/fix/subst-char-in-region

(hopefully adding a new branch is okay for this, obviously it can be
deleted later). After adding this fix I get...

before(1,8)
after(1,8,8)

This seems consistent and correct to me. My fix is not good though since
it is overly conservative. It's probably bad for other reasons also
since I don't know C (willing to learn though).

Assuming others agree with me that this behaviour is wrong, I would like
to fix the problem, but would welcome some help and code review.

[-- Attachment #2: open.el --]
[-- Type: application/emacs-lisp, Size: 379 bytes --]

[-- Attachment #3: noisy-change.el --]
[-- Type: application/emacs-lisp, Size: 702 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Request for Mentor] subst-char-in-region
  2014-12-12 11:43 [Request for Mentor] subst-char-in-region Phillip Lord
@ 2014-12-12 14:35 ` Stefan Monnier
  2014-12-12 15:05   ` Phillip Lord
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2014-12-12 14:35 UTC (permalink / raw)
  To: Phillip Lord; +Cc: emacs-devel

> ooo
> bbb

> Changing ooo to xxx reports the following from Emacs

> before(1,8)
> after(1,4,3)

That looks correct to me.

> origin/fix/subst-char-in-region
> (hopefully adding a new branch is okay for this, obviously it can be
> deleted later). After adding this fix I get...

Please use "scratch/<foo>" rather than "fix/<foo>".

> before(1,8)
> after(1,8,8)

This wouldn't be wrong, but would be less precise.

> Assuming others agree with me that this behaviour is wrong, I would like
> to fix the problem, but would welcome some help and code review.

Assuming a close&direct correct correspondence between
before-change-functions and after-change-functions is just a bad idea.
Does you code *really* need that?


        Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Request for Mentor] subst-char-in-region
  2014-12-12 14:35 ` Stefan Monnier
@ 2014-12-12 15:05   ` Phillip Lord
  2014-12-12 16:17     ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Phillip Lord @ 2014-12-12 15:05 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> ooo
>> bbb
>
>> Changing ooo to xxx reports the following from Emacs
>
>> before(1,8)
>> after(1,4,3)
>
> That looks correct to me.

Why? The doc says "the positions of the beginning and end of the old
text to be changed" for before-change-function. But the text from 4 to 8
is not changed. As indeed the after-change-functions value says.

If the doc means "the position of the beginning and end of the old text
which is potentially open for change but may not actually change", then
that would have been a different issue.

Given that the change in this case is a substution why is it not:

before(1,4)
after (1,4,3)

This could be calculated, of course, by subst-char-in-region, although
it would potentially require scanning the region twice (once to find
start and stop, once to actually make changes).

>> origin/fix/subst-char-in-region
>> (hopefully adding a new branch is okay for this, obviously it can be
>> deleted later). After adding this fix I get...
>
> Please use "scratch/<foo>" rather than "fix/<foo>".

Will do.

>
>> before(1,8)
>> after(1,8,8)
>
> This wouldn't be wrong, but would be less precise.

Yep, I agree.

>
>> Assuming others agree with me that this behaviour is wrong, I would like
>> to fix the problem, but would welcome some help and code review.
>
> Assuming a close&direct correct correspondence between
> before-change-functions and after-change-functions is just a bad idea.
> Does you code *really* need that?

At the moment, yes, it does. I am keeping two buffers in sync by
transfering changes between them. It does this by removing the text in
"that buffer" between the before change values and replacing it with the
text in "this buffer" (it's slightly more complex than this, but that's
the basic idea).

In general, I have found that this works. For instance, replacing
subst-char-in-region with a more traditional search and replace
operation would work fine.

Whether it could be done in another way, not assuming that this
correspondence, I just do not know. At the moment, I do not see an
obvious way. Even if I did see an obvious way, I might be wrong. I've
already found quite a few edge cases which I didn't think about when I
started.

Phil

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Request for Mentor] subst-char-in-region
  2014-12-12 15:05   ` Phillip Lord
@ 2014-12-12 16:17     ` Stefan Monnier
  2014-12-15 12:15       ` Phillip Lord
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2014-12-12 16:17 UTC (permalink / raw)
  To: Phillip Lord; +Cc: emacs-devel

>>> before(1,8)
>>> after(1,4,3)
>> That looks correct to me.
> Why? The doc says "the positions of the beginning and end of the old
> text to be changed" for before-change-function. But the text from 4 to 8
> is not changed. As indeed the after-change-functions value says.

Similarly if your subst-char-in-region changes "oaaao" to "xaaax" the
aaa part isn't changed, so you could argue that we should call b-c-f and
a-c-f twice (once per "o->x" change).  But instead we call them on
a "superset" of the actually changed text.  A tighter superset is
preferable, all other things being equal, but making a single call
rather than many "smaller" calls also is preferable.

> Given that the change in this case is a substution why is it not:
> before(1,4)
> after (1,4,3)
> This could be calculated, of course, by subst-char-in-region, although
> it would potentially require scanning the region twice (once to find
> start and stop, once to actually make changes).

Exactly, it doesn't seem worth scanning the region twice just to give
a slightly tighter bound to the b-c-f.

> At the moment, yes, it does.  I am keeping two buffers in sync by
> transfering changes between them. It does this by removing the text in
> "that buffer" between the before change values and replacing it with the
> text in "this buffer" (it's slightly more complex than this, but that's
> the basic idea).

Why do it this way?  Why not rely exclusively on the a-c-f?

Try your code on a diff-mode buffer using commands such as
diff-unified->context or diff-context->unified (hint, they use
combine-after-change-calls).

> In general, I have found that this works.

I think you mean "usually" rather than "in general".

        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Request for Mentor] subst-char-in-region
  2014-12-12 16:17     ` Stefan Monnier
@ 2014-12-15 12:15       ` Phillip Lord
  2014-12-15 14:29         ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Phillip Lord @ 2014-12-15 12:15 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>>> before(1,8)
>>>> after(1,4,3)
>>> That looks correct to me.
>> Why? The doc says "the positions of the beginning and end of the old
>> text to be changed" for before-change-function. But the text from 4 to 8
>> is not changed. As indeed the after-change-functions value says.
>
> Similarly if your subst-char-in-region changes "oaaao" to "xaaax" the
> aaa part isn't changed, so you could argue that we should call b-c-f and
> a-c-f twice (once per "o->x" change).  But instead we call them on
> a "superset" of the actually changed text.  A tighter superset is
> preferable, all other things being equal, but making a single call
> rather than many "smaller" calls also is preferable.

Sure, I understand that, and aggregating calls is fine. It's just that
the b-c-f call is something that is wrong or at least inconsistent with
the a-c-f. Wrong for me isn't the problem, actually, inconsistent is.

>> Given that the change in this case is a substution why is it not:
>> before(1,4)
>> after (1,4,3)
>> This could be calculated, of course, by subst-char-in-region, although
>> it would potentially require scanning the region twice (once to find
>> start and stop, once to actually make changes).
>
> Exactly, it doesn't seem worth scanning the region twice just to give
> a slightly tighter bound to the b-c-f.

It is possible to do with a single scan, actually. Scan from the start
to the first occurance. Scan from the end to the last occurance. Signal
b-c-f. Then scan between the first and last, making any changes, and
signal a-c-f. Add in a few if statements for edge cases and you're done.

The code is more complex, but running time is the same.

>> At the moment, yes, it does.  I am keeping two buffers in sync by
>> transfering changes between them. It does this by removing the text in
>> "that buffer" between the before change values and replacing it with the
>> text in "this buffer" (it's slightly more complex than this, but that's
>> the basic idea).
>
> Why do it this way?  Why not rely exclusively on the a-c-f?

Initially, because I wasn't sure that the a-c-f gave me all the
information that I needed. I thought it might do, but I was confused by
the third argument which is "length in bytes of the pre-change text". Is
"length in bytes" the same as "char position". I presumed note.

The second reason is more complex. I am trying to keep two buffers in
sync; to do this, I need to be able to convert between the position in
that buffer for the position in that buffer. In the simple case where
they contain exactly the same text this is easy (it's the same value),
but this is not necessarily the case; the two buffers map to each other,
but do not need to have identical content. So, I need to have a function
to do this conversion.

The difficulty is that in some cases the conversion function uses the
contents of this buffer to work out the equivalent location in that
buffer. When the b-c-f is called the two buffers are in sync, so this
conversion works. But when the a-c-f is called, the two buffers are not
in sync because we haven't percolated the change yet. So my conversion
functions tend to break. If I use the b-c-f values, then I can guarantee
the state of the two buffers is in-sync. Unfortunately, I depend on
b-c-f and a-c-f being consistent.

> Try your code on a diff-mode buffer using commands such as
> diff-unified->context or diff-context->unified (hint, they use
> combine-after-change-calls).

Yes, these break also. Although, I don't see why, since the
documentation says....

If `before-change-functions' is non-nil, then calls to the after-change
functions can't be deferred, so in that case this macro has no effect.

I had a quick look at combine-after-change-calls and
combine-after-change-execute and neither seem to check the value of
b-c-f; again, I've never learned C, so I could be wrong about the
latter.

I have got a partially working solution to this -- which is I can check
the value of this-command, on only use these values on this-command's
that I know to be safe (otherwise, I fall back to copying the whole
buffer). Unfortunately, this requires me to white list commands--rather
long winded, although this as this is mostly a performance optimisation
(albeit a significant one) and as self-insert-command is safe this might
be the way forward.

>> In general, I have found that this works.
>
> I think you mean "usually" rather than "in general".

Usually, I find, you understand things better than I, and specifically
in this case!

Phil

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Request for Mentor] subst-char-in-region
  2014-12-15 12:15       ` Phillip Lord
@ 2014-12-15 14:29         ` Stefan Monnier
  2014-12-16 11:30           ` Phillip Lord
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2014-12-15 14:29 UTC (permalink / raw)
  To: Phillip Lord; +Cc: emacs-devel

> Initially, because I wasn't sure that the a-c-f gave me all the
> information that I needed. I thought it might do, but I was confused by
> the third argument which is "length in bytes of the pre-change text". Is
> "length in bytes" the same as "char position". I presumed note.

Ha, nicely spotted, that's a bug in the docstring.  It passes (or
should anyway) the length in characters.

> The difficulty is that in some cases the conversion function uses the
> contents of this buffer to work out the equivalent location in that
> buffer.  When the b-c-f is called the two buffers are in sync, so this
> conversion works.  But when the a-c-f is called, the two buffers are not
> in sync because we haven't percolated the change yet.

So, rather than the length of the previous text, you'd need to actually
know the previous text?  Of course, you can use b-c-f to stash the
"previous text", but indeed it won't necessarily always be exactly
right, in the sense that the previous text stashed in b-c-f may not have
the exact same boundaries, so you'll need to massage it a bit.

The way I see it, you'll need to do something like

   (defvar-local my-before-change-text nil)

   (defun my-bcf (beg end)
     (setq my-before-change-text
           (if (null my-before-change-text)
               (cons beg (buffer-substring beg end))
             (let* ((oldbeg (car my-before-change-text))
                    (oldtext (cdr my-before-change-text))
                    (oldend (+ oldbeg (length oldtext)))
                    (newbeg (min beg oldbeg))
                    (newend (max end oldend)))
               (cl-assert (equal oldtext (buffer-substring oldbeg oldend)))
               (if (or (< newbeg oldbeg) (> newend oldend))
                   (cons newbeg (buffer-substring newbeg newend))
                 my-before-change-text)))))
   (add-hook 'before-change-functions #'my-bcf)

And then in after-change-functions, you'll need something like

   (defun my-acf (beg end oldlen)
     (let ((oldtext (substring (cdr my-before-change-text)
                               (- beg (car my-before-change-text))
                               (+ oldlen (- beg (car my-before-change-text))))))
       (setq my-before-change-text nil)
       ...))


-- Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Request for Mentor] subst-char-in-region
  2014-12-15 14:29         ` Stefan Monnier
@ 2014-12-16 11:30           ` Phillip Lord
  2014-12-16 14:07             ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Phillip Lord @ 2014-12-16 11:30 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Initially, because I wasn't sure that the a-c-f gave me all the
>> information that I needed. I thought it might do, but I was confused by
>> the third argument which is "length in bytes of the pre-change text". Is
>> "length in bytes" the same as "char position". I presumed note.
>
> Ha, nicely spotted, that's a bug in the docstring.  It passes (or
> should anyway) the length in characters.

Ah, okay. Can I fix it? This started of as a RFM thread, and this would
give me my first commit to Emacs core, even if it is just documentation.

>
>> The difficulty is that in some cases the conversion function uses the
>> contents of this buffer to work out the equivalent location in that
>> buffer.  When the b-c-f is called the two buffers are in sync, so this
>> conversion works.  But when the a-c-f is called, the two buffers are not
>> in sync because we haven't percolated the change yet.
>
> So, rather than the length of the previous text, you'd need to actually
> know the previous text?  

Potentially, yes.

For example, imagine I have a buffer like this

====
Here is some documentation
====

which maps to another buffer where the documentation is commented out

====
;; Here is some documentation
====

How to map between "equivalent" positions in the two? I do this by
counting backward from the end of the line in both buffers; this works
because the two lines differ only at the beginning.

With this technique, to calculate the equivalent position for the end of
a change, I need to know whether there are any new lines in changed region.

> Of course, you can use b-c-f to stash the "previous text", but indeed
> it won't necessarily always be exactly right, in the sense that the
> previous text stashed in b-c-f may not have the exact same boundaries,
> so you'll need to massage it a bit.
>
> The way I see it, you'll need to do something like
>
>    (defvar-local my-before-change-text nil)
>
>    (defun my-bcf (beg end)
>      (setq my-before-change-text
>            (if (null my-before-change-text)
>                (cons beg (buffer-substring beg end))
>              (let* ((oldbeg (car my-before-change-text))
>                     (oldtext (cdr my-before-change-text))
>                     (oldend (+ oldbeg (length oldtext)))
>                     (newbeg (min beg oldbeg))
>                     (newend (max end oldend)))
>                (cl-assert (equal oldtext (buffer-substring oldbeg oldend)))
>                (if (or (< newbeg oldbeg) (> newend oldend))
>                    (cons newbeg (buffer-substring newbeg newend))
>                  my-before-change-text)))))
>    (add-hook 'before-change-functions #'my-bcf)
>
> And then in after-change-functions, you'll need something like
>
>    (defun my-acf (beg end oldlen)
>      (let ((oldtext (substring (cdr my-before-change-text)
>                                (- beg (car my-before-change-text))
>                                (+ oldlen (- beg (car my-before-change-text))))))
>        (setq my-before-change-text nil)
>        ...))

That's possible, I guess, but it's still messy. The problem is I now
need to analyse both the buffer and the text which has been removed. I
also use the same function (to convert locations in the two buffers)
outside of changes (on the post-command-hook) where there is no changed
text. It adds considerable complexity the function. Still I will give it
a try.

For the moment, though, I have a workable solution (which is only make
incremental changes after white-listing commands); I have this working
with diff-mode now (nice example, incidentally, cause I found another,
unrelated, bug in my code).

Would you accept a commit to subst-char-in-region, though, which signals
the correct values on b-c-f? As I outlined in my previous email, it can
be done with no increase in computational complexity. Then I could
white-list fill-paragraph (at least when emacs 25 comes out!).

Phil

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Request for Mentor] subst-char-in-region
  2014-12-16 11:30           ` Phillip Lord
@ 2014-12-16 14:07             ` Stefan Monnier
  2014-12-16 16:05               ` Phillip Lord
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2014-12-16 14:07 UTC (permalink / raw)
  To: Phillip Lord; +Cc: emacs-devel

> Ah, okay.  Can I fix it?  This started of as a RFM thread, and this would
> give me my first commit to Emacs core, even if it is just documentation.

Oh, sorry, I just fixed it yesterday.  But don't worry: we have plenty
of other problems to fix.

> Would you accept a commit to subst-char-in-region, though, which signals
> the correct values on b-c-f?

If the change is small enough, yes.


        Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Request for Mentor] subst-char-in-region
  2014-12-16 14:07             ` Stefan Monnier
@ 2014-12-16 16:05               ` Phillip Lord
  0 siblings, 0 replies; 9+ messages in thread
From: Phillip Lord @ 2014-12-16 16:05 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Ah, okay.  Can I fix it?  This started of as a RFM thread, and this would
>> give me my first commit to Emacs core, even if it is just documentation.
>
> Oh, sorry, I just fixed it yesterday.  But don't worry: we have plenty
> of other problems to fix.

No worries! Thought you might have.

>
>> Would you accept a commit to subst-char-in-region, though, which signals
>> the correct values on b-c-f?
>
> If the change is small enough, yes.


Okay, will work on it.

Phil



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-12-16 16:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-12 11:43 [Request for Mentor] subst-char-in-region Phillip Lord
2014-12-12 14:35 ` Stefan Monnier
2014-12-12 15:05   ` Phillip Lord
2014-12-12 16:17     ` Stefan Monnier
2014-12-15 12:15       ` Phillip Lord
2014-12-15 14:29         ` Stefan Monnier
2014-12-16 11:30           ` Phillip Lord
2014-12-16 14:07             ` Stefan Monnier
2014-12-16 16:05               ` Phillip Lord

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).