unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Possible problem with looking-back function
@ 2010-08-19  1:15 Vinicius Jose Latorre
  2010-08-19  2:43 ` Davis Herring
  0 siblings, 1 reply; 9+ messages in thread
From: Vinicius Jose Latorre @ 2010-08-19  1:15 UTC (permalink / raw)
  To: GNU Emacs (devel)

Hi,


I'm not sure if there is a problem with looking-back function in Emacs 24.

Let me explain the problem.

Suppose the buffer content is:

| 1| .\n
| 2| \n
| 3| \ \ \ \n
| 7| \t\n
| 9| \n
|10| \n
|11| :

Where:
    .    represents the point position
    \n   represents the end of line
    \    represents a space character
    \t   represents a tab character
    :    represents the end of buffer

The numbers at first column indicates the point position at beginning of 
line.

Ok, executing the following code:

    (progn
      (looking-at "^\\([ \t\n]+\\)")
      (match-end 1))

It returns 11.

Now executing:

    (progn
      (goto-char 11)                     ; go to end of buffer
      (looking-back "^\\([ \t\n]+\\)" 1 t)
      (match-beginning 1))

It returns 9.

Shouldn't it return 1?




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible problem with looking-back function
  2010-08-19  1:15 Possible problem with looking-back function Vinicius Jose Latorre
@ 2010-08-19  2:43 ` Davis Herring
  2010-08-19  6:09   ` Andreas Röhler
  2010-08-20  2:11   ` Vinicius Jose Latorre
  0 siblings, 2 replies; 9+ messages in thread
From: Davis Herring @ 2010-08-19  2:43 UTC (permalink / raw)
  To: Vinicius Jose Latorre; +Cc: GNU Emacs

> Shouldn't it return 1?

The algorithm searches backward until it finds a position from which there
is a match that extends to point.  Doing more would be quadratic in the
value of point (at least), so is considered too slow.

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible problem with looking-back function
  2010-08-19  2:43 ` Davis Herring
@ 2010-08-19  6:09   ` Andreas Röhler
  2010-08-19  8:01     ` Stefan Monnier
  2010-08-19  9:02     ` Stephen J. Turnbull
  2010-08-20  2:11   ` Vinicius Jose Latorre
  1 sibling, 2 replies; 9+ messages in thread
From: Andreas Röhler @ 2010-08-19  6:09 UTC (permalink / raw)
  To: herring; +Cc: Emacs developers

Am 19.08.2010 04:43, schrieb Davis Herring:
>> Shouldn't it return 1?
>>      
> The algorithm searches backward until it finds a position from which there
> is a match that extends to point.  Doing more would be quadratic

Hi,

that surprises me.
Check must be done against the string already found.
So workload should grow lineary resp. slightly ascending.

AFAIK exists a bug, resp. missing implementation in re-search-backward.

Cheers

Andreas

--
https://code.launchpad.net/~a-roehler/python-mode
https://code.launchpad.net/s-x-emacs-werkstatt/



>   in the
> value of point (at least), so is considered too slow.
>
> Davis
>
>    




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible problem with looking-back function
  2010-08-19  6:09   ` Andreas Röhler
@ 2010-08-19  8:01     ` Stefan Monnier
  2010-08-19  9:02     ` Stephen J. Turnbull
  1 sibling, 0 replies; 9+ messages in thread
From: Stefan Monnier @ 2010-08-19  8:01 UTC (permalink / raw)
  To: Andreas Röhler; +Cc: Emacs developers

>>> Shouldn't it return 1?

Not according to the docstring:

   If GREEDY is non-nil, extend the match backwards as far as
   possible, stopping when a single additional previous character
   cannot be part of a match for REGEXP.  When the match is
   extended, its starting position is allowed to occur before
   LIMIT.

>> The algorithm searches backward until it finds a position from which there
>> is a match that extends to point.  Doing more would be quadratic

Actually our regexp matcher (i.e. "looking-at") is already much worse
than quadratic in worst-case complexity.

> AFAIK exists a bug, resp. missing implementation in re-search-backward.

Ignoring back-references, our regexps are pretty much within the
"regular language" domain, so in theory matching and searching can be
performed in linear time.  But our regexp engine uses a backtracking
implementation, so it's not even close to linear.  And the "backward"
matcher is not implemented, so we "simulate" it with a forward matcher,
which is why looking-back is slow and doesn't behave like we want.


        Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible problem with looking-back function
  2010-08-19  6:09   ` Andreas Röhler
  2010-08-19  8:01     ` Stefan Monnier
@ 2010-08-19  9:02     ` Stephen J. Turnbull
  2010-08-19  9:32       ` Andreas Röhler
  1 sibling, 1 reply; 9+ messages in thread
From: Stephen J. Turnbull @ 2010-08-19  9:02 UTC (permalink / raw)
  To: Andreas Röhler; +Cc: Emacs developers

Andreas Röhler writes:

 > that surprises me.  Check must be done against the string already
 > found.  So workload should grow lineary resp. slightly ascending.

That's incorrect.  In regexp matching abstractly defined, there is no
"string already matched."  In general backtracking must be done to get
a correct POSIX match, and it's potentially very expensive.  It would
be sometimes possible (as in this case) to identify a non-backtracking
algorithm as you suggest, but that would mean that different regexps
would be treated differently, or that some regexps would be
ridiculously expensive.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible problem with looking-back function
  2010-08-19  9:02     ` Stephen J. Turnbull
@ 2010-08-19  9:32       ` Andreas Röhler
  0 siblings, 0 replies; 9+ messages in thread
From: Andreas Röhler @ 2010-08-19  9:32 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Stefan Monnier, Emacs developers

Am 19.08.2010 11:02, schrieb Stephen J. Turnbull:
> Andreas Röhler writes:
>
>   >  that surprises me.  Check must be done against the string already
>   >  found.  So workload should grow lineary resp. slightly ascending.
>
> That's incorrect.  In regexp matching abstractly defined, there is no
> "string already matched."  In general backtracking must be done to get
> a correct POSIX match, and it's potentially very expensive.  It would
> be sometimes possible (as in this case) to identify a non-backtracking
> algorithm as you suggest, but that would mean that different regexps
> would be treated differently, or that some regexps would be
> ridiculously expensive.
>
>    

Hi,

just for the curious:
Coming upon the matter, as in markup languages point may be inside the 
start tag.
Solved it whith function below

--begstr is the startup-tag
Sometimes begstr is used inside beg-end function quoted: (regexp-quote 
begstr),
therefor the startup tag is delivered in both forms here, making 
looking-at working
- not sure, if its really necessary btw--


(defun ar-leave-begstr-backward (begstr unquoted-beg)
   (let* ((stringcount (length unquoted-beg))
          (collected (char-to-string (char-after)))
          (indx (string-match (regexp-quote collected) unquoted-beg)))
     (while (and indx (not (ignore-errors (looking-at begstr)))(< 1 
stringcount))
       (forward-char -1)
       (setq collected (concat (char-to-string (char-after)) collected))
       (setq indx (string-match (regexp-quote collected) unquoted-beg))
       (setq stringcount (1- stringcount)))))

See beg-end.el

at

https://code.launchpad.net/s-x-emacs-werkstatt/



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible problem with looking-back function
  2010-08-19  2:43 ` Davis Herring
  2010-08-19  6:09   ` Andreas Röhler
@ 2010-08-20  2:11   ` Vinicius Jose Latorre
  2010-08-20 13:07     ` Stefan Monnier
  1 sibling, 1 reply; 9+ messages in thread
From: Vinicius Jose Latorre @ 2010-08-20  2:11 UTC (permalink / raw)
  To: herring; +Cc: GNU Emacs


>> Shouldn't it return 1?
>
> The algorithm searches backward until it finds a position from which there
> is a match that extends to point. Doing more would be quadratic in the
> value of point (at least), so is considered too slow.

Ok, so looking-back should be replaced by:

    (progn
      (goto-char 11)                     ; go to end of buffer
      ;; ---- looking-back replacement --- BEGIN
      (save-excursion
        (when (/= 0 (skip-chars-backward " \t\n" 1))
          (unless (bolp)
            (forward-line 1))
          (looking-at "^\\([ \t\n]+\\)")))
      ;; ---- looking-back replacement --- END
      (match-beginning 1))

Now it returns 1.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible problem with looking-back function
  2010-08-20  2:11   ` Vinicius Jose Latorre
@ 2010-08-20 13:07     ` Stefan Monnier
  2010-08-21  0:08       ` Vinicius Jose Latorre
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2010-08-20 13:07 UTC (permalink / raw)
  To: Vinicius Jose Latorre; +Cc: GNU Emacs

> Ok, so looking-back should be replaced by:

>    (progn
>      (goto-char 11)                     ; go to end of buffer
>      ;; ---- looking-back replacement --- BEGIN
>      (save-excursion
>        (when (/= 0 (skip-chars-backward " \t\n" 1))
>          (unless (bolp)
>            (forward-line 1))
>          (looking-at "^\\([ \t\n]+\\)")))
>      ;; ---- looking-back replacement --- END
>      (match-beginning 1))

> Now it returns 1.

Or you can still use a regexp, with something like

  (defun other-looking-back (re limit)
    (save-restriction
      (narrow-to-region (point-min) (point))
      (goto-char limit)
      (re-search-forward (concat "\\(?:" re "\\)\\'") nil t)))

Note that it imposes a few different constraints on the regexp, mostly
it can't use things like \> or \< or \b at the end because
narrow-to-region will prevent it from peeking at the next char.


        Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Possible problem with looking-back function
  2010-08-20 13:07     ` Stefan Monnier
@ 2010-08-21  0:08       ` Vinicius Jose Latorre
  0 siblings, 0 replies; 9+ messages in thread
From: Vinicius Jose Latorre @ 2010-08-21  0:08 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: GNU Emacs


> Or you can still use a regexp, with something like
>
>    (defun other-looking-back (re limit)
>      (save-restriction
>        (narrow-to-region (point-min) (point))
>        (goto-char limit)
>        (re-search-forward (concat "\\(?:" re "\\)\\'") nil t)))
>
> Note that it imposes a few different constraints on the regexp, mostly
> it can't use things like \>  or \<  or \b at the end because
> narrow-to-region will prevent it from peeking at the next char.
>    

Ok, thanks all for helping.




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-08-21  0:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-19  1:15 Possible problem with looking-back function Vinicius Jose Latorre
2010-08-19  2:43 ` Davis Herring
2010-08-19  6:09   ` Andreas Röhler
2010-08-19  8:01     ` Stefan Monnier
2010-08-19  9:02     ` Stephen J. Turnbull
2010-08-19  9:32       ` Andreas Röhler
2010-08-20  2:11   ` Vinicius Jose Latorre
2010-08-20 13:07     ` Stefan Monnier
2010-08-21  0:08       ` Vinicius Jose Latorre

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).