unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dgutov@yandex.ru>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Yuan Fu <casouri@gmail.com>, Emacs developers <emacs-devel@gnu.org>
Subject: Re: Add some aliases for re-related functions
Date: Mon, 4 May 2020 03:29:02 +0300	[thread overview]
Message-ID: <cf01f001-1391-6a63-28a0-43f787e0af04@yandex.ru> (raw)
In-Reply-To: <jwvwo5tk8az.fsf-monnier+emacs@gnu.org>

On 03.05.2020 06:55, Stefan Monnier wrote:
>>> Think of it as the case where the regexp starts with \` and ends with \'
>>> Then there's the relaxation of "finding the longest match" (what we
>>> call `looking-at`) and then "finding the leftmost longest match" (what
>>> we call `string-match`).
>>
>> looking-at being a special case of re-search-forward, I take?
> 
> Not sure what you mean by that.

More or less that (looking-at "abc") is basically the same as 
(re-search-forward "\\=abc"). Though maybe not internally.

> `re-search-forward` is in the same
> category as `string-match`, i.e. it's a *search* operation, whereas
> `looking-at` is a *match* operation.
> 
> IOW a "match" operation is like a "search" but where `match-beginning`.
> 
> Algorithmically, the two are close, but there's a bit more work to go
> from one to the other than meets the eye:

Thanks for the explanation.

> If you take a regexp and turn it into a DFA in the usual way, you get an
> automaton that can trivially (and in O(n) time) give you either the
> shortest match or the longest match.  But if you want it to search, you
> have to add a loop around it to try matching at every possible start
> position, which brings the complexity to O(n²) :-(
> 
> To fix that you can try and compile ".*RE" instead of "RE" and that will
> give you an automaton that can do the search or "RE" in O(n) time, but
> it won't directly give you the "leftmost longest match" (instead it can
> directly give you "the match whose match-end is closest" and "the match
> whose match-end is furthest").

But that's what we generally want in practice anyway. And in the cases 
where the desired out come is different, the regexp is probably 
ambiguous, which often means worse performance.

> Note: Emacs's regexp engine isn't based on a DFA, and doesn't try and
> use the second option: our engine basically only does "matching" and to
> get the search behavior we add a loop around it, so algorithmically,
> `looking-at` and `string-match/re-search-forward` are quite different.
> 
> Notice that we don't really have the equivalent of `looking-at`
> on strings currently :-(
> 
>>> Those two have traditionally be named `re_match` and `re_search`
>>> respectively in C libraries (as can be seen in `src/regexp-emacs.c`).
>> Yes, ok. But we also need names to distinguish that things happen in
>> a buffer. So far we've used 'search' for those.
>> Using the term 'search' for matching in strings might be a significant
>> change, given existing expectations.
> 
> Yes, it's unfortunate.  Maybe we could/should merge them to clarify:
> 
>      (re-match REGEXP &optional STRING LIMIT START)
>      (re-search REGEXP &optional STRING LIMIT START)
> 
> would be like `looking-at` but would operate on STRING instead of
> `current-buffer` if STRING is non-nil.  START defaults to point for
> current-buffer and 0 for a string.

re-search-forward also moves point, whereas string-match returns the 
index of the match start. I think it would be kinda ugly to choose among 
these behaviors based on the second argument. And if it returns point 
instead, without moving, we get an entirely different function now.

I'm not sure it's worth the changeover and retraining everybody if the 
main benefit is being more aware of the underlying algorithm complexity. 
After all, it's not too hard to make an educated guess that looking-at 
is (or at least can be) faster than re-search-forward.

>>> PS: BTW, `looking-back` doesn't do a "match" of the "longest match that
>>> ends at point" but a "search" for the "rightmost longest match that ends
>>> at point" since it uses `re-search-backward` internally.
>> It's a weird function, I agree. Though it's proved to be a handy one.
> 
> Yes.  The functionality it offers is important, but in reality one would
> want a "real" `looking-back` which uses a backward match, rather than
> the current "backward search for a forward match" hack.  It would be
> both more efficient and provide a cleaner behavior.

It suppose so. Yet, in all cases I had to rewrite looking-back calls to 
add the now-mandatory argument, the resulting time it took was fast 
enough to get lost in the measurement noise.

Maybe an optimized version could enable some new use cases, though.



  reply	other threads:[~2020-05-04  0:29 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-02 18:28 Add some aliases for re-related functions Yuan Fu
2020-05-02 18:39 ` Eli Zaretskii
2020-05-02 18:43   ` Yuan Fu
2020-05-02 21:21   ` Stefan Monnier
2020-05-02 22:27     ` Drew Adams
2020-05-03  8:33       ` Michael Albinus
2020-05-03 19:07         ` Drew Adams
2020-05-02 19:29 ` Alan Mackenzie
2020-05-02 19:48   ` Yuan Fu
2020-05-02 20:10   ` Philippe Vaucher
2020-05-02 20:13     ` Philippe Vaucher
2020-05-03 14:16       ` Eli Zaretskii
2020-05-04  3:09         ` Richard Stallman
2020-05-04 14:28           ` Eli Zaretskii
2020-05-04 17:12             ` Dmitry Gutov
2020-05-04 17:35               ` Eli Zaretskii
2020-05-04 17:42                 ` Dmitry Gutov
2020-05-04 17:46                   ` Eli Zaretskii
2020-05-04 17:53                     ` Dmitry Gutov
2020-05-02 21:09     ` Alan Mackenzie
2020-05-02 21:51       ` Philippe Vaucher
2020-05-03  9:43         ` Alan Mackenzie
2020-05-03 15:00           ` 조성빈
2020-05-02 22:41       ` Stefan Monnier
2020-05-03 17:14         ` Alan Mackenzie
2020-05-04 10:07       ` João Távora
2020-05-03 14:16     ` Eli Zaretskii
2020-05-03 16:20       ` Yuri Khan
2020-05-03 16:42         ` Eli Zaretskii
2020-05-03 16:50           ` Dmitry Gutov
2020-05-02 21:33 ` Stefan Monnier
2020-05-02 22:10   ` Dmitry Gutov
2020-05-02 22:18     ` Eric Abrahamsen
2020-05-02 22:49     ` Stefan Monnier
2020-05-02 23:13       ` Dmitry Gutov
2020-05-03  3:55         ` Stefan Monnier
2020-05-04  0:29           ` Dmitry Gutov [this message]
2020-05-04  3:11             ` Stefan Monnier
2020-05-02 22:44   ` Drew Adams
2020-05-03  3:26     ` Stefan Monnier
2020-05-03  4:37       ` Drew Adams
2020-05-03  8:05         ` Philippe Vaucher
2020-05-03  9:55           ` Alan Mackenzie
2020-05-03 10:26             ` tomas
2020-05-03 15:15             ` Eli Zaretskii
2020-05-03 19:47           ` Drew Adams
2020-05-04  7:32             ` Philippe Vaucher
2020-05-04  8:20               ` Sending plaintext with Gmail (was: Add some aliases for re-related functions) Kévin Le Gouguec
2020-05-04  8:45                 ` Philippe Vaucher
2020-05-04 15:09                 ` Sending plaintext with Gmail Stefan Monnier
2020-05-04 15:25                   ` Kévin Le Gouguec
2020-05-04 15:29                     ` Lars Ingebrigtsen
2020-05-05  8:13                       ` HTML display in Gnus (was: Sending plaintext with Gmail) Kévin Le Gouguec
2020-05-05  8:24                         ` HTML display in Gnus Lars Ingebrigtsen
2020-05-05  9:26                           ` Kévin Le Gouguec
2020-07-17 17:14                           ` Kévin Le Gouguec
2020-05-04 15:39                     ` Sending plaintext with Gmail Andreas Schwab
2020-05-04 16:51               ` Add some aliases for re-related functions Drew Adams
2020-05-04 17:10               ` Drew Adams
2020-05-04 18:17                 ` Philippe Vaucher
2020-05-04 18:33                   ` Drew Adams
2020-05-05  2:48             ` Richard Stallman
2020-05-04  3:08         ` Richard Stallman
2020-05-04 10:16       ` João Távora
2020-05-04  3:04 ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf01f001-1391-6a63-28a0-43f787e0af04@yandex.ru \
    --to=dgutov@yandex.ru \
    --cc=casouri@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).