unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@iro.umontreal.ca>
To: Dmitry Gutov <dgutov@yandex.ru>
Cc: Yuan Fu <casouri@gmail.com>, Emacs developers <emacs-devel@gnu.org>
Subject: Re: Add some aliases for re-related functions
Date: Sat, 02 May 2020 23:55:08 -0400	[thread overview]
Message-ID: <jwvwo5tk8az.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <2a5344e3-999c-bb60-3c27-a9e9e6c256da@yandex.ru> (Dmitry Gutov's message of "Sun, 3 May 2020 02:13:42 +0300")

>> Think of it as the case where the regexp starts with \` and ends with \'
>> Then there's the relaxation of "finding the longest match" (what we
>> call `looking-at`) and then "finding the leftmost longest match" (what
>> we call `string-match`).
>
> looking-at being a special case of re-search-forward, I take?

Not sure what you mean by that.  `re-search-forward` is in the same
category as `string-match`, i.e. it's a *search* operation, whereas
`looking-at` is a *match* operation.

IOW a "match" operation is like a "search" but where `match-beginning`.

Algorithmically, the two are close, but there's a bit more work to go
from one to the other than meets the eye:

If you take a regexp and turn it into a DFA in the usual way, you get an
automaton that can trivially (and in O(n) time) give you either the
shortest match or the longest match.  But if you want it to search, you
have to add a loop around it to try matching at every possible start
position, which brings the complexity to O(n²) :-(

To fix that you can try and compile ".*RE" instead of "RE" and that will
give you an automaton that can do the search or "RE" in O(n) time, but
it won't directly give you the "leftmost longest match" (instead it can
directly give you "the match whose match-end is closest" and "the match
whose match-end is furthest").  So to get the desired "leftmost longest
match", you have to work a bit harder yet.

Note: Emacs's regexp engine isn't based on a DFA, and doesn't try and
use the second option: our engine basically only does "matching" and to
get the search behavior we add a loop around it, so algorithmically,
`looking-at` and `string-match/re-search-forward` are quite different.

Notice that we don't really have the equivalent of `looking-at`
on strings currently :-(

>> Those two have traditionally be named `re_match` and `re_search`
>> respectively in C libraries (as can be seen in `src/regexp-emacs.c`).
> Yes, ok. But we also need names to distinguish that things happen in
> a buffer. So far we've used 'search' for those.
> Using the term 'search' for matching in strings might be a significant
> change, given existing expectations.

Yes, it's unfortunate.  Maybe we could/should merge them to clarify:

    (re-match REGEXP &optional STRING LIMIT START)
    (re-search REGEXP &optional STRING LIMIT START)

would be like `looking-at` but would operate on STRING instead of
`current-buffer` if STRING is non-nil.  START defaults to point for
current-buffer and 0 for a string.

Compared to re-search-forward, this lacks the NOERROR and the COUNT args.
We could add yet more optional args, but this is getting ugly.  Not sure
how important these are, tho.
Or maybe we could change the optional START arg so it means "START" when used on
a string and it means something else (NOERROR or COUNT) when used in
a buffer (yucky)?

>> PS: BTW, `looking-back` doesn't do a "match" of the "longest match that
>> ends at point" but a "search" for the "rightmost longest match that ends
>> at point" since it uses `re-search-backward` internally.
> It's a weird function, I agree. Though it's proved to be a handy one.

Yes.  The functionality it offers is important, but in reality one would
want a "real" `looking-back` which uses a backward match, rather than
the current "backward search for a forward match" hack.  It would be
both more efficient and provide a cleaner behavior.


        Stefan




  reply	other threads:[~2020-05-03  3:55 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-02 18:28 Add some aliases for re-related functions Yuan Fu
2020-05-02 18:39 ` Eli Zaretskii
2020-05-02 18:43   ` Yuan Fu
2020-05-02 21:21   ` Stefan Monnier
2020-05-02 22:27     ` Drew Adams
2020-05-03  8:33       ` Michael Albinus
2020-05-03 19:07         ` Drew Adams
2020-05-02 19:29 ` Alan Mackenzie
2020-05-02 19:48   ` Yuan Fu
2020-05-02 20:10   ` Philippe Vaucher
2020-05-02 20:13     ` Philippe Vaucher
2020-05-03 14:16       ` Eli Zaretskii
2020-05-04  3:09         ` Richard Stallman
2020-05-04 14:28           ` Eli Zaretskii
2020-05-04 17:12             ` Dmitry Gutov
2020-05-04 17:35               ` Eli Zaretskii
2020-05-04 17:42                 ` Dmitry Gutov
2020-05-04 17:46                   ` Eli Zaretskii
2020-05-04 17:53                     ` Dmitry Gutov
2020-05-02 21:09     ` Alan Mackenzie
2020-05-02 21:51       ` Philippe Vaucher
2020-05-03  9:43         ` Alan Mackenzie
2020-05-03 15:00           ` 조성빈
2020-05-02 22:41       ` Stefan Monnier
2020-05-03 17:14         ` Alan Mackenzie
2020-05-04 10:07       ` João Távora
2020-05-03 14:16     ` Eli Zaretskii
2020-05-03 16:20       ` Yuri Khan
2020-05-03 16:42         ` Eli Zaretskii
2020-05-03 16:50           ` Dmitry Gutov
2020-05-02 21:33 ` Stefan Monnier
2020-05-02 22:10   ` Dmitry Gutov
2020-05-02 22:18     ` Eric Abrahamsen
2020-05-02 22:49     ` Stefan Monnier
2020-05-02 23:13       ` Dmitry Gutov
2020-05-03  3:55         ` Stefan Monnier [this message]
2020-05-04  0:29           ` Dmitry Gutov
2020-05-04  3:11             ` Stefan Monnier
2020-05-02 22:44   ` Drew Adams
2020-05-03  3:26     ` Stefan Monnier
2020-05-03  4:37       ` Drew Adams
2020-05-03  8:05         ` Philippe Vaucher
2020-05-03  9:55           ` Alan Mackenzie
2020-05-03 10:26             ` tomas
2020-05-03 15:15             ` Eli Zaretskii
2020-05-03 19:47           ` Drew Adams
2020-05-04  7:32             ` Philippe Vaucher
2020-05-04  8:20               ` Sending plaintext with Gmail (was: Add some aliases for re-related functions) Kévin Le Gouguec
2020-05-04  8:45                 ` Philippe Vaucher
2020-05-04 15:09                 ` Sending plaintext with Gmail Stefan Monnier
2020-05-04 15:25                   ` Kévin Le Gouguec
2020-05-04 15:29                     ` Lars Ingebrigtsen
2020-05-05  8:13                       ` HTML display in Gnus (was: Sending plaintext with Gmail) Kévin Le Gouguec
2020-05-05  8:24                         ` HTML display in Gnus Lars Ingebrigtsen
2020-05-05  9:26                           ` Kévin Le Gouguec
2020-07-17 17:14                           ` Kévin Le Gouguec
2020-05-04 15:39                     ` Sending plaintext with Gmail Andreas Schwab
2020-05-04 16:51               ` Add some aliases for re-related functions Drew Adams
2020-05-04 17:10               ` Drew Adams
2020-05-04 18:17                 ` Philippe Vaucher
2020-05-04 18:33                   ` Drew Adams
2020-05-05  2:48             ` Richard Stallman
2020-05-04  3:08         ` Richard Stallman
2020-05-04 10:16       ` João Távora
2020-05-04  3:04 ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvwo5tk8az.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=casouri@gmail.com \
    --cc=dgutov@yandex.ru \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).