unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Lars Ingebrigtsen <larsi@gnus.org>
To: emacs-devel@gnu.org
Subject: Make regexp handling more regular
Date: Wed, 02 Dec 2020 10:05:25 +0100	[thread overview]
Message-ID: <87lfeg60iy.fsf@gnus.org> (raw)

Today's idle shower thought:

I constant source of confusion and subtle bugs is the way Emacs does
regexp match handling: The way `string-match' (and the rest) sets a
global state, and you sort of have to catch them "early" is often a
challenge for new users.

Experienced Emacs Lisp programmers know to be safe and will say:

(when (string-match "[a-z]" string)
  (let ((match (match-string 0 string)))
    (foo)
    (bar match)))

while people new to Emacs Lisp will expect this to work:

(when (string-match "[a-z]" string)
  (foo)
  (bar (match-string - string)))

And sometimes it does, and sometimes it doesn't, depending on whether
`foo' also messes with the match data.

So my idle shower thought for the day is: Is there any reasonable path
forward that the Emacs Lisp language could take here?

Well, we obviously can't alter functions like `string-match' and
`re-search-forward' -- they have well-defined semantics, and we can't
make them return a match object.  But we could make a new set of
functions that are more, er, functional.

Naming is, of course, the most difficult problem here.  I wondered
whether the namespace would allow us to just add -p to the functions,
but names like `string-match-p' are already taken for variations on the
non-p functions.

In any case, if we happen upon a naming convention that's good, the new
interface for these functions would then be to return a "match object",
that can then be used for looking at details of the match.  I.e.,

(when (setq match (rx-string-match "[a-z]" string))
  (foo)
  (bar (match match 0)))

The match object would know what it had matched, too.  The following
code is an error:

(when (re-search-forward "p[a-z]+" nil t)
  (with-temp-buffer
    (insert (match-string 0))
    (buffer-string)))

But the following would work:

(when (setq match (rx-search-forward "p[a-z]+" nil t))
  (with-temp-buffer
    (insert (match match 0))
    (buffer-string)))

And the same for functions working on strings, of course.  And
equivalent forms for match-beginning/-end.  And we could finally get rid
of the confusingly-named `match-string' function.

There's nothing but upsides, people!

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




             reply	other threads:[~2020-12-02  9:05 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-02  9:05 Lars Ingebrigtsen [this message]
2020-12-02 10:44 ` Make regexp handling more regular Lars Ingebrigtsen
2020-12-02 11:12 ` Stefan Kangas
2020-12-02 11:21   ` Philipp Stephani
2020-12-03  8:31   ` Lars Ingebrigtsen
2020-12-02 17:17 ` Stefan Monnier
2020-12-02 17:45   ` Yuan Fu
2020-12-02 19:24     ` Stefan Monnier
2020-12-03  8:40       ` Lars Ingebrigtsen
2020-12-03  8:38   ` Lars Ingebrigtsen
2020-12-03 15:10     ` Stefan Monnier
2020-12-03 16:58       ` Lars Ingebrigtsen
2020-12-03 17:40         ` Stefan Monnier
2020-12-02 21:19 ` Juri Linkov
2020-12-03  8:41   ` Lars Ingebrigtsen
2020-12-03 15:00     ` Stefan Monnier
2020-12-03 21:02       ` Juri Linkov
2020-12-03 22:20         ` Vasilij Schneidermann
2020-12-02 21:28 ` Daniel Martín
2020-12-03  4:16 ` Adam Porter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lfeg60iy.fsf@gnus.org \
    --to=larsi@gnus.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).