From: Lars Ingebrigtsen <larsi@gnus.org>
To: emacs-devel@gnu.org
Subject: Make regexp handling more regular
Date: Wed, 02 Dec 2020 10:05:25 +0100 [thread overview]
Message-ID: <87lfeg60iy.fsf@gnus.org> (raw)
Today's idle shower thought:
I constant source of confusion and subtle bugs is the way Emacs does
regexp match handling: The way `string-match' (and the rest) sets a
global state, and you sort of have to catch them "early" is often a
challenge for new users.
Experienced Emacs Lisp programmers know to be safe and will say:
(when (string-match "[a-z]" string)
(let ((match (match-string 0 string)))
(foo)
(bar match)))
while people new to Emacs Lisp will expect this to work:
(when (string-match "[a-z]" string)
(foo)
(bar (match-string - string)))
And sometimes it does, and sometimes it doesn't, depending on whether
`foo' also messes with the match data.
So my idle shower thought for the day is: Is there any reasonable path
forward that the Emacs Lisp language could take here?
Well, we obviously can't alter functions like `string-match' and
`re-search-forward' -- they have well-defined semantics, and we can't
make them return a match object. But we could make a new set of
functions that are more, er, functional.
Naming is, of course, the most difficult problem here. I wondered
whether the namespace would allow us to just add -p to the functions,
but names like `string-match-p' are already taken for variations on the
non-p functions.
In any case, if we happen upon a naming convention that's good, the new
interface for these functions would then be to return a "match object",
that can then be used for looking at details of the match. I.e.,
(when (setq match (rx-string-match "[a-z]" string))
(foo)
(bar (match match 0)))
The match object would know what it had matched, too. The following
code is an error:
(when (re-search-forward "p[a-z]+" nil t)
(with-temp-buffer
(insert (match-string 0))
(buffer-string)))
But the following would work:
(when (setq match (rx-search-forward "p[a-z]+" nil t))
(with-temp-buffer
(insert (match match 0))
(buffer-string)))
And the same for functions working on strings, of course. And
equivalent forms for match-beginning/-end. And we could finally get rid
of the confusingly-named `match-string' function.
There's nothing but upsides, people!
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
next reply other threads:[~2020-12-02 9:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-02 9:05 Lars Ingebrigtsen [this message]
2020-12-02 10:44 ` Make regexp handling more regular Lars Ingebrigtsen
2020-12-02 11:12 ` Stefan Kangas
2020-12-02 11:21 ` Philipp Stephani
2020-12-03 8:31 ` Lars Ingebrigtsen
2020-12-02 17:17 ` Stefan Monnier
2020-12-02 17:45 ` Yuan Fu
2020-12-02 19:24 ` Stefan Monnier
2020-12-03 8:40 ` Lars Ingebrigtsen
2020-12-03 8:38 ` Lars Ingebrigtsen
2020-12-03 15:10 ` Stefan Monnier
2020-12-03 16:58 ` Lars Ingebrigtsen
2020-12-03 17:40 ` Stefan Monnier
2020-12-02 21:19 ` Juri Linkov
2020-12-03 8:41 ` Lars Ingebrigtsen
2020-12-03 15:00 ` Stefan Monnier
2020-12-03 21:02 ` Juri Linkov
2020-12-03 22:20 ` Vasilij Schneidermann
2020-12-02 21:28 ` Daniel Martín
2020-12-03 4:16 ` Adam Porter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lfeg60iy.fsf@gnus.org \
--to=larsi@gnus.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).