unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Make regexp handling more regular
@ 2020-12-02  9:05 Lars Ingebrigtsen
  2020-12-02 10:44 ` Lars Ingebrigtsen
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Lars Ingebrigtsen @ 2020-12-02  9:05 UTC (permalink / raw)
  To: emacs-devel

Today's idle shower thought:

I constant source of confusion and subtle bugs is the way Emacs does
regexp match handling: The way `string-match' (and the rest) sets a
global state, and you sort of have to catch them "early" is often a
challenge for new users.

Experienced Emacs Lisp programmers know to be safe and will say:

(when (string-match "[a-z]" string)
  (let ((match (match-string 0 string)))
    (foo)
    (bar match)))

while people new to Emacs Lisp will expect this to work:

(when (string-match "[a-z]" string)
  (foo)
  (bar (match-string - string)))

And sometimes it does, and sometimes it doesn't, depending on whether
`foo' also messes with the match data.

So my idle shower thought for the day is: Is there any reasonable path
forward that the Emacs Lisp language could take here?

Well, we obviously can't alter functions like `string-match' and
`re-search-forward' -- they have well-defined semantics, and we can't
make them return a match object.  But we could make a new set of
functions that are more, er, functional.

Naming is, of course, the most difficult problem here.  I wondered
whether the namespace would allow us to just add -p to the functions,
but names like `string-match-p' are already taken for variations on the
non-p functions.

In any case, if we happen upon a naming convention that's good, the new
interface for these functions would then be to return a "match object",
that can then be used for looking at details of the match.  I.e.,

(when (setq match (rx-string-match "[a-z]" string))
  (foo)
  (bar (match match 0)))

The match object would know what it had matched, too.  The following
code is an error:

(when (re-search-forward "p[a-z]+" nil t)
  (with-temp-buffer
    (insert (match-string 0))
    (buffer-string)))

But the following would work:

(when (setq match (rx-search-forward "p[a-z]+" nil t))
  (with-temp-buffer
    (insert (match match 0))
    (buffer-string)))

And the same for functions working on strings, of course.  And
equivalent forms for match-beginning/-end.  And we could finally get rid
of the confusingly-named `match-string' function.

There's nothing but upsides, people!

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-12-03 22:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-02  9:05 Make regexp handling more regular Lars Ingebrigtsen
2020-12-02 10:44 ` Lars Ingebrigtsen
2020-12-02 11:12 ` Stefan Kangas
2020-12-02 11:21   ` Philipp Stephani
2020-12-03  8:31   ` Lars Ingebrigtsen
2020-12-02 17:17 ` Stefan Monnier
2020-12-02 17:45   ` Yuan Fu
2020-12-02 19:24     ` Stefan Monnier
2020-12-03  8:40       ` Lars Ingebrigtsen
2020-12-03  8:38   ` Lars Ingebrigtsen
2020-12-03 15:10     ` Stefan Monnier
2020-12-03 16:58       ` Lars Ingebrigtsen
2020-12-03 17:40         ` Stefan Monnier
2020-12-02 21:19 ` Juri Linkov
2020-12-03  8:41   ` Lars Ingebrigtsen
2020-12-03 15:00     ` Stefan Monnier
2020-12-03 21:02       ` Juri Linkov
2020-12-03 22:20         ` Vasilij Schneidermann
2020-12-02 21:28 ` Daniel Martín
2020-12-03  4:16 ` Adam Porter

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).