all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Lars Ingebrigtsen <larsi@gnus.org>
To: Stefan Kangas <stefankangas@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: Make regexp handling more regular
Date: Thu, 03 Dec 2020 09:31:56 +0100	[thread overview]
Message-ID: <87ft4nz3wj.fsf@gnus.org> (raw)
In-Reply-To: <CADwFkmknD5CmysmQFaPFGiQpSk4H_N-9NgbhCW0vFweZkNhHAA@mail.gmail.com> (Stefan Kangas's message of "Wed, 2 Dec 2020 05:12:25 -0600")

Stefan Kangas <stefankangas@gmail.com> writes:

> I like the idea of adding an entirely new built-in API based on the
> current state of the art.  I would begin such a project by looking into
> what other Lisps are doing, such as CL, Clojure, Guile and Racket.  Why
> shouldn't Emacs Lisp be best-in-class?

Sure.

Common Lisp doesn't have regexps, but (some) implementations do, and
there's a bunch of libraries, like http://edicl.github.io/cl-ppcre/
I'm not much in favour:

* (scan "(a)*b" "xaaabd")
1
5
#(3)
#(4)

* (let ((s (create-scanner "(([a-c])+)x")))
    (scan s "abcxy"))
0
4
#(0 2)
#(3 3)

And since it's Common Lisp, of course you have special forms for
destructing: 

* (register-groups-bind (first second third fourth)
      ("((a)|(b)|(c))+" "abababc" :sharedp t)
    (list first second third fourth))
("c" "a" "b" "c")

Guile: https://www.gnu.org/software/guile/manual/html_node/Regexp-Functions.html

(string-match "[0-9][0-9][0-9][0-9]" "blah2002")
⇒ #("blah2002" (4 . 8))

(map match:substring (list-matches "[a-z]+" "abc 42 def 78"))
⇒ ("abc" "def")

Clojure: https://purelyfunctional.tv/mini-guide/regexes-in-clojure/

(re-matches #"abc(.*)" "abcxyz")
   ["abcxyz" "xyz"]

I.e., if there's one match, we return the match substring, otherwise an
array.  It's nice in one way, but the cleverness leads to errors when
(re-)writing code.

(subs (re-matches #"[a-z]+" "fooo baar") 3)

but then you add some more and you have to rewrite to something like:

(let [[_ s1 s2] (re-matches #"([a-z]+) ([a-z]+)" full-name)]
  (subs s1 3))

I hate that.

The thing that makes looking at other languages here slightly less
useful is that Emacs has buffers.  We're often not interested in the
(sub-)matches themselves at all, but instead their buffer positions
(i.e., match-beginning/end).

> As for naming, how about just using a short prefix such as "re-"?
> AFAICT, we currently have only five functions using that prefix.

Sure.

> Tangentially, I have always been wondering if its feasible to add a new
> regular expression type to `read' where you don't have to incessantly
> double quote all special characters.  (One could take inspiration from
> Python, for example, which adds an "r" character to strings to turn them
> into regexps: r"regexp".)

I'm all for adding a regexp object type (and a new read syntax), but I
think it's a somewhat orthogonal?  Not totally, though: I've long wished
for match/searching functions to be generic, and work differently on
strings and regexps.  That is, if fed a string, then do comparison with
`string-equal' and when fed a regexp, do the comparison with
`string-match'.

So you could say

(search-forward "foo")

and

(search-forward #r"fo+")

or

(search-forward (re-make "fo+"))

-- no reason for there to be separate functions if we have regexp objects.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



  parent reply	other threads:[~2020-12-03  8:31 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-02  9:05 Make regexp handling more regular Lars Ingebrigtsen
2020-12-02 10:44 ` Lars Ingebrigtsen
2020-12-02 11:12 ` Stefan Kangas
2020-12-02 11:21   ` Philipp Stephani
2020-12-03  8:31   ` Lars Ingebrigtsen [this message]
2020-12-02 17:17 ` Stefan Monnier
2020-12-02 17:45   ` Yuan Fu
2020-12-02 19:24     ` Stefan Monnier
2020-12-03  8:40       ` Lars Ingebrigtsen
2020-12-03  8:38   ` Lars Ingebrigtsen
2020-12-03 15:10     ` Stefan Monnier
2020-12-03 16:58       ` Lars Ingebrigtsen
2020-12-03 17:40         ` Stefan Monnier
2020-12-02 21:19 ` Juri Linkov
2020-12-03  8:41   ` Lars Ingebrigtsen
2020-12-03 15:00     ` Stefan Monnier
2020-12-03 21:02       ` Juri Linkov
2020-12-03 22:20         ` Vasilij Schneidermann
2020-12-02 21:28 ` Daniel Martín
2020-12-03  4:16 ` Adam Porter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ft4nz3wj.fsf@gnus.org \
    --to=larsi@gnus.org \
    --cc=emacs-devel@gnu.org \
    --cc=stefankangas@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.