unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Adam Porter <adam@alphapapa.net>
To: emacs-devel@gnu.org
Subject: Re: Improve `replace-regexp-in-string' ergonomics?
Date: Wed, 22 Sep 2021 02:33:03 -0500	[thread overview]
Message-ID: <87r1dhjc9c.fsf@alphapapa.net> (raw)
In-Reply-To: 878rzpw7jo.fsf@gnus.org

Lars Ingebrigtsen <larsi@gnus.org> writes:

> `replace-regexp-in-string' often leads to pretty awkward code.  I
> wonder whether we could improve it somehow.

That would be appreciated.  :)

> Here's a real life example:
>
> (defun org-babel-js-read (results)
> [...]
>        (org-babel-read
>         (concat "'"
>                 (replace-regexp-in-string
>                  "\\[" "(" (replace-regexp-in-string
>                             "\\]" ")" (replace-regexp-in-string
>                                        ",[[:space:]]" " "
> 				       (replace-regexp-in-string
> 					"'" "\"" results))))))
>
> That's kinda hard to read, but variations on this is pretty common.
> When you have one `replace-regexp-in-string', you often have another.
>
> We introduced `thread-last' in 2014, and there seems to be one (1) place
> in the Emacs code base, so I guess that didn't take off, but rewriting
> with that,

It doesn't seem that `thread-last' is very popular among Elispers, but
more among Clojurists (e.g. it's also implemented in dash.el as `->>').
But I've found it very useful in some cases, and I'm using `thread-last'
more often, trying to avoid adding dependencies on dash.el unless
necessary.

> we get:
>
>        (org-babel-read
>         (concat "'"
> 		(thread-last
> 		  results
> 		  (replace-regexp-in-string "'" "\"")
> 		  (replace-regexp-in-string ",[[:space:]]" " ")
> 		  (replace-regexp-in-string "\\]" ")")
>                   (replace-regexp-in-string "\\[" "("))))
>
> Which is somewhat more readable (but note that this totally breaks
> down if you want to mix in LITERAL etc). ... The length of the name of
> this common function is itself offputting.

Agreed, the name seems too long, and the function's signature is awkward
(I always have to check the argument list when I use it).  Most of the
time, I don't want to replace with automatic case matching, nor do I
want to substitute the original matched text, so I have to add the
FIXEDCASE argument, and then carefully re-read the docstring for LITERAL
and decide whether I need it, too.

The SUBEXP argument, I'm not so sure about.  Having it at the end would
break threading.  Having it after the replacement would mean having a
"nil" much of the time, which wouldn't be as pretty.  I suppose the
third argument could be either a SUBEXP or the string, and if a SUBEXP,
an optional fourth argument could be the string?  But since these are
likely called often and in loops, I suppose that might be undesirable.

> But I wonder whether we should consider renaming the function to
> something more palatable, and since we have `string-replace', why not
> `regexp-replace'?

Sounds good to me.  (Since it's also string-related, a
`string-replace-regexp' alias might be warranted, but I don't want to
get too bikesheddy now.)

>        (org-babel-read
>         (concat "'"
> 		(thread-last
> 		  results
> 		  (regexp-replace "'" "\"")
> 		  (regexp-replace ",[[:space:]]" " ")
> 		  (regexp-replace "\\]" ")")
>                   (regexp-replace "\\[" "("))))
>
> We could also consider making `regexp-replace' take a series of pairs,
> since this is so common.  Like:
>
>        (org-babel-read
>         (concat "'"
> 		(regexp-replace "'" "\""
> 				",[[:space:]]" " "
> 				"\\]" ")"
> 				"\\[" "("
> 				results)))
>
> Or some variation thereupon with some more ()s to group pairs.

It is common, but IMHO, it would be better to use a separate function
for that case, e.g. maybe `regexp-replace-pairs'.  (Alternatively,
`pcase-dolist' makes it easy to call a function in a loop with paired
arguments, so maybe it's not really needed.)

> The most popular way to deal with the awkwardness is to just give up and
> go all imperative:
>
> (defun authors-canonical-author-name (author file pos)
> [...]
>   (when author
>     (setq author (replace-regexp-in-string "[ \t]*[(<].*$" "" author))
>     (setq author (replace-regexp-in-string "\\`[ \t]+" "" author))
>     (setq author (replace-regexp-in-string "[ \t]+$" "" author))
>     (setq author (replace-regexp-in-string "[ \t]+" " " author))
>
> Which leads me to my other point -- about a quarter of the usages of the
> function in Emacs core has "" as the replacement, so perhaps that should
> have its own function?  `regexp-remove'?

IMHO, I'd lean toward not adding this unless it's really needed, but I
won't opine too strongly on it.




  parent reply	other threads:[~2021-09-22  7:33 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-22  4:36 Improve `replace-regexp-in-string' ergonomics? Lars Ingebrigtsen
2021-09-22  5:22 ` Yuri Khan
2021-09-22  6:36   ` Lars Ingebrigtsen
2021-09-22  7:47   ` Thierry Volpiatto
2021-09-22  5:24 ` Po Lu
2021-09-22  6:37   ` Lars Ingebrigtsen
2021-09-22 10:56     ` Po Lu
2021-09-22 20:08       ` Lars Ingebrigtsen
2021-09-23  0:11         ` Po Lu
2021-09-22  7:33 ` Adam Porter [this message]
2021-09-22  8:09   ` Lars Ingebrigtsen
2021-09-22  7:51 ` Andreas Schwab
2021-09-22  8:14 ` Augusto Stoffel
2021-09-22  8:21   ` Adam Porter
2021-09-22 18:01     ` Stefan Monnier
2021-09-22 18:24       ` Basil L. Contovounesios
2021-09-22 22:56       ` Adam Porter
2021-09-22 23:53         ` Eric Abrahamsen
2021-09-22 20:06   ` Lars Ingebrigtsen
2021-09-22 10:59 ` Dmitry Gutov
2021-09-22 20:18   ` Lars Ingebrigtsen
2021-09-22 22:23     ` Dmitry Gutov
2021-09-22 23:24       ` [External] : " Drew Adams
2021-09-22 18:14 ` Stefan Monnier
2021-09-22 19:30   ` Mattias Engdegård
2021-09-22 20:22   ` Lars Ingebrigtsen
2021-09-22 20:29     ` Lars Ingebrigtsen
2021-09-23  2:15     ` Stefan Monnier
2021-10-05 16:18 ` Juri Linkov
2021-10-12  6:53   ` Juri Linkov
2021-10-12 12:10     ` Lars Ingebrigtsen
2021-10-12 12:34       ` Stefan Monnier
2021-10-12 12:41         ` Lars Ingebrigtsen
2021-10-12 13:18           ` Lars Ingebrigtsen
2021-10-12 13:32             ` Mattias Engdegård
2021-10-12 15:48             ` Stefan Monnier
2021-10-12 13:33           ` Thierry Volpiatto
2021-10-12 19:16             ` Juri Linkov
2021-10-12 20:44               ` Thierry Volpiatto
2021-10-13  7:57                 ` Juri Linkov
2021-10-13  8:41                   ` Thierry Volpiatto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1dhjc9c.fsf@alphapapa.net \
    --to=adam@alphapapa.net \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).