From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Adam Porter <adam@alphapapa.net>
Newsgroups: gmane.emacs.devel
Subject: Re: Improve `replace-regexp-in-string' ergonomics?
Date: Wed, 22 Sep 2021 02:33:03 -0500
Message-ID: <87r1dhjc9c.fsf@alphapapa.net>
References: <878rzpw7jo.fsf@gnus.org>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="7857"; mail-complaints-to="usenet@ciao.gmane.io"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
To: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Sep 22 09:34:08 2021
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1mSwlc-0001mn-64
	for ged-emacs-devel@m.gmane-mx.org; Wed, 22 Sep 2021 09:34:08 +0200
Original-Received: from localhost ([::1]:42766 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1mSwla-0000bU-7t
	for ged-emacs-devel@m.gmane-mx.org; Wed, 22 Sep 2021 03:34:06 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40300)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ged-emacs-devel@m.gmane-mx.org>)
 id 1mSwkr-0008N9-Qs
 for emacs-devel@gnu.org; Wed, 22 Sep 2021 03:33:23 -0400
Original-Received: from ciao.gmane.io ([116.202.254.214]:44570)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ged-emacs-devel@m.gmane-mx.org>)
 id 1mSwkp-0000oo-VR
 for emacs-devel@gnu.org; Wed, 22 Sep 2021 03:33:21 -0400
Original-Received: from list by ciao.gmane.io with local (Exim 4.92)
 (envelope-from <ged-emacs-devel@m.gmane-mx.org>) id 1mSwkl-0000iN-I8
 for emacs-devel@gnu.org; Wed, 22 Sep 2021 09:33:15 +0200
X-Injected-Via-Gmane: http://gmane.org/
Received-SPF: pass client-ip=116.202.254.214;
 envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io
X-Spam_score_int: -15
X-Spam_score: -1.6
X-Spam_bar: -
X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9,
 HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:275298
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/275298>

Lars Ingebrigtsen <larsi@gnus.org> writes:

> `replace-regexp-in-string' often leads to pretty awkward code.  I
> wonder whether we could improve it somehow.

That would be appreciated.  :)

> Here's a real life example:
>
> (defun org-babel-js-read (results)
> [...]
>        (org-babel-read
>         (concat "'"
>                 (replace-regexp-in-string
>                  "\\[" "(" (replace-regexp-in-string
>                             "\\]" ")" (replace-regexp-in-string
>                                        ",[[:space:]]" " "
> 				       (replace-regexp-in-string
> 					"'" "\"" results))))))
>
> That's kinda hard to read, but variations on this is pretty common.
> When you have one `replace-regexp-in-string', you often have another.
>
> We introduced `thread-last' in 2014, and there seems to be one (1) place
> in the Emacs code base, so I guess that didn't take off, but rewriting
> with that,

It doesn't seem that `thread-last' is very popular among Elispers, but
more among Clojurists (e.g. it's also implemented in dash.el as `->>').
But I've found it very useful in some cases, and I'm using `thread-last'
more often, trying to avoid adding dependencies on dash.el unless
necessary.

> we get:
>
>        (org-babel-read
>         (concat "'"
> 		(thread-last
> 		  results
> 		  (replace-regexp-in-string "'" "\"")
> 		  (replace-regexp-in-string ",[[:space:]]" " ")
> 		  (replace-regexp-in-string "\\]" ")")
>                   (replace-regexp-in-string "\\[" "("))))
>
> Which is somewhat more readable (but note that this totally breaks
> down if you want to mix in LITERAL etc). ... The length of the name of
> this common function is itself offputting.

Agreed, the name seems too long, and the function's signature is awkward
(I always have to check the argument list when I use it).  Most of the
time, I don't want to replace with automatic case matching, nor do I
want to substitute the original matched text, so I have to add the
FIXEDCASE argument, and then carefully re-read the docstring for LITERAL
and decide whether I need it, too.

The SUBEXP argument, I'm not so sure about.  Having it at the end would
break threading.  Having it after the replacement would mean having a
"nil" much of the time, which wouldn't be as pretty.  I suppose the
third argument could be either a SUBEXP or the string, and if a SUBEXP,
an optional fourth argument could be the string?  But since these are
likely called often and in loops, I suppose that might be undesirable.

> But I wonder whether we should consider renaming the function to
> something more palatable, and since we have `string-replace', why not
> `regexp-replace'?

Sounds good to me.  (Since it's also string-related, a
`string-replace-regexp' alias might be warranted, but I don't want to
get too bikesheddy now.)

>        (org-babel-read
>         (concat "'"
> 		(thread-last
> 		  results
> 		  (regexp-replace "'" "\"")
> 		  (regexp-replace ",[[:space:]]" " ")
> 		  (regexp-replace "\\]" ")")
>                   (regexp-replace "\\[" "("))))
>
> We could also consider making `regexp-replace' take a series of pairs,
> since this is so common.  Like:
>
>        (org-babel-read
>         (concat "'"
> 		(regexp-replace "'" "\""
> 				",[[:space:]]" " "
> 				"\\]" ")"
> 				"\\[" "("
> 				results)))
>
> Or some variation thereupon with some more ()s to group pairs.

It is common, but IMHO, it would be better to use a separate function
for that case, e.g. maybe `regexp-replace-pairs'.  (Alternatively,
`pcase-dolist' makes it easy to call a function in a loop with paired
arguments, so maybe it's not really needed.)

> The most popular way to deal with the awkwardness is to just give up and
> go all imperative:
>
> (defun authors-canonical-author-name (author file pos)
> [...]
>   (when author
>     (setq author (replace-regexp-in-string "[ \t]*[(<].*$" "" author))
>     (setq author (replace-regexp-in-string "\\`[ \t]+" "" author))
>     (setq author (replace-regexp-in-string "[ \t]+$" "" author))
>     (setq author (replace-regexp-in-string "[ \t]+" " " author))
>
> Which leads me to my other point -- about a quarter of the usages of the
> function in Emacs core has "" as the replacement, so perhaps that should
> have its own function?  `regexp-remove'?

IMHO, I'd lean toward not adding this unless it's really needed, but I
won't opine too strongly on it.