From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Adam Porter Newsgroups: gmane.emacs.devel Subject: Re: Improve `replace-regexp-in-string' ergonomics? Date: Wed, 22 Sep 2021 02:33:03 -0500 Message-ID: <87r1dhjc9c.fsf@alphapapa.net> References: <878rzpw7jo.fsf@gnus.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="7857"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Sep 22 09:34:08 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mSwlc-0001mn-64 for ged-emacs-devel@m.gmane-mx.org; Wed, 22 Sep 2021 09:34:08 +0200 Original-Received: from localhost ([::1]:42766 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mSwla-0000bU-7t for ged-emacs-devel@m.gmane-mx.org; Wed, 22 Sep 2021 03:34:06 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40300) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mSwkr-0008N9-Qs for emacs-devel@gnu.org; Wed, 22 Sep 2021 03:33:23 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]:44570) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mSwkp-0000oo-VR for emacs-devel@gnu.org; Wed, 22 Sep 2021 03:33:21 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1mSwkl-0000iN-I8 for emacs-devel@gnu.org; Wed, 22 Sep 2021 09:33:15 +0200 X-Injected-Via-Gmane: http://gmane.org/ Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -15 X-Spam_score: -1.6 X-Spam_bar: - X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:275298 Archived-At: Lars Ingebrigtsen writes: > `replace-regexp-in-string' often leads to pretty awkward code. I > wonder whether we could improve it somehow. That would be appreciated. :) > Here's a real life example: > > (defun org-babel-js-read (results) > [...] > (org-babel-read > (concat "'" > (replace-regexp-in-string > "\\[" "(" (replace-regexp-in-string > "\\]" ")" (replace-regexp-in-string > ",[[:space:]]" " " > (replace-regexp-in-string > "'" "\"" results)))))) > > That's kinda hard to read, but variations on this is pretty common. > When you have one `replace-regexp-in-string', you often have another. > > We introduced `thread-last' in 2014, and there seems to be one (1) place > in the Emacs code base, so I guess that didn't take off, but rewriting > with that, It doesn't seem that `thread-last' is very popular among Elispers, but more among Clojurists (e.g. it's also implemented in dash.el as `->>'). But I've found it very useful in some cases, and I'm using `thread-last' more often, trying to avoid adding dependencies on dash.el unless necessary. > we get: > > (org-babel-read > (concat "'" > (thread-last > results > (replace-regexp-in-string "'" "\"") > (replace-regexp-in-string ",[[:space:]]" " ") > (replace-regexp-in-string "\\]" ")") > (replace-regexp-in-string "\\[" "(")))) > > Which is somewhat more readable (but note that this totally breaks > down if you want to mix in LITERAL etc). ... The length of the name of > this common function is itself offputting. Agreed, the name seems too long, and the function's signature is awkward (I always have to check the argument list when I use it). Most of the time, I don't want to replace with automatic case matching, nor do I want to substitute the original matched text, so I have to add the FIXEDCASE argument, and then carefully re-read the docstring for LITERAL and decide whether I need it, too. The SUBEXP argument, I'm not so sure about. Having it at the end would break threading. Having it after the replacement would mean having a "nil" much of the time, which wouldn't be as pretty. I suppose the third argument could be either a SUBEXP or the string, and if a SUBEXP, an optional fourth argument could be the string? But since these are likely called often and in loops, I suppose that might be undesirable. > But I wonder whether we should consider renaming the function to > something more palatable, and since we have `string-replace', why not > `regexp-replace'? Sounds good to me. (Since it's also string-related, a `string-replace-regexp' alias might be warranted, but I don't want to get too bikesheddy now.) > (org-babel-read > (concat "'" > (thread-last > results > (regexp-replace "'" "\"") > (regexp-replace ",[[:space:]]" " ") > (regexp-replace "\\]" ")") > (regexp-replace "\\[" "(")))) > > We could also consider making `regexp-replace' take a series of pairs, > since this is so common. Like: > > (org-babel-read > (concat "'" > (regexp-replace "'" "\"" > ",[[:space:]]" " " > "\\]" ")" > "\\[" "(" > results))) > > Or some variation thereupon with some more ()s to group pairs. It is common, but IMHO, it would be better to use a separate function for that case, e.g. maybe `regexp-replace-pairs'. (Alternatively, `pcase-dolist' makes it easy to call a function in a loop with paired arguments, so maybe it's not really needed.) > The most popular way to deal with the awkwardness is to just give up and > go all imperative: > > (defun authors-canonical-author-name (author file pos) > [...] > (when author > (setq author (replace-regexp-in-string "[ \t]*[(<].*$" "" author)) > (setq author (replace-regexp-in-string "\\`[ \t]+" "" author)) > (setq author (replace-regexp-in-string "[ \t]+$" "" author)) > (setq author (replace-regexp-in-string "[ \t]+" " " author)) > > Which leads me to my other point -- about a quarter of the usages of the > function in Emacs core has "" as the replacement, so perhaps that should > have its own function? `regexp-remove'? IMHO, I'd lean toward not adding this unless it's really needed, but I won't opine too strongly on it.