unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Improve `replace-regexp-in-string' ergonomics?
@ 2021-09-22  4:36 Lars Ingebrigtsen
  2021-09-22  5:22 ` Yuri Khan
                   ` (7 more replies)
  0 siblings, 8 replies; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-22  4:36 UTC (permalink / raw)
  To: emacs-devel

`replace-regexp-in-string' often leads to pretty awkward code.  I wonder
whether we could improve it somehow.

Here's a real life example:

(defun org-babel-js-read (results)
[...]
       (org-babel-read
        (concat "'"
                (replace-regexp-in-string
                 "\\[" "(" (replace-regexp-in-string
                            "\\]" ")" (replace-regexp-in-string
                                       ",[[:space:]]" " "
				       (replace-regexp-in-string
					"'" "\"" results))))))

That's kinda hard to read, but variations on this is pretty common.
When you have one `replace-regexp-in-string', you often have another.

We introduced `thread-last' in 2014, and there seems to be one (1) place
in the Emacs code base, so I guess that didn't take off, but rewriting
with that, we get:

       (org-babel-read
        (concat "'"
		(thread-last
		  results
		  (replace-regexp-in-string "'" "\"")
		  (replace-regexp-in-string ",[[:space:]]" " ")
		  (replace-regexp-in-string "\\]" ")")
                  (replace-regexp-in-string "\\[" "("))))

Which is somewhat more readable (but note that this totally breaks down
if you want to mix in LITERAL etc).  But I wonder whether we should
consider renaming the function to something more palatable, and since we
have `string-replace', why not `regexp-replace'?  The length of the name
of this common function is itself offputting.

       (org-babel-read
        (concat "'"
		(thread-last
		  results
		  (regexp-replace "'" "\"")
		  (regexp-replace ",[[:space:]]" " ")
		  (regexp-replace "\\]" ")")
                  (regexp-replace "\\[" "("))))

We could also consider making `regexp-replace' take a series of pairs,
since this is so common.  Like:

       (org-babel-read
        (concat "'"
		(regexp-replace "'" "\""
				",[[:space:]]" " "
				"\\]" ")"
				"\\[" "("
				results)))

Or some variation thereupon with some more ()s to group pairs.

The most popular way to deal with the awkwardness is to just give up and
go all imperative:

(defun authors-canonical-author-name (author file pos)
[...]
  (when author
    (setq author (replace-regexp-in-string "[ \t]*[(<].*$" "" author))
    (setq author (replace-regexp-in-string "\\`[ \t]+" "" author))
    (setq author (replace-regexp-in-string "[ \t]+$" "" author))
    (setq author (replace-regexp-in-string "[ \t]+" " " author))

Which leads me to my other point -- about a quarter of the usages of the
function in Emacs core has "" as the replacement, so perhaps that should
have its own function?  `regexp-remove'?

Then that could be:

  (when author
    (setq author (regexp-remove "[ \t]*[(<].*$" author))
    (setq author (regexp-remove "\\`[ \t]+" author))
    (setq author (regexp-remove "[ \t]+$" author))
    (setq author (regexp-replace "[ \t]+" " " author))

or

  (when author
    (setq author
	  (regexp-replace
	   "[ \t]+" " " (regexp-remove
			 "[ \t]*[(<].*$" (regexp-remove
					  "\\`[ \t]+" (regexp-remove
						       "[ \t]+$" author)))))))
or

  (when author
    (setq author
	  (thread-last author
		       (regexp-remove "[ \t]*[(<].*$")
		       (regexp-remove "\\`[ \t]+")
		       (regexp-remove "[ \t]+$")
		       (regexp-replace "[ \t]+" " ")))))


Or...  something else.  I'm sure nobody else has thought about this
issue before.  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  4:36 Improve `replace-regexp-in-string' ergonomics? Lars Ingebrigtsen
@ 2021-09-22  5:22 ` Yuri Khan
  2021-09-22  6:36   ` Lars Ingebrigtsen
  2021-09-22  7:47   ` Thierry Volpiatto
  2021-09-22  5:24 ` Po Lu
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 41+ messages in thread
From: Yuri Khan @ 2021-09-22  5:22 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Emacs developers

On Wed, 22 Sept 2021 at 11:37, Lars Ingebrigtsen <larsi@gnus.org> wrote:

> `replace-regexp-in-string' often leads to pretty awkward code.  I wonder
> whether we could improve it somehow.

> We could also consider making `regexp-replace' take a series of pairs,
> since this is so common.  Like:
>
>        (org-babel-read
>         (concat "'"
>                 (regexp-replace "'" "\""
>                                 ",[[:space:]]" " "
>                                 "\\]" ")"
>                                 "\\[" "("
>                                 results)))

I want to highlight a possible difference in semantics and usability
improvement with this interface, roughly parallel to (let …) vs (let*
…).

A nested/threaded/sequential imperative regexp-replace first changes
each occurrence of the first pattern into a first replacement, then
changes each occurrence of each subsequent pattern. This can lead to
interference when an earlier replacement matches a later pattern.

However, I have a hypothesis that what users really want is “I have
this list of pattern/replacement pairs, and I want to go through each
occurrence of each pattern in the original text, and replace them with
their respective replacements”, without thinking about intermediate
buffer contents. In simple cases, this can be simulated by
parenthesizing each pattern, joining them all with a \|, and using the
resulting super-pattern to iterate through occurrences, then,
depending on which group matched, doing the replacement. (Harder cases
include those where patterns have capturing groups and/or
backreferences.)



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  4:36 Improve `replace-regexp-in-string' ergonomics? Lars Ingebrigtsen
  2021-09-22  5:22 ` Yuri Khan
@ 2021-09-22  5:24 ` Po Lu
  2021-09-22  6:37   ` Lars Ingebrigtsen
  2021-09-22  7:33 ` Adam Porter
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Po Lu @ 2021-09-22  5:24 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Which is somewhat more readable (but note that this totally breaks down
> if you want to mix in LITERAL etc).  But I wonder whether we should
> consider renaming the function to something more palatable, and since we
> have `string-replace', why not `regexp-replace'?  The length of the name
> of this common function is itself offputting.

In my Emacs configuration, there is already a function named
`regexp-replace', that has about 45 callers.

I don't know what it does, and it would be nice to not have to find out.

How about an additional argument to `string-replace', that when non-nil
causes string-replace to behave like replace-regexp-in-string?



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  5:22 ` Yuri Khan
@ 2021-09-22  6:36   ` Lars Ingebrigtsen
  2021-09-22  7:47   ` Thierry Volpiatto
  1 sibling, 0 replies; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-22  6:36 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Emacs developers

Yuri Khan <yuri.v.khan@gmail.com> writes:

> However, I have a hypothesis that what users really want is “I have
> this list of pattern/replacement pairs, and I want to go through each
> occurrence of each pattern in the original text, and replace them with
> their respective replacements”, without thinking about intermediate
> buffer contents.

That's a good point.  That points towards a new `regexp-replace'
function that takes all the replacement pairs and does the right thing
here instead of chaining independent calls.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  5:24 ` Po Lu
@ 2021-09-22  6:37   ` Lars Ingebrigtsen
  2021-09-22 10:56     ` Po Lu
  0 siblings, 1 reply; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-22  6:37 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

Po Lu <luangruo@yahoo.com> writes:

> In my Emacs configuration, there is already a function named
> `regexp-replace', that has about 45 callers.

Do you know what package that comes from?

> How about an additional argument to `string-replace', that when non-nil
> causes string-replace to behave like replace-regexp-in-string?

No, that's not good language design.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  4:36 Improve `replace-regexp-in-string' ergonomics? Lars Ingebrigtsen
  2021-09-22  5:22 ` Yuri Khan
  2021-09-22  5:24 ` Po Lu
@ 2021-09-22  7:33 ` Adam Porter
  2021-09-22  8:09   ` Lars Ingebrigtsen
  2021-09-22  7:51 ` Andreas Schwab
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Adam Porter @ 2021-09-22  7:33 UTC (permalink / raw)
  To: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> `replace-regexp-in-string' often leads to pretty awkward code.  I
> wonder whether we could improve it somehow.

That would be appreciated.  :)

> Here's a real life example:
>
> (defun org-babel-js-read (results)
> [...]
>        (org-babel-read
>         (concat "'"
>                 (replace-regexp-in-string
>                  "\\[" "(" (replace-regexp-in-string
>                             "\\]" ")" (replace-regexp-in-string
>                                        ",[[:space:]]" " "
> 				       (replace-regexp-in-string
> 					"'" "\"" results))))))
>
> That's kinda hard to read, but variations on this is pretty common.
> When you have one `replace-regexp-in-string', you often have another.
>
> We introduced `thread-last' in 2014, and there seems to be one (1) place
> in the Emacs code base, so I guess that didn't take off, but rewriting
> with that,

It doesn't seem that `thread-last' is very popular among Elispers, but
more among Clojurists (e.g. it's also implemented in dash.el as `->>').
But I've found it very useful in some cases, and I'm using `thread-last'
more often, trying to avoid adding dependencies on dash.el unless
necessary.

> we get:
>
>        (org-babel-read
>         (concat "'"
> 		(thread-last
> 		  results
> 		  (replace-regexp-in-string "'" "\"")
> 		  (replace-regexp-in-string ",[[:space:]]" " ")
> 		  (replace-regexp-in-string "\\]" ")")
>                   (replace-regexp-in-string "\\[" "("))))
>
> Which is somewhat more readable (but note that this totally breaks
> down if you want to mix in LITERAL etc). ... The length of the name of
> this common function is itself offputting.

Agreed, the name seems too long, and the function's signature is awkward
(I always have to check the argument list when I use it).  Most of the
time, I don't want to replace with automatic case matching, nor do I
want to substitute the original matched text, so I have to add the
FIXEDCASE argument, and then carefully re-read the docstring for LITERAL
and decide whether I need it, too.

The SUBEXP argument, I'm not so sure about.  Having it at the end would
break threading.  Having it after the replacement would mean having a
"nil" much of the time, which wouldn't be as pretty.  I suppose the
third argument could be either a SUBEXP or the string, and if a SUBEXP,
an optional fourth argument could be the string?  But since these are
likely called often and in loops, I suppose that might be undesirable.

> But I wonder whether we should consider renaming the function to
> something more palatable, and since we have `string-replace', why not
> `regexp-replace'?

Sounds good to me.  (Since it's also string-related, a
`string-replace-regexp' alias might be warranted, but I don't want to
get too bikesheddy now.)

>        (org-babel-read
>         (concat "'"
> 		(thread-last
> 		  results
> 		  (regexp-replace "'" "\"")
> 		  (regexp-replace ",[[:space:]]" " ")
> 		  (regexp-replace "\\]" ")")
>                   (regexp-replace "\\[" "("))))
>
> We could also consider making `regexp-replace' take a series of pairs,
> since this is so common.  Like:
>
>        (org-babel-read
>         (concat "'"
> 		(regexp-replace "'" "\""
> 				",[[:space:]]" " "
> 				"\\]" ")"
> 				"\\[" "("
> 				results)))
>
> Or some variation thereupon with some more ()s to group pairs.

It is common, but IMHO, it would be better to use a separate function
for that case, e.g. maybe `regexp-replace-pairs'.  (Alternatively,
`pcase-dolist' makes it easy to call a function in a loop with paired
arguments, so maybe it's not really needed.)

> The most popular way to deal with the awkwardness is to just give up and
> go all imperative:
>
> (defun authors-canonical-author-name (author file pos)
> [...]
>   (when author
>     (setq author (replace-regexp-in-string "[ \t]*[(<].*$" "" author))
>     (setq author (replace-regexp-in-string "\\`[ \t]+" "" author))
>     (setq author (replace-regexp-in-string "[ \t]+$" "" author))
>     (setq author (replace-regexp-in-string "[ \t]+" " " author))
>
> Which leads me to my other point -- about a quarter of the usages of the
> function in Emacs core has "" as the replacement, so perhaps that should
> have its own function?  `regexp-remove'?

IMHO, I'd lean toward not adding this unless it's really needed, but I
won't opine too strongly on it.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  5:22 ` Yuri Khan
  2021-09-22  6:36   ` Lars Ingebrigtsen
@ 2021-09-22  7:47   ` Thierry Volpiatto
  1 sibling, 0 replies; 41+ messages in thread
From: Thierry Volpiatto @ 2021-09-22  7:47 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Lars Ingebrigtsen, Emacs developers

Yuri Khan <yuri.v.khan@gmail.com> writes:

> However, I have a hypothesis that what users really want is “I have
> this list of pattern/replacement pairs, and I want to go through each
> occurrence of each pattern in the original text, and replace them with
> their respective replacements”, without thinking about intermediate
> buffer contents. In simple cases, this can be simulated by
> parenthesizing each pattern, joining them all with a \|, and using the
> resulting super-pattern to iterate through occurrences, then,
> depending on which group matched, doing the replacement. (Harder cases
> include those where patterns have capturing groups and/or
> backreferences.)

Instead of using a list like 
(pattern replacement pattern replacement etc...) 
perhaps a list like (pattern subexp replacement ...) then you can loop
in this list doing the replacement with those 3 elements, something like
this:

    (defun tv/replace-regexp-in-string (regexps string &optional fixedcase literal)
      (cl-assert (zerop (% (length regexps) 3)))
      (cl-loop with str = string
               for (m s r) on regexps by 'cdddr
               do (setq str (replace-regexp-in-string
                             m r str
                             fixedcase literal s))
               finally return str))

With previous example it give:

    (let ((results "['foo', 'bar', 'baz']"))
      (tv/replace-regexp-in-string
              '("\\["          0 "("
                "\\]"          0 ")"
                ",[[:space:]]" 0 " "
                "'"            0 "\"")
              results))
    "(\"foo\" \"bar\" \"baz\")"


-- 
Thierry



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  4:36 Improve `replace-regexp-in-string' ergonomics? Lars Ingebrigtsen
                   ` (2 preceding siblings ...)
  2021-09-22  7:33 ` Adam Porter
@ 2021-09-22  7:51 ` Andreas Schwab
  2021-09-22  8:14 ` Augusto Stoffel
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 41+ messages in thread
From: Andreas Schwab @ 2021-09-22  7:51 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

On Sep 22 2021, Lars Ingebrigtsen wrote:

> Here's a real life example:
>
> (defun org-babel-js-read (results)
> [...]
>        (org-babel-read
>         (concat "'"
>                 (replace-regexp-in-string
>                  "\\[" "(" (replace-regexp-in-string
>                             "\\]" ")" (replace-regexp-in-string
>                                        ",[[:space:]]" " "
> 				       (replace-regexp-in-string
> 					"'" "\"" results))))))

       (org-babel-read
        (let* ((t1 (replace-regexp-in-string "'" "\"" results))
               (t2 (replace-regexp-in-string ",[[:space:]]" " " t1))
	       (t3 (replace-regexp-in-string "\\]" ")" t2))
	       (results (replace-regexp-in-string "\\[" "(" t3)))
	  (concat "'" results)))

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  7:33 ` Adam Porter
@ 2021-09-22  8:09   ` Lars Ingebrigtsen
  0 siblings, 0 replies; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-22  8:09 UTC (permalink / raw)
  To: Adam Porter; +Cc: emacs-devel

Adam Porter <adam@alphapapa.net> writes:

> Agreed, the name seems too long, and the function's signature is awkward
> (I always have to check the argument list when I use it).  Most of the
> time, I don't want to replace with automatic case matching, nor do I
> want to substitute the original matched text, so I have to add the
> FIXEDCASE argument, and then carefully re-read the docstring for LITERAL
> and decide whether I need it, too.

Functions in Emacs are generally on the DWIM side, so we could make a
new `regexp-replace' function be more well-defined.  That is, it could
be always-FIXEDCASE and never LITERAL.

> The SUBEXP argument, I'm not so sure about.

Yeah, that one is actually useful.  It's used in 5% of the cases in the
Emacs tree...

> Having it at the end would break threading.  Having it after the
> replacement would mean having a "nil" much of the time, which wouldn't
> be as pretty.  I suppose the third argument could be either a SUBEXP
> or the string, and if a SUBEXP, an optional fourth argument could be
> the string?  But since these are likely called often and in loops, I
> suppose that might be undesirable.

The replacement could be a cons like ("foo" . 2) to only replace subexp
2, perhaps.  If we allow a bunch of regexp/replacement pairs, then I
guess that'd be necessary anyway.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  4:36 Improve `replace-regexp-in-string' ergonomics? Lars Ingebrigtsen
                   ` (3 preceding siblings ...)
  2021-09-22  7:51 ` Andreas Schwab
@ 2021-09-22  8:14 ` Augusto Stoffel
  2021-09-22  8:21   ` Adam Porter
  2021-09-22 20:06   ` Lars Ingebrigtsen
  2021-09-22 10:59 ` Dmitry Gutov
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 41+ messages in thread
From: Augusto Stoffel @ 2021-09-22  8:14 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

On Wed, 22 Sep 2021 at 06:36, Lars Ingebrigtsen <larsi@gnus.org> wrote:

> We introduced `thread-last' in 2014, and there seems to be one (1) place
> in the Emacs code base, so I guess that didn't take off, but rewriting
> with that, we get:
>
>        (org-babel-read
>         (concat "'"
> 		(thread-last
> 		  results
> 		  (replace-regexp-in-string "'" "\"")
> 		  (replace-regexp-in-string ",[[:space:]]" " ")
> 		  (replace-regexp-in-string "\\]" ")")
>                   (replace-regexp-in-string "\\[" "("))))
>
> Which is somewhat more readable (but note that this totally breaks down
> if you want to mix in LITERAL etc).

Clojure has a moderately useful named variant of the threading macro for
this kind of situation:

        (as-> expr name & forms)

    Binds name to expr, evaluates the first form in the lexical context
    of that binding, then binds name to that result, repeating for each
    successive form, returning the result of the last form.

(I guess I would have chosen the opposite order of the 'expr' and 'name'
arguments, though.)



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  8:14 ` Augusto Stoffel
@ 2021-09-22  8:21   ` Adam Porter
  2021-09-22 18:01     ` Stefan Monnier
  2021-09-22 20:06   ` Lars Ingebrigtsen
  1 sibling, 1 reply; 41+ messages in thread
From: Adam Porter @ 2021-09-22  8:21 UTC (permalink / raw)
  To: emacs-devel

Augusto Stoffel <arstoffel@gmail.com> writes:

> Clojure has a moderately useful named variant of the threading macro for
> this kind of situation:
>
>         (as-> expr name & forms)
>
>     Binds name to expr, evaluates the first form in the lexical context
>     of that binding, then binds name to that result, repeating for each
>     successive form, returning the result of the last form.
>
> (I guess I would have chosen the opposite order of the 'expr' and 'name'
> arguments, though.)

Agreed, I'd be happy to have a `thread-as' macro, used like:

  (thread-as NAME EXPR &rest FORMS)

I considered proposing one recently, but it seems like anaphoric macros
aren't popular around here.  :)  Sometimes they really are the best way
to simplify code, since not all functions work with `thread-last' or
`thread-first'.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  6:37   ` Lars Ingebrigtsen
@ 2021-09-22 10:56     ` Po Lu
  2021-09-22 20:08       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 41+ messages in thread
From: Po Lu @ 2021-09-22 10:56 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Do you know what package that comes from?

It appears to come from a file named 'BlowDir.el'.

This is at the top of the file:
;; BlowDir (make dired a Blow)
;; THIS DOES NOT WORK WITH VIP! PLEASE ESPECIALLY MAKE SURE
;; TERM-SETUP-HOOK DOES NOT CONTAIN VIP-MODE OR BAD THINGS WILL HAPPEN.
;; Copyright (C) 1996 Jeffery Wen

;; BlowDir is free software; you can redistribute it and/or modify it
;; under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 2, or (at your option)
;; any later version.

;; BlowDir is distributed in the hope that it will be useful, but
;; WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with this program; there should be a file named COPYING.  If
;; not, write to the Free Software Foundation, Inc., 59 Temple Place -
;; Suite 330, Boston, MA 02111-1307, USA.

It has no other commentary, and annoyingly enough, no docstrings either.

I don't know where it came from, or how it made its way into my setup,
but it seems to implement at least 2 commands that I use in Dired on a
regular basis: goto-next-dangling-link and correct-dangling-links-by-pattern.

Does anyone know what this is?



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  4:36 Improve `replace-regexp-in-string' ergonomics? Lars Ingebrigtsen
                   ` (4 preceding siblings ...)
  2021-09-22  8:14 ` Augusto Stoffel
@ 2021-09-22 10:59 ` Dmitry Gutov
  2021-09-22 20:18   ` Lars Ingebrigtsen
  2021-09-22 18:14 ` Stefan Monnier
  2021-10-05 16:18 ` Juri Linkov
  7 siblings, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2021-09-22 10:59 UTC (permalink / raw)
  To: Lars Ingebrigtsen, emacs-devel

On 22.09.2021 07:36, Lars Ingebrigtsen wrote:
> But I wonder whether we should
> consider renaming the function to something more palatable, and since we
> have `string-replace', why not `regexp-replace'?  The length of the name
> of this common function is itself offputting.
> 
>         (org-babel-read
>          (concat "'"
> 		(thread-last
> 		  results
> 		  (regexp-replace "'" "\"")
> 		  (regexp-replace ",[[:space:]]" " ")
> 		  (regexp-replace "\\]" ")")
>                    (regexp-replace "\\[" "("))))

This way makes it impossible to use any optional arguments, right?

But if we target thread-first instead and make the new function accept 
STRING in the first position, all optional arguments would be still 
available.

> We could also consider making `regexp-replace' take a series of pairs,
> since this is so common.  Like:
> 
>         (org-babel-read
>          (concat "'"
> 		(regexp-replace "'" "\""
> 				",[[:space:]]" " "
> 				"\\]" ")"
> 				"\\[" "("
> 				results)))
> 
> Or some variation thereupon with some more ()s to group pairs.

I'm not sure how to also make it accept "normal" convention, and we 
probably don't want to always have to wrap the args in an alist, even 
when only one replacement is needed.

> The most popular way to deal with the awkwardness is to just give up and
> go all imperative:
> 
> (defun authors-canonical-author-name (author file pos)
> [...]
>    (when author
>      (setq author (replace-regexp-in-string "[ \t]*[(<].*$" "" author))
>      (setq author (replace-regexp-in-string "\\`[ \t]+" "" author))
>      (setq author (replace-regexp-in-string "[ \t]+$" "" author))
>      (setq author (replace-regexp-in-string "[ \t]+" " " author))
> 
> Which leads me to my other point -- about a quarter of the usages of the
> function in Emacs core has "" as the replacement, so perhaps that should
> have its own function?  `regexp-remove'?
> 
> Then that could be:
> 
>    (when author
>      (setq author (regexp-remove "[ \t]*[(<].*$" author))
>      (setq author (regexp-remove "\\`[ \t]+" author))
>      (setq author (regexp-remove "[ \t]+$" author))
>      (setq author (regexp-replace "[ \t]+" " " author))

IDK, if that leads to no increase in efficiency, then probably not? 
Replacing with "" is an established pattern by now.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  8:21   ` Adam Porter
@ 2021-09-22 18:01     ` Stefan Monnier
  2021-09-22 18:24       ` Basil L. Contovounesios
  2021-09-22 22:56       ` Adam Porter
  0 siblings, 2 replies; 41+ messages in thread
From: Stefan Monnier @ 2021-09-22 18:01 UTC (permalink / raw)
  To: Adam Porter; +Cc: emacs-devel

>   (thread-as NAME EXPR &rest FORMS)
>
> I considered proposing one recently, but it seems like anaphoric macros
> aren't popular around here.  :)

I thought "anaphoric" is used for macros where you don't specify the
name of the variable (it's typically the hardcoded `it`), whereas IIUC
here you do specify it.  I don't like the usual anaphoric macros, but if

    (thread-as FOO x
      (bar x 6)
      ...
      (toto 45 x))

turns into

    (let* ((x FOO)
           (x (bar x 6))
           (x ...))
      (toto 45 x))

then I wouldn't consider that as "anaphoric" and so wouldn't stay away
from it.


        Stefan




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  4:36 Improve `replace-regexp-in-string' ergonomics? Lars Ingebrigtsen
                   ` (5 preceding siblings ...)
  2021-09-22 10:59 ` Dmitry Gutov
@ 2021-09-22 18:14 ` Stefan Monnier
  2021-09-22 19:30   ` Mattias Engdegård
  2021-09-22 20:22   ` Lars Ingebrigtsen
  2021-10-05 16:18 ` Juri Linkov
  7 siblings, 2 replies; 41+ messages in thread
From: Stefan Monnier @ 2021-09-22 18:14 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

>                 (replace-regexp-in-string
>                  "\\[" "(" (replace-regexp-in-string
>                             "\\]" ")" (replace-regexp-in-string
>                                        ",[[:space:]]" " "
> 				       (replace-regexp-in-string
> 					"'" "\"" results))))))

To me "the right way" is to pass a function as the replacement.
I'm not sure the way REP currently works in `replace-regexp-in-string`
is the best we can do, but it works:

    (replace-regexp-in-string
     "[][']\\|,[[:space:]]\\(\\)"
     (lambda (s)
       (if (match-end 1) " "
         (pcase-exhaustive (aref s 0)
           (?\[ "(")
           (?\] ")")
           (?\' "\""))))
     results)

It'd be nice to have a front-end that lets you write a kind of lex-like
set of rules, but I think it'd be important for that front-end to allow
*computing* the replacement rather than only selecting it based on the
matched regexp.  Something like

    (foo-replace results
      ("\\[" "(")
      ("\\]" ")")
      ("[[:lower:]]+" (upcase (match-string 0)))
      ("siglo \\([0-9]+\\)"
       (integer-to-roman (string-to-number (match-string 1))))
      ...)


-- Stefan




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 18:01     ` Stefan Monnier
@ 2021-09-22 18:24       ` Basil L. Contovounesios
  2021-09-22 22:56       ` Adam Porter
  1 sibling, 0 replies; 41+ messages in thread
From: Basil L. Contovounesios @ 2021-09-22 18:24 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Adam Porter, emacs-devel

Stefan Monnier [2021-09-22 14:01 -0400] wrote:

>>   (thread-as NAME EXPR &rest FORMS)
>>
>> I considered proposing one recently, but it seems like anaphoric macros
>> aren't popular around here.  :)
>
> I thought "anaphoric" is used for macros where you don't specify the
> name of the variable (it's typically the hardcoded `it`),

Someone must have confused the words "anaphora" and "anathema" ;).

> whereas IIUC
> here you do specify it.  I don't like the usual anaphoric macros, but if
>
>     (thread-as FOO x
>       (bar x 6)
>       ...
>       (toto 45 x))
>
> turns into
>
>     (let* ((x FOO)
>            (x (bar x 6))
>            (x ...))
>       (toto 45 x))
>
> then I wouldn't consider that as "anaphoric" and so wouldn't stay away
> from it.

That would be a lot more hygienic,

-- 
Basil



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 18:14 ` Stefan Monnier
@ 2021-09-22 19:30   ` Mattias Engdegård
  2021-09-22 20:22   ` Lars Ingebrigtsen
  1 sibling, 0 replies; 41+ messages in thread
From: Mattias Engdegård @ 2021-09-22 19:30 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Lars Ingebrigtsen, emacs-devel

22 sep. 2021 kl. 20.14 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> It'd be nice to have a front-end that lets you write a kind of lex-like
> set of rules

In particular since sequential substitution (replace A with A', then B with B', ...) is often very error-prone: it's order-sensitive, and sometimes there is no correct order at all.

People use chains of replace-regexp-in-string because it's simple and it's there but when a simultaneous replacement is called for then we should provide such a construct, not make it easier to do the wrong thing.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  8:14 ` Augusto Stoffel
  2021-09-22  8:21   ` Adam Porter
@ 2021-09-22 20:06   ` Lars Ingebrigtsen
  1 sibling, 0 replies; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-22 20:06 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: emacs-devel

Augusto Stoffel <arstoffel@gmail.com> writes:

> Clojure has a moderately useful named variant of the threading macro for
> this kind of situation:
>
>         (as-> expr name & forms)
>
>     Binds name to expr, evaluates the first form in the lexical context
>     of that binding, then binds name to that result, repeating for each
>     successive form, returning the result of the last form.
>
> (I guess I would have chosen the opposite order of the 'expr' and 'name'
> arguments, though.)

Yeah, you really need something like that to actually do threading in a
Lispish language.  But it feels very hackish indeed.  Example from the
man page:

(as-> owners $ (nth $ 0) (:pets $) (deref $) ($ 1) ($ :type))

It's like...  revenge of the Perl.

Threading works a lot better in languages that are designed around it --
i.e., languages that have a consistent parameter placement, and doesn't
have optional parameters.  (Instead they introduce new functions when a
Lispish language introduces an optional parameter.)

Threading feels more like a neat hack in Lispish languages than actually
useful.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 10:56     ` Po Lu
@ 2021-09-22 20:08       ` Lars Ingebrigtsen
  2021-09-23  0:11         ` Po Lu
  0 siblings, 1 reply; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-22 20:08 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

Po Lu <luangruo@yahoo.com> writes:

>> Do you know what package that comes from?
>
> It appears to come from a file named 'BlowDir.el'.

Well, then I don't think we should feel constrained by its existence
when choosing function names for Emacs core.  (Not that I think that
`regexp-replace' is a good name -- but it's just hard to come up with
something that's short, not awkward and hasn't already been taken.
`re-replace'?  `string-replace-re'?  `string-replace-regexp'?
`string-re'?  Naming is hard.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 10:59 ` Dmitry Gutov
@ 2021-09-22 20:18   ` Lars Ingebrigtsen
  2021-09-22 22:23     ` Dmitry Gutov
  0 siblings, 1 reply; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-22 20:18 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

Dmitry Gutov <dgutov@yandex.ru> writes:

> This way makes it impossible to use any optional arguments, right?

Indeed.

> But if we target thread-first instead and make the new function accept
> STRING in the first position, all optional arguments would be still
> available.

Yes, I've always found it weird that these functions have the object to
be worked upon as the last non-optional parameter.  I had to look it up
for years when using `replace-regexp-in-string'.  And it didn't help
that Emacs took this function from XEmacs, which had the string in a
different position...  But I don't remember where...

*Lars says "apt install xemacs21"*

I misremembered:

`replace-in-string' is a compiled Lisp function 
  -- loaded from "/build/xemacs21-rcHAYB/xemacs21-21.4.24/lisp/subr.elc" 
(replace-in-string STR REGEXP NEWTEXT &optional LITERAL) 

So it has the placement of STRING that seems logical, I think.

On the other hand, changing the placement in a new function like this
will probably be even more confusing.

>> 		(regexp-replace "'" "\""
>> 				",[[:space:]]" " "
>> 				"\\]" ")"
>> 				"\\[" "("
>> 				results)))
>> Or some variation thereupon with some more ()s to group pairs.
>
> I'm not sure how to also make it accept "normal" convention, and we
> probably don't want to always have to wrap the args in an alist, even
> when only one replacement is needed.

No, that's the problem.  We could hack it up by doing a &rest in
reality, and then checking if the first parameter is a list, but yuck.

>>      (setq author (regexp-remove "[ \t]*[(<].*$" author))
>>      (setq author (regexp-remove "\\`[ \t]+" author))
>>      (setq author (regexp-remove "[ \t]+$" author))
>>      (setq author (regexp-replace "[ \t]+" " " author))
>
> IDK, if that leads to no increase in efficiency, then probably not?
> Replacing with "" is an established pattern by now.

It helps with readability -- the function says what the intention is.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 18:14 ` Stefan Monnier
  2021-09-22 19:30   ` Mattias Engdegård
@ 2021-09-22 20:22   ` Lars Ingebrigtsen
  2021-09-22 20:29     ` Lars Ingebrigtsen
  2021-09-23  2:15     ` Stefan Monnier
  1 sibling, 2 replies; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-22 20:22 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> It'd be nice to have a front-end that lets you write a kind of lex-like
> set of rules, but I think it'd be important for that front-end to allow
> *computing* the replacement rather than only selecting it based on the
> matched regexp.  Something like
>
>     (foo-replace results
>       ("\\[" "(")
>       ("\\]" ")")
>       ("[[:lower:]]+" (upcase (match-string 0)))
>       ("siglo \\([0-9]+\\)"
>        (integer-to-roman (string-to-number (match-string 1))))
>       ...)

Perfect.  That looks like a really nice function signature to me.
Perhaps without the parentheses if there's only one replacement to be
done.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 20:22   ` Lars Ingebrigtsen
@ 2021-09-22 20:29     ` Lars Ingebrigtsen
  2021-09-23  2:15     ` Stefan Monnier
  1 sibling, 0 replies; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-22 20:29 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

>>     (foo-replace results
>>       ("\\[" "(")
>>       ("\\]" ")")
>>       ("[[:lower:]]+" (upcase (match-string 0)))
>>       ("siglo \\([0-9]+\\)"
>>        (integer-to-roman (string-to-number (match-string 1))))
>>       ...)
>
> Perfect.  That looks like a really nice function signature to me.
> Perhaps without the parentheses if there's only one replacement to be
> done.

(And there could be an optional third element in the replacement lists
to do the SUBEXP action.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 20:18   ` Lars Ingebrigtsen
@ 2021-09-22 22:23     ` Dmitry Gutov
  2021-09-22 23:24       ` [External] : " Drew Adams
  0 siblings, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2021-09-22 22:23 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

On 22.09.2021 23:18, Lars Ingebrigtsen wrote:

>> But if we target thread-first instead and make the new function accept
>> STRING in the first position, all optional arguments would be still
>> available.
> 
> Yes, I've always found it weird that these functions have the object to
> be worked upon as the last non-optional parameter.  I had to look it up
> for years when using `replace-regexp-in-string'.  And it didn't help
> that Emacs took this function from XEmacs, which had the string in a
> different position...  But I don't remember where...
> 
> *Lars says "apt install xemacs21"*
> 
> I misremembered:
> 
> `replace-in-string' is a compiled Lisp function
>    -- loaded from "/build/xemacs21-rcHAYB/xemacs21-21.4.24/lisp/subr.elc"
> (replace-in-string STR REGEXP NEWTEXT &optional LITERAL)
> 
> So it has the placement of STRING that seems logical, I think.
> 
> On the other hand, changing the placement in a new function like this
> will probably be even more confusing.

Adding a new function is the only time we *can* change the arguments 
order. If we subsequently obsolete the current function, it could fly.

It's not the wildest among the alternatives anyway -- the idea about the 
argument being a list takes the first place, I think. And either could 
work, ultimately.

If we want to be able to use threading macros more consistently, it 
seems functions should expect the "main" argument in either the first or 
the last position, across the standard library. Or at least portions of it.

For example, in Clojure:

   By convention, core functions that operate on sequences expect the
   sequence as their last argument. Accordingly, pipelines containing
   map, filter, remove, reduce, into, etc usually call for the ->> macro.

   Core functions that operate on data structures, on the other hand,
   expect the value they work on as their first argument. These include
   assoc, update, dissoc, get and their -in variants. Pipelines that
   transform maps using these functions often require the -> macro.

(https://clojure.org/guides/threading_macros)

It seems to me, with penchant for optional arguments, it's generally 
harder to put the "main" argument into the last position in our case. I 
could be wrong, though. But STRING being in neither first or last 
position makes threading macro decidedly less useful.

>>> 		(regexp-replace "'" "\""
>>> 				",[[:space:]]" " "
>>> 				"\\]" ")"
>>> 				"\\[" "("
>>> 				results)))
>>> Or some variation thereupon with some more ()s to group pairs.
>>
>> I'm not sure how to also make it accept "normal" convention, and we
>> probably don't want to always have to wrap the args in an alist, even
>> when only one replacement is needed.
> 
> No, that's the problem.  We could hack it up by doing a &rest in
> reality, and then checking if the first parameter is a list, but yuck.

Probably check that the number of &rest arguments divides by two as 
well. Or three, or four? FIXEDCASE, LITERAL and SUBEXP could apply to a 
single replacement. At best, it will create an ambiguity (do those args 
apply to all steps, or do I need to repeat them?), but at worst it can 
limit the applicability of the approach (when steps need different 
values of these). Threading solves it.

>>>       (setq author (regexp-remove "[ \t]*[(<].*$" author))
>>>       (setq author (regexp-remove "\\`[ \t]+" author))
>>>       (setq author (regexp-remove "[ \t]+$" author))
>>>       (setq author (regexp-replace "[ \t]+" " " author))
>>
>> IDK, if that leads to no increase in efficiency, then probably not?
>> Replacing with "" is an established pattern by now.
> 
> It helps with readability -- the function says what the intention is.

True. I'm not sold, though.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 18:01     ` Stefan Monnier
  2021-09-22 18:24       ` Basil L. Contovounesios
@ 2021-09-22 22:56       ` Adam Porter
  2021-09-22 23:53         ` Eric Abrahamsen
  1 sibling, 1 reply; 41+ messages in thread
From: Adam Porter @ 2021-09-22 22:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On Wed, Sep 22, 2021 at 1:02 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> >   (thread-as NAME EXPR &rest FORMS)
> >
> > I considered proposing one recently, but it seems like anaphoric macros
> > aren't popular around here.  :)
>
> I thought "anaphoric" is used for macros where you don't specify the
> name of the variable (it's typically the hardcoded `it`), whereas IIUC
> here you do specify it.  I don't like the usual anaphoric macros, but if
>
>     (thread-as FOO x
>       (bar x 6)
>       ...
>       (toto 45 x))
>
> turns into
>
>     (let* ((x FOO)
>            (x (bar x 6))
>            (x ...))
>       (toto 45 x))
>
> then I wouldn't consider that as "anaphoric" and so wouldn't stay away
> from it.

Maybe we need a new name for that kind of macro, then.  Euphemistic?  :)



^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [External] : Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 22:23     ` Dmitry Gutov
@ 2021-09-22 23:24       ` Drew Adams
  0 siblings, 0 replies; 41+ messages in thread
From: Drew Adams @ 2021-09-22 23:24 UTC (permalink / raw)
  To: Dmitry Gutov, Lars Ingebrigtsen; +Cc: emacs-devel@gnu.org

> Probably check that the number of &rest arguments divides by two as
> well. Or three, or four? 

Giant apologies for not following this thread,
especially if this comment makes no sense in
context.
___

My comment is that &rest is best used for an
_arbitrary_ set (list) of unknown length and
_unknown structure_.  Among other things,
that allows for adding more, possibly
unrelated args later.

If instead you hard-code in the body an
expectation that the "rest" of the args will
be of a certain kind, in a certain order, or
in a certain kind of sequence (e.g. mod 2,
mod 3, with given meanings to each of the
pair or triplet items), then you lose this
property of &rest - which is an integral
part of its raison d'etre, IMO.

If you know the structure you want, and you
just want to allow it to have indeterminate
length, then use a single list arg (required
or optional), and specify in the doc string,
and control in the body, the actual form of
the list elements (e.g. triplets of a given
form).

Abusing &rest the way I think you might be
suggesting will ultimately bite you.  And
the more you do it, the  more it will bite
you elsewhere.  Users won't know what to
expect, or they'll have unreasonable
expectations.

Just one opinion.  Again, sorry if not on
the real topic.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 22:56       ` Adam Porter
@ 2021-09-22 23:53         ` Eric Abrahamsen
  0 siblings, 0 replies; 41+ messages in thread
From: Eric Abrahamsen @ 2021-09-22 23:53 UTC (permalink / raw)
  To: Adam Porter; +Cc: Stefan Monnier, emacs-devel

Adam Porter <adam@alphapapa.net> writes:

> On Wed, Sep 22, 2021 at 1:02 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>>
>> >   (thread-as NAME EXPR &rest FORMS)
>> >
>> > I considered proposing one recently, but it seems like anaphoric macros
>> > aren't popular around here.  :)
>>
>> I thought "anaphoric" is used for macros where you don't specify the
>> name of the variable (it's typically the hardcoded `it`), whereas IIUC
>> here you do specify it.  I don't like the usual anaphoric macros, but if
>>
>>     (thread-as FOO x
>>       (bar x 6)
>>       ...
>>       (toto 45 x))
>>
>> turns into
>>
>>     (let* ((x FOO)
>>            (x (bar x 6))
>>            (x ...))
>>       (toto 45 x))
>>
>> then I wouldn't consider that as "anaphoric" and so wouldn't stay away
>> from it.
>
> Maybe we need a new name for that kind of macro, then.  Euphemistic?  :)

Eponymous!



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 20:08       ` Lars Ingebrigtsen
@ 2021-09-23  0:11         ` Po Lu
  0 siblings, 0 replies; 41+ messages in thread
From: Po Lu @ 2021-09-23  0:11 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Well, then I don't think we should feel constrained by its existence
> when choosing function names for Emacs core.  (Not that I think that
> `regexp-replace' is a good name -- but it's just hard to come up with
> something that's short, not awkward and hasn't already been taken.
> `re-replace'?  `string-replace-re'?  `string-replace-regexp'?
> `string-re'?  Naming is hard.)

Fair enough, I'd also say  `re-replace' LGTM.

Thanks.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22 20:22   ` Lars Ingebrigtsen
  2021-09-22 20:29     ` Lars Ingebrigtsen
@ 2021-09-23  2:15     ` Stefan Monnier
  1 sibling, 0 replies; 41+ messages in thread
From: Stefan Monnier @ 2021-09-23  2:15 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

>> It'd be nice to have a front-end that lets you write a kind of lex-like
>> set of rules, but I think it'd be important for that front-end to allow
>> *computing* the replacement rather than only selecting it based on the
>> matched regexp.  Something like
>>
>>     (foo-replace results
>>       ("\\[" "(")
>>       ("\\]" ")")
>>       ("[[:lower:]]+" (upcase (match-string 0)))
>>       ("siglo \\([0-9]+\\)"
>>        (integer-to-roman (string-to-number (match-string 1))))
>>       ...)
>
> Perfect.  That looks like a really nice function signature to me.
> Perhaps without the parentheses if there's only one replacement to be
> done.

If someone's interested, `syntax-propertize-rules` solves a similar
problem, so some of its code can be reused for that.


        Stefan




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-09-22  4:36 Improve `replace-regexp-in-string' ergonomics? Lars Ingebrigtsen
                   ` (6 preceding siblings ...)
  2021-09-22 18:14 ` Stefan Monnier
@ 2021-10-05 16:18 ` Juri Linkov
  2021-10-12  6:53   ` Juri Linkov
  7 siblings, 1 reply; 41+ messages in thread
From: Juri Linkov @ 2021-10-05 16:18 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

>        (org-babel-read
>         (concat "'"
>               (thread-last
>                 results
>                 (regexp-replace "'" "\"")
>                 (regexp-replace ",[[:space:]]" " ")
>                 (regexp-replace "\\]" ")")
>                 (regexp-replace "\\[" "("))))
>
> We could also consider making `regexp-replace' take a series of pairs,
> since this is so common.  Like:
>
>        (org-babel-read
>         (concat "'"
>               (regexp-replace "'" "\""
>                               ",[[:space:]]" " "
>                               "\\]" ")"
>                               "\\[" "("
>                               results)))

When the user needs to apply more replacements to the results of previous
replacements, then 'thread-last' could be used like in the first example above.
Otherwise, after adding replacement pairs as args of 'regexp-replace',
replacements should be applied "simultaneously".  E.g. in the new version
Julia 1.7: https://lwn.net/SubscriberLink/871486/e4ae97b79d72bb25/
and https://github.com/JuliaLang/julia/pull/40484
the existing function 'replace' has been enhanced to accept
any number of replacement patterns that are applied simultaneously
left to right, e.g.

  replace("abc", "c" => "Z", "Z" => "WWW")

returns "abZ", not "abWWW".



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-05 16:18 ` Juri Linkov
@ 2021-10-12  6:53   ` Juri Linkov
  2021-10-12 12:10     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 41+ messages in thread
From: Juri Linkov @ 2021-10-12  6:53 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

>> We could also consider making `regexp-replace' take a series of pairs,
>> since this is so common.  Like:
>>
>>        (org-babel-read
>>         (concat "'"
>>               (regexp-replace "'" "\""
>>                               ",[[:space:]]" " "
>>                               "\\]" ")"
>>                               "\\[" "("
>>                               results)))
>
> Otherwise, after adding replacement pairs as args of 'regexp-replace',
> replacements should be applied "simultaneously".

Such simultaneous replacement could be implemented the same way as
in mm-uu-configure that creates a composite regexp from all regexps:

(defun mm-uu-configure (&optional symbol value)
  (setq mm-uu-beginning-regexp
        (mapconcat #'mm-uu-beginning-regexp
	           (delq nil (mapcar
	                      (lambda (entry)
	                        (if (mm-uu-configure-p (mm-uu-type entry)
		                                       'disabled)
		                    nil entry))
		              mm-uu-type-alist))
		   "\\|")))

Then it's used like this:

      (while (re-search-forward mm-uu-beginning-regexp nil t)

In case of replacements, it should replace the found composite regexp
with the corresponding string from replacement pairs.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-12  6:53   ` Juri Linkov
@ 2021-10-12 12:10     ` Lars Ingebrigtsen
  2021-10-12 12:34       ` Stefan Monnier
  0 siblings, 1 reply; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-12 12:10 UTC (permalink / raw)
  To: Juri Linkov; +Cc: emacs-devel

Juri Linkov <juri@linkov.net> writes:

> Such simultaneous replacement could be implemented the same way as
> in mm-uu-configure that creates a composite regexp from all regexps:

Yup.  Or as Stefan M said, `syntax-propertize-rules' which does the
same, but tries to keep track of subregexp stuff, which the mm function
doesn't do at all (and we want to be able to work on subregexps).

But the reason I haven't started working on this yet is that I don't see
a way past the limit we have on nine back references, and it wouldn't be
cool to have such an arbitrary limit here.

Using one mega-regexp and then re-applying a smaller regexp to get at
the back references doesn't sound very appealing, either.  :-/

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-12 12:10     ` Lars Ingebrigtsen
@ 2021-10-12 12:34       ` Stefan Monnier
  2021-10-12 12:41         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 41+ messages in thread
From: Stefan Monnier @ 2021-10-12 12:34 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Juri Linkov, emacs-devel

> But the reason I haven't started working on this yet is that I don't see
> a way past the limit we have on nine back references, and it wouldn't be
> cool to have such an arbitrary limit here.

Basically (like in `syntax-propertize-rules`) you need to disallow backrefs.
The upside is that it gives the warm&fuzzy feeling that we could speed
it up with a DFA implementation ;-)

BTW, the current limit on backrefs is just due to a lack of syntax for
higher-numbered backrefs, so it would be easy to lift this limit: it's
a small matter of choosing an appropriate syntax and then adjusting the
code that parses the regexps in regexp.c.


        Stefan




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-12 12:34       ` Stefan Monnier
@ 2021-10-12 12:41         ` Lars Ingebrigtsen
  2021-10-12 13:18           ` Lars Ingebrigtsen
  2021-10-12 13:33           ` Thierry Volpiatto
  0 siblings, 2 replies; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-12 12:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, Juri Linkov

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> BTW, the current limit on backrefs is just due to a lack of syntax for
> higher-numbered backrefs, so it would be easy to lift this limit: it's
> a small matter of choosing an appropriate syntax and then adjusting the
> code that parses the regexps in regexp.c.

A small matter indeed.  :-/

But perhaps it would be worth doing -- arbitrary limitations like that
goes against everything GNU stands for!

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-12 12:41         ` Lars Ingebrigtsen
@ 2021-10-12 13:18           ` Lars Ingebrigtsen
  2021-10-12 13:32             ` Mattias Engdegård
  2021-10-12 15:48             ` Stefan Monnier
  2021-10-12 13:33           ` Thierry Volpiatto
  1 sibling, 2 replies; 41+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-12 13:18 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Juri Linkov, emacs-devel

But!  I totally thought that the limitation was an actual one on our
regexp libraries for groups, and not just a syntax thing.  So if we just
ignore the backref issue, then things should basically Just Work.

I.e.,

(new-and-spiffy-regexp-replace string
                               "\\(foo\\)" "bar\\1"
                               "\\(bar\\)" "foo\\1"
                               ...)

should work just fine, no matter how many of them you have -- you just
have to transpose the groups down to where they were again before doing
the replacement.  (I mean, after creating the mega-regexp and matching
with that.)

It will break down if the user has a lot of actual backrefs in the
matching regexps, but that's quite rare in practice.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-12 13:18           ` Lars Ingebrigtsen
@ 2021-10-12 13:32             ` Mattias Engdegård
  2021-10-12 15:48             ` Stefan Monnier
  1 sibling, 0 replies; 41+ messages in thread
From: Mattias Engdegård @ 2021-10-12 13:32 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel, Stefan Monnier, Juri Linkov

> you just
> have to transpose the groups down to where they were again before doing
> the replacement.

We don't even need transposing the match state at run time; we can just as well do the transposing in the replacement string templates at macro-expansion time. If those exceed the syntactic limits of group 9, then generate Lisp code as mentioned before. Don't do more at runtime than actually required.

Transposing the match state is only necessary when the user uses actual Lisp code for the replacement. Even that can be avoided by passing the group number offset as an argument to the user code -- I did that in one instance (although I recall Stefan didn't quite fancy it).




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-12 12:41         ` Lars Ingebrigtsen
  2021-10-12 13:18           ` Lars Ingebrigtsen
@ 2021-10-12 13:33           ` Thierry Volpiatto
  2021-10-12 19:16             ` Juri Linkov
  1 sibling, 1 reply; 41+ messages in thread
From: Thierry Volpiatto @ 2021-10-12 13:33 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Juri Linkov, Stefan Monnier, emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>> BTW, the current limit on backrefs is just due to a lack of syntax for
>> higher-numbered backrefs, so it would be easy to lift this limit: it's
>> a small matter of choosing an appropriate syntax and then adjusting the
>> code that parses the regexps in regexp.c.
>
> A small matter indeed.  :-/
>
> But perhaps it would be worth doing -- arbitrary limitations like that
> goes against everything GNU stands for!

What about providing a simple anaphoric `and`?

    (let ((bar "bar"))
      (helm-aand bar
                 (replace-regexp-in-string "b" "f" it)
                 (replace-regexp-in-string "a" "o" it)
                 (replace-regexp-in-string "r" "o" it)))
    =>"foo"

And in addition it can be used for something else.

-- 
Thierry



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-12 13:18           ` Lars Ingebrigtsen
  2021-10-12 13:32             ` Mattias Engdegård
@ 2021-10-12 15:48             ` Stefan Monnier
  1 sibling, 0 replies; 41+ messages in thread
From: Stefan Monnier @ 2021-10-12 15:48 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel, Juri Linkov

> (new-and-spiffy-regexp-replace string
>                                "\\(foo\\)" "bar\\1"
>                                "\\(bar\\)" "foo\\1"
>                                ...)
>
> should work just fine, no matter how many of them you have -- you just
> have to transpose the groups down to where they were again before doing
> the replacement.

That's right (and `syntax-propertize-rules` does indeed do that, tho
for the above example it would probably be better to do it differently).

> It will break down if the user has a lot of actual backrefs in the
> matching regexps, but that's quite rare in practice.

Indeed, tho IIRC the problem can occur even with just a single backref.


        Stefan




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-12 13:33           ` Thierry Volpiatto
@ 2021-10-12 19:16             ` Juri Linkov
  2021-10-12 20:44               ` Thierry Volpiatto
  0 siblings, 1 reply; 41+ messages in thread
From: Juri Linkov @ 2021-10-12 19:16 UTC (permalink / raw)
  To: Thierry Volpiatto; +Cc: Lars Ingebrigtsen, Stefan Monnier, emacs-devel

> What about providing a simple anaphoric `and`?
>
>     (let ((bar "bar"))
>       (helm-aand bar
>                  (replace-regexp-in-string "b" "f" it)
>                  (replace-regexp-in-string "a" "o" it)
>                  (replace-regexp-in-string "r" "o" it)))
>     =>"foo"

What does the following return?

    (let ((bar "bar"))
      (helm-aand bar
                 (replace-regexp-in-string "b" "f" it)
                 (replace-regexp-in-string "f" "o" it)))

If it returns "oar" then it applies replacements sequentially,
and we have no problem with such implementations.

But we need an alternative version that performs simultaneous
replacements and returns "far".



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-12 19:16             ` Juri Linkov
@ 2021-10-12 20:44               ` Thierry Volpiatto
  2021-10-13  7:57                 ` Juri Linkov
  0 siblings, 1 reply; 41+ messages in thread
From: Thierry Volpiatto @ 2021-10-12 20:44 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Lars Ingebrigtsen, Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]


Juri Linkov <juri@linkov.net> writes:

>> What about providing a simple anaphoric `and`?
>>
>>     (let ((bar "bar"))
>>       (helm-aand bar
>>                  (replace-regexp-in-string "b" "f" it)
>>                  (replace-regexp-in-string "a" "o" it)
>>                  (replace-regexp-in-string "r" "o" it)))
>>     =>"foo"
>
> What does the following return?
>
>     (let ((bar "bar"))
>       (helm-aand bar
>                  (replace-regexp-in-string "b" "f" it)
>                  (replace-regexp-in-string "f" "o" it)))
>
> If it returns "oar" then it applies replacements sequentially,
> and we have no problem with such implementations.

Yes, it does, thought you wanted something easy to read (and write), it
was the initial question isn't it?

> But we need an alternative version that performs simultaneous
> replacements and returns "far".

So I don't understand what you want to achieve.

-- 
Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 686 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-12 20:44               ` Thierry Volpiatto
@ 2021-10-13  7:57                 ` Juri Linkov
  2021-10-13  8:41                   ` Thierry Volpiatto
  0 siblings, 1 reply; 41+ messages in thread
From: Juri Linkov @ 2021-10-13  7:57 UTC (permalink / raw)
  To: Thierry Volpiatto; +Cc: Lars Ingebrigtsen, Stefan Monnier, emacs-devel

>> What does the following return?
>>
>>     (let ((bar "bar"))
>>       (helm-aand bar
>>                  (replace-regexp-in-string "b" "f" it)
>>                  (replace-regexp-in-string "f" "o" it)))
>>
>> If it returns "oar" then it applies replacements sequentially,
>> and we have no problem with such implementations.
>
> Yes, it does, thought you wanted something easy to read (and write), it
> was the initial question isn't it?

General-purpose threading like you proposed is a nice feature.
But is supports only sequential replacements.

>> But we need an alternative version that performs simultaneous
>> replacements and returns "far".
>
> So I don't understand what you want to achieve.

Most of replacements are intended to be simultaneous.
But in practice most of simultaneous replacements
could be performed using sequential replacement
because often the result of every replacement step
doesn't contain matches for the next replacement step.

But sometimes simultaneous replacement is required.
For example,

    (let ((bar "<&"))
      (helm-aand bar
                 (replace-regexp-in-string "<" "&lt;" it)
                 (replace-regexp-in-string "&" "&amp;" it)))

will do the wrong thing (and will return "&amp;lt;&amp;" instead of the
intended "&lt;&amp;") because these replacements should be performed
simultaneously.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Improve `replace-regexp-in-string' ergonomics?
  2021-10-13  7:57                 ` Juri Linkov
@ 2021-10-13  8:41                   ` Thierry Volpiatto
  0 siblings, 0 replies; 41+ messages in thread
From: Thierry Volpiatto @ 2021-10-13  8:41 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Lars Ingebrigtsen, Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]


Juri Linkov <juri@linkov.net> writes:

>>> What does the following return?
>>>
>>>     (let ((bar "bar"))
>>>       (helm-aand bar
>>>                  (replace-regexp-in-string "b" "f" it)
>>>                  (replace-regexp-in-string "f" "o" it)))
>>>
>>> If it returns "oar" then it applies replacements sequentially,
>>> and we have no problem with such implementations.
>>
>> Yes, it does, thought you wanted something easy to read (and write), it
>> was the initial question isn't it?
>
> General-purpose threading like you proposed is a nice feature.
> But is supports only sequential replacements.
>
>>> But we need an alternative version that performs simultaneous
>>> replacements and returns "far".
>>
>> So I don't understand what you want to achieve.
>
> Most of replacements are intended to be simultaneous.
> But in practice most of simultaneous replacements
> could be performed using sequential replacement
> because often the result of every replacement step
> doesn't contain matches for the next replacement step.
>
> But sometimes simultaneous replacement is required.
> For example,
>
>     (let ((bar "<&"))
>       (helm-aand bar
>                  (replace-regexp-in-string "<" "&lt;" it)
>                  (replace-regexp-in-string "&" "&amp;" it)))
>
> will do the wrong thing (and will return "&amp;lt;&amp;" instead of the
> intended "&lt;&amp;") because these replacements should be performed
> simultaneously.

I see what you mean now, thanks for explanations. 

-- 
Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 686 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-10-13  8:41 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22  4:36 Improve `replace-regexp-in-string' ergonomics? Lars Ingebrigtsen
2021-09-22  5:22 ` Yuri Khan
2021-09-22  6:36   ` Lars Ingebrigtsen
2021-09-22  7:47   ` Thierry Volpiatto
2021-09-22  5:24 ` Po Lu
2021-09-22  6:37   ` Lars Ingebrigtsen
2021-09-22 10:56     ` Po Lu
2021-09-22 20:08       ` Lars Ingebrigtsen
2021-09-23  0:11         ` Po Lu
2021-09-22  7:33 ` Adam Porter
2021-09-22  8:09   ` Lars Ingebrigtsen
2021-09-22  7:51 ` Andreas Schwab
2021-09-22  8:14 ` Augusto Stoffel
2021-09-22  8:21   ` Adam Porter
2021-09-22 18:01     ` Stefan Monnier
2021-09-22 18:24       ` Basil L. Contovounesios
2021-09-22 22:56       ` Adam Porter
2021-09-22 23:53         ` Eric Abrahamsen
2021-09-22 20:06   ` Lars Ingebrigtsen
2021-09-22 10:59 ` Dmitry Gutov
2021-09-22 20:18   ` Lars Ingebrigtsen
2021-09-22 22:23     ` Dmitry Gutov
2021-09-22 23:24       ` [External] : " Drew Adams
2021-09-22 18:14 ` Stefan Monnier
2021-09-22 19:30   ` Mattias Engdegård
2021-09-22 20:22   ` Lars Ingebrigtsen
2021-09-22 20:29     ` Lars Ingebrigtsen
2021-09-23  2:15     ` Stefan Monnier
2021-10-05 16:18 ` Juri Linkov
2021-10-12  6:53   ` Juri Linkov
2021-10-12 12:10     ` Lars Ingebrigtsen
2021-10-12 12:34       ` Stefan Monnier
2021-10-12 12:41         ` Lars Ingebrigtsen
2021-10-12 13:18           ` Lars Ingebrigtsen
2021-10-12 13:32             ` Mattias Engdegård
2021-10-12 15:48             ` Stefan Monnier
2021-10-12 13:33           ` Thierry Volpiatto
2021-10-12 19:16             ` Juri Linkov
2021-10-12 20:44               ` Thierry Volpiatto
2021-10-13  7:57                 ` Juri Linkov
2021-10-13  8:41                   ` Thierry Volpiatto

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).