unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* (rx regexp to remove space and new lines
@ 2022-09-12  9:25 Jean Louis
  2022-09-12 10:13 ` tomas
  0 siblings, 1 reply; 15+ messages in thread
From: Jean Louis @ 2022-09-12  9:25 UTC (permalink / raw)
  To: Help GNU Emacs

I would like to construct rx regexp to remove both whitespace
characters and new line "\n" to replace it with only one space " ".

How to do it?

(string-replace (rx (one-or-more (any "\n" "[[:space:]]"))) " " sql)))))



Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: (rx regexp to remove space and new lines
  2022-09-12  9:25 (rx regexp to remove space and new lines Jean Louis
@ 2022-09-12 10:13 ` tomas
  2022-09-12 10:25   ` Jean Louis
  2022-09-13  2:36   ` Michael Heerdegen
  0 siblings, 2 replies; 15+ messages in thread
From: tomas @ 2022-09-12 10:13 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 722 bytes --]

On Mon, Sep 12, 2022 at 12:25:58PM +0300, Jean Louis wrote:
> I would like to construct rx regexp to remove both whitespace
> characters and new line "\n" to replace it with only one space " ".
> 
> How to do it?
> 
> (string-replace (rx (one-or-more (any "\n" "[[:space:]]"))) " " sql)))))

This looks about correct, so I don't understand your question. In which way
this does fail? Don't make us solve riddles ;-)

Note that [:space:] includes "\n" (and vertical tabulator and other ilks
of space-y characters), so the (any "\n" "[[:space:]]") seems redundant
and "[[:space:]]" should suffice.

I'm not very fluent on Rx, but I'd double check whether it takes "[[:space:]]"
as syntax.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: (rx regexp to remove space and new lines
  2022-09-12 10:13 ` tomas
@ 2022-09-12 10:25   ` Jean Louis
  2022-09-12 10:43     ` [SOLVED] " Jean Louis
  2022-09-13  2:36   ` Michael Heerdegen
  1 sibling, 1 reply; 15+ messages in thread
From: Jean Louis @ 2022-09-12 10:25 UTC (permalink / raw)
  To: tomas; +Cc: help-gnu-emacs

* tomas@tuxteam.de <tomas@tuxteam.de> [2022-09-12 13:15]:
> On Mon, Sep 12, 2022 at 12:25:58PM +0300, Jean Louis wrote:
> > I would like to construct rx regexp to remove both whitespace
> > characters and new line "\n" to replace it with only one space " ".
> > 
> > How to do it?
> > 
> > (string-replace (rx (one-or-more (any "\n" "[[:space:]]"))) " " sql)))))
> 
> This looks about correct, so I don't understand your question. In which way
> this does fail? Don't make us solve riddles ;-)

(replace-regexp-in-string (rx (one-or-more (any whitespace))) " "
"Hello    there

and    here") ⇒ "Hello there

and here"

new line is not removed with character class `whitespace'.

(replace-regexp-in-string (rx (one-or-more (or "\n" (any whitespace)))) " "
"Hello    there

and    here") ⇒ "Hello there and here"

So now it will work that way.

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-12 10:25   ` Jean Louis
@ 2022-09-12 10:43     ` Jean Louis
  2022-09-13  2:44       ` Michael Heerdegen
  0 siblings, 1 reply; 15+ messages in thread
From: Jean Louis @ 2022-09-12 10:43 UTC (permalink / raw)
  To: tomas, help-gnu-emacs

If there is better way let me know.

(defun rcd-string-clean-whitespace (s)
  "Return trimmed string S without new lines and whitespaces."
  (replace-regexp-in-string 
   (rx (one-or-more (or "\n" (any whitespace))))
   " "
   (string-trim s)))


-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: (rx regexp to remove space and new lines
  2022-09-12 10:13 ` tomas
  2022-09-12 10:25   ` Jean Louis
@ 2022-09-13  2:36   ` Michael Heerdegen
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Heerdegen @ 2022-09-13  2:36 UTC (permalink / raw)
  To: help-gnu-emacs

<tomas@tuxteam.de> writes:


> > (string-replace (rx (one-or-more (any "\n" "[[:space:]]"))) " " sql)))))
>
> This looks about correct, so I don't understand your question. In which way
> this does fail? Don't make us solve riddles ;-)

When you eval that `rx' expression you see that you get ... something.

> I'm not very fluent on Rx, but I'd double check whether it takes
> "[[:space:]]" as syntax.

No, strings match literally.

Michael.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-12 10:43     ` [SOLVED] " Jean Louis
@ 2022-09-13  2:44       ` Michael Heerdegen
  2022-09-13  4:01         ` Jean Louis
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Heerdegen @ 2022-09-13  2:44 UTC (permalink / raw)
  To: help-gnu-emacs

Jean Louis <bugs@gnu.support> writes:

>    (rx (one-or-more (or "\n" (any whitespace))))

It had been mentioned that newlines count as whitespace.  And
(any CHARCLASS) can be simplified to CHARCLASS.  And you probably want
greedy matching.  You get:

  (rx (+ whitespace))

Michael.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-13  2:44       ` Michael Heerdegen
@ 2022-09-13  4:01         ` Jean Louis
  2022-09-13 10:19           ` Michael Heerdegen
  0 siblings, 1 reply; 15+ messages in thread
From: Jean Louis @ 2022-09-13  4:01 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: help-gnu-emacs

* Michael Heerdegen <michael_heerdegen@web.de> [2022-09-13 05:45]:
> Jean Louis <bugs@gnu.support> writes:
> 
> >    (rx (one-or-more (or "\n" (any whitespace))))
> 
> It had been mentioned that newlines count as whitespace.  And
> (any CHARCLASS) can be simplified to CHARCLASS.  And you probably want
> greedy matching.  You get:
> 
>   (rx (+ whitespace))

That makes sense, but how?

(defun rcd-string-clean-whitespace (s)
  "Return trimmed string S after cleaning whitespaces greedy."
  (replace-regexp-in-string 
   ;; (rx (one-or-more (or "\n" (any whitespace))))
   (rx (one-or-more (+ whitespace)))
   " "
   (string-trim s)))

(rcd-string-clean-whitespace "   hello   there  


and I    am her e") ⇒ "hello there and I am her e"

it works here in this buffer, but in other Emacs buffer it does not
work. This is maybe because whitespace is defined diffrently per
buffer? 

Anyway, in emacs -Q it does not work when I use:
   (rx (one-or-more (+ whitespace)))

Until I understand it, I have to stay with this version:

(defun rcd-string-clean-whitespace (s)
  "Return trimmed string S after cleaning whitespaces greedy."
  (replace-regexp-in-string 
   (rx (one-or-more (or "\n" (any whitespace))))
   ;; (rx (one-or-more (+ whitespace)))
   " "
   (string-trim s)))

as it works in this buffer and in other buffers.

(rcd-string-clean-whitespace "   hello   there  


and I    am her e") ⇒ "hello there and I am her e"

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-13  4:01         ` Jean Louis
@ 2022-09-13 10:19           ` Michael Heerdegen
  2022-09-13 10:33             ` tomas
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Heerdegen @ 2022-09-13 10:19 UTC (permalink / raw)
  To: help-gnu-emacs

Jean Louis <bugs@gnu.support> writes:

> it works here in this buffer, but in other Emacs buffer it does not
> work. This is maybe because whitespace is defined diffrently per
> buffer?

Eh - oh, of course, depends on the syntax table, sorry.

But
 
  (rx (+ (or "\n" whitespace)))

should hopefully be correct.

Michael.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-13 10:19           ` Michael Heerdegen
@ 2022-09-13 10:33             ` tomas
  2022-09-13 12:32               ` Michael Heerdegen
  0 siblings, 1 reply; 15+ messages in thread
From: tomas @ 2022-09-13 10:33 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

On Tue, Sep 13, 2022 at 12:19:08PM +0200, Michael Heerdegen wrote:
> Jean Louis <bugs@gnu.support> writes:
> 
> > it works here in this buffer, but in other Emacs buffer it does not
> > work. This is maybe because whitespace is defined diffrently per
> > buffer?
> 
> Eh - oh, of course, depends on the syntax table, sorry.

This would be unfortunate. What would correspond to "[[:space:]]", i.e.
the syntax-independent POSIX whitespace?

(Not saying that you aren't right, just not the time to play around at
the moment).

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-13 10:33             ` tomas
@ 2022-09-13 12:32               ` Michael Heerdegen
  2022-09-13 12:48                 ` tomas
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Heerdegen @ 2022-09-13 12:32 UTC (permalink / raw)
  To: help-gnu-emacs

<tomas@tuxteam.de> writes:

> > Eh - oh, of course, depends on the syntax table, sorry.
>
> This would be unfortunate. What would correspond to "[[:space:]]", i.e.
> the syntax-independent POSIX whitespace?

rx transforms 'whitespace' to [:space:].

But (info "(elisp) Char Classes") told me that "[:space:] [...] matches
any character that has whitespace syntax (*note Syntax Class Table)", so I
assumed that the meaning is not constant... or is it?

Michael.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-13 12:32               ` Michael Heerdegen
@ 2022-09-13 12:48                 ` tomas
  2022-09-13 14:29                   ` Michael Albinus
  0 siblings, 1 reply; 15+ messages in thread
From: tomas @ 2022-09-13 12:48 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 668 bytes --]

On Tue, Sep 13, 2022 at 02:32:58PM +0200, Michael Heerdegen wrote:
> <tomas@tuxteam.de> writes:
> 
> > > Eh - oh, of course, depends on the syntax table, sorry.
> >
> > This would be unfortunate. What would correspond to "[[:space:]]", i.e.
> > the syntax-independent POSIX whitespace?
> 
> rx transforms 'whitespace' to [:space:].
> 
> But (info "(elisp) Char Classes") told me that "[:space:] [...] matches
> any character that has whitespace syntax (*note Syntax Class Table)", so I
> assumed that the meaning is not constant... or is it?

Good question. I'll check once I'm off work (unless someone else beats me
to it, that is).

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-13 12:48                 ` tomas
@ 2022-09-13 14:29                   ` Michael Albinus
  2022-09-14 11:00                     ` Jean Louis
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Albinus @ 2022-09-13 14:29 UTC (permalink / raw)
  To: tomas; +Cc: help-gnu-emacs

<tomas@tuxteam.de> writes:

>> rx transforms 'whitespace' to [:space:].
>>
>> But (info "(elisp) Char Classes") told me that "[:space:] [...] matches
>> any character that has whitespace syntax (*note Syntax Class Table)", so I
>> assumed that the meaning is not constant... or is it?
>
> Good question. I'll check once I'm off work (unless someone else beats me
> to it, that is).

(rx blank) might be the better choice. Adding CR and LF to the puzzle,
I'd use (rx (any "\n\r" blank))

> Cheers

Best regards, Michael.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-13 14:29                   ` Michael Albinus
@ 2022-09-14 11:00                     ` Jean Louis
  2022-09-15  2:28                       ` Michael Heerdegen
  0 siblings, 1 reply; 15+ messages in thread
From: Jean Louis @ 2022-09-14 11:00 UTC (permalink / raw)
  To: Michael Albinus; +Cc: tomas, help-gnu-emacs

* Michael Albinus <michael.albinus@gmx.de> [2022-09-13 17:33]:
> <tomas@tuxteam.de> writes:
> 
> >> rx transforms 'whitespace' to [:space:].
> >>
> >> But (info "(elisp) Char Classes") told me that "[:space:] [...] matches
> >> any character that has whitespace syntax (*note Syntax Class Table)", so I
> >> assumed that the meaning is not constant... or is it?
> >
> > Good question. I'll check once I'm off work (unless someone else beats me
> > to it, that is).
> 
> (rx blank) might be the better choice. Adding CR and LF to the puzzle,
> I'd use (rx (any "\n\r" blank))

Thanks, I have tried it here:

(defun rcd-string-clean-whitespace (s)
  "Return trimmed string S after cleaning whitespaces."
  (replace-regexp-in-string 
   (rx (one-or-more blank))
   ;; (rx (one-or-more (or "\n" (any whitespace))))
   ;; (rx (one-or-more (+ whitespace)))
   " "
   (string-trim s)))

But it seems it does not work well:

(rcd-string-clean-whitespace "H   elllo  
there") ⇒ "H elllo 
there"

This version works well:

(defun rcd-string-clean-whitespace (s)
  "Return trimmed string S after cleaning whitespaces."
  (replace-regexp-in-string 
   (rx (one-or-more (or "\n" (any whitespace))))
   " "
   (string-trim s)))

(rcd-string-clean-whitespace "H   elllo  
there") ⇒ "H elllo there"

and I can replace whitespace with blank to get same effect

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-14 11:00                     ` Jean Louis
@ 2022-09-15  2:28                       ` Michael Heerdegen
  2022-09-15  8:22                         ` Jean Louis
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Heerdegen @ 2022-09-15  2:28 UTC (permalink / raw)
  To: help-gnu-emacs

Jean Louis <bugs@gnu.support> writes:

> > (rx blank) might be the better choice. Adding CR and LF to the
> > puzzle, I'd use (rx (any "\n\r" blank))
>
> Thanks, I have tried it here:
>
> (defun rcd-string-clean-whitespace (s)
>   "Return trimmed string S after cleaning whitespaces."
>   (replace-regexp-in-string 
>    (rx (one-or-more blank))

The newline is missing.

Michael.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [SOLVED] Re: (rx regexp to remove space and new lines
  2022-09-15  2:28                       ` Michael Heerdegen
@ 2022-09-15  8:22                         ` Jean Louis
  0 siblings, 0 replies; 15+ messages in thread
From: Jean Louis @ 2022-09-15  8:22 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: help-gnu-emacs

* Michael Heerdegen <michael_heerdegen@web.de> [2022-09-15 05:29]:
> Jean Louis <bugs@gnu.support> writes:
> 
> > > (rx blank) might be the better choice. Adding CR and LF to the
> > > puzzle, I'd use (rx (any "\n\r" blank))
> >
> > Thanks, I have tried it here:
> >
> > (defun rcd-string-clean-whitespace (s)
> >   "Return trimmed string S after cleaning whitespaces."
> >   (replace-regexp-in-string 
> >    (rx (one-or-more blank))
> 
> The newline is missing.

This is version that works. I can interchange (any whitespace) with
(any blank), but none of them recognizes "\n" as whitespace so far.

(defun rcd-string-clean-whitespace (s)
  "Return trimmed string S after cleaning whitespaces."
  (replace-regexp-in-string 
   (rx (one-or-more (or "\n" (any whitespace))))
   " "
   (string-trim s)))

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-09-15  8:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-12  9:25 (rx regexp to remove space and new lines Jean Louis
2022-09-12 10:13 ` tomas
2022-09-12 10:25   ` Jean Louis
2022-09-12 10:43     ` [SOLVED] " Jean Louis
2022-09-13  2:44       ` Michael Heerdegen
2022-09-13  4:01         ` Jean Louis
2022-09-13 10:19           ` Michael Heerdegen
2022-09-13 10:33             ` tomas
2022-09-13 12:32               ` Michael Heerdegen
2022-09-13 12:48                 ` tomas
2022-09-13 14:29                   ` Michael Albinus
2022-09-14 11:00                     ` Jean Louis
2022-09-15  2:28                       ` Michael Heerdegen
2022-09-15  8:22                         ` Jean Louis
2022-09-13  2:36   ` Michael Heerdegen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).