regular expression

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* regular expression
@ 2010-01-25 11:00 Burkhard Schultheis
  2010-01-25 12:35 ` Nuno J. Silva
  2010-01-25 18:12 ` Stefan Monnier
  0 siblings, 2 replies; 17+ messages in thread
From: Burkhard Schultheis @ 2010-01-25 11:00 UTC (permalink / raw)
  To: help-gnu-emacs

I want to search for the following string: A hyphen not surrounded by 
spaces. Therefore I tried the following pattern:
[^ ]-[^ ].
But this pattern finds a hyphen preceded by a letter and followed by a 
newline character, too.

How to exclude the newline character? I tried
[^ ]-[^ \n]
but that does not work. Why? And how to search for this?

Thank you in advance!

Regards
Burkhard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
  2010-01-25 11:00 Burkhard Schultheis
@ 2010-01-25 12:35 ` Nuno J. Silva
  2010-01-25 12:53   ` Helmut Eller
  2010-01-25 16:34   ` Burkhard Schultheis
  2010-01-25 18:12 ` Stefan Monnier
  1 sibling, 2 replies; 17+ messages in thread
From: Nuno J. Silva @ 2010-01-25 12:35 UTC (permalink / raw)
  To: help-gnu-emacs

Burkhard Schultheis <burkhard.schultheis@web.de> writes:

> I want to search for the following string: A hyphen not surrounded by
> spaces. Therefore I tried the following pattern:
> [^ ]-[^ ].
> But this pattern finds a hyphen preceded by a letter and followed by a
> newline character, too.
>
> How to exclude the newline character? I tried
> [^ ]-[^ \n]
> but that does not work. Why? And how to search for this?

I made up some test text in a buffer, and tried search-forward-regexp
with 

[^ ]-[^ \n]

and

[^ ]-[^ 
]

The second one works here. It has the result of hitting C-q C-j, instead
of \n, because there are other places where emacs won't match a newline
with \n, and needs this.

-- 
Nuno J. Silva
gopher://sdf-eu.org/1/users/njsg


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
  2010-01-25 12:35 ` Nuno J. Silva
@ 2010-01-25 12:53   ` Helmut Eller
  2010-01-26 19:46     ` Nuno J. Silva
  2010-01-25 16:34   ` Burkhard Schultheis
  1 sibling, 1 reply; 17+ messages in thread
From: Helmut Eller @ 2010-01-25 12:53 UTC (permalink / raw)
  To: help-gnu-emacs

* Nuno J. Silva [2010-01-25 13:35+0100] writes:

> The second one works here. It has the result of hitting C-q C-j, instead
> of \n, because there are other places where emacs won't match a newline
> with \n, and needs this.

Have you tried to include \r in the regexp?

Helmut


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
  2010-01-25 12:35 ` Nuno J. Silva
  2010-01-25 12:53   ` Helmut Eller
@ 2010-01-25 16:34   ` Burkhard Schultheis
  1 sibling, 0 replies; 17+ messages in thread
From: Burkhard Schultheis @ 2010-01-25 16:34 UTC (permalink / raw)
  To: help-gnu-emacs

Am 25.01.2010 13:35, schrieb Nuno J. Silva:
> Burkhard Schultheis<burkhard.schultheis@web.de>  writes:
>
>> I want to search for the following string: A hyphen not surrounded by
>> spaces. Therefore I tried the following pattern:
>> [^ ]-[^ ].
>> But this pattern finds a hyphen preceded by a letter and followed by a
>> newline character, too.
>>
>> How to exclude the newline character? I tried
>> [^ ]-[^ \n]
>> but that does not work. Why? And how to search for this?
>
> I made up some test text in a buffer, and tried search-forward-regexp
> with
>
> [^ ]-[^ \n]
>
> and
>
> [^ ]-[^
> ]
>
> The second one works here. It has the result of hitting C-q C-j, instead
> of \n, because there are other places where emacs won't match a newline
> with \n, and needs this.
>

Here too! Thank you very much!

Regards
Burkhard


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
  2010-01-25 11:00 Burkhard Schultheis
  2010-01-25 12:35 ` Nuno J. Silva
@ 2010-01-25 18:12 ` Stefan Monnier
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Monnier @ 2010-01-25 18:12 UTC (permalink / raw)
  To: help-gnu-emacs

> How to exclude the newline character? I tried
> [^ ]-[^ \n]
> but that does not work. Why? And how to search for this?

\n is the representation of a newline inside an Elisp string, so it will
work if you do (re-search-forward "[^ ]-[^ \n]").  If it didn't work for
you, it's most likely because you typed that text not within an Elisp
string, but at a minibuffer prompt, where there's no such
backslash-espaces.  Instead you want to do hit C-q C-j to insert an
actual newline.

        Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
  2010-01-25 12:53   ` Helmut Eller
@ 2010-01-26 19:46     ` Nuno J. Silva
  0 siblings, 0 replies; 17+ messages in thread
From: Nuno J. Silva @ 2010-01-26 19:46 UTC (permalink / raw)
  To: help-gnu-emacs

Helmut Eller <eller.helmut@gmail.com> writes:

> * Nuno J. Silva [2010-01-25 13:35+0100] writes:
>
>> The second one works here. It has the result of hitting C-q C-j, instead
>> of \n, because there are other places where emacs won't match a newline
>> with \n, and needs this.
>
> Have you tried to include \r in the regexp?

I don't know if parens are the right way to make strings in regexps, but
none of the following worked (in the place where \n was):

\n\r

(\n\r)

(\r\n)

The last one looks silly, I wonder if any system uses such a line
break. The first one would catch any of the characters. As this is UNIX
I doubt there are carriage returns on my files (but emacs could use its
own internal convention for line breaks). 

-- 
Nuno J. Silva
gopher://sdf-eu.org/1/users/njsg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
       [not found] <d5b8df44-60fd-4b8f-83d1-cb7d04b2a7b4@googlegroups.com>
@ 2014-06-30 20:04 ` Emanuel Berg
  2014-06-30 20:13   ` Emanuel Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Emanuel Berg @ 2014-06-30 20:04 UTC (permalink / raw)
  To: help-gnu-emacs

renato.pontefice@gmail.com writes:

> Hi, I'm newbe on this group.
>
> I know, that I can use Regexp in emacs. And, I would
> do that. Can someone help me?
>
> I have a text file, that is a converted .pdf
> file. So, I have many dirty character inside.
>
> I've found some reg- expression.
>
> i.e.: in a line like this:
>
> 40 STREET DW...
>
> I want to made substitution like this:
>
> 40#STREEDW...
>
> can someone help me to build this expression?

I suspect it is better to use gnu.emacs.help for this
kind of question as that group is much more
active. Therefore, I post this on both groups. You can
later remove the crosspost depending on where the
action is from now on.

As for your question, you only give one example so I
had to guess a bit what the general case is. For just
one example, you might as well use one (non-regexp)
search-and-replace, right? But I suspect you want to do
this on all cases like this:

40 STREET DW
6 ROAD EW
666 A Z
666 a z

So try the below command:

(replace-regexp
 "\\([0-9]+\\) \\([A-Z]+\\) \\([A-Z]+\\)"
 "\\1#\\2\\3")

Here is how it works:

[...] are ranges

+ is "one, or many (but never zero) of the previous"

whitespace is whitespace

\\(...\\) is a group - those are used in the "replace
with" expression - \\1 means insert group 1 (from left
to right), and so on.

Note that [A-Z] matches [a-z] as well (the lowercase
equivalent) unless the variable case-fold-search is
nil. If you want to have case-sensitive replacement
(where [A-Z] makes sense), you can enclose the command
like this:

(let ((case-fold-search nil))
  (replace-regexp
    "\\([0-9]+\\) \\([A-Z]+\\) \\([A-Z]+\\)"
    "\\1#\\2\\3") )

You can watch this in action by running it on the
examples above - see how now, the "a z" one is left
alone!

Yes, you can do this without writing code - but it is
easier to write it in code and execute it. The reason
is you have better overview and it is easier to adjust
the regexp (both the match and replacement parts) - and
this is often a thing you'd do a couple of times, to
get it right. So that is easier than to input it all
again and again interactively.

Come back with more question if you have any. Otherwise
tell us if you got it to work. Good luck!

-- 
underground experts united:
http://user.it.uu.se/~embe8573

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
  2014-06-30 20:04 ` Emanuel Berg
@ 2014-06-30 20:13   ` Emanuel Berg
  2014-06-30 20:36     ` Teemu Likonen
       [not found]     ` <mailman.4605.1404160609.1147.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 17+ messages in thread
From: Emanuel Berg @ 2014-06-30 20:13 UTC (permalink / raw)
  To: help-gnu-emacs

Emanuel Berg <embe8573@student.uu.se> writes:

> Note that [A-Z] matches [a-z] as well (the lowercase
> equivalent) unless the variable case-fold-search is
> nil.

It seems this isn't so. Well, all the better as it is
less confusing. [A-Za-z] is all the a-z chars in both
upper- and lowercase, and [A-Z] isn't [a-z].

The guys at gnu.emacs.help will clarify this if
needed...

-- 
underground experts united:
http://user.it.uu.se/~embe8573

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
  2014-06-30 20:13   ` Emanuel Berg
@ 2014-06-30 20:36     ` Teemu Likonen
       [not found]     ` <mailman.4605.1404160609.1147.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 17+ messages in thread
From: Teemu Likonen @ 2014-06-30 20:36 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 669 bytes --]

Emanuel Berg [2014-06-30 22:13:09 +02:00] wrote:

> Emanuel Berg <embe8573@student.uu.se> writes:
>> Note that [A-Z] matches [a-z] as well (the lowercase equivalent)
>> unless the variable case-fold-search is nil.
>
> It seems this isn't so. Well, all the better as it is less confusing.
> [A-Za-z] is all the a-z chars in both upper- and lowercase, and [A-Z]
> isn't [a-z].

It's very likely that when one writes [A-Za-z] he actually means "match
any letter". So it's often better to write [[:alpha:]] instead since it
also matches other letters than A-Z. Sometimes a good option is \w which
matches word-constituent characters as defined in the current syntax
table.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
       [not found]     ` <mailman.4605.1404160609.1147.help-gnu-emacs@gnu.org>
@ 2014-06-30 20:52       ` Emanuel Berg
  2014-06-30 21:04         ` Teemu Likonen
       [not found]         ` <mailman.4609.1404162300.1147.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 17+ messages in thread
From: Emanuel Berg @ 2014-06-30 20:52 UTC (permalink / raw)
  To: help-gnu-emacs

Teemu Likonen <tlikonen@iki.fi> writes:

> It's very likely that when one writes [A-Za-z] he
> actually means "match any letter". So it's often
> better to write [[:alpha:]] instead since it also
> matches other letters than A-Z. Sometimes a good
> option is \w which matches word-constituent
> characters as defined in the current syntax table.

I actually think A-Z and so on is clearer to read and
faster to write, but good point, the char classes
probably makes for more "portable" stuff because
[[:alpha:]] includes the Swedish chars ä, å, and ö as
well (and the German, Finnish, etc., I would suppose) -
I don't know if this somehow plays with your locale
(that'd be impressive) or if [[:alpha:]] is just very
generous. Like I said, A-Z is clearer! But you should
use [[:alpha:]], of course - as for this particular
question, there are also [[:upper:]], [[:lower:]],
[[:digit:]], [[:space:]]. Note the double square
brackets - otherwise [:digit:] would be :, d, i, g, i,
t - so it could look like this: [[:alpha:][:digit:]]

(replace-regexp "[[:alpha:][:digit:]]" "x")

One1 (will be xxxx)

PS. Boy, I really hope the OP did as I said and came to
gnu.emacs.help! DS.

-- 
underground experts united:
http://user.it.uu.se/~embe8573

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
  2014-06-30 20:52       ` Emanuel Berg
@ 2014-06-30 21:04         ` Teemu Likonen
       [not found]         ` <mailman.4609.1404162300.1147.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 17+ messages in thread
From: Teemu Likonen @ 2014-06-30 21:04 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 184 bytes --]

Emanuel Berg [2014-06-30 22:52:41 +02:00] wrote:

> (replace-regexp "[[:alpha:][:digit:]]" "x")
>
> One1 (will be xxxx)

I think [[:alpha:][:digit:]] is the same thing as [[:alnum:]].

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
       [not found]         ` <mailman.4609.1404162300.1147.help-gnu-emacs@gnu.org>
@ 2014-06-30 21:11           ` Emanuel Berg
  0 siblings, 0 replies; 17+ messages in thread
From: Emanuel Berg @ 2014-06-30 21:11 UTC (permalink / raw)
  To: help-gnu-emacs

Teemu Likonen <tlikonen@iki.fi> writes:

>> (replace-regexp "[[:alpha:][:digit:]]" "x") One1
>> (will be xxxx)
>
> I think [[:alpha:][:digit:]] is the same thing as
> [[:alnum:]].

OK, I made up the example so the OP would understand
the brackets, that is, the first (outer) pair makes it
a class, and the second (inner) pair puts some chars
into that class. So you'd not think [[:alnum:]] is
"atomic" syntax... I don't know how to better explain
it - the example is more clear, but of course examples
should make sense as well...

-- 
underground experts united:
http://user.it.uu.se/~embe8573

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
@ 2014-06-30 23:14 Tak Kunihiro
  0 siblings, 0 replies; 17+ messages in thread
From: Tak Kunihiro @ 2014-06-30 23:14 UTC (permalink / raw)
  To: renato.pontefice; +Cc: help-gnu-emacs

> 40 STREET DW...
>
> I want to made substitution like this:
>
> 40#STREEDW...
>
> can someone help me to build this expression?

Use `re-builder' and `query-replace-regexp'.

Play on (re-builder) with following setup, then copy regexp that is
inside of ".  Call (query-replace-regexp) for substitution.

(setq reb-re-syntax 'string)



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
       [not found] <mailman.4622.1404173952.1147.help-gnu-emacs@gnu.org>
@ 2014-07-02  8:33 ` Emanuel Berg
  0 siblings, 0 replies; 17+ messages in thread
From: Emanuel Berg @ 2014-07-02  8:33 UTC (permalink / raw)
  To: help-gnu-emacs

Tak Kunihiro <tak.kunihiro@gmail.com> writes:

> Use `re-builder' and `query-replace-regexp'.
>
> Play on (re-builder) with following setup, then copy
> regexp that is inside of ".  Call
> (query-replace-regexp) for substitution.
>
> (setq reb-re-syntax 'string)

re-builder is a gorgeous tool, if it was a girl I would
hit on it (her), however, compare two workflows:

Method one:

1. get up re-builder
2. start typing - ah, lots of highlights
   immediately...!
3. maybe my idea for a regexp wasn't that good, I have
   to rethink it (?)
4. redo 2-3 (the highlights of an incomplete regexp
   bouncing around, makes me think when I should type
   and think of what I type)
5. OK, that looks fine, check the screen to see it
   caught all cases (the regexp method is so the
   computer can do this, but OK I'll do it - why
   otherwise would it show?)
6. OK, I think it did catch all (didn't check all, I'm
   not a machine, but looks good)
7. put the regexp in query-replace-regexp
8. "yes" or "noing" them one by one

Method two:

1. think/write code
2. execute it

We shouldn't be afraid of the batch, one-hit
tools. They are not more difficult to master, and even
before we get there, they are less error prone that the
interactive pop-shooting all over the place "I
caught them all (almost, I think)". Be brave!

But that's not all, check out this I just wrote:

(progn
  (insert
   (let ((case-fold-search nil)
         (label "Totalt: ")
         (birds 0))
     (save-excursion
       (replace-regexp (format "^%s[[:digit:]]*\n\n" label) "")
       (while (re-search-forward "^[a-z]" (point-max) t) (incf birds)) )
     (format "\n\n%s%d" label birds) ))
  (save-buffer) )

Check it out in context here to see what it does:

http://user.it.uu.se/~embe8573/birds.txt

Yes: by getting better at regexp by doing simple cases
in code (like search-and-replaces with patterns), turns
out, the regexp can be used to keep track of a small
database of birds and keep a digit in sync! (I've heard
you need 300 to be a big boy - but even they had to get
by the 31st so I'm there in a couple of decades.)

Lesson to the computer kids of the world: Do the easy
things the (seemingly) difficult but in reality best
way, very soon the difficult things will be easy with
the exact same method. Remember, "What you once feared,
now makes you free. Do it today - in a different way!"

I'm getting of the soap box, thanks for this session
Emacs people.

-- 
underground experts united:
http://user.it.uu.se/~embe8573

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
@ 2014-07-02 13:10 Tak Kunihiro
  2014-07-02 13:43 ` Stefan Monnier
  0 siblings, 1 reply; 17+ messages in thread
From: Tak Kunihiro @ 2014-07-02 13:10 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: embe8573

I think a work flow from (re-builder) to (query-replace-regexp) should
be by default in Emacs.

Tweaks are necessary; however, the idea is demonstrated such in
 https://gist.github.com/mooz/890562

I suggest following key bindings.
 (global-set-key (kbd "C-M-%") 're-builder)
 (define-key reb-mode-map (kbd "<return>") 'my-reb-query-replace-regexp)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
  2014-07-02 13:10 Tak Kunihiro
@ 2014-07-02 13:43 ` Stefan Monnier
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Monnier @ 2014-07-02 13:43 UTC (permalink / raw)
  To: help-gnu-emacs

> I think a work flow from (re-builder) to (query-replace-regexp) should
> be by default in Emacs.

Maybe the right way to do it is to integrate re-builder into isearch
(when searching with a regexp), to benefit from the existing work flow
from isearch to query-replace.


        Stefan




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regular expression
@ 2014-07-02 23:14 Tak Kunihiro
  0 siblings, 0 replies; 17+ messages in thread
From: Tak Kunihiro @ 2014-07-02 23:14 UTC (permalink / raw)
  To: help-gnu-emacs

> > I think a work flow from (re-builder) to (query-replace-regexp) should
> > be by default in Emacs.
> 
> Maybe the right way to do it is to integrate re-builder into isearch
> (when searching with a regexp), to benefit from the existing work flow
> from isearch to query-replace.

A window dedicated for re-builder can be regarded as mini buffer.

With following setup, it is almost similar to isearch with regexp.
(define-key reb-mode-map (kbd "C-s") 'reb-next-match)
(define-key reb-mode-map (kbd "C-r") 'reb-prev-match)
(define-key reb-mode-map (kbd "C-g") 'reb-quit)
(define-key reb-mode-map (kbd "<return>") 'my-query-replace-regexp)

Minor concern is re-builder looks from beginning of current buffer
instead from where the point is.

I think to bind re-builder to "C-M-s" is a good idea, although coding
is beyond my capability.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-07-02 23:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-30 23:14 regular expression Tak Kunihiro
  -- strict thread matches above, loose matches on Subject: below --
2014-07-02 23:14 Tak Kunihiro
2014-07-02 13:10 Tak Kunihiro
2014-07-02 13:43 ` Stefan Monnier
     [not found] <mailman.4622.1404173952.1147.help-gnu-emacs@gnu.org>
2014-07-02  8:33 ` Emanuel Berg
     [not found] <d5b8df44-60fd-4b8f-83d1-cb7d04b2a7b4@googlegroups.com>
2014-06-30 20:04 ` Emanuel Berg
2014-06-30 20:13   ` Emanuel Berg
2014-06-30 20:36     ` Teemu Likonen
     [not found]     ` <mailman.4605.1404160609.1147.help-gnu-emacs@gnu.org>
2014-06-30 20:52       ` Emanuel Berg
2014-06-30 21:04         ` Teemu Likonen
     [not found]         ` <mailman.4609.1404162300.1147.help-gnu-emacs@gnu.org>
2014-06-30 21:11           ` Emanuel Berg
2010-01-25 11:00 Burkhard Schultheis
2010-01-25 12:35 ` Nuno J. Silva
2010-01-25 12:53   ` Helmut Eller
2010-01-26 19:46     ` Nuno J. Silva
2010-01-25 16:34   ` Burkhard Schultheis
2010-01-25 18:12 ` Stefan Monnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).