* Documentation on debugging regexp performance
@ 2016-01-21 5:29 Clément Pit--Claudel
2016-01-21 6:36 ` Yuri Khan
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Clément Pit--Claudel @ 2016-01-21 5:29 UTC (permalink / raw)
To: Emacs developers
[-- Attachment #1.1: Type: text/plain, Size: 571 bytes --]
Hi emacs-devel,
I'm running into a surprising regular expressions issue. I have attached a file (~50k) in which (re-search-forward " +[^:=]+ +:=?") seems to be extremely slow. (I killed it after 30 seconds). Truncating the file to its first 20 lines reduces the time for re-search-forward to about a second, which is still extremely slow.
Are there good resources on how to rewrite regexps to make them Emacs-friendly? I didn't find such documentation, and I'm puzzled as to what could make the regexp above hard to re-search-forward for.
Cheers,
Clément.
[-- Attachment #1.2: large-goal --]
[-- Type: text/plain, Size: 48957 bytes --]
Vector.t methSig (S q0) =>
match
v0 as v'0 in (Vector.t _ m0)
return
(match m0 as m1 return (.. -> ..) with
| O => fun _ : Vector.t methSig O => False -> True
| S n0 => fun _ : Vector.t methSig .. => Fin.t n0 -> methSig
end v'0)
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons _ n0 t0 => fun p1 : Fin.t n0 => nth_fix n0 t0 p1
end p'0
end v') n t p0
end p'
end
(VectorDef.cons methSig
(Build_methSig
(String (Ascii false false true false true true true false)
(String (Ascii true true true true false true true false)
(String (Ascii true true true true true false true false)
(String (Ascii true true false false true true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true false false true false true true false)
(String
(Ascii false true true true false true true false)
(String
(Ascii true true true false false true true false) EmptyString)))))))))
(@nil Type)
(@Some Type
match HSLM return Type with
| Build_StringLikeMin String0 _ _ => String0
end)) (S (S (S (S (S (S O))))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii true true false false false true true false)
(String (Ascii false false false true false true true false)
(String (Ascii true false false false false true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true true true true true false true false)
(String
(Ascii true false false false false true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii true true true true true false true false)
(String
(Ascii true false true true false true true false)
(String (..) (..)))))))))))
(@cons Type nat (@cons Type (ascii -> bool) (@nil Type)))
(@Some Type bool))
(S (S (S (S (S O)))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii true true true false false true true false)
(String (Ascii true false true false false true true false)
(String (Ascii false false true false true true true false) EmptyString)))
(@cons Type nat (@nil Type))
(@Some Type ascii))
(S (S (S (S O))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii false false true true false true true false)
(String
(Ascii true false true false false true true false)
(String
(Ascii false true true true false true true false)
(String
(Ascii true true true false false true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii false false false true false true true false) EmptyString))))))
(@nil Type)
(@Some Type nat))
(S (S (S O)))
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii false false true false true true true false)
(String
(Ascii true false false false false true true false)
(String
(Ascii true true false true false true true false)
(String
(Ascii true false true false false true true false) EmptyString))))
(@cons Type nat (@nil Type))
(@None Type))
(S (S O))
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii false false true false false true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true true true true false true true false)
(String
(Ascii false false false false true true true false) EmptyString))))
(@cons Type nat (@nil Type))
(@None Type))
(S O)
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii true true false false true true true false)
(String
(Ascii false false false false true true true false)
(String
(Ascii false false true true false true true false)
(String
(Ascii true false false true false true true false)
(String (..) (..))))))
(@cons
Type
(prod nat (prod nat nat))
(@cons Type nat (@cons Type nat (@nil Type))))
(@Some Type (list nat))) O
(VectorDef.nil methSig))))))))
return (list Type)
with
| Build_methSig _ methDom _ => methDom
end
match
match idx in (Fin.t m') return (Vector.t methSig m' -> methSig) with
| Fin.F1 q =>
fun v : Vector.t methSig (S q) =>
match
v as v' in (Vector.t _ m)
return
(match m as m0 return (Vector.t methSig m0 -> Type) with
| O => fun _ : Vector.t methSig O => False -> True
| S n => fun _ : Vector.t methSig (S n) => methSig
end v')
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons h n _ => h
end
| Fin.FS q p' =>
fun v : Vector.t methSig (S q) =>
match
v as v' in (Vector.t _ m)
return
(match m as m0 return (Vector.t methSig m0 -> Type) with
| O => fun _ : Vector.t methSig O => False -> True
| S n => fun _ : Vector.t methSig (S n) => Fin.t n -> methSig
end v')
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons _ n t =>
fun p0 : Fin.t n =>
(fix nth_fix (m : nat) (v' : Vector.t methSig m) (p : Fin.t m) {struct v'} :
methSig :=
match p in (Fin.t m') return (Vector.t methSig m' -> methSig) with
| Fin.F1 q0 =>
fun v0 : Vector.t methSig (S q0) =>
match
v0 as v'0 in (Vector.t _ m0)
return
(match m0 as m1 return (Vector.t methSig m1 -> Type) with
| O => fun _ : Vector.t methSig O => False -> True
| S n0 => fun _ : Vector.t methSig (..) => methSig
end v'0)
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons h n0 _ => h
end
| Fin.FS q0 p'0 =>
fun v0 : Vector.t methSig (S q0) =>
match
v0 as v'0 in (Vector.t _ m0)
return
(match m0 as m1 return (.. -> ..) with
| O => fun _ : Vector.t methSig O => False -> True
| S n0 => fun _ : Vector.t methSig .. => Fin.t n0 -> methSig
end v'0)
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons _ n0 t0 => fun p1 : Fin.t n0 => nth_fix n0 t0 p1
end p'0
end v') n t p0
end p'
end
(VectorDef.cons methSig
(Build_methSig
(String (Ascii false false true false true true true false)
(String (Ascii true true true true false true true false)
(String (Ascii true true true true true false true false)
(String (Ascii true true false false true true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true false false true false true true false)
(String
(Ascii false true true true false true true false)
(String
(Ascii true true true false false true true false) EmptyString)))))))))
(@nil Type)
(@Some Type
match HSLM return Type with
| Build_StringLikeMin String0 _ _ => String0
end)) (S (S (S (S (S (S O))))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii true true false false false true true false)
(String (Ascii false false false true false true true false)
(String (Ascii true false false false false true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true true true true true false true false)
(String
(Ascii true false false false false true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii true true true true true false true false)
(String
(Ascii true false true true false true true false)
(String (..) (..)))))))))))
(@cons Type nat (@cons Type (ascii -> bool) (@nil Type)))
(@Some Type bool))
(S (S (S (S (S O)))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii true true true false false true true false)
(String (Ascii true false true false false true true false)
(String (Ascii false false true false true true true false) EmptyString)))
(@cons Type nat (@nil Type))
(@Some Type ascii))
(S (S (S (S O))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii false false true true false true true false)
(String
(Ascii true false true false false true true false)
(String
(Ascii false true true true false true true false)
(String
(Ascii true true true false false true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii false false false true false true true false) EmptyString))))))
(@nil Type)
(@Some Type nat))
(S (S (S O)))
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii false false true false true true true false)
(String
(Ascii true false false false false true true false)
(String
(Ascii true true false true false true true false)
(String
(Ascii true false true false false true true false) EmptyString))))
(@cons Type nat (@nil Type))
(@None Type))
(S (S O))
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii false false true false false true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true true true true false true true false)
(String
(Ascii false false false false true true true false) EmptyString))))
(@cons Type nat (@nil Type))
(@None Type))
(S O)
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii true true false false true true true false)
(String
(Ascii false false false false true true true false)
(String
(Ascii false false true true false true true false)
(String
(Ascii true false false true false true true false)
(String (..) (..))))))
(@cons
Type
(prod nat (prod nat nat))
(@cons Type nat (@cons Type nat (@nil Type))))
(@Some Type (list nat))) O
(VectorDef.nil methSig))))))))
return (option Type)
with
| Build_methSig _ _ methCod => methCod
end)
(@Fin.FS (S (S (S (S (S (S O))))))
(@Fin.FS (S (S (S (S (S O)))))
(@Fin.FS (S (S (S (S O))))
(@Fin.FS (S (S (S O))) (@Fin.FS (S (S O)) (@Fin.FS (S O) (@Fin.F1 O)))))))))
(@snd (list Type) (option Type)
((fun idx : Fin.t (S (S (S (S (S (S (S O))))))) =>
@pair (list Type) (option Type)
match
match idx in (Fin.t m') return (Vector.t methSig m' -> methSig) with
| Fin.F1 q =>
fun v : Vector.t methSig (S q) =>
match
v as v' in (Vector.t _ m)
return
(match m as m0 return (Vector.t methSig m0 -> Type) with
| O => fun _ : Vector.t methSig O => False -> True
| S n => fun _ : Vector.t methSig (S n) => methSig
end v')
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons h n _ => h
end
| Fin.FS q p' =>
fun v : Vector.t methSig (S q) =>
match
v as v' in (Vector.t _ m)
return
(match m as m0 return (Vector.t methSig m0 -> Type) with
| O => fun _ : Vector.t methSig O => False -> True
| S n => fun _ : Vector.t methSig (S n) => Fin.t n -> methSig
end v')
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons _ n t =>
fun p0 : Fin.t n =>
(fix nth_fix (m : nat) (v' : Vector.t methSig m) (p : Fin.t m) {struct v'} :
methSig :=
match p in (Fin.t m') return (Vector.t methSig m' -> methSig) with
| Fin.F1 q0 =>
fun v0 : Vector.t methSig (S q0) =>
match
v0 as v'0 in (Vector.t _ m0)
return
(match m0 as m1 return (Vector.t methSig m1 -> Type) with
| O => fun _ : Vector.t methSig O => False -> True
| S n0 => fun _ : Vector.t methSig (..) => methSig
end v'0)
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons h n0 _ => h
end
| Fin.FS q0 p'0 =>
fun v0 : Vector.t methSig (S q0) =>
match
v0 as v'0 in (Vector.t _ m0)
return
(match m0 as m1 return (.. -> ..) with
| O => fun _ : Vector.t methSig O => False -> True
| S n0 => fun _ : Vector.t methSig .. => Fin.t n0 -> methSig
end v'0)
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons _ n0 t0 => fun p1 : Fin.t n0 => nth_fix n0 t0 p1
end p'0
end v') n t p0
end p'
end
(VectorDef.cons methSig
(Build_methSig
(String (Ascii false false true false true true true false)
(String (Ascii true true true true false true true false)
(String (Ascii true true true true true false true false)
(String (Ascii true true false false true true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true false false true false true true false)
(String
(Ascii false true true true false true true false)
(String
(Ascii true true true false false true true false) EmptyString)))))))))
(@nil Type)
(@Some Type
match HSLM return Type with
| Build_StringLikeMin String0 _ _ => String0
end)) (S (S (S (S (S (S O))))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii true true false false false true true false)
(String (Ascii false false false true false true true false)
(String (Ascii true false false false false true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true true true true true false true false)
(String
(Ascii true false false false false true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii true true true true true false true false)
(String
(Ascii true false true true false true true false)
(String (..) (..)))))))))))
(@cons Type nat (@cons Type (ascii -> bool) (@nil Type)))
(@Some Type bool))
(S (S (S (S (S O)))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii true true true false false true true false)
(String (Ascii true false true false false true true false)
(String (Ascii false false true false true true true false) EmptyString)))
(@cons Type nat (@nil Type))
(@Some Type ascii))
(S (S (S (S O))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii false false true true false true true false)
(String
(Ascii true false true false false true true false)
(String
(Ascii false true true true false true true false)
(String
(Ascii true true true false false true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii false false false true false true true false) EmptyString))))))
(@nil Type)
(@Some Type nat))
(S (S (S O)))
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii false false true false true true true false)
(String
(Ascii true false false false false true true false)
(String
(Ascii true true false true false true true false)
(String
(Ascii true false true false false true true false) EmptyString))))
(@cons Type nat (@nil Type))
(@None Type))
(S (S O))
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii false false true false false true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true true true true false true true false)
(String
(Ascii false false false false true true true false) EmptyString))))
(@cons Type nat (@nil Type))
(@None Type))
(S O)
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii true true false false true true true false)
(String
(Ascii false false false false true true true false)
(String
(Ascii false false true true false true true false)
(String
(Ascii true false false true false true true false)
(String (..) (..))))))
(@cons
Type
(prod nat (prod nat nat))
(@cons Type nat (@cons Type nat (@nil Type))))
(@Some Type (list nat))) O
(VectorDef.nil methSig))))))))
return (list Type)
with
| Build_methSig _ methDom _ => methDom
end
match
match idx in (Fin.t m') return (Vector.t methSig m' -> methSig) with
| Fin.F1 q =>
fun v : Vector.t methSig (S q) =>
match
v as v' in (Vector.t _ m)
return
(match m as m0 return (Vector.t methSig m0 -> Type) with
| O => fun _ : Vector.t methSig O => False -> True
| S n => fun _ : Vector.t methSig (S n) => methSig
end v')
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons h n _ => h
end
| Fin.FS q p' =>
fun v : Vector.t methSig (S q) =>
match
v as v' in (Vector.t _ m)
return
(match m as m0 return (Vector.t methSig m0 -> Type) with
| O => fun _ : Vector.t methSig O => False -> True
| S n => fun _ : Vector.t methSig (S n) => Fin.t n -> methSig
end v')
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons _ n t =>
fun p0 : Fin.t n =>
(fix nth_fix (m : nat) (v' : Vector.t methSig m) (p : Fin.t m) {struct v'} :
methSig :=
match p in (Fin.t m') return (Vector.t methSig m' -> methSig) with
| Fin.F1 q0 =>
fun v0 : Vector.t methSig (S q0) =>
match
v0 as v'0 in (Vector.t _ m0)
return
(match m0 as m1 return (Vector.t methSig m1 -> Type) with
| O => fun _ : Vector.t methSig O => False -> True
| S n0 => fun _ : Vector.t methSig (..) => methSig
end v'0)
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons h n0 _ => h
end
| Fin.FS q0 p'0 =>
fun v0 : Vector.t methSig (S q0) =>
match
v0 as v'0 in (Vector.t _ m0)
return
(match m0 as m1 return (.. -> ..) with
| O => fun _ : Vector.t methSig O => False -> True
| S n0 => fun _ : Vector.t methSig .. => Fin.t n0 -> methSig
end v'0)
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons _ n0 t0 => fun p1 : Fin.t n0 => nth_fix n0 t0 p1
end p'0
end v') n t p0
end p'
end
(VectorDef.cons methSig
(Build_methSig
(String (Ascii false false true false true true true false)
(String (Ascii true true true true false true true false)
(String (Ascii true true true true true false true false)
(String (Ascii true true false false true true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true false false true false true true false)
(String
(Ascii false true true true false true true false)
(String
(Ascii true true true false false true true false) EmptyString)))))))))
(@nil Type)
(@Some Type
match HSLM return Type with
| Build_StringLikeMin String0 _ _ => String0
end)) (S (S (S (S (S (S O))))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii true true false false false true true false)
(String (Ascii false false false true false true true false)
(String (Ascii true false false false false true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true true true true true false true false)
(String
(Ascii true false false false false true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii true true true true true false true false)
(String
(Ascii true false true true false true true false)
(String (..) (..)))))))))))
(@cons Type nat (@cons Type (ascii -> bool) (@nil Type)))
(@Some Type bool))
(S (S (S (S (S O)))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii true true true false false true true false)
(String (Ascii true false true false false true true false)
(String (Ascii false false true false true true true false) EmptyString)))
(@cons Type nat (@nil Type))
(@Some Type ascii))
(S (S (S (S O))))
(VectorDef.cons methSig
(Build_methSig
(String (Ascii false false true true false true true false)
(String
(Ascii true false true false false true true false)
(String
(Ascii false true true true false true true false)
(String
(Ascii true true true false false true true false)
(String
(Ascii false false true false true true true false)
(String
(Ascii false false false true false true true false) EmptyString))))))
(@nil Type)
(@Some Type nat))
(S (S (S O)))
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii false false true false true true true false)
(String
(Ascii true false false false false true true false)
(String
(Ascii true true false true false true true false)
(String
(Ascii true false true false false true true false) EmptyString))))
(@cons Type nat (@nil Type))
(@None Type))
(S (S O))
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii false false true false false true true false)
(String
(Ascii false true false false true true true false)
(String
(Ascii true true true true false true true false)
(String
(Ascii false false false false true true true false) EmptyString))))
(@cons Type nat (@nil Type))
(@None Type))
(S O)
(VectorDef.cons methSig
(Build_methSig
(String
(Ascii true true false false true true true false)
(String
(Ascii false false false false true true true false)
(String
(Ascii false false true true false true true false)
(String
(Ascii true false false true false true true false)
(String (..) (..))))))
(@cons
Type
(prod nat (prod nat nat))
(@cons Type nat (@cons Type nat (@nil Type))))
(@Some Type (list nat))) O
(VectorDef.nil methSig))))))))
return (option Type)
with
| Build_methSig _ _ methCod => methCod
end)
(@Fin.FS (S (S (S (S (S (S O))))))
(@Fin.FS (S (S (S (S (S O)))))
(@Fin.FS (S (S (S (S O))))
(@Fin.FS (S (S (S O))) (@Fin.FS (S (S O)) (@Fin.FS (S O) (@Fin.F1 O)))))))))))
(methCod
(Build_methSig
(String (Ascii true true false false true true true false)
(String (Ascii false false false false true true true false)
(String (Ascii false false true true false true true false)
(String (Ascii true false false true false true true false)
(String (Ascii false false true false true true true false)
(String (Ascii true true false false true true true false) EmptyString))))))
(@fst (list Type) (option Type)
((fun idx : Fin.t (S (S (S (S (S (S (S O))))))) =>
@pair (list Type) (option Type)
match
match idx in (Fin.t m') return (Vector.t methSig m' -> methSig) with
| Fin.F1 q =>
fun v : Vector.t methSig (S q) =>
match
v as v' in (Vector.t _ m)
return
(match m as m0 return (Vector.t methSig m0 -> Type) with
| O => fun _ : Vector.t methSig O => False -> True
| S n => fun _ : Vector.t methSig (S n) => methSig
end v')
with
| Vector.nil => fun devil : False => match devil return True with
end
| Vector.cons h n _ => h
end
| Fin.FS q p' =>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 5:29 Documentation on debugging regexp performance Clément Pit--Claudel
@ 2016-01-21 6:36 ` Yuri Khan
2016-01-21 9:39 ` Alexis
2016-01-21 11:42 ` Wolfgang Jenkner
2016-01-21 15:27 ` Alan Mackenzie
2 siblings, 1 reply; 13+ messages in thread
From: Yuri Khan @ 2016-01-21 6:36 UTC (permalink / raw)
To: Clément Pit--Claudel; +Cc: Emacs developers
On Thu, Jan 21, 2016 at 11:29 AM, Clément Pit--Claudel
<clement.pit@gmail.com> wrote:
> Hi emacs-devel,
>
> I'm running into a surprising regular expressions issue. I have attached a file (~50k) in which (re-search-forward " +[^:=]+ +:=?") seems to be extremely slow. (I killed it after 30 seconds). Truncating the file to its first 20 lines reduces the time for re-search-forward to about a second, which is still extremely slow.
I’m no expert on the Emacs regexp implementation, but this part is
ambiguous: "[^:=]+ +". The engine will have to backtrack at least once
because the first part will greedily slurp all spaces, then the second
part will not match. You might want to add the space to the exclusion
character class: "[^:= ]+ +".
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 6:36 ` Yuri Khan
@ 2016-01-21 9:39 ` Alexis
2016-01-21 13:22 ` Clément Pit--Claudel
2016-01-21 22:10 ` Marcin Borkowski
0 siblings, 2 replies; 13+ messages in thread
From: Alexis @ 2016-01-21 9:39 UTC (permalink / raw)
To: emacs-devel
Yuri Khan <yuri.v.khan@gmail.com> writes:
>> I'm running into a surprising regular expressions issue. I have
>> attached a file (~50k) in which (re-search-forward " +[^:=]+
>> +:=?") seems to be extremely slow. (I killed it after 30
>> seconds). Truncating the file to its first 20 lines reduces the
>> time for re-search-forward to about a second, which is still
>> extremely slow.
>
> I’m no expert on the Emacs regexp implementation, but this part
> is ambiguous: "[^:=]+ +". The engine will have to backtrack at
> least once because the first part will greedily slurp all
> spaces, then the second part will not match. You might want to
> add the space to the exclusion character class: "[^:= ]+ +".
More generally, i highly recommend Jeffrey Friedl's book
"Mastering Regular Expressions". It's not Emacs-specific, but it
provides in-depth explanations of why certain regexen are time-
and/or space-hungry.
Alexis.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 5:29 Documentation on debugging regexp performance Clément Pit--Claudel
2016-01-21 6:36 ` Yuri Khan
@ 2016-01-21 11:42 ` Wolfgang Jenkner
2016-01-21 16:38 ` Clément Pit--Claudel
2016-01-21 15:27 ` Alan Mackenzie
2 siblings, 1 reply; 13+ messages in thread
From: Wolfgang Jenkner @ 2016-01-21 11:42 UTC (permalink / raw)
To: Clément Pit--Claudel; +Cc: Emacs developers
On Thu, Jan 21 2016, Clément Pit--Claudel wrote:
> I'm running into a surprising regular expressions issue. I have attached a file (~50k) in which (re-search-forward " +[^:=]+ +:=?") seems to be extremely slow. (I killed it after 30 seconds). Truncating the file to its first 20 lines reduces the time for re-search-forward to about a second, which is still extremely slow.
Perhaps you meant
(re-search-forward " +[^:=\n]+ +:=?")
Cf. (info "(elisp) Regexp Special"), in particular the section about
"complemented character alternative".
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 9:39 ` Alexis
@ 2016-01-21 13:22 ` Clément Pit--Claudel
2016-01-21 22:10 ` Marcin Borkowski
1 sibling, 0 replies; 13+ messages in thread
From: Clément Pit--Claudel @ 2016-01-21 13:22 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 436 bytes --]
On 01/21/2016 04:39 AM, Alexis wrote:
> More generally, i highly recommend Jeffrey Friedl's book "Mastering
> Regular Expressions". It's not Emacs-specific, but it provides
> in-depth explanations of why certain regexen are time- and/or
> space-hungry.
Thanks for the suggestion. I think I do need something Emacs-specific, however: Python's regexp engine has no trouble at all with the example provided; neither does grep's.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 5:29 Documentation on debugging regexp performance Clément Pit--Claudel
2016-01-21 6:36 ` Yuri Khan
2016-01-21 11:42 ` Wolfgang Jenkner
@ 2016-01-21 15:27 ` Alan Mackenzie
2016-01-21 16:37 ` Clément Pit--Claudel
2 siblings, 1 reply; 13+ messages in thread
From: Alan Mackenzie @ 2016-01-21 15:27 UTC (permalink / raw)
To: Clément Pit--Claudel; +Cc: Emacs developers
Hello, Clément.
On Thu, Jan 21, 2016 at 12:29:58AM -0500, Clément Pit--Claudel wrote:
> Hi emacs-devel,
> I'm running into a surprising regular expressions issue. I have
> attached a file (~50k) in which (re-search-forward " +[^:=]+ +:=?")
> seems to be extremely slow. (I killed it after 30 seconds). Truncating
> the file to its first 20 lines reduces the time for re-search-forward
> to about a second, which is still extremely slow.
> Are there good resources on how to rewrite regexps to make them
> Emacs-friendly? I didn't find such documentation, and I'm puzzled as to
> what could make the regexp above hard to re-search-forward for.
> Cheers,
> Clément.
" +[^:=]+ +:=?" is an ill-formed regexp - if you get lots of spaces in
a non-match, the Emacs regexp engine will try all possible ways of
matching these spaces before giving up. You have three concatenated
sub-expressions, all of which match any number of spaces, namely:
" +[^:=]+ +"
1122222233
I would suggest reformulating it thus:
" +[^:= ][^:=]+ "
112222223333334
Subexpression 1 matches ALL the leading spaces. Subexp 2 is exactly one
character which can't be a space. Subexp 3 matches almost anything,
including spaces, and subexp 4 matches a single space at the end (to make
sure there is at least one space there).
All the best with your regexp!
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 15:27 ` Alan Mackenzie
@ 2016-01-21 16:37 ` Clément Pit--Claudel
2016-01-21 17:16 ` Alan Mackenzie
0 siblings, 1 reply; 13+ messages in thread
From: Clément Pit--Claudel @ 2016-01-21 16:37 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: Emacs developers
[-- Attachment #1: Type: text/plain, Size: 1405 bytes --]
On 01/21/2016 10:27 AM, Alan Mackenzie wrote:
> Hello, Clément.
Hi Alan!
> " +[^:=]+ +:=?" is an ill-formed regexp - if you get lots of spaces in
> a non-match, the Emacs regexp engine will try all possible ways of
> matching these spaces before giving up. You have three concatenated
> sub-expressions, all of which match any number of spaces, namely:
>
> " +[^:=]+ +"
> 1122222233
>
> I would suggest reformulating it thus:
>
> " +[^:= ][^:=]+ "
> 112222223333334
I think this has different semantics: my original regexp requires at least three spaces. But I think prepending spaces to yours fixes that.
>
> Subexpression 1 matches ALL the leading spaces.
> Subexp 2 is exactly one
> character which can't be a space. Subexp 3 matches almost anything,
> including spaces, and subexp 4 matches a single space at the end (to make
> sure there is at least one space there).
This is helpful, thanks! I realize however that maybe I oversimplified. The issue is that what I really want is something like this:
" +\\([^:=]+\\) +:=?"
IOW, I want to capture that first group.
> All the best with your regexp!
Thanks. Your points about backtracking were helpful as well. Do you know if there are technical reasons why Emacs chooses a backtracking implementation for this regexp (instead of compiling it to a linear-time matcher)?
Clément.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 11:42 ` Wolfgang Jenkner
@ 2016-01-21 16:38 ` Clément Pit--Claudel
0 siblings, 0 replies; 13+ messages in thread
From: Clément Pit--Claudel @ 2016-01-21 16:38 UTC (permalink / raw)
To: Emacs developers
[-- Attachment #1: Type: text/plain, Size: 560 bytes --]
On 01/21/2016 06:42 AM, Wolfgang Jenkner wrote:
> On Thu, Jan 21 2016, Clément Pit--Claudel wrote:
>
>> I'm running into a surprising regular expressions issue. I have attached a file (~50k) in which (re-search-forward " +[^:=]+ +:=?") seems to be extremely slow. (I killed it after 30 seconds). Truncating the file to its first 20 lines reduces the time for re-search-forward to about a second, which is still extremely slow.
>
> Perhaps you meant
>
> (re-search-forward " +[^:=\n]+ +:=?")
I don't think so; I do want newlines in there.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 16:37 ` Clément Pit--Claudel
@ 2016-01-21 17:16 ` Alan Mackenzie
2016-01-23 6:12 ` Stefan Monnier
0 siblings, 1 reply; 13+ messages in thread
From: Alan Mackenzie @ 2016-01-21 17:16 UTC (permalink / raw)
To: Clément Pit--Claudel; +Cc: Emacs developers
Hello again Clément.
On Thu, Jan 21, 2016 at 11:37:48AM -0500, Clément Pit--Claudel wrote:
> On 01/21/2016 10:27 AM, Alan Mackenzie wrote:
> Hi Alan!
> > " +[^:=]+ +:=?" is an ill-formed regexp - if you get lots of spaces in
> > a non-match, the Emacs regexp engine will try all possible ways of
> > matching these spaces before giving up. You have three concatenated
> > sub-expressions, all of which match any number of spaces, namely:
> > " +[^:=]+ +"
> > 1122222233
> > I would suggest reformulating it thus:
> > " +[^:= ][^:=]+ "
> > 112222223333334
> I think this has different semantics: my original regexp requires at
> least three spaces. But I think prepending spaces to yours fixes that.
Sorry, yes, I'd extracted the interesting bit of your regexp, and forgot
that I'd done so.
> > Subexpression 1 matches ALL the leading spaces.
> > Subexp 2 is exactly one
> > character which can't be a space. Subexp 3 matches almost anything,
> > including spaces, and subexp 4 matches a single space at the end (to make
> > sure there is at least one space there).
> This is helpful, thanks! I realize however that maybe I
> oversimplified. The issue is that what I really want is something like
> this:
> " +\\([^:=]+\\) +:=?"
> IOW, I want to capture that first group.
That is ambiguous. But if we can assume that the first group always
begins with a non-space, and always ends with a non-space, then we can
reformulate the above as:
" +\\([^:= ]\\([^:=]+[^:= ]\\)?\\) +:=?"
^
(or something similar - I've not actually tested it). The ? inside the
first expression is to cope with there just being 1 single character
matched by the group.
> > All the best with your regexp!
> Thanks. Your points about backtracking were helpful as well. Do you
> know if there are technical reasons why Emacs chooses a backtracking
> implementation for this regexp (instead of compiling it to a
> linear-time matcher)?
I'm afraid I don't know. It might be that compiling a regexp for a
linear-time matcher would be slower. Or, possibly, nobody has sat down
and hacked out a better regexp engine.
> Clément.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 9:39 ` Alexis
2016-01-21 13:22 ` Clément Pit--Claudel
@ 2016-01-21 22:10 ` Marcin Borkowski
2016-01-22 7:02 ` Alexis
2016-01-22 14:32 ` Clément Pit--Claudel
1 sibling, 2 replies; 13+ messages in thread
From: Marcin Borkowski @ 2016-01-21 22:10 UTC (permalink / raw)
To: Alexis; +Cc: emacs-devel
On 2016-01-21, at 10:39, Alexis <flexibeast@gmail.com> wrote:
> More generally, i highly recommend Jeffrey Friedl's book
> "Mastering Regular Expressions". It's not Emacs-specific, but it
> provides in-depth explanations of why certain regexen are time-
> and/or space-hungry.
Also, this: https://swtch.com/~rsc/regexp/regexp1.html . (Btw, the author
criticizes Friedl very strongly at the end; I am not sure whether this
is deserved. Still, a very good read it is.)
> Alexis.
Hth,
--
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 22:10 ` Marcin Borkowski
@ 2016-01-22 7:02 ` Alexis
2016-01-22 14:32 ` Clément Pit--Claudel
1 sibling, 0 replies; 13+ messages in thread
From: Alexis @ 2016-01-22 7:02 UTC (permalink / raw)
To: Marcin Borkowski; +Cc: emacs-devel
Marcin Borkowski <mbork@mbork.pl> writes:
> Also, this: https://swtch.com/~rsc/regexp/regexp1.html . (Btw,
> the author criticizes Friedl very strongly at the end; I am not
> sure whether this is deserved. Still, a very good read it is.)
That looks very interesting indeed - thanks!
Alexis.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 22:10 ` Marcin Borkowski
2016-01-22 7:02 ` Alexis
@ 2016-01-22 14:32 ` Clément Pit--Claudel
1 sibling, 0 replies; 13+ messages in thread
From: Clément Pit--Claudel @ 2016-01-22 14:32 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 580 bytes --]
On 01/21/2016 05:10 PM, Marcin Borkowski wrote:
>
> On 2016-01-21, at 10:39, Alexis <flexibeast@gmail.com> wrote:
>
>> More generally, i highly recommend Jeffrey Friedl's book
>> "Mastering Regular Expressions". It's not Emacs-specific, but it
>> provides in-depth explanations of why certain regexen are time-
>> and/or space-hungry.
>
> Also, this: https://swtch.com/~rsc/regexp/regexp1.html . (Btw, the author
> criticizes Friedl very strongly at the end; I am not sure whether this
> is deserved. Still, a very good read it is.)
Indeed, a great read!
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Documentation on debugging regexp performance
2016-01-21 17:16 ` Alan Mackenzie
@ 2016-01-23 6:12 ` Stefan Monnier
0 siblings, 0 replies; 13+ messages in thread
From: Stefan Monnier @ 2016-01-23 6:12 UTC (permalink / raw)
To: emacs-devel
> I'm afraid I don't know. It might be that compiling a regexp for a
> linear-time matcher would be slower. Or, possibly, nobody has sat down
> and hacked out a better regexp engine.
That's about right. I'd love to use some newer linear-time
regexp-engine.
Stefan
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-01-23 6:12 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-21 5:29 Documentation on debugging regexp performance Clément Pit--Claudel
2016-01-21 6:36 ` Yuri Khan
2016-01-21 9:39 ` Alexis
2016-01-21 13:22 ` Clément Pit--Claudel
2016-01-21 22:10 ` Marcin Borkowski
2016-01-22 7:02 ` Alexis
2016-01-22 14:32 ` Clément Pit--Claudel
2016-01-21 11:42 ` Wolfgang Jenkner
2016-01-21 16:38 ` Clément Pit--Claudel
2016-01-21 15:27 ` Alan Mackenzie
2016-01-21 16:37 ` Clément Pit--Claudel
2016-01-21 17:16 ` Alan Mackenzie
2016-01-23 6:12 ` Stefan Monnier
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.