* regexp-quote missing escapes in grouping constructs - Bug?
@ 2008-06-12 23:39 St/n_P/rm/n
2008-06-13 6:17 ` Miles Bader
2008-06-13 6:20 ` Herbert Euler
0 siblings, 2 replies; 6+ messages in thread
From: St/n_P/rm/n @ 2008-06-12 23:39 UTC (permalink / raw)
To: emacs-devel
(regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}")
---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"
Am I misunderstanding something?
Shouldn't passing that string to regexp-quote give back something more
like this:
---> "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+(-\\|/)[0-9]\\{2,4\\}"
or *flinches at the thought*
---> "[0-9]\\\\{2,4\\\\}\\\\(-\\\\|/\\\\)[0-9]?+(-\\\\|/)[0-9]\\\\{2,4\\\\}"
-
Its possible that I am misunderstanding the function, and I'd rather
not file this as a bug report b/c I am using a Lennart's recent W32
patched...
GNU Emacs 23.0.60.1 (i386-mingw-nt5.1.2600)
of 2008-05-12 on LENNART-69DE564 (patched)
However, currently building a derived mode and the regexp-opt and
-quote are kinda required, esp. as there doesn't seem to be a clean
way to avoid passing everything around through multiple instances of
defconst defvar defcustom etc. just to "cache" keyword regexes for
font-lock
---
I find the following two most relevant to the matter at hand.
case a)
We get the requisite lisp reader 4x \\\\ for the group construct , but
the function not only misses the interior alternative escape but omits
it e.g.
(regexp-quote "\\(123\|567\\)")
---> "\\\\(123|567\\\\)"
case b)
In contrast, when we give him enough the double escape "\\" inside the
group he DOES catch the the escape and gives us 4x the \
(regexp-quote "\\(123\\|567\\)")
---> "\\\\(123\\\\|567\\\\)"
---
This doesn't seem like consistent behavior esp. as regexp-quote is
feeding regexp-opt elsewhere.
---
These others examples do not strike me as edge cases when
'manually-optimizing' a regex for font-locks:
(regexp-quote "\(123\|567\)")
---> "(123|567)"
(regexp-quote `(,"\(123\|567\)")
---> ("(123|567)")
(regexp-quote '("\(123|567\)")
---> ("(123|567)")
(regexp-quote "(123|567)")
---> "(123|567)"
(regexp-quote '"(123|567)")
---> "(123|567)"
---
again, maybe I am missing something but my head hurts... despite
having really come to appreciate emacs regexps :)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regexp-quote missing escapes in grouping constructs - Bug?
2008-06-12 23:39 regexp-quote missing escapes in grouping constructs - Bug? St/n_P/rm/n
@ 2008-06-13 6:17 ` Miles Bader
2008-06-13 17:36 ` MON KEY
2008-06-13 6:20 ` Herbert Euler
1 sibling, 1 reply; 6+ messages in thread
From: Miles Bader @ 2008-06-13 6:17 UTC (permalink / raw)
To: St/n_P/rm/n; +Cc: emacs-devel
"St/n_P/rm/n" <Stan@SandPframing.com> writes:
> (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}")
>
> ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"
>
> Am I misunderstanding something?
The backslashes you entered in the original lisp string were eaten by
the lisp reader, so there are no backslashes in the string. Since (, ),
|, etc., are not emacs regexp metacharacters (without a preceding
backslash), there's no need to quote them.
Here's what you probably meant:
(regexp-quote "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+\\(-\\|/\\)[0-9]\\{2,4\\}")
=> "\\[0-9]\\\\{2,4\\\\}\\\\(-\\\\|/\\\\)\\[0-9]\\?\\+\\\\(-\\\\|/\\\\)\\[0-9]\\\\{2,4\\\\}"
-Miles
--
Joy, n. An emotion variously excited, but in its highest degree arising from
the contemplation of grief in another.
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: regexp-quote missing escapes in grouping constructs - Bug?
2008-06-12 23:39 regexp-quote missing escapes in grouping constructs - Bug? St/n_P/rm/n
2008-06-13 6:17 ` Miles Bader
@ 2008-06-13 6:20 ` Herbert Euler
1 sibling, 0 replies; 6+ messages in thread
From: Herbert Euler @ 2008-06-13 6:20 UTC (permalink / raw)
To: St/n_P/rm/n, emacs-devel
> (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}")
>
> ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"
>
> Am I misunderstanding something?
> Shouldn't passing that string to regexp-quote give back something more
> like this:
>
> ---> "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+(-\\|/)[0-9]\\{2,4\\}"
I think this is because `regexp-quote' sees the string being processed
by the Lisp reader, i.e. "\{" ==> "{" in the internal representation,
while "\\{" ==> "\{":
ELISP> (regexp-quote "[0-9]\{2,4\}")
"\\[0-9]{2,4}"
ELISP> (regexp-quote "[0-9]\\{2,4\\}")
"\\[0-9]\\\\{2,4\\\\}"
ELISP>
Regards,
Guanpeng Xu
_________________________________________________________________
Explore the seven wonders of the world
http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regexp-quote missing escapes in grouping constructs - Bug?
2008-06-13 6:17 ` Miles Bader
@ 2008-06-13 17:36 ` MON KEY
2008-06-13 22:21 ` Stefan Monnier
2008-06-14 4:16 ` tomas
0 siblings, 2 replies; 6+ messages in thread
From: MON KEY @ 2008-06-13 17:36 UTC (permalink / raw)
To: Miles Bader; +Cc: emacs-devel
Y. but why are the "?" a"[" and "+" getting escaped regardless of the
presence of a preceding \ whereas the alternative "|" inside the
grouping construct isn't?
e.g.
example 1)
(regexp-quote "[0-9]{2,4}(-|/)[0-9]?+(-|/)[0-9]{2,4}")
---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"
as compared to 2);
(regexp-quote "[0-9]{2,4}(-?+/)[0-9]?+(-|/)[0-9]{2,4}")
---> "\\[0-9]{2,4}(-\\?\\+/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"
in the second case the ?+ nested inside the group is getting escaped.
Is the "|" not considered a special operator or emacs regexp
metacharacter in the regexp-quote situation?
And if so, why not?
The issue is that regexp-opt.el is calling regexp-quote
If i understand the implications of regexp-opt it is meant as a
helper function for passing regexps to font-lock-keywords and isn't
intended to accept or 'optimize' existing regexp "words". So, to feed
a well-formed regexp to font-lock-add-keywords I need to build the
regexp by hand. Likewise, that regexp needs to be passed as a string
with all special characters properly escaped e.g.
(defconst stupid-mode-keywords
'("^[A-z]\\?\\+\\(some\\|stupid\\|regexp\\)\\{2,4\\}" . my-stupid-mode-face))
Am I to understand that the ? and + should be escaped but the |
shouldn't be in order for the regexp to work with font-lock?
FWIW my epierience is otherwise, and the previous case doesn't work,
whereas the following does:
(defconst stupid-mode-keywords
'("^[A-z]?+\\(some\\|stupid\\|regexp\\)\\{2,4\\}" . my-stupid-mode-face))
For my purposes, the larger issue is that I can't find a sensible way
to cons or append a well formed regexp to an existing one without
running into regexp-quote and regexp-opt confusion esp. as I am
unclear as to the correctness of the quoting and escaping of regexps
for font-locking by the two respective functions.
The only solution that seems approachable is to make a new defconst
defvar and defface for each new regexp i wish to font-lock. This
approach is not really particularly maintanable over the longterm.
On Fri, Jun 13, 2008 at 2:17 AM, Miles Bader <miles.bader@necel.com> wrote:
> "St/n_P/rm/n" <Stan@SandPframing.com> writes:
>> (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}")
>>
>> ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"
>>
>> Am I misunderstanding something?
>
> The backslashes you entered in the original lisp string were eaten by
> the lisp reader, so there are no backslashes in the string. Since (, ),
> |, etc., are not emacs regexp metacharacters (without a preceding
> backslash), there's no need to quote them.
>
> Here's what you probably meant:
>
> (regexp-quote "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+\\(-\\|/\\)[0-9]\\{2,4\\}")
> => "\\[0-9]\\\\{2,4\\\\}\\\\(-\\\\|/\\\\)\\[0-9]\\?\\+\\\\(-\\\\|/\\\\)\\[0-9]\\\\{2,4\\\\}"
>
> -Miles
>
> --
> Joy, n. An emotion variously excited, but in its highest degree arising from
> the contemplation of grief in another.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regexp-quote missing escapes in grouping constructs - Bug?
2008-06-13 17:36 ` MON KEY
@ 2008-06-13 22:21 ` Stefan Monnier
2008-06-14 4:16 ` tomas
1 sibling, 0 replies; 6+ messages in thread
From: Stefan Monnier @ 2008-06-13 22:21 UTC (permalink / raw)
To: MON KEY; +Cc: emacs-devel, Miles Bader
> Y. but why are the "?" a"[" and "+" getting escaped regardless of the
> presence of a preceding \ whereas the alternative "|" inside the
> grouping construct isn't?
Please read the Elisp manual's section about regular expressions.
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regexp-quote missing escapes in grouping constructs - Bug?
2008-06-13 17:36 ` MON KEY
2008-06-13 22:21 ` Stefan Monnier
@ 2008-06-14 4:16 ` tomas
1 sibling, 0 replies; 6+ messages in thread
From: tomas @ 2008-06-14 4:16 UTC (permalink / raw)
To: MON KEY; +Cc: emacs-devel, Miles Bader
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Fri, Jun 13, 2008 at 01:36:07PM -0400, MON KEY wrote:
> Y. but why are the "?" a"[" and "+" getting escaped regardless of the
> presence of a preceding \ whereas the alternative "|" inside the
> grouping construct isn't?
Ah -- maybe somewhat unexpectedly to you (e.g. in Perl it is), the "|"
isn't special in Emacs. You have to spell "\|" to say "alternative".
HTH
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFIU0Y1Bcgs9XrR2kYRAjc/AJwPIAhKhoMdc5iDeFo6U3dNMAYn6wCeJZH9
FcFJY6lNWbQODDxxBlDm2Qw=
=O1bw
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-06-14 4:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-12 23:39 regexp-quote missing escapes in grouping constructs - Bug? St/n_P/rm/n
2008-06-13 6:17 ` Miles Bader
2008-06-13 17:36 ` MON KEY
2008-06-13 22:21 ` Stefan Monnier
2008-06-14 4:16 ` tomas
2008-06-13 6:20 ` Herbert Euler
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).