* bug#16046: Bug with Regexp Containing only a Character Class with a Caret
@ 2013-12-04 4:57 Cameron Desautels
2013-12-05 19:26 ` bug#16046: Bug with Regexp Containing only a Character Class with a Caret (PATCH) Cameron Desautels
0 siblings, 1 reply; 3+ messages in thread
From: Cameron Desautels @ 2013-12-04 4:57 UTC (permalink / raw)
To: 16046
Hi all,
I've run across a dilemma, in the most literal sense: either there's a
problem in Emacs's regexp engine or there's an issue with
`regexp-opt-charset`---I'm not sure which.
The issue has to do with regular expressions containing character
classes with only a caret character. I know this seems like a rather
silly case (why not just use "\\^"?) but it came up in the context of
trying to track down a bug in ruby-mode, so it does occur in real (and
particularly *programmatic*) settings.
The simplest case to reproduce is the following:
(re-search-forward "[^]")
; => Debugger entered--Lisp error: (invalid-regexp "Unmatched [ or [^")
; re-search-forward("[^]")
; eval((re-search-forward "[^]") nil)
; eval-last-sexp-1(t)
; eval-last-sexp(t)
; eval-print-last-sexp()
; call-interactively(eval-print-last-sexp record nil)
; command-execute(eval-print-last-sexp record)
; execute-extended-command(nil "eval-print-last-sexp")
; call-interactively(execute-extended-command nil nil)
Now, you can make a compelling case that that's not a valid regexp
(and the Emacs Lisp Reference Manual doesn't seem to *directly*
contradict this argument), but that presents a problem when paired
with `regexp-opt-charset`:
(regexp-opt-charset '(?^))
=> "[^]"
Note that that produces the problem regexp; which is to say that the
following code is bound to fail when it should succeed:
(re-search-forward (regexp-opt-charset '(?^)))
What's the correct behavior? I'd be happy to offer a patch for either
side of the equation but I'm not sure which one to target.
All the best.
-- Cameron
In GNU Emacs 24.3.1 (x86_64-apple-darwin11.4.2, Carbon Version 1.6.0
AppKit 1138.51)
of 2013-05-13 on atago
Windowing system distributor `Apple Inc.', version 10.9.0
Configured using:
`configure '--with-mac'
'--enable-mac-app=/Users/xin/Documents/emacs-mac-port/build'
'--prefix=/Users/xin/Documents/emacs-mac-port/build''
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
default enable-multibyte-characters: t
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Load-path shadows:
/Applications/Emacs.app/Contents/Resources/lisp/.dir-locals hides
/Applications/Emacs.app/Contents/Resources/lisp/gnus/.dir-locals
Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils help-mode easymenu debug time-date tooltip
ediff-hook vc-hooks lisp-float-type mwheel mac-win tool-bar dnd fontset
image regexp-opt fringe tabulated-list newcomment lisp-mode register
page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core frame cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew
greek romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote mac multi-tty make-network-process emacs)
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#16046: Bug with Regexp Containing only a Character Class with a Caret (PATCH)
2013-12-04 4:57 bug#16046: Bug with Regexp Containing only a Character Class with a Caret Cameron Desautels
@ 2013-12-05 19:26 ` Cameron Desautels
2013-12-05 20:26 ` Stefan Monnier
0 siblings, 1 reply; 3+ messages in thread
From: Cameron Desautels @ 2013-12-05 19:26 UTC (permalink / raw)
To: 16046
[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]
After further experimentation, I suspect that "[^]" is simply not
a valid regular expression. For instance, grep(1) gives the
following behavior:
$ echo "^" | grep "[^]"
grep: brackets ([ ]) not balanced
This suggests that the broken behavior is within
`regexp-opt-charset`. I've attached a patch for that function.
Here are some test cases which reveal the behavior of the unpatched
and patched versions of the function (the only difference is the
handling of the "[^]" case):
;; Pre-patch
(regexp-opt-charset (list ?^)) ; "[^]"
(regexp-opt-charset (list ?^ ?a)) ; "[a^]"
(regexp-opt-charset (list ?^ ?-)) ; "[-^]"
(regexp-opt-charset (list ?^ ?\])) ; "[]^]"
(regexp-opt-charset (list ?^ ?- ?\])) ; "[]^-]"
;; Post-patch
(regexp-opt-charset (list ?^)) ; "\\^"
(regexp-opt-charset (list ?^ ?a)) ; "[a^]"
(regexp-opt-charset (list ?^ ?-)) ; "[-^]"
(regexp-opt-charset (list ?^ ?\])) ; "[]^]"
(regexp-opt-charset (list ?^ ?- ?\])) ; "[]^-]"
--
Cameron Desautels <camdez@gmail.com>
[-- Attachment #2: regexp-opt.el.diff --]
[-- Type: text/plain, Size: 808 bytes --]
*** regexp-opt.el.orig Thu Dec 5 11:17:19 2013
--- regexp-opt.el Thu Dec 5 11:19:31 2013
*************** CHARS should be a list of characters."
*** 285,291 ****
;;
;; Make sure a caret is not first and a dash is first or last.
(if (and (string-equal charset "") (string-equal bracket ""))
! (concat "[" dash caret "]")
(concat "[" bracket charset caret dash "]"))))
(provide 'regexp-opt)
--- 285,293 ----
;;
;; Make sure a caret is not first and a dash is first or last.
(if (and (string-equal charset "") (string-equal bracket ""))
! (if (string-equal dash "")
! "\\^" ; [^] is not a valid regexp
! (concat "[" dash caret "]"))
(concat "[" bracket charset caret dash "]"))))
(provide 'regexp-opt)
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#16046: Bug with Regexp Containing only a Character Class with a Caret (PATCH)
2013-12-05 19:26 ` bug#16046: Bug with Regexp Containing only a Character Class with a Caret (PATCH) Cameron Desautels
@ 2013-12-05 20:26 ` Stefan Monnier
0 siblings, 0 replies; 3+ messages in thread
From: Stefan Monnier @ 2013-12-05 20:26 UTC (permalink / raw)
To: Cameron Desautels; +Cc: 16046-done
> After further experimentation, I suspect that "[^]" is simply not
> a valid regular expression.
Indeed, according to the documentation, for ^ to be treated as itself,
it needs to be "not the first char", but since we have nothing else to
put there, we're kind of screwed.
> This suggests that the broken behavior is within
> `regexp-opt-charset`. I've attached a patch for that function.
Thank you for tracking down the problem and providing a fix. I just
installed it in trunk, closing,
Stefan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-12-05 20:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-04 4:57 bug#16046: Bug with Regexp Containing only a Character Class with a Caret Cameron Desautels
2013-12-05 19:26 ` bug#16046: Bug with Regexp Containing only a Character Class with a Caret (PATCH) Cameron Desautels
2013-12-05 20:26 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).