unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#11216: 23.4; parenthesis matching breaks on certain complex expressions
@ 2012-04-11  0:18 Nathan Trapuzzano
  2012-04-11  7:09 ` Andreas Schwab
  2012-04-12  2:08 ` Stefan Monnier
  0 siblings, 2 replies; 4+ messages in thread
From: Nathan Trapuzzano @ 2012-04-11  0:18 UTC (permalink / raw)
  To: 11216

[-- Attachment #1: Type: text/plain, Size: 2944 bytes --]

Here's a complex regular expression that breaks parenthesis matching
(and yes, that's a real regular expression generated from a real perl
program).


M[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*H[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*\=[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*N[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*I[\=\/\\]?[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*N(?![\x21\x27\x2a\x2d\x2f\x3d\x41-\x5a\x5c\x61-\x7a\x7c]) (?<!S\d)(?<!\-\ [@"]\d\ [\x80-\xff])(?<!\-[\x80-\xff][\x80-\xff])(?<!\-[\x80-\xff])(?<![\x27-\x29\x2f\x3d\x41-\x5a\x7c]\*)(?<![\x27-\x29\x2f\x3d\x41-\x5a\x7c])(?:
 A\)[\/\\]|\*\)[\/\\]A[\=\/\\]?)[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*E[\=\/\\]?[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*I[\=\/\\]?[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*D[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*E[\=\/\\]?(?![\x21\x27\x2a\x2d\x2f\x3d\x41-\x5a\x5c\x61-\x7a\x7c]) (?<!S\d)(?<!\-\ [@"]\d\ [\x80-\xff])(?<!\-[\x80-\xff][\x80-\xff])(?<!\-[\x80-\xff])(?<![\x27-\x29\x2f\x3d\x41-\x5a
 \x7c]\*)(?<![\x27-\x29\x2f\x3d\x41-\x5a\x7c])Q[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*E[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*A[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*[\/\\]

Matching gets messed up with the open parenthesis immediately following
the first (?<!S\d). I suspect this is due to the opening double-quote
about 10 characters later.

I noticed that the behavior of show-paren-mode changes depending on the
major mode. For example, the behavior described above happens in
fundamental mode, whereas when I switch to text mode, quotation marks
are ignored. However, switching to text mode also causes paren-matching
to ignore back-slashes and thus escaped parentheses/brackets. I think
the best fix would be to enable customization of show-paren-mode so
that the user can specify which characters should be ignored when
matching parentheses.

I've also attached a file containing the regexp in question in case the long line gets broken up over mail transmission.

[-- Attachment #2: regexp.txt --]
[-- Type: text/plain, Size: 1996 bytes --]

M[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*H[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*\=[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*N[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*I[\=\/\\]?[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*N(?![\x21\x27\x2a\x2d\x2f\x3d\x41-\x5a\x5c\x61-\x7a\x7c]) (?<!S\d)(?<!\-\ [@"]\d\ [\x80-\xff])(?<!\-[\x80-\xff][\x80-\xff])(?<!\-[\x80-\xff])(?<![\x27-\x29\x2f\x3d\x41-\x5a\x7c]\*)(?<![\x27-\x29\x2f\x3d\x41-\x5a\x7c])(?:
 A\)[\/\\]|\*\)[\/\\]A[\=\/\\]?)[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*E[\=\/\\]?[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*I[\=\/\\]?[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*D[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*E[\=\/\\]?(?![\x21\x27\x2a\x2d\x2f\x3d\x41-\x5a\x5c\x61-\x7a\x7c]) (?<!S\d)(?<!\-\ [@"]\d\ [\x80-\xff])(?<!\-[\x80-\xff][\x80-\xff])(?<!\-[\x80-\xff])(?<![\x27-\x29\x2f\x3d\x41-\x5a
 \x7c]\*)(?<![\x27-\x29\x2f\x3d\x41-\x5a\x7c])Q[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*E[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*A[?+]?(?:-(?:[\x01-\x7f]*[\x00\x80-\xff]+))?[\x02-\x19\x22-\x27\x28-\x2e\x30-\x3c\x3e-\x40\x5b\x5d-\x7b\x7d-\xff]*[\/\\]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-01 20:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11  0:18 bug#11216: 23.4; parenthesis matching breaks on certain complex expressions Nathan Trapuzzano
2012-04-11  7:09 ` Andreas Schwab
2019-11-01 20:11   ` Stefan Kangas
2012-04-12  2:08 ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).