unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#17373: 24.3.50; match data is incorrect if there are too many groups
@ 2014-04-29 19:19 Nicolas Richard
  2014-05-19  5:47 ` Paul Eggert
  0 siblings, 1 reply; 4+ messages in thread
From: Nicolas Richard @ 2014-04-29 19:19 UTC (permalink / raw)
  To: 17373

Hi,

The following reports 2. Replace 255 by 254, and it'll report 512 as expected
#+BEGIN_SRC emacs-lisp
  (with-temp-buffer
    (insert "bar")
    (when
        (re-search-backward
         (concat
          (mapconcat (lambda (x) (format "\\(%s\\)" x)) (make-list 255 "foo") "\\|")
          "\\|"
          "\\(bar\\)")
         nil t)
      (length (match-data))))
#+END_SRC

Regexps with many groups is the kind of thing is used in AUCTeX, in
TeX-auto-parse-region. What auctex does in that function is construct a
big regexp out of a list of smaller ones (each small one is made into a
group) ; then when the big regexp matches it then tries to find out
which of the smaller regexps actually matched by checking which group is
non-nil.

In GNU Emacs 24.3.50.7 (i686-pc-linux-gnu, GTK+ Version 2.24.20)
 of 2014-04-10 on LDLC-portable
Windowing system distributor `The X.Org Foundation', version 11.0.11405000
System Description:	Ubuntu 13.10

Configured using:
 `configure 'CFLAGS=-g3 -O2''

Important settings:
  value of $LANG: fr_BE.UTF-8
  locale-coding-system: utf-8-unix

-- 
Nico.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#17373: 24.3.50; match data is incorrect if there are too many groups
  2014-04-29 19:19 bug#17373: 24.3.50; match data is incorrect if there are too many groups Nicolas Richard
@ 2014-05-19  5:47 ` Paul Eggert
  2014-05-19 13:48   ` Drew Adams
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Eggert @ 2014-05-19  5:47 UTC (permalink / raw)
  To: 17373

Yes, unfortunately Emacs currently has a limit of at most 256 groups of 
match data: one for the entire pattern, and 255 for parenthesized 
subpatterns.  If you go over the limit, the excess matches are silently 
discarded.  I don't see this limitation documented anywhere; it should 
be.  Or better yet, the limitation should be removed.

The limitation is wired into the representation of the 'start_memory' 
code in compiled regular expressions: this code has a one-byte operand. 
  As far as I know, the limitation is specific to Emacs, and is not 
present in the Gnulib or glibc versions of the regexp matcher.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#17373: 24.3.50; match data is incorrect if there are too many groups
  2014-05-19  5:47 ` Paul Eggert
@ 2014-05-19 13:48   ` Drew Adams
  2016-02-10 17:11     ` Marcin Borkowski
  0 siblings, 1 reply; 4+ messages in thread
From: Drew Adams @ 2014-05-19 13:48 UTC (permalink / raw)
  To: Paul Eggert, 17373

> Yes, unfortunately Emacs currently has a limit of at most 256 groups of
> match data: one for the entire pattern, and 255 for parenthesized
> subpatterns.  If you go over the limit, the excess matches are silently
> discarded.  I don't see this limitation documented anywhere; it should
> be.  Or better yet, the limitation should be removed.

Good to know.  +1, to documenting it, at least.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#17373: 24.3.50; match data is incorrect if there are too many groups
  2014-05-19 13:48   ` Drew Adams
@ 2016-02-10 17:11     ` Marcin Borkowski
  0 siblings, 0 replies; 4+ messages in thread
From: Marcin Borkowski @ 2016-02-10 17:11 UTC (permalink / raw)
  To: 17373; +Cc: Paul Eggert

On 2014-05-19, at 07:48, Drew Adams <drew.adams@oracle.com> wrote:

>> Yes, unfortunately Emacs currently has a limit of at most 256 groups of
>> match data: one for the entire pattern, and 255 for parenthesized
>> subpatterns.  If you go over the limit, the excess matches are silently
>> discarded.  I don't see this limitation documented anywhere; it should
>> be.  Or better yet, the limitation should be removed.
>
> Good to know.  +1, to documenting it, at least.

I can write a patch to the manual, but I'm a bit afraid that if this
gets documented, the limit will stay there forever.  Is there a chance
of someone fluent in C to fix this?

(Incidentally, I have one package of mine where this limit could strike,
too.)

Best,

--
Marcin Borkowski





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-02-10 17:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-29 19:19 bug#17373: 24.3.50; match data is incorrect if there are too many groups Nicolas Richard
2014-05-19  5:47 ` Paul Eggert
2014-05-19 13:48   ` Drew Adams
2016-02-10 17:11     ` Marcin Borkowski

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).