unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Limits on the regexp string length
@ 2022-12-21 12:51 Ihor Radchenko
  2022-12-21 13:39 ` Eli Zaretskii
  0 siblings, 1 reply; 2+ messages in thread
From: Ihor Radchenko @ 2022-12-21 12:51 UTC (permalink / raw)
  To: emacs-devel

Hi,

I am writing as a follow-up of a recent bug report we got in Org.

Rudolf Adamkovič <salutis@me.com> (December 14) (2022 emacs-orgmode.gnu.org inbox maillist replied)
Subject: Radio links work only in small numbers
https://orgmode.org/list/m2lenax5m6.fsf@me.com

It looks like the length of regular expressions in Emacs is limited and
regexps exceeding this length cause error being thrown: "Regular
expression too big".

Is there any rationale behind this limit? Can we increase it somehow
from Elisp?

The regexps in question are giant (or re1 re2 ...) where we are
searching for occurrences of word combinations from list.

The compiled discrete automata should not occupy too much memory. No more
than ~ max_phrase_length * char_table_size.

P.S. Note that `regexp-opt' is not suitable because we need to match
arbitrary numbers of newlines/spaces inside the word combinations
equally.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Limits on the regexp string length
  2022-12-21 12:51 Limits on the regexp string length Ihor Radchenko
@ 2022-12-21 13:39 ` Eli Zaretskii
  0 siblings, 0 replies; 2+ messages in thread
From: Eli Zaretskii @ 2022-12-21 13:39 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-devel

> From: Ihor Radchenko <yantar92@posteo.net>
> Date: Wed, 21 Dec 2022 12:51:11 +0000
> 
> It looks like the length of regular expressions in Emacs is limited and
> regexps exceeding this length cause error being thrown: "Regular
> expression too big".
> 
> Is there any rationale behind this limit? Can we increase it somehow
> from Elisp?

See this part of regex-emacs.c:

  /* This is not an arbitrary limit: the arguments which represent offsets
     into the pattern are two bytes long.  So if 2^15 bytes turns out to
     be too small, many things would have to change.  */
  # define MAX_BUF_SIZE (1 << 15)

  /* Extend the buffer by at least N bytes via realloc and
     reset the pointers that pointed into the old block to point to the
     correct places in the new one.  If extending the buffer results in it
     being larger than MAX_BUF_SIZE, then flag memory exhausted.  */
  #define EXTEND_BUFFER(n)						\
    do {								\
      ptrdiff_t requested_extension = n;				\
      unsigned char *old_buffer = bufp->buffer;				\
      if (MAX_BUF_SIZE - bufp->allocated < requested_extension)		\
	return REG_ESIZE;						\




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-12-21 13:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-21 12:51 Limits on the regexp string length Ihor Radchenko
2022-12-21 13:39 ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).