From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Limits on the regexp string length Date: Wed, 21 Dec 2022 15:39:27 +0200 Message-ID: <83zgbganao.fsf@gnu.org> References: <87h6xox6m8.fsf@localhost> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17438"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Ihor Radchenko Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Dec 21 14:40:38 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1p7zKn-0004Lv-Sn for ged-emacs-devel@m.gmane-mx.org; Wed, 21 Dec 2022 14:40:38 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p7zJr-0003L2-Lu; Wed, 21 Dec 2022 08:39:39 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p7zJl-0003Gd-7j for emacs-devel@gnu.org; Wed, 21 Dec 2022 08:39:33 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p7zJk-0001ep-VS; Wed, 21 Dec 2022 08:39:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=kbMbx7huqg8Ow0Cjr1bCE4M6IEhMX9WHGey4Ows7FB4=; b=i/vN9IENg2ia j/MQc7jKSwnV9MiwpycWdk09Mk1ReNcL2s3ZlDZP9ka3tfpTGjmiCqjgGZPAFxeQmN3Sh+GwEm9xv LLZ31H/ez6cknruBffuvKCgGOEv0yzs6/MZIinrtb2kf7Nii0JKri6RgFqGLKIwuDPMRuPWr4dpD7 tmfn9ToXJG5vv/hTaU4uic72BtMQlAYoH79Lr6qOvJ4yt3x1h8Ww0jocZoFmluMCBa2VTdzv1RJO0 arvzelj+orgW6wsocTTYlpVaY/r+4v1Btrb45sr1uEzYFrwtubjhApR9qEqpexXp+Sve2jVxh8u3o VSzSQaIzA0kqLShJM2ndVQ==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p7zJk-0008M5-Bp; Wed, 21 Dec 2022 08:39:32 -0500 In-Reply-To: <87h6xox6m8.fsf@localhost> (message from Ihor Radchenko on Wed, 21 Dec 2022 12:51:11 +0000) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:301747 Archived-At: > From: Ihor Radchenko > Date: Wed, 21 Dec 2022 12:51:11 +0000 > > It looks like the length of regular expressions in Emacs is limited and > regexps exceeding this length cause error being thrown: "Regular > expression too big". > > Is there any rationale behind this limit? Can we increase it somehow > from Elisp? See this part of regex-emacs.c: /* This is not an arbitrary limit: the arguments which represent offsets into the pattern are two bytes long. So if 2^15 bytes turns out to be too small, many things would have to change. */ # define MAX_BUF_SIZE (1 << 15) /* Extend the buffer by at least N bytes via realloc and reset the pointers that pointed into the old block to point to the correct places in the new one. If extending the buffer results in it being larger than MAX_BUF_SIZE, then flag memory exhausted. */ #define EXTEND_BUFFER(n) \ do { \ ptrdiff_t requested_extension = n; \ unsigned char *old_buffer = bufp->buffer; \ if (MAX_BUF_SIZE - bufp->allocated < requested_extension) \ return REG_ESIZE; \