From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Michal Nazarewicz Newsgroups: gmane.emacs.bugs Subject: bug#24100: [PATCH 4/4] Hardcode regex syntax to remove dead code handling different syntax Date: Thu, 28 Jul 2016 20:07:17 +0200 Message-ID: <1469729237-14208-4-git-send-email-mina86@mina86.com> References: <1469729237-14208-1-git-send-email-mina86@mina86.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1469729314 22211 80.91.229.3 (28 Jul 2016 18:08:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 28 Jul 2016 18:08:34 +0000 (UTC) To: 24100@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Jul 28 20:08:25 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1bSpjM-000137-Ha for geb-bug-gnu-emacs@m.gmane.org; Thu, 28 Jul 2016 20:08:24 +0200 Original-Received: from localhost ([::1]:54949 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSpjG-0005QX-A3 for geb-bug-gnu-emacs@m.gmane.org; Thu, 28 Jul 2016 14:08:18 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37384) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSpj5-0005NY-A4 for bug-gnu-emacs@gnu.org; Thu, 28 Jul 2016 14:08:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bSpj1-0007sA-0k for bug-gnu-emacs@gnu.org; Thu, 28 Jul 2016 14:08:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:52275) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSpj0-0007s5-TA for bug-gnu-emacs@gnu.org; Thu, 28 Jul 2016 14:08:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bSpj0-0007kf-Pg for bug-gnu-emacs@gnu.org; Thu, 28 Jul 2016 14:08:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Michal Nazarewicz Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 28 Jul 2016 18:08:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24100 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24100-submit@debbugs.gnu.org id=B24100.146972925029732 (code B ref 24100); Thu, 28 Jul 2016 18:08:02 +0000 Original-Received: (at 24100) by debbugs.gnu.org; 28 Jul 2016 18:07:30 +0000 Original-Received: from localhost ([127.0.0.1]:49566 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSpiU-0007jT-7U for submit@debbugs.gnu.org; Thu, 28 Jul 2016 14:07:30 -0400 Original-Received: from mail-wm0-f51.google.com ([74.125.82.51]:38105) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSpiQ-0007it-Fb for 24100@debbugs.gnu.org; Thu, 28 Jul 2016 14:07:27 -0400 Original-Received: by mail-wm0-f51.google.com with SMTP id o80so119614498wme.1 for <24100@debbugs.gnu.org>; Thu, 28 Jul 2016 11:07:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=yuRGIQdlRq7cHnGDwlQGkEu2mrt1qZsoB8MsVp5Kyjk=; b=dE3afmsre0ODQeTj1/pb3kGKN27imU7VIQFB+6tGjJnDI4DS7BzdA4oHtYP2zt9D8O CvqpjuZhb4Gf+uzwfVRsDFe0LwEtk6nD5FHZo+xWnq1d91CS24sK/vekgvl11X4fiag2 YmOQAwXTZx2WSOXFm91uhhhBPrL8xSKNXWlVpSYHPpXCctWnsVCmjszYwEpQx25FBSu/ sHPuHPmUHunu43F+KcdHp1GlnP0ZGQYsFhn1q87x+h2bRMRpymSLvd3BiSv6uDOy58Ey n2u7sx8WfuUEA5qHfuNPNU5tFIaRKksQqXDc+uFPhLKWpNICR3j013SHpXVz+zWGte7R cYaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=yuRGIQdlRq7cHnGDwlQGkEu2mrt1qZsoB8MsVp5Kyjk=; b=ctvYFL8uJc9FiMJ5zXaT8LySX8ZyuoUxdjoEt1I7XMR29l+5YOfBwpNAG7kv2Wr3Zr 3pW/Yryh4ceJuDJazRS2XQeCpWRfaxQcNPydZWXPW51iezA0d53gih+D48iIhrpweEyc C5XMTz4SwvpiV/LaSKkYLuoqMzBn0FlAhgHhvLagZzcq5GhtK0HLgGJvDmWiiL4Zdi4q eqk1Pn4UDQ8rSQ0PgrvLlkHVQsVsDkaSZ+wJOPSsMXTrxip9f38SKM8ZInl8ZzllG6aO i0ZQshxVWRSVjinNVjrINoEHq2snG2OxtuKD1uAb2c7StjUd5FwRr+jqqYTOUJycWOR+ cUUw== X-Gm-Message-State: AEkooutknYUotmXixleuiiPmmkG1nehOm8qFJ7N+D/E+9YYQpFGMmrUjjeOssQ+Xu+oUAuZR X-Received: by 10.195.18.170 with SMTP id gn10mr35073766wjd.46.1469729240355; Thu, 28 Jul 2016 11:07:20 -0700 (PDT) Original-Received: from mpn.zrh.corp.google.com ([172.16.113.135]) by smtp.gmail.com with ESMTPSA id v189sm44484652wmv.12.2016.07.28.11.07.19 for <24100@debbugs.gnu.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Jul 2016 11:07:19 -0700 (PDT) Original-Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id B84EB1E021E; Thu, 28 Jul 2016 20:07:18 +0200 (CEST) X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 In-Reply-To: <1469729237-14208-1-git-send-email-mina86@mina86.com> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:121640 Archived-At: Emacs only ever uses its own regex syntax so support for other syntaxes is never used. Hardcode the syntax so that the compilar can detect such dead code and remove it from compiled code. The only exception is RE_NO_POSIX_BACKTRACKING which can be separatelly specified. Handle this separatelly with a function argument (replacing now unnecessary syntax argument). With this patchset, size of Emacs binary on x86_64 machine is reduced by around 60 kB: new-sizes:-rwx------ 3 mpn eng 30254720 Jul 27 23:31 src/emacs old-sizes:-rwx------ 3 mpn eng 30314828 Jul 27 23:29 src/emacs * src/regex.h (re_pattern_buffer): Don’t define syntax field #ifdef emacs. (re_compile_pattern): Replace syntax with posix_backtracking argument. * src/regex.c (print_compiled_pattern): Don’t print syntax #ifdef emacs. (regex_compile): #ifdef emacs, replace syntax argument with posix_backtracking which is now used instead of testing for RE_NO_POSIX_BACKTRACKING syntax. (re_match_2_internal): Don’t access bufp->syntax #ifndef emacs. (re_compile_pattern): Replace syntax with posix_backtracking argument. * src/search.c (compile_pattern_1): Pass boolean posix_backtracking instead of syntax to re_compile_pattern. --- src/regex.c | 40 +++++++++++++++++++++++++++++++--------- src/regex.h | 5 +++-- src/search.c | 4 +--- 3 files changed, 35 insertions(+), 14 deletions(-) diff --git a/src/regex.c b/src/regex.c index c32a62f..8dafb11 100644 --- a/src/regex.c +++ b/src/regex.c @@ -1108,7 +1108,9 @@ print_compiled_pattern (struct re_pattern_buffer *bufp) printf ("no_sub: %d\t", bufp->no_sub); printf ("not_bol: %d\t", bufp->not_bol); printf ("not_eol: %d\t", bufp->not_eol); +#ifndef emacs printf ("syntax: %lx\n", bufp->syntax); +#endif fflush (stdout); /* Perhaps we should print the translate table? */ } @@ -1558,9 +1560,11 @@ do { \ /* Subroutine declarations and macros for regex_compile. */ static reg_errcode_t regex_compile (re_char *pattern, size_t size, - reg_syntax_t syntax, #ifdef emacs + bool posix_backtracking, const char *whitespace_regexp, +#else + reg_syntax_t syntax, #endif struct re_pattern_buffer *bufp); static void store_op1 (re_opcode_t op, unsigned char *loc, int arg); @@ -2426,9 +2430,14 @@ do { \ } while (0) static reg_errcode_t -regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax, +regex_compile (const_re_char *pattern, size_t size, #ifdef emacs +# define syntax RE_SYNTAX_EMACS + bool posix_backtracking, const char *whitespace_regexp, +#else + reg_syntax_t syntax, +# define posix_backtracking (!(syntax & RE_NO_POSIX_BACKTRACKING)) #endif struct re_pattern_buffer *bufp) { @@ -2518,7 +2527,9 @@ regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax, range_table_work.allocated = 0; /* Initialize the pattern buffer. */ +#ifndef emacs bufp->syntax = syntax; +#endif bufp->fastmap_accurate = 0; bufp->not_bol = bufp->not_eol = 0; bufp->used_syntax = 0; @@ -3645,7 +3656,7 @@ regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax, /* If we don't want backtracking, force success the first time we reach the end of the compiled pattern. */ - if (syntax & RE_NO_POSIX_BACKTRACKING) + if (!posix_backtracking) BUF_PUSH (succeed); /* We have succeeded; set the length of the buffer. */ @@ -3680,6 +3691,12 @@ regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax, #endif /* not MATCH_MAY_ALLOCATE */ FREE_STACK_RETURN (REG_NOERROR); + +#ifdef emacs +# undef syntax +#else +# undef posix_backtracking +#endif } /* regex_compile */ /* Subroutines for `regex_compile'. */ @@ -5442,6 +5459,7 @@ re_match_2_internal (struct re_pattern_buffer *bufp, const_re_char *string1, { int buf_charlen; re_wchar_t buf_ch; + reg_syntax_t syntax; DEBUG_PRINT ("EXECUTING anychar.\n"); @@ -5450,10 +5468,14 @@ re_match_2_internal (struct re_pattern_buffer *bufp, const_re_char *string1, target_multibyte); buf_ch = TRANSLATE (buf_ch); - if ((!(bufp->syntax & RE_DOT_NEWLINE) - && buf_ch == '\n') - || ((bufp->syntax & RE_DOT_NOT_NULL) - && buf_ch == '\000')) +#ifdef emacs + syntax = RE_SYNTAX_EMACS; +#else + syntax = bufp->syntax; +#endif + + if ((!(syntax & RE_DOT_NEWLINE) && buf_ch == '\n') + || ((syntax & RE_DOT_NOT_NULL) && buf_ch == '\000')) goto fail; DEBUG_PRINT (" Matched \"%d\".\n", *d); @@ -6281,7 +6303,7 @@ bcmp_translate (const_re_char *s1, const_re_char *s2, register ssize_t len, const char * re_compile_pattern (const char *pattern, size_t length, #ifdef emacs - reg_syntax_t syntax, const char *whitespace_regexp, + bool posix_backtracking, const char *whitespace_regexp, #endif struct re_pattern_buffer *bufp) { @@ -6298,7 +6320,7 @@ re_compile_pattern (const char *pattern, size_t length, ret = regex_compile ((re_char*) pattern, length, #ifdef emacs - syntax, + posix_backtracking, whitespace_regexp, #else re_syntax_options, diff --git a/src/regex.h b/src/regex.h index af9480d..b672d3f 100644 --- a/src/regex.h +++ b/src/regex.h @@ -354,9 +354,10 @@ struct re_pattern_buffer /* Number of bytes actually used in `buffer'. */ size_t used; +#ifndef emacs /* Syntax setting with which the pattern was compiled. */ reg_syntax_t syntax; - +#endif /* Pointer to a fastmap, if any, otherwise zero. re_search uses the fastmap, if there is one, to skip over impossible starting points for matches. */ @@ -473,7 +474,7 @@ extern reg_syntax_t re_set_syntax (reg_syntax_t __syntax); BUFFER. Return NULL if successful, and an error string if not. */ extern const char *re_compile_pattern (const char *__pattern, size_t __length, #ifdef emacs - reg_syntax_t syntax, + bool posix_backtracking, const char *whitespace_regexp, #endif struct re_pattern_buffer *__buffer); diff --git a/src/search.c b/src/search.c index c7556a9..7f2b4f9 100644 --- a/src/search.c +++ b/src/search.c @@ -114,7 +114,6 @@ compile_pattern_1 (struct regexp_cache *cp, Lisp_Object pattern, Lisp_Object translate, bool posix) { const char *whitespace_regexp; - reg_syntax_t syntax; char *val; cp->regexp = Qnil; @@ -133,12 +132,11 @@ compile_pattern_1 (struct regexp_cache *cp, Lisp_Object pattern, So let's turn it off. */ /* BLOCK_INPUT; */ - syntax = RE_SYNTAX_EMACS | (posix ? 0 : RE_NO_POSIX_BACKTRACKING); whitespace_regexp = STRINGP (Vsearch_spaces_regexp) ? SSDATA (Vsearch_spaces_regexp) : NULL; val = (char *) re_compile_pattern (SSDATA (pattern), SBYTES (pattern), - syntax, whitespace_regexp, &cp->buf); + posix, whitespace_regexp, &cp->buf); /* If the compiled pattern hard codes some of the contents of the syntax-table, it can only be reused with *this* syntax table. */ -- 2.8.0.rc3.226.g39d4020