From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.comp.lib.gnulib.bugs,gmane.emacs.devel Subject: [PATCH 3/4] dfa: shrink constraints from 4 bits to 3 Date: Tue, 10 Jan 2017 02:13:31 -0800 Message-ID: <20170110101332.15354-3-eggert@cs.ucla.edu> References: <20170110101332.15354-1-eggert@cs.ucla.edu> NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1484043254 28230 195.159.176.226 (10 Jan 2017 10:14:14 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 10 Jan 2017 10:14:14 +0000 (UTC) Cc: Paul Eggert To: bug-gnulib@gnu.org, emacs-devel@gnu.org Original-X-From: bug-gnulib-bounces+gnu-bug-gnulib=m.gmane.org@gnu.org Tue Jan 10 11:14:11 2017 Return-path: Envelope-to: gnu-bug-gnulib@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cQtRO-0006Ly-8z for gnu-bug-gnulib@m.gmane.org; Tue, 10 Jan 2017 11:14:06 +0100 Original-Received: from localhost ([::1]:46116 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cQtRS-0007KB-JF for gnu-bug-gnulib@m.gmane.org; Tue, 10 Jan 2017 05:14:10 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49945) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cQtR7-0007IB-FF for bug-gnulib@gnu.org; Tue, 10 Jan 2017 05:13:50 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cQtR6-0002pP-8k for bug-gnulib@gnu.org; Tue, 10 Jan 2017 05:13:49 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:57544) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cQtR5-0002p1-U3; Tue, 10 Jan 2017 05:13:48 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E0583160112; Tue, 10 Jan 2017 02:13:46 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 95O0t1UJ6AYe; Tue, 10 Jan 2017 02:13:42 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E0D69160110; Tue, 10 Jan 2017 02:13:42 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id vhwryZdhTmtR; Tue, 10 Jan 2017 02:13:42 -0800 (PST) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id BFE881600F3; Tue, 10 Jan 2017 02:13:42 -0800 (PST) X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170110101332.15354-1-eggert@cs.ucla.edu> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnulib-bounces+gnu-bug-gnulib=m.gmane.org@gnu.org Original-Sender: "bug-gnulib" Xref: news.gmane.org gmane.comp.lib.gnulib.bugs:36712 gmane.emacs.devel:211182 Archived-At: * lib/dfa.c (newline_constraint, letter_constraint) (other_constraint, prev_newline_dependent) (prev_letter_dependent, NO_CONSTRAINT, BEGLINE_CONSTRAINT) (ENDLINE_CONSTRAINT, BEGWORD_CONSTRAINT, ENDWORD_CONSTRAINT) (LIMWORD_CONSTRAINT, NOTLIMWORD_CONSTRAINT): Constraints need only 3 bits, not 4. Using smaller integers shrinks the code a bit and makes grep a tad faster on x86-64. --- ChangeLog | 9 +++++++++ lib/dfa.c | 32 ++++++++++++++++---------------- 2 files changed, 25 insertions(+), 16 deletions(-) diff --git a/ChangeLog b/ChangeLog index 91cdc6d..e0b73b5 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,14 @@ 2017-01-09 Paul Eggert + dfa: shrink constraints from 4 bits to 3 + * lib/dfa.c (newline_constraint, letter_constraint) + (other_constraint, prev_newline_dependent) + (prev_letter_dependent, NO_CONSTRAINT, BEGLINE_CONSTRAINT) + (ENDLINE_CONSTRAINT, BEGWORD_CONSTRAINT, ENDWORD_CONSTRAINT) + (LIMWORD_CONSTRAINT, NOTLIMWORD_CONSTRAINT): + Constraints need only 3 bits, not 4. Using smaller integers + shrinks the code a bit and makes grep a tad faster on x86-64. + dfa: omit unnecessary ptrdiff_t check * lib/dfa.c (alloc_position_set): Do not worry about ptrdiff_t overflow, since xnmalloc does that now. diff --git a/lib/dfa.c b/lib/dfa.c index 509d6d1..28678c2 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -137,13 +137,13 @@ enum /* Sometimes characters can only be matched depending on the surrounding context. Such context decisions depend on what the previous character was, and the value of the current (lookahead) character. Context - dependent constraints are encoded as 12-bit integers. Each bit that + dependent constraints are encoded as 9-bit integers. Each bit that is set indicates that the constraint succeeds in the corresponding context. - bit 8-11 - valid contexts when next character is CTX_NEWLINE - bit 4-7 - valid contexts when next character is CTX_LETTER - bit 0-3 - valid contexts when next character is CTX_NONE + bit 6-8 - valid contexts when next character is CTX_NEWLINE + bit 3-5 - valid contexts when next character is CTX_LETTER + bit 0-2 - valid contexts when next character is CTX_NONE succeeds_in_context determines whether a given constraint succeeds in a particular context. Prev is a bitmask of possible @@ -152,17 +152,17 @@ enum static int newline_constraint (int constraint) { - return (constraint >> 8) & 0xf; + return (constraint >> 6) & 7; } static int letter_constraint (int constraint) { - return (constraint >> 4) & 0xf; + return (constraint >> 3) & 7; } static int other_constraint (int constraint) { - return constraint & 0xf; + return constraint & 7; } static bool @@ -178,12 +178,12 @@ succeeds_in_context (int constraint, int prev, int curr) static bool prev_newline_dependent (int constraint) { - return ((constraint ^ constraint >> 2) & 0x111) != 0; + return ((constraint ^ constraint >> 2) & 0111) != 0; } static bool prev_letter_dependent (int constraint) { - return ((constraint ^ constraint >> 1) & 0x111) != 0; + return ((constraint ^ constraint >> 1) & 0111) != 0; } /* Tokens that match the empty string subject to some constraint actually @@ -192,13 +192,13 @@ prev_letter_dependent (int constraint) the constraints corresponding to the special tokens previously defined. */ enum { - NO_CONSTRAINT = 0x777, - BEGLINE_CONSTRAINT = 0x444, - ENDLINE_CONSTRAINT = 0x700, - BEGWORD_CONSTRAINT = 0x050, - ENDWORD_CONSTRAINT = 0x202, - LIMWORD_CONSTRAINT = 0x252, - NOTLIMWORD_CONSTRAINT = 0x525 + NO_CONSTRAINT = 0777, + BEGLINE_CONSTRAINT = 0444, + ENDLINE_CONSTRAINT = 0700, + BEGWORD_CONSTRAINT = 0050, + ENDWORD_CONSTRAINT = 0202, + LIMWORD_CONSTRAINT = 0252, + NOTLIMWORD_CONSTRAINT = 0525 }; /* The regexp is parsed into an array of tokens in postfix form. Some tokens -- 2.9.3