From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Matching regex case-sensitively in C strings? Date: Tue, 8 Nov 2022 12:59:12 -0800 Message-ID: <2338A13C-A6E3-49B6-9E5C-A366022E4D29@gmail.com> References: <218795BA-107D-4A86-9ACF-0A44BD2EC3D2@gmail.com> <83edufyoad.fsf@gnu.org> <580E87E6-DCFD-42AE-807A-339BBB3878C2@acm.org> <5711A9D3-7BCB-44AE-8911-5E039FF5FBB8@gmail.com> <838rklxmnm.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_FA54678E-1542-4FDB-8DC0-F655B09A6A52" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15475"; mail-complaints-to="usenet@ciao.gmane.io" Cc: mattiase@acm.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Nov 08 21:59:57 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1osVhL-0003of-Lc for ged-emacs-devel@m.gmane-mx.org; Tue, 08 Nov 2022 21:59:55 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1osVgl-0005NX-OT; Tue, 08 Nov 2022 15:59:19 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1osVgj-0005K2-TW for emacs-devel@gnu.org; Tue, 08 Nov 2022 15:59:17 -0500 Original-Received: from mail-pl1-x634.google.com ([2607:f8b0:4864:20::634]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1osVgi-00066F-DA; Tue, 08 Nov 2022 15:59:17 -0500 Original-Received: by mail-pl1-x634.google.com with SMTP id k7so15247977pll.6; Tue, 08 Nov 2022 12:59:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=fGLZU0w4vk4NQkhs8pa3oun01I0H1C/Uwq0ge5bqptQ=; b=bAoNwx7f0zuhR2d4rSl6LPuQr8scshwpCiU0Ict/V8R1fvV1t/E+pPToLyG17lOEfq SESjGCxrK7hDEPFUEsXSIrG9QPe9tDpGkfzLSyZZrExpVAmOhsKEVBQMyxEpyCF6rlvI 02p/R3C4DZh9JPu3K+cHOKwxrj9ekqjkXv6orW2Olbyhyc5LzrnqZ5fVflMn7PWBkISr FTdaaJOo6QyY59GQG9+R8wDshVNo0bu+9jRD+6Y2uaMYuI/knHTsAmpe4+6MSD2Adi9a VoqvIPJElIMQSHOBaEapDyRRTuEfVFpmngdA2q/dJFUjy7gfKe6We7gmz8D+gBToYs31 IPXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fGLZU0w4vk4NQkhs8pa3oun01I0H1C/Uwq0ge5bqptQ=; b=UUz+uBrbM0bMMxkeYvr8dyQYSleDbrqv0YnqcMixa7gDD9ge6KTtGsD3EHdQ+rbvd0 ml2OBWLKtrT3ZzDalgP48TxhT0KEpRtIGe3FDXw867M1/mFoRlGNF5xlj+DOEPLOFw7z TfQpvQZNFz4t+WYG0hSUkqHYBJc33IguzrmEpV74jb6WHG5ld4GXepf23KFgd3f5yY1h NzGSAdomHqOtNS07wSk7B2y0MkR7sBtcKT1qQMuZb3TavJtjUXoArLu6r5l344NVju/q mTc8uawG7yw9VUQIOEj8x8q3tWlP1YGxuSq5c2SkaZq1dzVA/dASdyh59seV0iAM+LB6 uUEg== X-Gm-Message-State: ACrzQf3WgKqRwl1MEM8wjnduvVvgvTA+XnhXwoYp/hjsIRkcA4Lj6Qbo jXyMM4mfp2ek0LtQbFzfAcBm7RkgoQ0= X-Google-Smtp-Source: AMsMyM4cFaNvO3sIArAVBxCmVlusxrgjVTEi6bqsvIJn7ckuQDzJ2A56w2TdASqbQ3/QAXhXciogug== X-Received: by 2002:a17:903:2410:b0:17a:b32:dbec with SMTP id e16-20020a170903241000b0017a0b32dbecmr57244224plo.163.1667941154130; Tue, 08 Nov 2022 12:59:14 -0800 (PST) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id ei15-20020a17090ae54f00b00213d08fa459sm6619739pjb.17.2022.11.08.12.59.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Nov 2022 12:59:13 -0800 (PST) In-Reply-To: <838rklxmnm.fsf@gnu.org> X-Mailer: Apple Mail (2.3696.120.41.1.1) Received-SPF: pass client-ip=2607:f8b0:4864:20::634; envelope-from=casouri@gmail.com; helo=mail-pl1-x634.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:299368 Archived-At: --Apple-Mail=_FA54678E-1542-4FDB-8DC0-F655B09A6A52 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii > On Nov 8, 2022, at 11:37 AM, Eli Zaretskii wrote: > >> From: Yuan Fu >> Date: Tue, 8 Nov 2022 11:31:35 -0800 >> Cc: Eli Zaretskii , >> emacs-devel >> >> + Lisp_Object translate_table = ignore_case ? Vascii_canon_table : Qnil; > > Please don't call this "translate_table", as it isn't. The > documentation elsewhere calls this "canonicalize_table" or just > "canonicalize". > >> fast_c_string_match_ignore_case (Lisp_Object regexp, >> const char *string, ptrdiff_t len) >> { >> + return fast_c_string_match (regexp, string, len, true); >> +} > > I'm bothered that this function, which is supposed to be fast, will > now be slower due to an extra function call. How about this: Now c_string_match functions mirror their string_match counterparts. Yuan --Apple-Mail=_FA54678E-1542-4FDB-8DC0-F655B09A6A52 Content-Disposition: attachment; filename=c_string_match.diff Content-Type: application/octet-stream; x-unix-mode=0644; name="c_string_match.diff" Content-Transfer-Encoding: 7bit diff --git a/src/lisp.h b/src/lisp.h index 1e41e2064c9..9f497d92270 100644 --- a/src/lisp.h +++ b/src/lisp.h @@ -4757,6 +4757,8 @@ XMODULE_FUNCTION (Lisp_Object o) extern void record_unwind_save_match_data (void); extern ptrdiff_t fast_string_match_internal (Lisp_Object, Lisp_Object, Lisp_Object); +extern ptrdiff_t fast_c_string_match_internal (Lisp_Object, const char *, + ptrdiff_t, Lisp_Object); INLINE ptrdiff_t fast_string_match (Lisp_Object regexp, Lisp_Object string) @@ -4770,8 +4772,21 @@ fast_string_match_ignore_case (Lisp_Object regexp, Lisp_Object string) return fast_string_match_internal (regexp, string, Vascii_canon_table); } -extern ptrdiff_t fast_c_string_match_ignore_case (Lisp_Object, const char *, - ptrdiff_t); +INLINE ptrdiff_t +fast_c_string_match (Lisp_Object regexp, + const char *string, ptrdiff_t len) +{ + return fast_c_string_match_internal (regexp, string, len, Qnil); +} + +INLINE ptrdiff_t +fast_c_string_match_ignore_case (Lisp_Object regexp, + const char *string, ptrdiff_t len) +{ + return fast_c_string_match_internal (regexp, string, len, + Vascii_canon_table); +} + extern ptrdiff_t fast_looking_at (Lisp_Object, ptrdiff_t, ptrdiff_t, ptrdiff_t, ptrdiff_t, Lisp_Object); extern ptrdiff_t find_newline1 (ptrdiff_t, ptrdiff_t, ptrdiff_t, ptrdiff_t, diff --git a/src/search.c b/src/search.c index b5d6a442c0f..f7a28202b23 100644 --- a/src/search.c +++ b/src/search.c @@ -496,19 +496,26 @@ fast_string_match_internal (Lisp_Object regexp, Lisp_Object string, return val; } -/* Match REGEXP against STRING, searching all of STRING ignoring case, - and return the index of the match, or negative on failure. - This does not clobber the match data. +/* Match REGEXP against STRING, searching all of STRING and return the + index of the match, or negative on failure. This does not clobber + the match data. Table is a canonicalize table. + We assume that STRING contains single-byte characters. */ ptrdiff_t -fast_c_string_match_ignore_case (Lisp_Object regexp, - const char *string, ptrdiff_t len) +fast_c_string_match_internal (Lisp_Object regexp, + const char *string, ptrdiff_t len, + Lisp_Object table) { + /* FIXME: This is expensive and not obviously correct when it makes + a difference. I.e., no longer "fast", and may hide bugs. + Something should be done about this. */ regexp = string_make_unibyte (regexp); + /* Record specpdl index because freeze_pattern pushes an + unwind-protect on the specpdl. */ specpdl_ref count = SPECPDL_INDEX (); struct regexp_cache *cache_entry - = compile_pattern (regexp, 0, Vascii_canon_table, 0, 0); + = compile_pattern (regexp, 0, table, 0, 0); freeze_pattern (cache_entry); re_match_object = Qt; ptrdiff_t val = re_search (&cache_entry->buf, string, len, 0, len, 0); --Apple-Mail=_FA54678E-1542-4FDB-8DC0-F655B09A6A52--