From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.devel Subject: Re: Matching regex case-sensitively in C strings? Date: Tue, 8 Nov 2022 11:18:10 +0100 Message-ID: <580E87E6-DCFD-42AE-807A-339BBB3878C2@acm.org> References: <218795BA-107D-4A86-9ACF-0A44BD2EC3D2@gmail.com> <83edufyoad.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19526"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , emacs-devel To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Nov 08 11:19:33 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1osLhc-0004pa-Nd for ged-emacs-devel@m.gmane-mx.org; Tue, 08 Nov 2022 11:19:32 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1osLgd-0003vz-NC; Tue, 08 Nov 2022 05:18:31 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1osLgW-0003v4-4Z for emacs-devel@gnu.org; Tue, 08 Nov 2022 05:18:25 -0500 Original-Received: from mail156c50.megamailservers.eu ([91.136.10.166] helo=mail51c50.megamailservers.eu) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1osLgT-0002yN-Mi; Tue, 08 Nov 2022 05:18:23 -0500 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1667902694; bh=zXsL0lMjpPDiIOEtz9EJOHcRrMUdckSIFTQkOtmYxX0=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=HJFYtPT2svvqSb3+usg6RtoAuMckjS9RKOA6OjD7T2TCfJ69FpGSInAuTceSFgCdx x0DWlx0iszgbCRybxDEDqNLPKdSslwVJJ+DcMt/0GLNu+PDw+aQLOu8RUQivXKOb6N d/S4c+DYiahSZ8INXb1e9G7BAu2S+NEsLxAnQa+I= Feedback-ID: mattiase@acm.or Original-Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se [188.150.171.209]) (authenticated bits=0) by mail51c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 2A8AIBjM129397; Tue, 8 Nov 2022 10:18:13 +0000 In-Reply-To: X-Mailer: Apple Mail (2.3654.120.0.1.13) X-CTCH-RefID: str=0001.0A782F1A.636A2CE6.0058, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-Origin-Country: SE Received-SPF: softfail client-ip=91.136.10.166; envelope-from=mattiase@acm.org; helo=mail51c50.megamailservers.eu X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:299328 Archived-At: 7 nov. 2022 kl. 21.35 skrev Yuan Fu : > fast_c_string_match_ignore_case (Lisp_Object regexp, > const char *string, ptrdiff_t len) > { > regexp =3D string_make_unibyte (regexp); This is expensive and not obviously correct when it makes a difference. = Ie, no longer "fast", and may hide bugs. Something should be done about that. > // Why do we need to unwind stack? > specpdl_ref count =3D SPECPDL_INDEX (); Because freeze_pattern pushes an unwind-protect on the specpdl. > struct regexp_cache *cache_entry > =3D compile_pattern (regexp, 0, Vascii_canon_table, 0, 0); `Vascii_canon_table` is what makes it case-insensitive; you want to use = Qnil (but you probably already know that now). Since this is the only thing that differs from your intended use, I = suggest you generalise this subroutine with a boolean parameter. > // What does freezing a pattern do? > freeze_pattern (cache_entry); It locks the compiled pattern record to make the regexp engine reentrant = (but here it also seems to be used for GC purposes; not sure about = that). > // What is re_match_object for? I see that it can be t, nil or a = string. > re_match_object =3D Qt; Described in regex-emacs.h: > /* The string or buffer being matched. > It is used for looking up syntax properties. >=20 > If the value is a Lisp string object, match text in that string; if > it's nil, match text in the current buffer; if it's t, match text > in a C string. >=20 > This value is effectively another parameter to re_search_2 and > re_match_2. No calls into Lisp or thread switches are allowed > before setting re_match_object and calling into the regex search > and match functions. These functions capture the current value of > re_match_object into gl_state on entry. >=20