From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Po Lu Newsgroups: gmane.emacs.devel Subject: Re: prior work on non-backtracking regex engine? Date: Mon, 08 Apr 2024 22:00:27 +0800 Message-ID: <87zfu43qf8.fsf@yahoo.com> References: <3a9IKoS2YLqJYosdfpFVdq8ashG0LPPJdB-ugdUgJEqM6-O3RWFeCu01FUPYBsp87xchkX-z1PRlNqJQm8ge_h3v0ziCWcME2fx-6PW-UP4=@hypnicjerk.ai> <87il1qw9i9.fsf@localhost> <8eHWNAHASGpm2Qhs7mldB0VTdFaRW0NZJWi9ppNiCPZQ-QO8E7YLUXzJfT7LXYe1mTnwFGACouy7YroB8qsCFDbRuWYHr2gvOTagQY3TW7w=@hypnicjerk.ai> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23649"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Ihor Radchenko , "emacs-devel@gnu.org" To: Danny McClanahan Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Apr 08 16:01:59 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rtpZN-0005ql-2P for ged-emacs-devel@m.gmane-mx.org; Mon, 08 Apr 2024 16:01:57 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rtpYI-0003e9-Kk; Mon, 08 Apr 2024 10:00:50 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rtpYE-0003d2-L1 for emacs-devel@gnu.org; Mon, 08 Apr 2024 10:00:47 -0400 Original-Received: from sonic307-9.consmr.mail.ne1.yahoo.com ([66.163.190.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rtpYC-0000Dm-6d for emacs-devel@gnu.org; Mon, 08 Apr 2024 10:00:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1712584840; bh=uKUw2x0MuoHOR+M6LbZtA8J6bgAyxjOmqTF4rNMj87g=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From:Subject:Reply-To; b=Hj3jQDuXDny4qkRCK1Zrot6Igp4s/jWKQtYpi1vSbSStRHIMxcWVAWboGQ5YpSmbyUbAcikkPP9N+eBT8iIES3CNGBF89m9ja5551MsOmP3+Xx+HYnDRYPFLrM3Qc51Rbbp7mye3l/Bz22vkZe4/Le1CxRHJojPMDEvkdujAwRkWmiIOwlCkZsGpeg5+vauYMNKMyctvPrLOIxtg82dL7SAw/tmMeqNv2URjY0fVGTbV9/TJ2tunmJ4RM847t5KJGBKVPTcmPQaNsFU04B6DJGpuSFRE7lGodrJoPUCbbgiKcJGycETrwL2FwZgJIkXJOFymAg8GdeiF0jbg1ptPTQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1712584840; bh=dPJRYg5Ggix+bvGCoSe08ywLF3ZyYfhgIZglvOdFeSq=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=UJScGrQbO0sFrwIDIpDEKEj4xmMNJ/W2WAZXVXdPEg16hPPDcgpki/tgRzGHaDwpZnxmogoYYct5qFMHSlPbzch5FNJ52fO2yldY2r3yHL2XcaP0/h+NoFyBmtgzwSKzKg1RixJnN4T4CcPRf426sUejwOuQ2j+ta6VA7wURUIcR2ADhrdxLA3Ux1LPIsofy5GHJ5hBC7Vlx18AkB7tfPAtNXRYvOAyKtZz2qUfOddFbEsdVGpfX4L64O5r6kapmBb2wVO6y4Rfu0y8U6TbSbWThAsqYNwZug4OMJbmFb/1CCud7AqgQ7ZwGAxYohkgt9rfzEbJLaB0Odrk6JbNzNg== X-YMail-OSG: iT4nGEYVM1lq0ZvymupEW7W0zlTMIaE_v.4PHvaUpKDEprT.YK2aTjENRYWWFp0 f5FJzoO2p2Cnppb2WembmtP7QkjYh5oHvmICTUJK9b6vBQp_71pJkZkd7ED38XeLELlto83j5k4z xPPW2FqrQwDGnvfteVA95jTTosIe7deJ3Z.409VFtjvqTS6UF7gmwweJaPONpSKNrEsmgffWoBjP 9zhL2VgrBXxScKLFc9ntAKTj7DX_CTAM61ho3l9Bw1pfKb7RDqupNUtHSBKheJzcBkC1OrSdhuv6 tvB.RvJzaDltl71PzVUoM9pbaKgh5d1zyWqGmPL0PvIm4yetERDMHj3FBQvTgmk6fMS_1XGx1uVH ep_03d2Z.oOxmxesWJq0n8nGnaidryYOFQ2i5ss4U1fT4GWK.oeC4baun20P5x9UR62oe8zF5X3Y 1v154ERHp92lkt57PEbBt1qgJT86Cd3K22Sh5kv6KFGAYiScrhNETPHfeVIvB_btd5pKVSje3iXV iteiOnrFoPKejXjl41JOe40po7X8uBR.UTzuq7o95WbwamCJd8xHJxVBLMOgb3FB8b4.42fSElHz ZcaOGq0R0cj9K9.EwWzZh4Fcl9bMsDPxvevue1VQq0aCUT.x9Ft4_VdDYUXh5mXDmtQrwx7.2i1W 3areRTKr_551Av4p_0GKKOEe1sbQVy_A1.EWYSSWd8oHeY5ZDZ8Zpk8CGF980osxE.7TG0kOSdYt oplhLVawT_sfW06c0RWLbFBVkrj1h_FTZjrKInG1WV9w3g0OGC4kga41xk88JaFPSNTxuN3Eixvs nuSBwrbIInpljH8pw7V.jbCO7SHr4cooiVsysZnXAD X-Sonic-MF: X-Sonic-ID: 6392d12e-4855-42bf-8f9b-f15642ae3656 Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic307.consmr.mail.ne1.yahoo.com with HTTP; Mon, 8 Apr 2024 14:00:40 +0000 Original-Received: by hermes--production-sg3-6dc75bc8fb-bcm5n (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 751cd99bff2bc4f120aea87c3a10fae2; Mon, 08 Apr 2024 14:00:33 +0000 (UTC) In-Reply-To: <8eHWNAHASGpm2Qhs7mldB0VTdFaRW0NZJWi9ppNiCPZQ-QO8E7YLUXzJfT7LXYe1mTnwFGACouy7YroB8qsCFDbRuWYHr2gvOTagQY3TW7w=@hypnicjerk.ai> (Danny McClanahan's message of "Sun, 07 Apr 2024 04:42:13 +0000") X-Mailer: WebService/1.1.22205 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo Received-SPF: pass client-ip=66.163.190.32; envelope-from=luangruo@yahoo.com; helo=sonic307-9.consmr.mail.ne1.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:317607 Archived-At: Danny McClanahan writes: >> P.S. Better regexp engine would be very welcome. > > ^_^ !!! This makes me so happy to hear! I took the time to learn how > regex-emacs.c as well as other emacs C code interacts with the gap > buffer last week, and was super glad to see how simple the gap buffer > data structure is. I am not an expert on string search performance > yet, but I believe the gap buffer (which is always allocated within a > single block, and has at most two non-adjacent data sections) is > likely much more amenable to high-performance search techniques than a > more complex data structure such as a rope. And I was also *super* > pleased to see that regex-emacs.h itself doesn't expose any dependency > on the gap buffer or other internal emacs representations (except > regarding multibyte encoding). So in my amateur evaluation, emacs > actually seems very well-placed to take advantage of high-performance > regex engine techniques without any big structural changes. [...] > I would *really like* to eventually have emacs depend on an existing > regex engine like re2 or rust regex to take advantage of their > bugfixes and optimizations, but both of those engines (and most > others) require utf-8 input, and (I'm pretty sure) can't easily be > made to support emacs's multibyte functionality. So I think there is a > strong case for a new engine here, especially one licensed with the > GPLv3 (or any later version) as opposed to the LGPL or other more > permissive license. I hate to rain on people's parades, but from where I stand, introducing one more mandatory dependency, not least a dependency for so fundamental a component of Emacs as the regexp matcher, is not acceptable, even if this library is maintained under the auspices of the GNU project or under the GPL, license or stewardship being really immaterial issues. The options being tendered are still less so: the one is written in C++, and relies on a library that gives an awful impression of being a boost-in-the-making, and the other is written in an immature language that is not portable, especially to older systems we support for users with antiquarian interests. Don't let's delude ourselves that a solution is waiting in the wings in the shape of some mystical library, and embark our hopes on this library alone, to the detriment of improving the regex engine we already have. It's a recurring trap that we should have learned to avoid by now. If the sabotage of xz, for example, has taught us anything, it's that software is generally improved by a reduction in dependencies. Thanks.