From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: master ea93326: Add `union' and `intersection' to rx (bug#37849) Date: Sun, 15 Dec 2019 15:04:29 -0500 Message-ID: References: <20191210213842.5388.30110@vcs0.savannah.gnu.org> <20191210213843.EB6A520A23@vcs0.savannah.gnu.org> <67645170-D29F-4C77-99F4-09706856CEEB@acm.org> <719A6AF3-9624-422A-B6E6-FC942623B5C5@acm.org> <8D6E2D8D-4617-4EFA-9E60-F5C7B443C24C@acm.org> <379396AE-D709-4F6F-AE7C-30321A5452C4@acm.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="247150"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: Emacs developers To: Mattias =?windows-1252?Q?Engdeg=E5rd?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Dec 15 21:04:45 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iga8C-00129Q-HN for ged-emacs-devel@m.gmane.org; Sun, 15 Dec 2019 21:04:44 +0100 Original-Received: from localhost ([::1]:42420 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iga8B-0002MA-Bb for ged-emacs-devel@m.gmane.org; Sun, 15 Dec 2019 15:04:43 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44651) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iga85-0002M4-4j for emacs-devel@gnu.org; Sun, 15 Dec 2019 15:04:38 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iga83-0002kO-2j for emacs-devel@gnu.org; Sun, 15 Dec 2019 15:04:36 -0500 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:46816) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iga82-0002gL-QT for emacs-devel@gnu.org; Sun, 15 Dec 2019 15:04:35 -0500 Original-Received: from pmg1.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id 890591009B0; Sun, 15 Dec 2019 15:04:33 -0500 (EST) Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id DB08E1000D1; Sun, 15 Dec 2019 15:04:31 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1576440271; bh=0KNn08UDPk15tcnkJFYPSj4nl09TUjXEXBWngVYvD7w=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=X9tgXo57B9WMxwa7VNYpEsXkkaSnPnmdhuasPt2gVehCPunn0J1iKVj9DSpzwqdSE Cdy4eyIheRgBpLihEmXAkPbdhUerJz6qCWbec8eM89X7jVANEh6hOxwieWIA6XTDDG h+x+lE6/88Lgl72uCEPcDVC7UhsCShhANRz/TbT+cNtncOCpZrPkV8u6IaogmEms+d QdIAWhwFgmYaYMkmqwT8Di3EIar5mv5JxWNG7cWzVBHTOSFmhMhk7TU775QRaI+FIc 6E77pBHq5VciLhpYznICafiKatd3/S1jEV1oW/miFMzVao9QzIwyK6nwXFVoUFdV4N UUNnF1F/gRX4Q== Original-Received: from pastel (unknown [45.72.153.183]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 9A23512120A; Sun, 15 Dec 2019 15:04:31 -0500 (EST) In-Reply-To: <379396AE-D709-4F6F-AE7C-30321A5452C4@acm.org> ("Mattias \=\?windows-1252\?Q\?Engdeg\=E5rd\=22's\?\= message of "Sun, 15 Dec 2019 20:23:17 +0100") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 132.204.25.50 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:243398 Archived-At: >>> A bit overkill just for matching a set of constant strings, don't you think? >> I think there's a lot of implicit assumptions here. >> Yes, there are cases where you may want the "longest match" rule and >> where `posix-string-match` can be too costly, but the ones I can think >> of seem to be fairly contrived. > Perhaps I should have underlined that it is only literal strings that is of > immediate concern, since that is what regexp-opt is used for. It is not > a contrived situation to have a set of strings -- keywords, for instance -- > not necessarily anchored by something else at the end. We need more elements for a realistic scenario. E.g. when the regexp match fails, `posix-string-match` has the same cost as `string-match`, so not only you need the end of the regexp not to be anchored to something else at the end, but you also need all of the below: - the match should be frequent enough for performance to matter - the match should almost always succeed - it needs to matter exactly where the match end - one of the matched words needs to be a prefix of another - you can "extract the next word" and look it up in a hash-table instead of performing a regexp match FWIW, I think we can fix this by using a non-backtracking regexp matcher, but I don't see it as a strong motivation for such a change (there are good motivations for that, but this one is a pretty weak one in my book). Stefan