From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Add some aliases for re-related functions Date: Sat, 02 May 2020 23:55:08 -0400 Message-ID: References: <7976B8C1-AFC7-4662-B750-6492EB70C0D5@gmail.com> <1f156067-9bf3-e588-4306-9d673a2a27b9@yandex.ru> <2a5344e3-999c-bb60-3c27-a9e9e6c256da@yandex.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="32110"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: Yuan Fu , Emacs developers To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun May 03 05:55:46 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jV5jG-0008Fd-GS for ged-emacs-devel@m.gmane-mx.org; Sun, 03 May 2020 05:55:46 +0200 Original-Received: from localhost ([::1]:56186 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jV5jF-00012r-JC for ged-emacs-devel@m.gmane-mx.org; Sat, 02 May 2020 23:55:45 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41636) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jV5ik-0000Wj-2X for emacs-devel@gnu.org; Sat, 02 May 2020 23:55:14 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:3024) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jV5ii-00035B-9i for emacs-devel@gnu.org; Sat, 02 May 2020 23:55:12 -0400 Original-Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 1E9F1450866; Sat, 2 May 2020 23:55:11 -0400 (EDT) Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 41873450879; Sat, 2 May 2020 23:55:09 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1588478109; bh=vfE697X13hjY21v5xIXmaRzbv3apwOPOB1ukHNJncCE=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=nvo6OY/pXXkQX/VN1wNW2VlxxdlE0U4w2Nn1UoDLPTPX65cCfkhtqCYhWv6AzTQtJ w5OGzAmdBa8ioFB6h7mRhHPCbEAPVjUSZGrjleJImh30fAGItBEIO6/X7f3KyacNXz mfOX8p28Br8x1wf9EPfyc1keFBJIN5gqi+u49RZID5Doi69ZqMoqYTH5gyZ6dfIrk+ pxOQ+PtBnDQiYvGr9uoI5t7fAE0xtKdsBdCnkviYwnG0zRxlYyXvZe5PrQjC40LjXo Hnm8UuovhrR3S9pjYhAEABa3zTGhmpYhxBj7UQESCDy6/isvF1Fi3iw0si+ptFMI42 Q/e7c2xY/xXJw== Original-Received: from alfajor (unknown [216.154.3.202]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id F18BD120779; Sat, 2 May 2020 23:55:08 -0400 (EDT) In-Reply-To: <2a5344e3-999c-bb60-3c27-a9e9e6c256da@yandex.ru> (Dmitry Gutov's message of "Sun, 3 May 2020 02:13:42 +0300") Received-SPF: pass client-ip=132.204.25.50; envelope-from=monnier@iro.umontreal.ca; helo=mailscanner.iro.umontreal.ca X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/02 23:26:12 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:248631 Archived-At: >> Think of it as the case where the regexp starts with \` and ends with \' >> Then there's the relaxation of "finding the longest match" (what we >> call `looking-at`) and then "finding the leftmost longest match" (what >> we call `string-match`). > > looking-at being a special case of re-search-forward, I take? Not sure what you mean by that. `re-search-forward` is in the same category as `string-match`, i.e. it's a *search* operation, whereas `looking-at` is a *match* operation. IOW a "match" operation is like a "search" but where `match-beginning`. Algorithmically, the two are close, but there's a bit more work to go from one to the other than meets the eye: If you take a regexp and turn it into a DFA in the usual way, you get an automaton that can trivially (and in O(n) time) give you either the shortest match or the longest match. But if you want it to search, you have to add a loop around it to try matching at every possible start position, which brings the complexity to O(n=B2) :-( To fix that you can try and compile ".*RE" instead of "RE" and that will give you an automaton that can do the search or "RE" in O(n) time, but it won't directly give you the "leftmost longest match" (instead it can directly give you "the match whose match-end is closest" and "the match whose match-end is furthest"). So to get the desired "leftmost longest match", you have to work a bit harder yet. Note: Emacs's regexp engine isn't based on a DFA, and doesn't try and use the second option: our engine basically only does "matching" and to get the search behavior we add a loop around it, so algorithmically, `looking-at` and `string-match/re-search-forward` are quite different. Notice that we don't really have the equivalent of `looking-at` on strings currently :-( >> Those two have traditionally be named `re_match` and `re_search` >> respectively in C libraries (as can be seen in `src/regexp-emacs.c`). > Yes, ok. But we also need names to distinguish that things happen in > a buffer. So far we've used 'search' for those. > Using the term 'search' for matching in strings might be a significant > change, given existing expectations. Yes, it's unfortunate. Maybe we could/should merge them to clarify: (re-match REGEXP &optional STRING LIMIT START) (re-search REGEXP &optional STRING LIMIT START) would be like `looking-at` but would operate on STRING instead of `current-buffer` if STRING is non-nil. START defaults to point for current-buffer and 0 for a string. Compared to re-search-forward, this lacks the NOERROR and the COUNT args. We could add yet more optional args, but this is getting ugly. Not sure how important these are, tho. Or maybe we could change the optional START arg so it means "START" when us= ed on a string and it means something else (NOERROR or COUNT) when used in a buffer (yucky)? >> PS: BTW, `looking-back` doesn't do a "match" of the "longest match that >> ends at point" but a "search" for the "rightmost longest match that ends >> at point" since it uses `re-search-backward` internally. > It's a weird function, I agree. Though it's proved to be a handy one. Yes. The functionality it offers is important, but in reality one would want a "real" `looking-back` which uses a backward match, rather than the current "backward search for a forward match" hack. It would be both more efficient and provide a cleaner behavior. Stefan