From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: Add some aliases for re-related functions Date: Mon, 4 May 2020 03:29:02 +0300 Message-ID: References: <7976B8C1-AFC7-4662-B750-6492EB70C0D5@gmail.com> <1f156067-9bf3-e588-4306-9d673a2a27b9@yandex.ru> <2a5344e3-999c-bb60-3c27-a9e9e6c256da@yandex.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="109504"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 Cc: Yuan Fu , Emacs developers To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon May 04 02:29:42 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jVOzN-000SPX-86 for ged-emacs-devel@m.gmane-mx.org; Mon, 04 May 2020 02:29:41 +0200 Original-Received: from localhost ([::1]:52422 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jVOzM-0003h8-Al for ged-emacs-devel@m.gmane-mx.org; Sun, 03 May 2020 20:29:40 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38298) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jVOyp-0003GU-7S for emacs-devel@gnu.org; Sun, 03 May 2020 20:29:07 -0400 Original-Received: from mail-wm1-x329.google.com ([2a00:1450:4864:20::329]:36331) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jVOyo-0001e9-2i for emacs-devel@gnu.org; Sun, 03 May 2020 20:29:06 -0400 Original-Received: by mail-wm1-x329.google.com with SMTP id u127so6954239wmg.1 for ; Sun, 03 May 2020 17:29:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=xqq4qEkL6TYGMiEKkKJjCcK5/I107SjTigtMAUK77Nc=; b=KimNzUDiAm6o7Lcv6KD/u7bUjPEA/lvxMwdImHNDkgfPocQCiTCeN73OVGAIhztola 2/2e2+Gf7AMO1QtVbRH7wwo4JtnE4DOCXjAOlcfG9AzabjMM1Z1miQmd0ok6pErBpuAf 2b7FTIrZ7ie6Yed2UGXw5meYrxK8O+6CBtPw0akoULWozWrqnWymzT3PvozphZ1pGHKK gjMup2TruH8dHZ7HwKz7tznX4plJcl2/HECb6gG/9s3c0i4vjfoJXNu+bC2QnE1T7Dxc cjx3BeczLfI+WVOv1vfqWn76oEBPwA51d5TiyIiIgGItawgHw/cU5HnjBj7CBWbEYCd5 S6/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=xqq4qEkL6TYGMiEKkKJjCcK5/I107SjTigtMAUK77Nc=; b=Ds8MlzrB3Skt24Ax13v2LdDjw6qAVSP6RqVnyoQs4+0i0jYtqzpUVhgW3FUZa4u1+v ezsKsK6r8ctBItASzRwmnQwOGn+CoFVU0c4mZq4cgWGXoMiRZr93ciNThrYJAMJ5tth7 ZAX/nDOqqiEDFTFI1E81HHHlTBeehDZdEdPS0mJQco8AfQr5ShVDWkprtfZir6+OP8Z0 SHSqE037vqcxaYWcHy3TZIghdidfXstFgxrwxIFEF+vbMoR2+HwyMZWuYO1o7O1pcsPW KTW7gCDfirnoGWPkLHq83fANwfydd6zUVcpfOVT9KZ6ZdQH5vhQrbPONUOeXm0MpyK1j zD9g== X-Gm-Message-State: AGi0PuZbzu7GWCAY+ad1cwe3XnyPgsTTf2hoacUaMTVXcs82rM4K9o2c G6uKg6DcuINNd0KZJbTtoftppijG X-Google-Smtp-Source: APiQypKWwPBuubfIf64Na85oUZstxT75AKM5JA6o/l3pGsh/Uf4Emz1MzlXkT4STC0Az20Psaz/qRg== X-Received: by 2002:a7b:cf25:: with SMTP id m5mr12478110wmg.65.1588552144232; Sun, 03 May 2020 17:29:04 -0700 (PDT) Original-Received: from [192.168.0.3] ([66.205.73.129]) by smtp.googlemail.com with ESMTPSA id b22sm10907805wmj.1.2020.05.03.17.29.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 03 May 2020 17:29:03 -0700 (PDT) In-Reply-To: Content-Language: en-US Received-SPF: pass client-ip=2a00:1450:4864:20::329; envelope-from=raaahh@gmail.com; helo=mail-wm1-x329.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:248753 Archived-At: On 03.05.2020 06:55, Stefan Monnier wrote: >>> Think of it as the case where the regexp starts with \` and ends with \' >>> Then there's the relaxation of "finding the longest match" (what we >>> call `looking-at`) and then "finding the leftmost longest match" (what >>> we call `string-match`). >> >> looking-at being a special case of re-search-forward, I take? > > Not sure what you mean by that. More or less that (looking-at "abc") is basically the same as (re-search-forward "\\=abc"). Though maybe not internally. > `re-search-forward` is in the same > category as `string-match`, i.e. it's a *search* operation, whereas > `looking-at` is a *match* operation. > > IOW a "match" operation is like a "search" but where `match-beginning`. > > Algorithmically, the two are close, but there's a bit more work to go > from one to the other than meets the eye: Thanks for the explanation. > If you take a regexp and turn it into a DFA in the usual way, you get an > automaton that can trivially (and in O(n) time) give you either the > shortest match or the longest match. But if you want it to search, you > have to add a loop around it to try matching at every possible start > position, which brings the complexity to O(n²) :-( > > To fix that you can try and compile ".*RE" instead of "RE" and that will > give you an automaton that can do the search or "RE" in O(n) time, but > it won't directly give you the "leftmost longest match" (instead it can > directly give you "the match whose match-end is closest" and "the match > whose match-end is furthest"). But that's what we generally want in practice anyway. And in the cases where the desired out come is different, the regexp is probably ambiguous, which often means worse performance. > Note: Emacs's regexp engine isn't based on a DFA, and doesn't try and > use the second option: our engine basically only does "matching" and to > get the search behavior we add a loop around it, so algorithmically, > `looking-at` and `string-match/re-search-forward` are quite different. > > Notice that we don't really have the equivalent of `looking-at` > on strings currently :-( > >>> Those two have traditionally be named `re_match` and `re_search` >>> respectively in C libraries (as can be seen in `src/regexp-emacs.c`). >> Yes, ok. But we also need names to distinguish that things happen in >> a buffer. So far we've used 'search' for those. >> Using the term 'search' for matching in strings might be a significant >> change, given existing expectations. > > Yes, it's unfortunate. Maybe we could/should merge them to clarify: > > (re-match REGEXP &optional STRING LIMIT START) > (re-search REGEXP &optional STRING LIMIT START) > > would be like `looking-at` but would operate on STRING instead of > `current-buffer` if STRING is non-nil. START defaults to point for > current-buffer and 0 for a string. re-search-forward also moves point, whereas string-match returns the index of the match start. I think it would be kinda ugly to choose among these behaviors based on the second argument. And if it returns point instead, without moving, we get an entirely different function now. I'm not sure it's worth the changeover and retraining everybody if the main benefit is being more aware of the underlying algorithm complexity. After all, it's not too hard to make an educated guess that looking-at is (or at least can be) faster than re-search-forward. >>> PS: BTW, `looking-back` doesn't do a "match" of the "longest match that >>> ends at point" but a "search" for the "rightmost longest match that ends >>> at point" since it uses `re-search-backward` internally. >> It's a weird function, I agree. Though it's proved to be a handy one. > > Yes. The functionality it offers is important, but in reality one would > want a "real" `looking-back` which uses a backward match, rather than > the current "backward search for a forward match" hack. It would be > both more efficient and provide a cleaner behavior. It suppose so. Yet, in all cases I had to rewrite looking-back calls to add the now-mandatory argument, the resulting time it took was fast enough to get lost in the measurement noise. Maybe an optimized version could enable some new use cases, though.