From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: Should project delegate project-find-regexp? Date: Mon, 18 Apr 2022 06:01:37 +0300 Message-ID: <7c78dec7-d719-b3f0-2b6e-649c73eb5523@yandex.ru> References: <86bkxdf71q.fsf@gmail.com> <7147b499-0df5-6307-54ee-387f35ef9dcf@yandex.ru> <86v8vkynle.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5594"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Cc: emacs-devel@gnu.org To: Joel Reicher Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Apr 18 05:02:40 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ngHex-0001Ea-3b for ged-emacs-devel@m.gmane-mx.org; Mon, 18 Apr 2022 05:02:39 +0200 Original-Received: from localhost ([::1]:47374 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ngHev-0006Eg-HW for ged-emacs-devel@m.gmane-mx.org; Sun, 17 Apr 2022 23:02:37 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:39294) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ngHe5-0005X6-JP for emacs-devel@gnu.org; Sun, 17 Apr 2022 23:01:45 -0400 Original-Received: from mail-wr1-x431.google.com ([2a00:1450:4864:20::431]:46999) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ngHe3-0000Xv-3t for emacs-devel@gnu.org; Sun, 17 Apr 2022 23:01:45 -0400 Original-Received: by mail-wr1-x431.google.com with SMTP id i20so17109105wrb.13 for ; Sun, 17 Apr 2022 20:01:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=dWlwOVeH2nBfQn1dxCMuCqFtlySEp8F1mxc0dFlqovM=; b=oFTUCdckSvodc67WqYs4A3BxbQWwq+QjgJsL6ccL5cKiF2dv9nXsj++Xz0EDrHVN1G +axTkhFEyq2aVEvNBmU9BsorpE6gBaz1aW32Aey0CHhrl6lgnALzjz95vabiND/7WZn4 6iTpsvWsZ9/ch9VSMA6OSMFRQy6pK1XawhyQQ187PN0h/jOV9caYGBO2gZOmtes9PdD1 P7967QomXm4yRkZ8YsixBcYtpkNRWELD25iY1fhwlUv5bjTr+58cPJmi0xoBEQxdItLn sk0KyCVVagTog4yWK0y45ScjM2EtVbmTw7dCxSvKlHxkQjyoP1nXOBeNIkL7eWXu5kn5 7M4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:message-id:date:mime-version:user-agent :subject:content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=dWlwOVeH2nBfQn1dxCMuCqFtlySEp8F1mxc0dFlqovM=; b=gRkZ/UDqRDJdhwSOkCmwcazf5UzZ1dxYKNNWeJlAqA7Vn4BdaHnds8wSFRKIyjB9A9 IDmcZ07M0KVCJOMP5B0TeXF7kChGs6VgJnKwWZ8qEe6FzvPsDDxiqLuY5sVyowi8du2U SOWE5XvkAd5o0JVnDo2AbmlcLsJdvjyf64cqJsGQpLlj4LB9eM28HMDeVwVv1vnr0o4B MCxU9MxlItGPmyrljoWtlGFzWjfItQMDI7tQ0EqAN4GA5CCaGhOCUH6gFXpihLPoBh+2 RiLZ+eZ6k1FrowasJVQy2Bm7OfdbM2Jca9fjvhd+Kce52Fw4ofv73bgjz/4Rgyn03jx3 wSzg== X-Gm-Message-State: AOAM530nf4z0DoZjnLPWjpTTx0BVJNJIrE2NMkYEuz7/euLHjyBX5b0X 5WgSf7os5A3XFjejrx0HPro= X-Google-Smtp-Source: ABdhPJxmNb5XlUBzT5mMyHwnZYGajF2fWC2iIyXqU8bGuZfsEK/Y+YV2OdGEwQAtvArTwdOlZCymTQ== X-Received: by 2002:a05:6000:1ca:b0:207:acc8:c153 with SMTP id t10-20020a05600001ca00b00207acc8c153mr6831004wrx.165.1650250900735; Sun, 17 Apr 2022 20:01:40 -0700 (PDT) Original-Received: from [192.168.0.6] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id p7-20020a5d4e07000000b002079a418430sm9217276wrt.85.2022.04.17.20.01.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 17 Apr 2022 20:01:39 -0700 (PDT) Content-Language: en-US In-Reply-To: <86v8vkynle.fsf@gmail.com> Received-SPF: pass client-ip=2a00:1450:4864:20::431; envelope-from=raaahh@gmail.com; helo=mail-wr1-x431.google.com X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:288589 Archived-At: On 08.04.2022 11:40, Joel Reicher wrote: > Dmitry Gutov writes: > >> On 07.04.2022 14:48, Joel Reicher wrote: >>> It seems to me that, at least in the case of git, 'git grep' offers a superior implementation to anything offered by the generic implementation of project-find-regexp. >> >> Last I checked, there was no way to make 'git grep' search in >> untracked files. > > There's a --untracked option, at least now. Thanks, that works. And we could try to support it. "ignore patterns" would require some code duplication, but that's doable. Not "error patterns", sorry, that was a typo. But I've benchmarked searching through a large project (200000 files), and the results seem mixed. --untracked does slow it down noticeably. Examples: $ time git grep -z -e symlinks >/dev/null ________________________________________________________ Executed in 1,11 secs fish external usr time 2,16 secs 720,00 micros 2,16 secs sys time 3,65 secs 192,00 micros 3,65 secs $ time git grep -z --untracked -e symlinks >/dev/null ________________________________________________________ Executed in 1,81 secs fish external usr time 2,42 secs 0,00 micros 2,42 secs sys time 4,00 secs 938,00 micros 4,00 secs At the same time, if I pipe the results of 'git ls-files' to ripgrep: $ time git ls-files -z -c -o --exclude-standard | xargs -0 rg --null --no-messages -g '!*/' -nH -e symlinks >/dev/null ________________________________________________________ Executed in 2,50 secs fish external usr time 2,91 secs 1,40 millis 2,90 secs sys time 3,02 secs 0,37 millis 3,02 secs ...it looks a little worse. But what if I add some forced parallelism? $ time git ls-files -z -c -o --exclude-standard | xargs -0 -P8 rg --null --no-messages -g '!*/' -nH -e symlinks >/dev/null ________________________________________________________ Executed in 1,08 secs fish external usr time 4,03 secs 1,50 millis 4,03 secs sys time 3,60 secs 0,42 millis 3,60 secs ...it shows better performance. Unfortunately, using the -P argument of xargs for grepping because of synchronization problems, but I've wrote about this to ripgrep's issue tracker (https://github.com/BurntSushi/ripgrep/issues/273#issuecomment-1100792783), and we might get such feature there natively someday. YMMV, but on this machine at least this seems to demonstrate that 'git grep' isn't always better, at least. And its '--threads' argument doesn't seem to make any difference. Now, the default searcher (grep) is a little slower than ripgrep, but at least we have a faster option present. Now, when it comes to Emacs, we also lose a fair amount of time on parsing the list of files internally (the output of 'git ls-files') before sending it to 'xargs rg' or 'xargs grep'. There are a few approaches how to deal with this. Maybe we'd have a generic function which constructs the shell command (which we'd simply concatenate when constructing the shell command for search). Or we'd have 'project-files' return some opaque value with a bunch of accessors which would allow parsing the list of files lazily, and simply reuse the output buffer as input without parsing it (this would save ~500ms in my measurements in this scenario). Or we'd cache the list of files, and cut the whole 1s with that. We've discussed some of this before (like the caching thing) but so far it's up in the air. But given the possibility of being able to choose a faster search problem, I'm not sure about making the search a project method (which would lock such projects into one search implementation). I'd rather try to work on other inefficiencies first. Do try installing ripgrep, though. The search program is configured through the xref-search-program defcustom.