From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#44983: Truncate long lines of grep output Date: Wed, 9 Dec 2020 22:06:01 +0200 Message-ID: References: <87v9dlc3ti.fsf_-_@mail.linkov.net> <83ft4pik35.fsf@gnu.org> <87sg8p5kw0.fsf@mail.linkov.net> <83eek8hoyx.fsf@gnu.org> <87h7p4r1n9.fsf@mail.linkov.net> <62EB4762-278D-43E7-8699-BBDC47818A50@gnu.org> <87zh2w7ww1.fsf@mail.linkov.net> <83pn3reyjs.fsf@gnu.org> <87y2ie7for.fsf@mail.linkov.net> <87h7p0f611.fsf@mail.linkov.net> <87a6uqafmk.fsf@mail.linkov.net> <87zh2q61n6.fsf@mail.linkov.net> <3620abd0-ce79-cc9d-3fb2-255e91f13da1@yandex.ru> <87mtyo3x1z.fsf@mail.linkov.net> <857088a6-fe90-d989-9115-2c159b2a02e6@yandex.ru> <87lfe6x1uf.fsf@mail.linkov.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31069"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 Cc: 44983@debbugs.gnu.org To: Juri Linkov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Dec 09 21:15:20 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kn5rs-0007yC-CP for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 09 Dec 2020 21:15:20 +0100 Original-Received: from localhost ([::1]:57048 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kn5rr-0005tx-8e for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 09 Dec 2020 15:15:19 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49780) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kn5jq-0007Fw-NT for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2020 15:07:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:52914) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kn5jq-0004LK-En for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2020 15:07:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kn5jq-0007VJ-8G for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2020 15:07:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 09 Dec 2020 20:07:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44983 X-GNU-PR-Package: emacs Original-Received: via spool by 44983-submit@debbugs.gnu.org id=B44983.160754437328773 (code B ref 44983); Wed, 09 Dec 2020 20:07:02 +0000 Original-Received: (at 44983) by debbugs.gnu.org; 9 Dec 2020 20:06:13 +0000 Original-Received: from localhost ([127.0.0.1]:36226 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kn5j2-0007U0-Uc for submit@debbugs.gnu.org; Wed, 09 Dec 2020 15:06:13 -0500 Original-Received: from mail-ej1-f50.google.com ([209.85.218.50]:46647) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kn5j0-0007Tk-Nx for 44983@debbugs.gnu.org; Wed, 09 Dec 2020 15:06:11 -0500 Original-Received: by mail-ej1-f50.google.com with SMTP id bo9so3938875ejb.13 for <44983@debbugs.gnu.org>; Wed, 09 Dec 2020 12:06:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=XIMB9fbmfWFhWH/cqM83IPRdpDuFrguuZhCiKT/a1nA=; b=UjGOd5cC/95Zjjf51TjJRTZM+Rcb2Yd1CNo5c28+406XY64jYiB7E1CmOOl+kCGAU3 7HO7PIPMQb92Y806Qc1JgXGX1BsGxsML6z27NFvzcuAePYn+J+IhjFMkERpjFujyebmp kBuvWJB2MaxlAt0FDIyD3bvelClvFfdTY+G6o4wX7tJNqY6TJ6YL0E2rCxl/gx0zKb38 ZZ0jOJJfucSoLhZLqLPgIE3IWttqxB0zm48CdjXoqILwsJ3iWlDM+pDVoupWMJyw2XL8 Lxn16DGv0hKZ7jynC2R17oawxT36m82I++wMEhbx+2dRxIWsC1qBrldUBKGpATfIlZ3Y ymVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=XIMB9fbmfWFhWH/cqM83IPRdpDuFrguuZhCiKT/a1nA=; b=rDv4dJHDI91tWxHKo8Bh7eNcSoBEGQKQuoNTgPMaUPkcG/+lsjSM/bUXw0d1MSOsaM 0QneU+5CbJul2bjSt+EVJcQoVX1dU2mdigPjhOgFvK57l8FYTH1WwgYGXrJlVQRCy6/F zrpjHfybymDEOTy4eq4ZaR7RJEZKOCPV/MrFIapGEvIvM9/BfGu459m/MchYklRqj/EQ Ol9B1Dm4Nl6jUDdqmTsWAXp2+Ydx39amdPA+8n5nG+KRDUkNZk/MTCJ4XsX0wf4JO0m2 Kst3MPScVxooVAEm+IEONtenI6Nx79QMyWqW9w/lLiUXuVuS3QflHc5lYVOVdJBaoyWe dBGA== X-Gm-Message-State: AOAM532qfh98sbKawU7YOcM1hvwuEO82z4vqCUuBobJV50lAQ89DRnrx sdckdoP1JHedM6xWZP4mxoI+l23z0IyhNA== X-Google-Smtp-Source: ABdhPJyeFHXTpzZWphQrBthRg69Yd2OIhbMRpR59px3x/uGxTwCtb/AJPRj7iTqtIFCMB2I7Q1LGgQ== X-Received: by 2002:a17:907:20f1:: with SMTP id rh17mr3433768ejb.147.1607544364545; Wed, 09 Dec 2020 12:06:04 -0800 (PST) Original-Received: from [192.168.0.4] ([66.205.71.3]) by smtp.googlemail.com with ESMTPSA id be6sm2642367edb.29.2020.12.09.12.06.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 09 Dec 2020 12:06:03 -0800 (PST) In-Reply-To: <87lfe6x1uf.fsf@mail.linkov.net> Content-Language: en-US X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:195588 Archived-At: On 09.12.2020 21:17, Juri Linkov wrote: >>> I think until a long string is inserted to the buffer, truncating the >>> string in the variable in xref--collect-matches-1 should be much faster. >> >> It would surely be faster, but how would that overhead compare to the >> whole operation? >> >> Could be negligible, except in the most extreme cases. After all, the main >> slowdown factor with long strings is the display engine, and it won't be in >> play there. >> >> The upside is we'd be able to support column limiting with Grep too. Which >> is the default configuration. And we'd extract the cutoff column into >> a more visible user option. > > This is exactly what we need. After that this bug report/feature request > can be closed. Perhaps you would like to come up with the name for the new user option? The changes to xref--collect-matches-1 should be straightforward (it will include a choice, though: whether to cut off matches when they don't fit). Since you're the one who has experienced poor performance because of this, though, you can do the benchmarking. Basically, what we need to know is whether the new option indeed makes performance acceptable. > BTW, for sorting currently xref-search-program-alist uses: > > "| sort -t: -k1,1 -k2n,2" > > but fortunately ripgrep has a special option to do the same with: > > "--sort path" Somehow, that option came out to be consistently slower in my benchmarking. Even when the results are only a few lines (that's actually when the difference should be most apparent, because with many lines Elisp takes up the most of CPU time). You can try it yourself: (benchmark 10 '(project-find-regexp ":package-version '(xref")) 0.86 with '| sort' 1.33 with '--sort path' $ rg --version ripgrep 12.1.1 (rev 7cb211378a) -SIMD -AVX (compiled) +SIMD +AVX (runtime) We can also document it in the docstring, though. For those who don't have 'sort' installed. >>> They should be merged into one regexp indeed. Because after customizing >>> it >>> to the rg regexp, grep output doesn't highlight matches anymore (I use both >>> grep and rg interchangeably by different commands). >>> Currently their separate regexps are: >>> grep: >>> "\033\\[0?1;31m >>> \\(.*?\\) >>> \033\\[[0-9]*m" >>> rg: >>> "\033\\[[0-9]*m >>> \033\\[[0-9]*1m >>> \033\\[[0-9]*1m >>> \\(.*?\\) >>> \033\\[[0-9]*0m" >>> That could be combined into one regexp: >>> "\033\\[[0-9?;]*m >>> \\(?:\033\\[[0-9]*1m\\)\\{0,2\\} >>> \\(.*?\\) >>> \033\\[[0-9]*0?m" >> >> Makes sense. Is the parsing performance the same? > > Performance is not a problem. The problem is that more lax regexp > causes more false positives. So the above regexp highlighted even > the separator colons (':') between file names and column numbers. > > BTW, it's possible to see all highlighted parts of the output > by changing the argument 'MODE' of 'compilation-start' in 'grep' > from #'grep-mode to t (so it uses comint-mode in grep buffers). Because ansi-color-process-output is in comint-output-filter-functions? > Anyway, I found the shortest change needed to support ripgrep, > and pushed to master. Excellent. >> Also, with the increased complexity, I'd rather we added a couple of tests, >> or a comment with output examples. Or maybe both. > > Fortunately, we have all possible cases listed in etc/grep.txt, > so it was easy to check if everything is highlighted correctly now. > Also I added ripgrep samples to etc/grep.txt. Thanks!