From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames Date: Mon, 17 May 2021 02:22:52 +0300 Message-ID: References: <658a3e61-9511-5502-43de-8f591cec7387@yandex.ru> <91dd2467-f64e-eede-8098-14fc8ccd7ae7@yandex.ru> <429484E1-DDFA-4050-B5BF-E43477441C84@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="14321"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 Cc: 47799@debbugs.gnu.org To: Philipp Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon May 17 01:24:10 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1liQ7F-0003da-QA for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 17 May 2021 01:24:09 +0200 Original-Received: from localhost ([::1]:35006 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1liQ7E-0005Dp-PB for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 16 May 2021 19:24:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:50174) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1liQ78-0005De-FS for bug-gnu-emacs@gnu.org; Sun, 16 May 2021 19:24:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:40118) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1liQ78-0002Ib-8R for bug-gnu-emacs@gnu.org; Sun, 16 May 2021 19:24:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1liQ78-0003xy-4U for bug-gnu-emacs@gnu.org; Sun, 16 May 2021 19:24:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 16 May 2021 23:24:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 47799 X-GNU-PR-Package: emacs Original-Received: via spool by 47799-submit@debbugs.gnu.org id=B47799.162120738715178 (code B ref 47799); Sun, 16 May 2021 23:24:02 +0000 Original-Received: (at 47799) by debbugs.gnu.org; 16 May 2021 23:23:07 +0000 Original-Received: from localhost ([127.0.0.1]:51664 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1liQ6E-0003wj-UB for submit@debbugs.gnu.org; Sun, 16 May 2021 19:23:07 -0400 Original-Received: from mail-wm1-f53.google.com ([209.85.128.53]:51145) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1liQ69-0003vx-0s for 47799@debbugs.gnu.org; Sun, 16 May 2021 19:23:05 -0400 Original-Received: by mail-wm1-f53.google.com with SMTP id t206so2587583wmf.0 for <47799@debbugs.gnu.org>; Sun, 16 May 2021 16:23:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=W4gyun+mD1bBV3/7yIOpmKsG6BCZb4Vc4RqB1hSgmDY=; b=nhYQt8xAnHqlwSifk4flSSao1jsSC2+/n/JdGNUAEeLgqaJmaw2ZcnkyG1zwwDSC7R GdJhXgLGl7lNg03ptowoegzyFoY0WZNbgdJNzeKPn2u29ahfNp3bmZyId/v+wgNq6tGc aCYCCfSLhqIZL5Yz+Eg6x7bkOQPaGFGAVppfqeawbtrZCgnMklWp5AjuX4qIMO1RcW4D Bld6/TVtRnOtHYFhDOHqgkBqK7tnLfqQPz5KtpGUXzyq9woXMFVfk36Cp0JodghoL1HE 3Tcup8nY9psaoKOMSPPoehKmRnWKvQJJG/J2O8o+v0P8vaWeq9rOotB7AquCjIF2oRp0 FL2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=W4gyun+mD1bBV3/7yIOpmKsG6BCZb4Vc4RqB1hSgmDY=; b=CuH537Bb5ZeFnic8SIkX5Chy3FKT4m/m7fkqthCed7R2ZoSidcOnLfu2dNu9yF2CpJ RiO83PciTogEGyVNGNzUBCiEI6zMj/71xk4X4q+x1OETMyWB33tgVe0TPytAVdlAQu5I wVfMmJJ269ODrw6kSq/usyttH04WEGbkw9WSxXYgJU64g3in3fmOybXZ1/36PDiQ0xvG S/gXzMo2TLdXVFb4/2QZ7keqGmJQWWSvKh1lTd7hHxFFwqwaGFlHsz2dT84562AmEwWs 6QkbLjZPs8BB0mazCkp9ymf1y7TIu/EkEK1ISp3HimmoIYSs8bd9ipLc4M2Bhdq+mlmz 3aOg== X-Gm-Message-State: AOAM53251FD+Q8whOfk2nIW6TVoShjI3Am8XA7VMZ/GUu8JExkBkKBIm snXgEcJ5djh9JKjHgfgeGBfN5pJJdN8= X-Google-Smtp-Source: ABdhPJwpPYnaalXWi1hIwgWYciPx/1hLjT1S/rzosuDzBHHqo3qul4E1hvCo2AeedD7zR6YGdHw3Sg== X-Received: by 2002:a7b:cc15:: with SMTP id f21mr20367501wmh.86.1621207375283; Sun, 16 May 2021 16:22:55 -0700 (PDT) Original-Received: from [192.168.0.6] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id r5sm14221174wmh.23.2021.05.16.16.22.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 16 May 2021 16:22:54 -0700 (PDT) In-Reply-To: <429484E1-DDFA-4050-B5BF-E43477441C84@gmail.com> Content-Language: en-US X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:206691 Archived-At: On 16.05.2021 16:37, Philipp wrote: > One thing that came to my mind is: in general, in Elisp (not just XRef), we spend lots of time parsing filenames to support remote and quoted filenames. Other languages probably solve this by introducing proper types for filenames (e.g. the Java Path class), which can then hold preprocessed information about the underlying filesystem (or special file name handler, in the case of Elisp). How about doing similar for Elisp? For example, introduce a `parsed-file-name' class or structure holding the remote/quoting state, or attach it to string properties? I haven't tried out that idea, but I think it could significantly speed up the parsing (since we'd only have to do it once and don't have to search for filename handlers all the time), as well as remain backward-compatible to "plain" unp arsed filenames by allowing both strings and this new object type. WDYT? That sounds like an interesting idea to explore. We create/concatenate those file names inside project-files, and then "parse" them again to convert to local names inside xref-matches-in-files. Creating such structures might indeed save us on some parsing and garbage generation. Experiments and patches welcome. What I was also thinking of previously, is some "fileset" data structure which could contain a list of local file names and their connection in a separate slot. Maybe even separating the parent/root directory into a separate slot when feasible, to minimize GC further, though that might complicate applications. A more structured "file" value format might make this stuff easier to use indeed, and perhaps the performance difference will be negligible. The difficulty is having a method like project-files return one format for some users, and another for users who want to take advantage of this performance improvement. Or we break the compatibility and/or introduce a new method with this new behavior. There is a one in the works already in the 'scratch/etags-regen' branch after all. Or another, more simplistic approach would be to have the method project-files-filtered return file names relative to the root (always, or when called with a certain argument). And then pass the root (and the connection/host) in the default-directory var. Then change xref-matches-in-files to use default-directory if the values in FILES are not absolute. The last approach would only work if we decide that a search across multiple roots (e.g. project roots together with external roots) can be done efficiently enough through multiple calls to xref-matches-in-files (and thus using multiple consecutive process calls). Someone should benchmark this in a real-world scenario; it might or might not show worse performance: OT1H, the potential for parallelism is more limited, and there is more overhead on process calls, OTOH, the practical parallelism is not infinite anyway, and the process soon bottlenecks on CPU and/or disk access throughput.