unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dgutov@yandex.ru>
To: Philipp <p.stephani2@gmail.com>
Cc: 47799@debbugs.gnu.org
Subject: bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
Date: Mon, 17 May 2021 02:22:52 +0300	[thread overview]
Message-ID: <d55cf5b0-a1d0-c9ab-30ad-7a8e1c0889c2@yandex.ru> (raw)
In-Reply-To: <429484E1-DDFA-4050-B5BF-E43477441C84@gmail.com>

On 16.05.2021 16:37, Philipp wrote:

> One thing that came to my mind is: in general, in Elisp (not just XRef), we spend lots of time parsing filenames to support remote and quoted filenames.  Other languages probably solve this by introducing proper types for filenames (e.g. the Java Path class), which can then hold preprocessed information about the underlying filesystem (or special file name handler, in the case of Elisp).  How about doing similar for Elisp?  For example, introduce a `parsed-file-name' class or structure holding the remote/quoting state, or attach it to string properties?  I haven't tried out that idea, but I think it could significantly speed up the parsing (since we'd only have to do it once and don't have to search for filename handlers all the time), as well as remain backward-compatible to "plain" unp
 arsed filenames by allowing both strings and this new object type.  WDYT?

That sounds like an interesting idea to explore.

We create/concatenate those file names inside project-files, and then 
"parse" them again to convert to local names inside 
xref-matches-in-files. Creating such structures might indeed save us on 
some parsing and garbage generation.

Experiments and patches welcome.

What I was also thinking of previously, is some "fileset" data structure 
which could contain a list of local file names and their connection in a 
separate slot. Maybe even separating the parent/root directory into a 
separate slot when feasible, to minimize GC further, though that might 
complicate applications.

A more structured "file" value format might make this stuff easier to 
use indeed, and perhaps the performance difference will be negligible.

The difficulty is having a method like project-files return one format 
for some users, and another for users who want to take advantage of this 
performance improvement. Or we break the compatibility and/or introduce 
a new method with this new behavior.

There is a one in the works already in the 'scratch/etags-regen' branch 
after all.

Or another, more simplistic approach would be to have the method 
project-files-filtered return file names relative to the root (always, 
or when called with a certain argument). And then pass the root (and the 
connection/host) in the default-directory var. Then change 
xref-matches-in-files to use default-directory if the values in FILES 
are not absolute.

The last approach would only work if we decide that a search across 
multiple roots (e.g. project roots together with external roots) can be 
done efficiently enough through multiple calls to xref-matches-in-files 
(and thus using multiple consecutive process calls). Someone should 
benchmark this in a real-world scenario; it might or might not show 
worse performance: OT1H, the potential for parallelism is more limited, 
and there is more overhead on process calls, OTOH, the practical 
parallelism is not infinite anyway, and the process soon bottlenecks on 
CPU and/or disk access throughput.





  reply	other threads:[~2021-05-16 23:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-15 13:43 bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames Philipp Stephani
2021-04-15 16:15 ` Dmitry Gutov
2021-04-15 16:26   ` Philipp Stephani
2021-04-15 16:44   ` Philipp Stephani
2021-04-16  1:08     ` Dmitry Gutov
2021-04-18 20:06       ` Philipp Stephani
2021-04-18 20:21         ` Dmitry Gutov
2021-04-19 14:48           ` Philipp Stephani
2021-04-19 20:48             ` Dmitry Gutov
2021-04-22  0:46               ` Dmitry Gutov
2021-05-16 13:37               ` Philipp
2021-05-16 23:22                 ` Dmitry Gutov [this message]
2021-05-16 23:31                   ` Dmitry Gutov
2021-07-05 19:05                   ` Philipp
2021-07-18  0:53                     ` Dmitry Gutov
2021-09-05 17:14                       ` Philipp
2021-09-20 16:05                         ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d55cf5b0-a1d0-c9ab-30ad-7a8e1c0889c2@yandex.ru \
    --to=dgutov@yandex.ru \
    --cc=47799@debbugs.gnu.org \
    --cc=p.stephani2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).