unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
@ 2021-04-15 13:43 Philipp Stephani
  2021-04-15 16:15 ` Dmitry Gutov
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp Stephani @ 2021-04-15 13:43 UTC (permalink / raw)
  To: 47799


emacs -Q -batch -l project -eval '(print (project-files (quote (transient . "/:/"))))'

("find: ‘/:/’: No such file or directory
")

Note that the error message is listed as a file.

1. `project-files' should unquote local filenames before passing them to
   `find'.

2. `project-files' should check for errors returned from `find'.


In GNU Emacs 28.0.50 (build 74, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0)
 of 2021-04-15
Repository revision: 31f8ae53beb9bada58750160c1bf7f867ecd442e
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12010000
System Description: Debian GNU/Linux rodete

Configured using:
 'configure --enable-gcc-warnings=warn-only
 --enable-gtk-deprecation-warnings --without-pop --with-mailutils
 --enable-checking=all --enable-check-lisp-object-type --with-modules
 'CFLAGS=-O0 -ggdb3''

Configured features:
CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LIBSELINUX LIBSYSTEMD MODULES NOTIFY INOTIFY PDUMPER PNG SECCOMP SOUND
THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XPM GTK3 ZLIB

Important settings:
  value of $LC_TIME: en_DK.utf8
  value of $LANG: en_US.utf8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc dired dired-loaddefs rfc822
mml mml-sec epa epg epg-config gnus-util rmail rmail-loaddefs time-date
mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils phst skeleton derived edmacro kmacro pcase ffap thingatpt url
url-proxy url-privacy url-expand url-methods url-history url-cookie
url-domsuf url-util url-parse auth-source cl-seq eieio eieio-core
cl-macs eieio-loaddefs password-cache json map url-vars mailcap rx
gnutls puny dbus xml subr-x seq byte-opt gv bytecomp byte-compile cconv
compile text-property-search comint ansi-color ring cl-loaddefs cl-lib
iso-transl tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads dbusbind
inotify dynamic-setting system-font-setting font-render-setting cairo
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 68419 8560)
 (symbols 48 8313 1)
 (strings 32 24255 1847)
 (string-bytes 1 785382)
 (vectors 16 14967)
 (vector-slots 8 194817 4757)
 (floats 8 26 32)
 (intervals 56 223 0)
 (buffers 992 11))

-- 
Google Germany GmbH
Erika-Mann-Straße 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich.  Falls Sie diese fälschlicherweise erhalten haben
sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie
alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail
an die falsche Person gesendet wurde.

This e-mail is confidential.  If you received this communication by mistake,
please don’t forward it to anyone else, please erase all copies and
attachments, and please let me know that it has gone to the wrong person.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-04-15 13:43 bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames Philipp Stephani
@ 2021-04-15 16:15 ` Dmitry Gutov
  2021-04-15 16:26   ` Philipp Stephani
  2021-04-15 16:44   ` Philipp Stephani
  0 siblings, 2 replies; 17+ messages in thread
From: Dmitry Gutov @ 2021-04-15 16:15 UTC (permalink / raw)
  To: Philipp Stephani, 47799

Hi Philipp,

On 15.04.2021 16:43, Philipp Stephani wrote:
> emacs -Q -batch -l project -eval '(print (project-files (quote (transient . "/:/"))))'
> 
> ("find: ‘/:/’: No such file or directory
> ")
> 
> Note that the error message is listed as a file.
> 
> 1. `project-files' should unquote local filenames before passing them to
>     `find'.
> 
> 2. `project-files' should check for errors returned from `find'.

Would you like to propose a patch?

I don't really understand the file quoting feature.

Is project--files-in-directory supposed to unquote? Should 
project--vc-list-files do that as well?

Does read-directory-name return quoted names when needed? Can 
locate-dominating-file return one?





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-04-15 16:15 ` Dmitry Gutov
@ 2021-04-15 16:26   ` Philipp Stephani
  2021-04-15 16:44   ` Philipp Stephani
  1 sibling, 0 replies; 17+ messages in thread
From: Philipp Stephani @ 2021-04-15 16:26 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 47799

Am Do., 15. Apr. 2021 um 18:15 Uhr schrieb Dmitry Gutov <dgutov@yandex.ru>:
>
> Hi Philipp,
>
> On 15.04.2021 16:43, Philipp Stephani wrote:
> > emacs -Q -batch -l project -eval '(print (project-files (quote (transient . "/:/"))))'
> >
> > ("find: ‘/:/’: No such file or directory
> > ")
> >
> > Note that the error message is listed as a file.
> >
> > 1. `project-files' should unquote local filenames before passing them to
> >     `find'.
> >
> > 2. `project-files' should check for errors returned from `find'.
>
> Would you like to propose a patch?
>
> I don't really understand the file quoting feature.
>
> Is project--files-in-directory supposed to unquote? Should
> project--vc-list-files do that as well?

I think only functions that pass filenames to external programs (that
don't know about Emacs filename handlers) should unquote. Unquoting
can change the meaning of a filename.
From what I can see, project--vs-list-files shouldn't unquote, because
it doesn't pass filenames to external programs.

>
> Does read-directory-name return quoted names when needed? Can
> locate-dominating-file return one?

Yes, both of these can return quoted names.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-04-15 16:15 ` Dmitry Gutov
  2021-04-15 16:26   ` Philipp Stephani
@ 2021-04-15 16:44   ` Philipp Stephani
  2021-04-16  1:08     ` Dmitry Gutov
  1 sibling, 1 reply; 17+ messages in thread
From: Philipp Stephani @ 2021-04-15 16:44 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 47799

Am Do., 15. Apr. 2021 um 18:15 Uhr schrieb Dmitry Gutov <dgutov@yandex.ru>:
>
> Hi Philipp,
>
> On 15.04.2021 16:43, Philipp Stephani wrote:
> > emacs -Q -batch -l project -eval '(print (project-files (quote (transient . "/:/"))))'
> >
> > ("find: ‘/:/’: No such file or directory
> > ")
> >
> > Note that the error message is listed as a file.
> >
> > 1. `project-files' should unquote local filenames before passing them to
> >     `find'.
> >
> > 2. `project-files' should check for errors returned from `find'.
>
> Would you like to propose a patch?

I've now pushed a minimal fix to this specific problem (commit
157bfc1812c51a0a48162c71eadf7959f7de9ac6), but there are probably more
places that should get fixed, e.g. xref--find-ignores-arguments.
project--files-in-directory also still ignores any errors from the
find binary.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-04-15 16:44   ` Philipp Stephani
@ 2021-04-16  1:08     ` Dmitry Gutov
  2021-04-18 20:06       ` Philipp Stephani
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Gutov @ 2021-04-16  1:08 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: 47799

On 15.04.2021 19:44, Philipp Stephani wrote:
> Am Do., 15. Apr. 2021 um 18:15 Uhr schrieb Dmitry Gutov <dgutov@yandex.ru>:
>>
>> Hi Philipp,
>>
>> On 15.04.2021 16:43, Philipp Stephani wrote:
>>> emacs -Q -batch -l project -eval '(print (project-files (quote (transient . "/:/"))))'
>>>
>>> ("find: ‘/:/’: No such file or directory
>>> ")
>>>
>>> Note that the error message is listed as a file.
>>>
>>> 1. `project-files' should unquote local filenames before passing them to
>>>      `find'.
>>>
>>> 2. `project-files' should check for errors returned from `find'.
>>
>> Would you like to propose a patch?
> 
> I've now pushed a minimal fix to this specific problem (commit
> 157bfc1812c51a0a48162c71eadf7959f7de9ac6), but there are probably more
> places that should get fixed, e.g. xref--find-ignores-arguments.
> project--files-in-directory also still ignores any errors from the
> find binary.

Thank you.

I've added error handling to project--files-in-directory in a follow-up 
commit.

Regarding xref--find-ignores-arguments, it seems it would be economical 
to do the quoting on the value passed to it. See f955df1.

Regarding your change, though, have you tried project-find-regexp in a 
"transient" project with a quoted root directory name?

You've made project--files-in-directory quote the returned file names, 
but that list gets passed to xref-matches-in-files, which pipes them to 
find-grep in the end. I suppose xref-matches-in-files could use a step 
similar to (when remote-id ...) that is already there.

A bit unfortunate for the users of large projects with quoted names, but 
not sure what else we could do.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-04-16  1:08     ` Dmitry Gutov
@ 2021-04-18 20:06       ` Philipp Stephani
  2021-04-18 20:21         ` Dmitry Gutov
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp Stephani @ 2021-04-18 20:06 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 47799

Am Fr., 16. Apr. 2021 um 03:08 Uhr schrieb Dmitry Gutov <dgutov@yandex.ru>:
>
> On 15.04.2021 19:44, Philipp Stephani wrote:
> > Am Do., 15. Apr. 2021 um 18:15 Uhr schrieb Dmitry Gutov <dgutov@yandex.ru>:
> >>
> >> Hi Philipp,
> >>
> >> On 15.04.2021 16:43, Philipp Stephani wrote:
> >>> emacs -Q -batch -l project -eval '(print (project-files (quote (transient . "/:/"))))'
> >>>
> >>> ("find: ‘/:/’: No such file or directory
> >>> ")
> >>>
> >>> Note that the error message is listed as a file.
> >>>
> >>> 1. `project-files' should unquote local filenames before passing them to
> >>>      `find'.
> >>>
> >>> 2. `project-files' should check for errors returned from `find'.
> >>
> >> Would you like to propose a patch?
> >
> > I've now pushed a minimal fix to this specific problem (commit
> > 157bfc1812c51a0a48162c71eadf7959f7de9ac6), but there are probably more
> > places that should get fixed, e.g. xref--find-ignores-arguments.
> > project--files-in-directory also still ignores any errors from the
> > find binary.
>
> Thank you.
>
> I've added error handling to project--files-in-directory in a follow-up
> commit.

Thanks.

>
> Regarding xref--find-ignores-arguments, it seems it would be economical
> to do the quoting on the value passed to it. See f955df1.

Since it's an internal function, this is mostly a matter of style. I
prefer using Emacs filenames as function arguments as much as
possible; it's the expected behavior for functions dealing with files,
and Emacs doesn't have a strong enough type system to distinguish
"Emacs filename" (which can be quoted or remote) from "filename for
external programs" (which must be unquoted and local).

>
> Regarding your change, though, have you tried project-find-regexp in a
> "transient" project with a quoted root directory name?
>
> You've made project--files-in-directory quote the returned file names,
> but that list gets passed to xref-matches-in-files, which pipes them to
> find-grep in the end. I suppose xref-matches-in-files could use a step
> similar to (when remote-id ...) that is already there.

Good point, I've pushed 6ebc6e12cf to fix. But xref.el should also be
fixed. Maybe I'll find some time to take a look at it.

>
> A bit unfortunate for the users of large projects with quoted names, but
> not sure what else we could do.

Quoting is a purely lexical operation which should be reasonably fast
even for a large list of files.
For truly enormous projects, something like "list of all project
files" is infeasible anyway.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-04-18 20:06       ` Philipp Stephani
@ 2021-04-18 20:21         ` Dmitry Gutov
  2021-04-19 14:48           ` Philipp Stephani
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Gutov @ 2021-04-18 20:21 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: 47799

On 18.04.2021 23:06, Philipp Stephani wrote:

>> Regarding xref--find-ignores-arguments, it seems it would be economical
>> to do the quoting on the value passed to it. See f955df1.
> 
> Since it's an internal function, this is mostly a matter of style. I
> prefer using Emacs filenames as function arguments as much as
> possible; it's the expected behavior for functions dealing with files,
> and Emacs doesn't have a strong enough type system to distinguish
> "Emacs filename" (which can be quoted or remote) from "filename for
> external programs" (which must be unquoted and local).

Matter of style, yes, but the way I fixed it requires one unquoting 
instead of two, which seems like a win.

Your point is also valid, of course.

>> Regarding your change, though, have you tried project-find-regexp in a
>> "transient" project with a quoted root directory name?
>>
>> You've made project--files-in-directory quote the returned file names,
>> but that list gets passed to xref-matches-in-files, which pipes them to
>> find-grep in the end. I suppose xref-matches-in-files could use a step
>> similar to (when remote-id ...) that is already there.
> 
> Good point, I've pushed 6ebc6e12cf to fix. But xref.el should also be
> fixed. Maybe I'll find some time to take a look at it.

Yes, I think the fix needs to be in xref-matches-in-files, and it also 
needs to use file-name-quoted-p similarly to (file-remote-p 
default-directory, to avoid the mapping overhead when the list of file 
names is large.

It's a realistic use case, and the impact is significant (e.g. 1s to 
list files, 2s to unquote them all), so please look into this soon. The 
fix in xref should be simple enough, I'd just like for someone to 
realistically test it before installing (I can send a patch, if you want).

>> A bit unfortunate for the users of large projects with quoted names, but
>> not sure what else we could do.
> 
> Quoting is a purely lexical operation which should be reasonably fast
> even for a large list of files.
> For truly enormous projects, something like "list of all project
> files" is infeasible anyway.

File listing in projects backed by Git (at least) is quite optimized. 
For example, in a gecko-dev checkout:

(benchmark 1 '(project-files (project-current)))
=> 0.97s

(benchmark 1 '(mapcar #'file-name-unquote (project-files 
(project-current))))
=> 2.97s





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-04-18 20:21         ` Dmitry Gutov
@ 2021-04-19 14:48           ` Philipp Stephani
  2021-04-19 20:48             ` Dmitry Gutov
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp Stephani @ 2021-04-19 14:48 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 47799

Am So., 18. Apr. 2021 um 22:21 Uhr schrieb Dmitry Gutov <dgutov@yandex.ru>:

> >> Regarding your change, though, have you tried project-find-regexp in a
> >> "transient" project with a quoted root directory name?
> >>
> >> You've made project--files-in-directory quote the returned file names,
> >> but that list gets passed to xref-matches-in-files, which pipes them to
> >> find-grep in the end. I suppose xref-matches-in-files could use a step
> >> similar to (when remote-id ...) that is already there.
> >
> > Good point, I've pushed 6ebc6e12cf to fix. But xref.el should also be
> > fixed. Maybe I'll find some time to take a look at it.
>
> Yes, I think the fix needs to be in xref-matches-in-files, and it also
> needs to use file-name-quoted-p similarly to (file-remote-p
> default-directory, to avoid the mapping overhead when the list of file
> names is large.
>
> It's a realistic use case, and the impact is significant (e.g. 1s to
> list files, 2s to unquote them all), so please look into this soon. The
> fix in xref should be simple enough, I'd just like for someone to
> realistically test it before installing (I can send a patch, if you want).

Hah, I wasn't aware that quoting/unquoting is so slow.
Rather than making assumptions in xref-matches-in-files, maybe we
could work more with relative filenames. For example:
1. Add another project method "project-relative-files" that returns
filenames relative to the root. By default, this would call
project-files and make the filenames relative, but project
implementations can provide an optimized implementation.
2. Give xref-matches-in-files an optional root directory argument and
allow users to pass names relative to that root.
Then I think both project and xref could leave these relative
filenames alone. WDYT?





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-04-19 14:48           ` Philipp Stephani
@ 2021-04-19 20:48             ` Dmitry Gutov
  2021-04-22  0:46               ` Dmitry Gutov
  2021-05-16 13:37               ` Philipp
  0 siblings, 2 replies; 17+ messages in thread
From: Dmitry Gutov @ 2021-04-19 20:48 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: 47799

On 19.04.2021 17:48, Philipp Stephani wrote:

>> It's a realistic use case, and the impact is significant (e.g. 1s to
>> list files, 2s to unquote them all), so please look into this soon. The
>> fix in xref should be simple enough, I'd just like for someone to
>> realistically test it before installing (I can send a patch, if you want).
> 
> Hah, I wasn't aware that quoting/unquoting is so slow.

It's file listing that is fast, rather. ;-)

The project in question has 200'000 files.

> Rather than making assumptions in xref-matches-in-files, maybe we
> could work more with relative filenames. For example:
> 1. Add another project method "project-relative-files" that returns
> filenames relative to the root. By default, this would call
> project-files and make the filenames relative, but project
> implementations can provide an optimized implementation.
> 2. Give xref-matches-in-files an optional root directory argument and
> allow users to pass names relative to that root.
> Then I think both project and xref could leave these relative
> filenames alone. WDYT?

We've discussed this before, but it's a change in the API, a +1 method 
for a very minor feature.

And how will we explain anyway that xref-matches-in-files, when called 
without the new ROOT argument, doesn't handle remote or quoted file names?

So if you can fix this to avoid performance loss in the general case, 
that would be a good improvement for now.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-04-19 20:48             ` Dmitry Gutov
@ 2021-04-22  0:46               ` Dmitry Gutov
  2021-05-16 13:37               ` Philipp
  1 sibling, 0 replies; 17+ messages in thread
From: Dmitry Gutov @ 2021-04-22  0:46 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: 47799

On 19.04.2021 23:48, Dmitry Gutov wrote:
> And how will we explain anyway that xref-matches-in-files, when called 
> without the new ROOT argument, doesn't handle remote or quoted file names?

The above is probably the only real blocker I see. Else we would need to 
document it to explicitly only take relative file names, I think. Do we 
have a precedent in the core library for this?

Other than that, an exploration into an API working with relative files 
names sounds good, actually. It could further speed up file listing, 
eliminating some concatenations in project--vc-list-files (in the big 
project I referred to, that takes it from 1s down to 0.75s).

If you have the time, patches welcome, even rough ones.

> So if you can fix this to avoid performance loss in the general case, 
> that would be a good improvement for now.

In the meantime, I've changed the fix to use the plan explained previously.

Downsides: either all files should be quoted, or none (is that a 
reasonable assumption?), and, of course, users which which do have 
directories making use of quoting still pay the performance overhead.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-04-19 20:48             ` Dmitry Gutov
  2021-04-22  0:46               ` Dmitry Gutov
@ 2021-05-16 13:37               ` Philipp
  2021-05-16 23:22                 ` Dmitry Gutov
  1 sibling, 1 reply; 17+ messages in thread
From: Philipp @ 2021-05-16 13:37 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 47799



> Am 19.04.2021 um 22:48 schrieb Dmitry Gutov <dgutov@yandex.ru>:
> 
> 
>> Rather than making assumptions in xref-matches-in-files, maybe we
>> could work more with relative filenames. For example:
>> 1. Add another project method "project-relative-files" that returns
>> filenames relative to the root. By default, this would call
>> project-files and make the filenames relative, but project
>> implementations can provide an optimized implementation.
>> 2. Give xref-matches-in-files an optional root directory argument and
>> allow users to pass names relative to that root.
>> Then I think both project and xref could leave these relative
>> filenames alone. WDYT?
> 
> We've discussed this before, but it's a change in the API, a +1 method for a very minor feature.
> 
> And how will we explain anyway that xref-matches-in-files, when called without the new ROOT argument, doesn't handle remote or quoted file names?
> 
> So if you can fix this to avoid performance loss in the general case, that would be a good improvement for now.

Yeah, I think you're right, we shouldn't complicate the API unnecessarily for optimization purposes.

One thing that came to my mind is: in general, in Elisp (not just XRef), we spend lots of time parsing filenames to support remote and quoted filenames.  Other languages probably solve this by introducing proper types for filenames (e.g. the Java Path class), which can then hold preprocessed information about the underlying filesystem (or special file name handler, in the case of Elisp).  How about doing similar for Elisp?  For example, introduce a `parsed-file-name' class or structure holding the remote/quoting state, or attach it to string properties?  I haven't tried out that idea, but I think it could significantly speed up the parsing (since we'd only have to do it once and don't have to search for filename handlers all the time), as well as remain backward-compatible to "plain" unparsed filenames by allowing both strings and this new object type.  WDYT?




^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-05-16 13:37               ` Philipp
@ 2021-05-16 23:22                 ` Dmitry Gutov
  2021-05-16 23:31                   ` Dmitry Gutov
  2021-07-05 19:05                   ` Philipp
  0 siblings, 2 replies; 17+ messages in thread
From: Dmitry Gutov @ 2021-05-16 23:22 UTC (permalink / raw)
  To: Philipp; +Cc: 47799

On 16.05.2021 16:37, Philipp wrote:

> One thing that came to my mind is: in general, in Elisp (not just XRef), we spend lots of time parsing filenames to support remote and quoted filenames.  Other languages probably solve this by introducing proper types for filenames (e.g. the Java Path class), which can then hold preprocessed information about the underlying filesystem (or special file name handler, in the case of Elisp).  How about doing similar for Elisp?  For example, introduce a `parsed-file-name' class or structure holding the remote/quoting state, or attach it to string properties?  I haven't tried out that idea, but I think it could significantly speed up the parsing (since we'd only have to do it once and don't have to search for filename handlers all the time), as well as remain backward-compatible to "plain" unp
 arsed filenames by allowing both strings and this new object type.  WDYT?

That sounds like an interesting idea to explore.

We create/concatenate those file names inside project-files, and then 
"parse" them again to convert to local names inside 
xref-matches-in-files. Creating such structures might indeed save us on 
some parsing and garbage generation.

Experiments and patches welcome.

What I was also thinking of previously, is some "fileset" data structure 
which could contain a list of local file names and their connection in a 
separate slot. Maybe even separating the parent/root directory into a 
separate slot when feasible, to minimize GC further, though that might 
complicate applications.

A more structured "file" value format might make this stuff easier to 
use indeed, and perhaps the performance difference will be negligible.

The difficulty is having a method like project-files return one format 
for some users, and another for users who want to take advantage of this 
performance improvement. Or we break the compatibility and/or introduce 
a new method with this new behavior.

There is a one in the works already in the 'scratch/etags-regen' branch 
after all.

Or another, more simplistic approach would be to have the method 
project-files-filtered return file names relative to the root (always, 
or when called with a certain argument). And then pass the root (and the 
connection/host) in the default-directory var. Then change 
xref-matches-in-files to use default-directory if the values in FILES 
are not absolute.

The last approach would only work if we decide that a search across 
multiple roots (e.g. project roots together with external roots) can be 
done efficiently enough through multiple calls to xref-matches-in-files 
(and thus using multiple consecutive process calls). Someone should 
benchmark this in a real-world scenario; it might or might not show 
worse performance: OT1H, the potential for parallelism is more limited, 
and there is more overhead on process calls, OTOH, the practical 
parallelism is not infinite anyway, and the process soon bottlenecks on 
CPU and/or disk access throughput.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-05-16 23:22                 ` Dmitry Gutov
@ 2021-05-16 23:31                   ` Dmitry Gutov
  2021-07-05 19:05                   ` Philipp
  1 sibling, 0 replies; 17+ messages in thread
From: Dmitry Gutov @ 2021-05-16 23:31 UTC (permalink / raw)
  To: Philipp; +Cc: 47799

On 17.05.2021 02:22, Dmitry Gutov wrote:
> Or another, more simplistic approach would be to have the method 
> project-files-filtered return file names relative to the root (always, 
> or when called with a certain argument). And then pass the root (and the 
> connection/host) in the default-directory var. Then change 
> xref-matches-in-files to use default-directory if the values in FILES 
> are not absolute.

Looking at the previous discussion, this actually seems very close to 
what you, Philipp, suggested on 19.04.2021.

So above are my subsequent thoughts on how it can be implemented without 
requiring much change to the API (though complicating the 
implementations a bit).





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-05-16 23:22                 ` Dmitry Gutov
  2021-05-16 23:31                   ` Dmitry Gutov
@ 2021-07-05 19:05                   ` Philipp
  2021-07-18  0:53                     ` Dmitry Gutov
  1 sibling, 1 reply; 17+ messages in thread
From: Philipp @ 2021-07-05 19:05 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 47799



> Am 17.05.2021 um 01:22 schrieb Dmitry Gutov <dgutov@yandex.ru>:
> 
> On 16.05.2021 16:37, Philipp wrote:
> 
>> One thing that came to my mind is: in general, in Elisp (not just XRef), we spend lots of time parsing filenames to support remote and quoted filenames.  Other languages probably solve this by introducing proper types for filenames (e.g. the Java Path class), which can then hold preprocessed information about the underlying filesystem (or special file name handler, in the case of Elisp).  How about doing similar for Elisp?  For example, introduce a `parsed-file-name' class or structure holding the remote/quoting state, or attach it to string properties?  I haven't tried out that idea, but I think it could significantly speed up the parsing (since we'd only have to do it once and don't have to search for filename handlers all the time), as well as remain backward-compatible to "plain" unparsed filenames by allowing both strings and this new object type.  WDYT?
> 
> That sounds like an interesting idea to explore.
> 
> We create/concatenate those file names inside project-files, and then "parse" them again to convert to local names inside xref-matches-in-files. Creating such structures might indeed save us on some parsing and garbage generation.
> 
> Experiments and patches welcome.
> 
> What I was also thinking of previously, is some "fileset" data structure which could contain a list of local file names and their connection in a separate slot. Maybe even separating the parent/root directory into a separate slot when feasible, to minimize GC further, though that might complicate applications.
> 
> A more structured "file" value format might make this stuff easier to use indeed, and perhaps the performance difference will be negligible.

I think those are very good ideas.  The "fileset" structure sounds like a pretty good abstraction.

> 
> The difficulty is having a method like project-files return one format for some users, and another for users who want to take advantage of this performance improvement. Or we break the compatibility and/or introduce a new method with this new behavior.

A general design approach in OOP is to not treat abstract virtual functions (generic functions in ELisp terminology) as part of the public interface of a type; i.e., abstract functions can be implemented, but shouldn't be called outside of the module that defines them (project.el in this case).  That allows for changes like this: implementers could freely return the new fileset structure because only project.el would call project-files.  Not sure how much ELisp code adheres to this principle, though.  If there's too much code (outside of project.el) that relies on project-files returning a list, we need to indeed fall back to some of the other options.






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-07-05 19:05                   ` Philipp
@ 2021-07-18  0:53                     ` Dmitry Gutov
  2021-09-05 17:14                       ` Philipp
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Gutov @ 2021-07-18  0:53 UTC (permalink / raw)
  To: Philipp; +Cc: 47799

On 05.07.2021 22:05, Philipp wrote:

>> What I was also thinking of previously, is some "fileset" data structure which could contain a list of local file names and their connection in a separate slot. Maybe even separating the parent/root directory into a separate slot when feasible, to minimize GC further, though that might complicate applications.
>>
>> A more structured "file" value format might make this stuff easier to use indeed, and perhaps the performance difference will be negligible.
> 
> I think those are very good ideas.  The "fileset" structure sounds like a pretty good abstraction.

I've yet to express that in terms of code, though.

>> The difficulty is having a method like project-files return one format for some users, and another for users who want to take advantage of this performance improvement. Or we break the compatibility and/or introduce a new method with this new behavior.
> 
> A general design approach in OOP is to not treat abstract virtual functions (generic functions in ELisp terminology) as part of the public interface of a type; i.e., abstract functions can be implemented, but shouldn't be called outside of the module that defines them (project.el in this case).  That allows for changes like this: implementers could freely return the new fileset structure because only project.el would call project-files.  Not sure how much ELisp code adheres to this principle, though.  

When you say "abstract virtual functions", do you mean OOP as in C++ 
OOP? I'm not sure about standard practices there, but this sounds more 
like C++ and less like OOP in general.

I'm looking as generic functions here as part of an interface signature 
(like Java or Go interface). They are programmed against (which is the 
case with project.el) and are supposed to be stable.

 > If there's too much code (outside of project.el) that relies on 
project-files returning a list, we need to indeed fall back to some of 
the other options.

A new method seems to be the way forward. Or, say, an ad-hoc argument 
which determines whether file names should be relative.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-07-18  0:53                     ` Dmitry Gutov
@ 2021-09-05 17:14                       ` Philipp
  2021-09-20 16:05                         ` Dmitry Gutov
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp @ 2021-09-05 17:14 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 47799



> Am 18.07.2021 um 02:53 schrieb Dmitry Gutov <dgutov@yandex.ru>:
> 
> On 05.07.2021 22:05, Philipp wrote:
> 
>>> The difficulty is having a method like project-files return one format for some users, and another for users who want to take advantage of this performance improvement. Or we break the compatibility and/or introduce a new method with this new behavior.
>> A general design approach in OOP is to not treat abstract virtual functions (generic functions in ELisp terminology) as part of the public interface of a type; i.e., abstract functions can be implemented, but shouldn't be called outside of the module that defines them (project.el in this case).  That allows for changes like this: implementers could freely return the new fileset structure because only project.el would call project-files.  Not sure how much ELisp code adheres to this principle, though.  
> 
> When you say "abstract virtual functions", do you mean OOP as in C++ OOP? I'm not sure about standard practices there, but this sounds more like C++ and less like OOP in general.
> 
> I'm looking as generic functions here as part of an interface signature (like Java or Go interface). They are programmed against (which is the case with project.el) and are supposed to be stable.

I think the idea is applicable to most programming languages that have some form of subtype polymorphism.  Basically, for a normal (monomorphic) function, you can make the parameter types more general or the return type over time more specific over time without breaking compatibility.  For a polymorphic function that's only specialized but not called outside the defining entity, e.g. a private virtual function in C++ or a method marked as @ForOverride (https://github.com/google/error-prone/blob/master/annotations/src/main/java/com/google/errorprone/annotations/ForOverride.java) in Java, it's the other way round: you can make the parameter types more specific and the return type more generic over time.  That implies that for a polymorphic function that's also called outside the defining entity, you can't change any of the types without breaking compatibility.  Thus the suggestion to separate the interface for callers from the interface for subclasses/specializers.

> 
> > If there's too much code (outside of project.el) that relies on project-files returning a list, we need to indeed fall back to some of the other options.
> 
> A new method seems to be the way forward. Or, say, an ad-hoc argument which determines whether file names should be relative.

I guess you also can't introduce new parameters without breaking compatibility either.  That would only leave the new method possibility.  We could then say that nothing outside project.el should call it to avoid the above problem.  Ideally, the byte compiler would support a declaration form similar to @ForOverride to warn about such invocations.






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames
  2021-09-05 17:14                       ` Philipp
@ 2021-09-20 16:05                         ` Dmitry Gutov
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Gutov @ 2021-09-20 16:05 UTC (permalink / raw)
  To: Philipp; +Cc: 47799

Hi Philipp,

Sorry for the long pause.

On 05.09.2021 20:14, Philipp wrote:
> 
> 
>> Am 18.07.2021 um 02:53 schrieb Dmitry Gutov <dgutov@yandex.ru>:
>>
>> On 05.07.2021 22:05, Philipp wrote:
>>
>>>> The difficulty is having a method like project-files return one format for some users, and another for users who want to take advantage of this performance improvement. Or we break the compatibility and/or introduce a new method with this new behavior.
>>> A general design approach in OOP is to not treat abstract virtual functions (generic functions in ELisp terminology) as part of the public interface of a type; i.e., abstract functions can be implemented, but shouldn't be called outside of the module that defines them (project.el in this case).  That allows for changes like this: implementers could freely return the new fileset structure because only project.el would call project-files.  Not sure how much ELisp code adheres to this principle, though.
>>
>> When you say "abstract virtual functions", do you mean OOP as in C++ OOP? I'm not sure about standard practices there, but this sounds more like C++ and less like OOP in general.
>>
>> I'm looking as generic functions here as part of an interface signature (like Java or Go interface). They are programmed against (which is the case with project.el) and are supposed to be stable.
> 
> I think the idea is applicable to most programming languages that have some form of subtype polymorphism.  Basically, for a normal (monomorphic) function, you can make the parameter types more general or the return type over time more specific over time without breaking compatibility.  For a polymorphic function that's only specialized but not called outside the defining entity, e.g. a private virtual function in C++ or a method marked as @ForOverride (https://github.com/google/error-prone/blob/master/annotations/src/main/java/com/google/errorprone/annotations/ForOverride.java) in Java, it's the other way round: you can make the parameter types more specific and the return type more generic over time.  That implies that for a polymorphic function that's also called outside the defining e
 ntity, you can't change any of the types without breaking compatibility.  Thus the suggestion to separate the interface for callers from the interface for subclasses/specializers.

I'm not quite familiar with the practice, and since we don't do type 
parameterization here, the justification probably doesn't apply.

>>> If there's too much code (outside of project.el) that relies on project-files returning a list, we need to indeed fall back to some of the other options.
>>
>> A new method seems to be the way forward. Or, say, an ad-hoc argument which determines whether file names should be relative.
> 
> I guess you also can't introduce new parameters without breaking compatibility either.

That is also true.

So we'll probably go this way: I'm already experimenting with a method 
called project-files-filtered in the scratch/etags-regen branch.

> That would only leave the new method possibility.  We could then say that nothing outside project.el should call it to avoid the above problem.  Ideally, the byte compiler would support a declaration form similar to @ForOverride to warn about such invocations.

I get the idea of having a Template Method pattern where the public 
method is something else (this approach seems to be described in 
ForOverride.java), but I'm not sure how to apply it to our case. I.e., 
what kind of data the "public" function would accept and return.

If you mean it would return a list of absolute file names, that would 
actually limit its usefulness.

And/or perhaps you imagined that our projects could have a "private" 
method which would not be "exported" to outside code, but which 
project-find-regexp would be able to use? I don't really like that 
approach, outside code should be able to reimplement the same features.





^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-09-20 16:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15 13:43 bug#47799: 28.0.50; Default `project-files' implementation doesn't work with quoted filenames Philipp Stephani
2021-04-15 16:15 ` Dmitry Gutov
2021-04-15 16:26   ` Philipp Stephani
2021-04-15 16:44   ` Philipp Stephani
2021-04-16  1:08     ` Dmitry Gutov
2021-04-18 20:06       ` Philipp Stephani
2021-04-18 20:21         ` Dmitry Gutov
2021-04-19 14:48           ` Philipp Stephani
2021-04-19 20:48             ` Dmitry Gutov
2021-04-22  0:46               ` Dmitry Gutov
2021-05-16 13:37               ` Philipp
2021-05-16 23:22                 ` Dmitry Gutov
2021-05-16 23:31                   ` Dmitry Gutov
2021-07-05 19:05                   ` Philipp
2021-07-18  0:53                     ` Dmitry Gutov
2021-09-05 17:14                       ` Philipp
2021-09-20 16:05                         ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).