* TRAMP problem with large repositories
@ 2019-12-11 19:46 Philippe Vaucher
2019-12-12 13:35 ` Michael Albinus
0 siblings, 1 reply; 8+ messages in thread
From: Philippe Vaucher @ 2019-12-11 19:46 UTC (permalink / raw)
To: Emacs developers; +Cc: Michael Albinus
[-- Attachment #1: Type: text/plain, Size: 2224 bytes --]
Hello,
Sorry if this is not the right place to post, feel free to redirect me as
needed.
While helping someone for a projectile issue (
https://github.com/bbatsov/projectile/issues/1480), it seems that when
`shell-command-to-string` tries to execute `git ls-files -zco
--exclude-standard` over TRAMP on a repository that has 85K files it takes
forever to complete.
Here's a stacktrace:
https://user-images.githubusercontent.com/81829/70549675-72b07f00-1b29-11ea-90f6-91fe0c36b0f4.png
We see that `tramp-wait-for-output` calls `tramp-wait-for-regexp` which
calls `tramp-check-for-regexp`, and when looking at the source:
(defun tramp-wait-for-output (proc &optional timeout)
"Wait for output from remote command."
(unless (buffer-live-p (process-buffer proc))
(delete-process proc)
(tramp-error proc 'file-error "Process `%s' not available, try again" proc))
(with-current-buffer (process-buffer proc)
(let* (;; Initially, `tramp-end-of-output' is "#$ ". There might
;; be leading escape sequences, which must be ignored.
;; Busyboxes built with the EDITING_ASK_TERMINAL config
;; option send also escape sequences, which must be
;; ignored.
(regexp (format "[^#$\n]*%s\\(%s\\)?\r?$"
(regexp-quote tramp-end-of-output)
tramp-device-escape-sequence-regexp))
;; Sometimes, the commands do not return a newline but a
;; null byte before the shell prompt, for example "git
;; ls-files -c -z ...".
(regexp1 (format "\\(^\\|\000\\)%s" regexp))
(found (tramp-wait-for-regexp proc timeout regexp1)))
.... snip ...
My understanding is that it does a loop that reads a bit of what the
commands outputs then tries to parse end of lines (or '\0') and repeats
until the process died or that it found one. Because the command returns a
huge string (85K files), this process of read-regexp-repeat takes all the
CPU (compared to reading the whole chunk in one go and then trying to check
for the regexp).
My questions are the following:
1. Did I understand the problem right? Is this something known?
2. Is there something to be done about this? Or maybe it would it
require too much refactoring / faster implementation?
Kind regards,
Philippe
[-- Attachment #2: Type: text/html, Size: 7583 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories
2019-12-11 19:46 TRAMP problem with large repositories Philippe Vaucher
@ 2019-12-12 13:35 ` Michael Albinus
2019-12-13 11:39 ` Philippe Vaucher
0 siblings, 1 reply; 8+ messages in thread
From: Michael Albinus @ 2019-12-12 13:35 UTC (permalink / raw)
To: Philippe Vaucher; +Cc: Emacs developers
[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]
Philippe Vaucher <philippe.vaucher@gmail.com> writes:
> Hello,
Hi Philippe,
> While helping someone for a projectile issue
> (https://github.com/bbatsov/projectile/issues/1480), it seems that
> when `shell-command-to-string` tries to execute `git ls-files -zco -
> -exclude-standard` over TRAMP on a repository that has 85K files it
> takes forever to complete.
>
> We see that `tramp-wait-for-output` calls `tramp-wait-for-regexp`
> which calls `tramp-check-for-regexp`, and when looking at the source:
>
> My understanding is that it does a loop that reads a bit of what the
> commands outputs then tries to parse end of lines (or '\0') and
> repeats until the process died or that it found one. Because the
> command returns a huge string (85K files), this process of
> read-regexp-repeat takes all the CPU (compared to reading the whole
> chunk in one go and then trying to check for the regexp).
>
> My questions are the following:
>
> 1 Did I understand the problem right? Is this something known?
Yes, your analysis is right. And no, I haven't seen related reports yet.
> 2 Is there something to be done about this? Or maybe it would it
> require too much refactoring / faster implementation?
I have appended a patch which should fix the problem. Could you, please,
(let) test?
Btw, the latest Tramp release is always available via GNU ELPA.
> Kind regards,
> Philippe
Best regards, Michael.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3505 bytes --]
diff --git a/lisp/tramp-sh.el b/lisp/tramp-sh.el
index 8de88d35..506c33df 100644
--- a/lisp/tramp-sh.el
+++ b/lisp/tramp-sh.el
@@ -5102,9 +5102,8 @@ function waits for output unless NOOUTPUT is set."
(forward-line 1)
(delete-region (point-min) (point)))
;; Delete the prompt.
- (goto-char (point-max))
- (re-search-backward regexp nil t)
- (delete-region (point) (point-max)))
+ (when (tramp-search-regexp regexp)
+ (delete-region (point) (point-max))))
(if timeout
(tramp-error
proc 'file-error
@@ -5134,8 +5133,7 @@ DONT-SUPPRESS-ERR is non-nil, stderr won't be sent to /dev/null."
"echo tramp_exit_status $?"
(if subshell " )" "")))
(with-current-buffer (tramp-get-connection-buffer vec)
- (goto-char (point-max))
- (unless (re-search-backward "tramp_exit_status [0-9]+" nil t)
+ (unless (tramp-search-regexp "tramp_exit_status [0-9]+")
(tramp-error
vec 'file-error "Couldn't find exit status of `%s'" command))
(skip-chars-forward "^ ")
diff --git a/lisp/tramp.el b/lisp/tramp.el
index 03e04568..e05e0965 100644
--- a/lisp/tramp.el
+++ b/lisp/tramp.el
@@ -4196,19 +4196,35 @@ for process communication also."
(buffer-string))
result)))
+(defun tramp-search-regexp (regexp)
+ "Search for REGEXP backwards, starting at point-max.
+If found, set point to the end of the occurrence found, and return point.
+Otherwise, return nil."
+ (goto-char (point-max))
+ ;; We restrict ourselves to the last 256 characters. There were
+ ;; reports of 85kB output, which has blocked Tramp forever.
+ (re-search-backward regexp (max (point-min) (- (point) 256)) 'noerror))
+
(defun tramp-check-for-regexp (proc regexp)
"Check, whether REGEXP is contained in process buffer of PROC.
Erase echoed commands if exists."
(with-current-buffer (process-buffer proc)
(goto-char (point-min))
- ;; Check whether we need to remove echo output.
+ ;; Check whether we need to remove echo output. The max length of
+ ;; the echo mark regexp is taken for search. We restrict the
+ ;; search for the second echo mark to PIPE_BUF characters.
(when (and (tramp-get-connection-property proc "check-remote-echo" nil)
- (re-search-forward tramp-echoed-echo-mark-regexp nil t))
+ (re-search-forward
+ tramp-echoed-echo-mark-regexp
+ (+ (point) (* 5 tramp-echo-mark-marker-length)) t))
(let ((begin (match-beginning 0)))
- (when (re-search-forward tramp-echoed-echo-mark-regexp nil t)
+ (when
+ (re-search-forward
+ tramp-echoed-echo-mark-regexp
+ (+ (point) (tramp-get-connection-property proc "pipe-buf" 4096)) t)
;; Discard echo from remote output.
- (tramp-set-connection-property proc "check-remote-echo" nil)
+ (tramp-flush-connection-property proc "check-remote-echo")
(tramp-message proc 5 "echo-mark found")
(forward-line 1)
(delete-region begin (point))
@@ -4229,8 +4245,7 @@ Erase echoed commands if exists."
;; overflow in regexp matcher". For example, //DIRED// lines of
;; directory listings with some thousand files. Therefore, we
;; look from the end.
- (goto-char (point-max))
- (ignore-errors (re-search-backward regexp nil t)))))
+ (tramp-search-regexp regexp))))
(defun tramp-wait-for-regexp (proc timeout regexp)
"Wait for a REGEXP to appear from process PROC within TIMEOUT seconds.
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories
2019-12-12 13:35 ` Michael Albinus
@ 2019-12-13 11:39 ` Philippe Vaucher
2019-12-13 11:56 ` Michael Albinus
0 siblings, 1 reply; 8+ messages in thread
From: Philippe Vaucher @ 2019-12-13 11:39 UTC (permalink / raw)
To: Michael Albinus; +Cc: Emacs developers
[-- Attachment #1: Type: text/plain, Size: 397 bytes --]
>
> > 2 Is there something to be done about this? Or maybe it would it
>
> require too much refactoring / faster implementation?
>
> I have appended a patch which should fix the problem. Could you, please,
> (let) test?
>
Great, I'll make them test & report (or test myself if needed).
> Btw, the latest Tramp release is always available via GNU ELPA.
>
Ah, good to know!
Thanks,
Philippe
[-- Attachment #2: Type: text/html, Size: 985 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories
2019-12-13 11:39 ` Philippe Vaucher
@ 2019-12-13 11:56 ` Michael Albinus
2019-12-13 17:38 ` Philippe Vaucher
0 siblings, 1 reply; 8+ messages in thread
From: Michael Albinus @ 2019-12-13 11:56 UTC (permalink / raw)
To: Philippe Vaucher; +Cc: Emacs developers
Philippe Vaucher <philippe.vaucher@gmail.com> writes:
> I have appended a patch which should fix the problem. Could you,
> please, (let) test?
>
> Great, I'll make them test & report (or test myself if needed).
FTR, this morning I've made a test with a cloned Linux kernel git repo
(66473 files). I've accessed the directory via "/ssh::".
(shell-command-to-string "git ls-files -zco --exclude-standard")
returned in a few seconds, with a line of 2479486 bytes.
> Btw, the latest Tramp release is always available via GNU ELPA.
>
> Ah, good to know!
I've committed the patch meanwhile. Tramp 2.4.3, scheduled for end of
the year, will contain it.
In case there's a problem, you still have some days to report :-)
> Thanks,
> Philippe
Best regards, Michael.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories
2019-12-13 11:56 ` Michael Albinus
@ 2019-12-13 17:38 ` Philippe Vaucher
2019-12-13 18:31 ` Michael Albinus
0 siblings, 1 reply; 8+ messages in thread
From: Philippe Vaucher @ 2019-12-13 17:38 UTC (permalink / raw)
To: Michael Albinus; +Cc: Emacs developers
[-- Attachment #1: Type: text/plain, Size: 740 bytes --]
>
> > I have appended a patch which should fix the problem. Could you,
> > please, (let) test?
> >
> > Great, I'll make them test & report (or test myself if needed).
>
> FTR, this morning I've made a test with a cloned Linux kernel git repo
> (66473 files). I've accessed the directory via "/ssh::".
> (shell-command-to-string "git ls-files -zco --exclude-standard")
> returned in a few seconds, with a line of 2479486 bytes.
>
Is that with or without the patch? Could you reproduce the problem without
the patch?
> I've committed the patch meanwhile. Tramp 2.4.3, scheduled for end of
> the year, will contain it.
>
> In case there's a problem, you still have some days to report :-)
>
Great, thanks!
Kind regards,
Philippe
[-- Attachment #2: Type: text/html, Size: 1264 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories
2019-12-13 17:38 ` Philippe Vaucher
@ 2019-12-13 18:31 ` Michael Albinus
2019-12-14 11:48 ` Philippe Vaucher
0 siblings, 1 reply; 8+ messages in thread
From: Michael Albinus @ 2019-12-13 18:31 UTC (permalink / raw)
To: Philippe Vaucher; +Cc: Emacs developers
Philippe Vaucher <philippe.vaucher@gmail.com> writes:
> FTR, this morning I've made a test with a cloned Linux kernel git
> repo (66473 files). I've accessed the directory via "/ssh::".
> (shell-command-to-string "git ls-files -zco --exclude-standard")
> returned in a few seconds, with a line of 2479486 bytes.
>
> Is that with or without the patch? Could you reproduce the problem
> without the patch?
That was with the patch. I reran the test again, this time over a slow
connection (using a multi-hop to savannah and back). With the patch, the
command returned after <10 secs. W/o the patch, I had to cancel the
command after a while.
> Kind regards,
> Philippe
Best regards, Michael.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories
2019-12-13 18:31 ` Michael Albinus
@ 2019-12-14 11:48 ` Philippe Vaucher
2019-12-15 9:06 ` Philippe Vaucher
0 siblings, 1 reply; 8+ messages in thread
From: Philippe Vaucher @ 2019-12-14 11:48 UTC (permalink / raw)
To: Michael Albinus; +Cc: Emacs developers
[-- Attachment #1: Type: text/plain, Size: 563 bytes --]
On Fri, Dec 13, 2019 at 7:31 PM Michael Albinus <michael.albinus@gmx.de>
wrote:
> > Is that with or without the patch? Could you reproduce the problem
> > without the patch?
>
> That was with the patch. I reran the test again, this time over a slow
> connection (using a multi-hop to savannah and back). With the patch, the
> command returned after <10 secs. W/o the patch, I had to cancel the
> command after a while.
A great, nice that you were able to reproduce.
Thank you very much, I'll report when I have news on the other side.
Kind regards,
Philippe
[-- Attachment #2: Type: text/html, Size: 984 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories
2019-12-14 11:48 ` Philippe Vaucher
@ 2019-12-15 9:06 ` Philippe Vaucher
0 siblings, 0 replies; 8+ messages in thread
From: Philippe Vaucher @ 2019-12-15 9:06 UTC (permalink / raw)
To: Michael Albinus; +Cc: Emacs developers
[-- Attachment #1: Type: text/plain, Size: 232 bytes --]
>
> Thank you very much, I'll report when I have news on the other side.
>
"It worked! Projectile was able to fetch the files within about 10 seconds,
and now the files appear to be cached."
All good then! Thanks a lot.
Philippe
[-- Attachment #2: Type: text/html, Size: 564 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-12-15 9:06 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-12-11 19:46 TRAMP problem with large repositories Philippe Vaucher
2019-12-12 13:35 ` Michael Albinus
2019-12-13 11:39 ` Philippe Vaucher
2019-12-13 11:56 ` Michael Albinus
2019-12-13 17:38 ` Philippe Vaucher
2019-12-13 18:31 ` Michael Albinus
2019-12-14 11:48 ` Philippe Vaucher
2019-12-15 9:06 ` Philippe Vaucher
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.