unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* TRAMP problem with large repositories
@ 2019-12-11 19:46 Philippe Vaucher
  2019-12-12 13:35 ` Michael Albinus
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Vaucher @ 2019-12-11 19:46 UTC (permalink / raw)
  To: Emacs developers; +Cc: Michael Albinus

[-- Attachment #1: Type: text/plain, Size: 2224 bytes --]

Hello,

Sorry if this is not the right place to post, feel free to redirect me as
needed.

While helping someone for a projectile issue (
https://github.com/bbatsov/projectile/issues/1480), it seems that when
`shell-command-to-string` tries to execute `git ls-files -zco
--exclude-standard` over TRAMP on a repository that has 85K files it takes
forever to complete.

Here's a stacktrace:

https://user-images.githubusercontent.com/81829/70549675-72b07f00-1b29-11ea-90f6-91fe0c36b0f4.png

We see that `tramp-wait-for-output` calls `tramp-wait-for-regexp` which
calls `tramp-check-for-regexp`, and when looking at the source:

(defun tramp-wait-for-output (proc &optional timeout)
  "Wait for output from remote command."
  (unless (buffer-live-p (process-buffer proc))
    (delete-process proc)
    (tramp-error proc 'file-error "Process `%s' not available, try again" proc))
  (with-current-buffer (process-buffer proc)
    (let* (;; Initially, `tramp-end-of-output' is "#$ ".  There might
	   ;; be leading escape sequences, which must be ignored.
	   ;; Busyboxes built with the EDITING_ASK_TERMINAL config
	   ;; option send also escape sequences, which must be
	   ;; ignored.
	   (regexp (format "[^#$\n]*%s\\(%s\\)?\r?$"
			   (regexp-quote tramp-end-of-output)
			   tramp-device-escape-sequence-regexp))
	   ;; Sometimes, the commands do not return a newline but a
	   ;; null byte before the shell prompt, for example "git
	   ;; ls-files -c -z ...".
	   (regexp1 (format "\\(^\\|\000\\)%s" regexp))
	   (found (tramp-wait-for-regexp proc timeout regexp1)))
      .... snip ...


My understanding is that it does a loop that reads a bit of what the
commands outputs then tries to parse end of lines (or '\0') and repeats
until the process died or that it found one. Because the command returns a
huge string (85K files), this process of read-regexp-repeat takes all the
CPU (compared to reading the whole chunk in one go and then trying to check
for the regexp).

My questions are the following:

   1. Did I understand the problem right? Is this something known?
   2. Is there something to be done about this? Or maybe it would it
   require too much refactoring / faster implementation?

Kind regards,
Philippe

[-- Attachment #2: Type: text/html, Size: 7583 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TRAMP problem with large repositories
  2019-12-11 19:46 TRAMP problem with large repositories Philippe Vaucher
@ 2019-12-12 13:35 ` Michael Albinus
  2019-12-13 11:39   ` Philippe Vaucher
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Albinus @ 2019-12-12 13:35 UTC (permalink / raw)
  To: Philippe Vaucher; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]

Philippe Vaucher <philippe.vaucher@gmail.com> writes:

> Hello,

Hi Philippe,

> While helping someone for a projectile issue
> (https://github.com/bbatsov/projectile/issues/1480), it seems that
> when `shell-command-to-string` tries to execute `git ls-files -zco -
> -exclude-standard` over TRAMP on a repository that has 85K files it
> takes forever to complete.
>
> We see that `tramp-wait-for-output` calls `tramp-wait-for-regexp`
> which calls `tramp-check-for-regexp`, and when looking at the source:
>
> My understanding is that it does a loop that reads a bit of what the
> commands outputs then tries to parse end of lines (or '\0') and
> repeats until the process died or that it found one. Because the
> command returns a huge string (85K files), this process of
> read-regexp-repeat takes all the CPU (compared to reading the whole
> chunk in one go and then trying to check for the regexp).
>
> My questions are the following:
>
> 1 Did I understand the problem right? Is this something known?

Yes, your analysis is right. And no, I haven't seen related reports yet.

> 2 Is there something to be done about this? Or maybe it would it
>   require too much refactoring / faster implementation?

I have appended a patch which should fix the problem. Could you, please,
(let) test?

Btw, the latest Tramp release is always available via GNU ELPA.

> Kind regards,
> Philippe

Best regards, Michael.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3505 bytes --]

diff --git a/lisp/tramp-sh.el b/lisp/tramp-sh.el
index 8de88d35..506c33df 100644
--- a/lisp/tramp-sh.el
+++ b/lisp/tramp-sh.el
@@ -5102,9 +5102,8 @@ function waits for output unless NOOUTPUT is set."
 	      (forward-line 1)
 	      (delete-region (point-min) (point)))
 	    ;; Delete the prompt.
-	    (goto-char (point-max))
-	    (re-search-backward regexp nil t)
-	    (delete-region (point) (point-max)))
+	    (when (tramp-search-regexp regexp)
+	      (delete-region (point) (point-max))))
 	(if timeout
 	    (tramp-error
 	     proc 'file-error
@@ -5134,8 +5133,7 @@ DONT-SUPPRESS-ERR is non-nil, stderr won't be sent to /dev/null."
 	   "echo tramp_exit_status $?"
 	   (if subshell " )" "")))
   (with-current-buffer (tramp-get-connection-buffer vec)
-    (goto-char (point-max))
-    (unless (re-search-backward "tramp_exit_status [0-9]+" nil t)
+    (unless (tramp-search-regexp "tramp_exit_status [0-9]+")
       (tramp-error
        vec 'file-error "Couldn't find exit status of `%s'" command))
     (skip-chars-forward "^ ")
diff --git a/lisp/tramp.el b/lisp/tramp.el
index 03e04568..e05e0965 100644
--- a/lisp/tramp.el
+++ b/lisp/tramp.el
@@ -4196,19 +4196,35 @@ for process communication also."
        (buffer-string))
       result)))

+(defun tramp-search-regexp (regexp)
+  "Search for REGEXP backwards, starting at point-max.
+If found, set point to the end of the occurrence found, and return point.
+Otherwise, return nil."
+  (goto-char (point-max))
+  ;; We restrict ourselves to the last 256 characters.  There were
+  ;; reports of 85kB output, which has blocked Tramp forever.
+  (re-search-backward regexp (max (point-min) (- (point) 256)) 'noerror))
+
 (defun tramp-check-for-regexp (proc regexp)
   "Check, whether REGEXP is contained in process buffer of PROC.
 Erase echoed commands if exists."
   (with-current-buffer (process-buffer proc)
     (goto-char (point-min))

-    ;; Check whether we need to remove echo output.
+    ;; Check whether we need to remove echo output.  The max length of
+    ;; the echo mark regexp is taken for search.  We restrict the
+    ;; search for the second echo mark to PIPE_BUF characters.
     (when (and (tramp-get-connection-property proc "check-remote-echo" nil)
-	       (re-search-forward tramp-echoed-echo-mark-regexp nil t))
+	       (re-search-forward
+		tramp-echoed-echo-mark-regexp
+		(+ (point) (* 5 tramp-echo-mark-marker-length)) t))
       (let ((begin (match-beginning 0)))
-	(when (re-search-forward tramp-echoed-echo-mark-regexp nil t)
+	(when
+	    (re-search-forward
+	     tramp-echoed-echo-mark-regexp
+	     (+ (point) (tramp-get-connection-property proc "pipe-buf" 4096)) t)
 	  ;; Discard echo from remote output.
-	  (tramp-set-connection-property proc "check-remote-echo" nil)
+	  (tramp-flush-connection-property proc "check-remote-echo")
 	  (tramp-message proc 5 "echo-mark found")
 	  (forward-line 1)
 	  (delete-region begin (point))
@@ -4229,8 +4245,7 @@ Erase echoed commands if exists."
       ;; overflow in regexp matcher".  For example, //DIRED// lines of
       ;; directory listings with some thousand files.  Therefore, we
       ;; look from the end.
-      (goto-char (point-max))
-      (ignore-errors (re-search-backward regexp nil t)))))
+      (tramp-search-regexp regexp))))

 (defun tramp-wait-for-regexp (proc timeout regexp)
   "Wait for a REGEXP to appear from process PROC within TIMEOUT seconds.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: TRAMP problem with large repositories
  2019-12-12 13:35 ` Michael Albinus
@ 2019-12-13 11:39   ` Philippe Vaucher
  2019-12-13 11:56     ` Michael Albinus
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Vaucher @ 2019-12-13 11:39 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 397 bytes --]

>
> > 2 Is there something to be done about this? Or maybe it would it
>
>   require too much refactoring / faster implementation?
>
> I have appended a patch which should fix the problem. Could you, please,
> (let) test?
>

Great, I'll make them test & report (or test myself if needed).



> Btw, the latest Tramp release is always available via GNU ELPA.
>

Ah, good to know!

Thanks,
Philippe

[-- Attachment #2: Type: text/html, Size: 985 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TRAMP problem with large repositories
  2019-12-13 11:39   ` Philippe Vaucher
@ 2019-12-13 11:56     ` Michael Albinus
  2019-12-13 17:38       ` Philippe Vaucher
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Albinus @ 2019-12-13 11:56 UTC (permalink / raw)
  To: Philippe Vaucher; +Cc: Emacs developers

Philippe Vaucher <philippe.vaucher@gmail.com> writes:

>     I have appended a patch which should fix the problem. Could you,
>     please, (let) test?
>
> Great, I'll make them test & report (or test myself if needed).

FTR, this morning I've made a test with a cloned Linux kernel git repo
(66473 files). I've accessed the directory via "/ssh::".
(shell-command-to-string "git ls-files -zco --exclude-standard")
returned in a few seconds, with a line of 2479486 bytes.

>     Btw, the latest Tramp release is always available via GNU ELPA.
>
> Ah, good to know!

I've committed the patch meanwhile. Tramp 2.4.3, scheduled for end of
the year, will contain it.

In case there's a problem, you still have some days to report :-)

> Thanks,
> Philippe

Best regards, Michael.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TRAMP problem with large repositories
  2019-12-13 11:56     ` Michael Albinus
@ 2019-12-13 17:38       ` Philippe Vaucher
  2019-12-13 18:31         ` Michael Albinus
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Vaucher @ 2019-12-13 17:38 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 740 bytes --]

>
> >     I have appended a patch which should fix the problem. Could you,
> >     please, (let) test?
> >
> > Great, I'll make them test & report (or test myself if needed).
>
> FTR, this morning I've made a test with a cloned Linux kernel git repo
> (66473 files). I've accessed the directory via "/ssh::".
> (shell-command-to-string "git ls-files -zco --exclude-standard")
> returned in a few seconds, with a line of 2479486 bytes.
>

Is that with or without the patch? Could you reproduce the problem without
the patch?



> I've committed the patch meanwhile. Tramp 2.4.3, scheduled for end of
> the year, will contain it.
>
> In case there's a problem, you still have some days to report :-)
>

Great, thanks!

Kind regards,
Philippe

[-- Attachment #2: Type: text/html, Size: 1264 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TRAMP problem with large repositories
  2019-12-13 17:38       ` Philippe Vaucher
@ 2019-12-13 18:31         ` Michael Albinus
  2019-12-14 11:48           ` Philippe Vaucher
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Albinus @ 2019-12-13 18:31 UTC (permalink / raw)
  To: Philippe Vaucher; +Cc: Emacs developers

Philippe Vaucher <philippe.vaucher@gmail.com> writes:

>     FTR, this morning I've made a test with a cloned Linux kernel git
>     repo (66473 files). I've accessed the directory via "/ssh::".
>     (shell-command-to-string "git ls-files -zco --exclude-standard")
>     returned in a few seconds, with a line of 2479486 bytes.
>
> Is that with or without the patch? Could you reproduce the problem
> without the patch?

That was with the patch. I reran the test again, this time over a slow
connection (using a multi-hop to savannah and back). With the patch, the
command returned after <10 secs. W/o the patch, I had to cancel the
command after a while.

> Kind regards,
> Philippe

Best regards, Michael.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TRAMP problem with large repositories
  2019-12-13 18:31         ` Michael Albinus
@ 2019-12-14 11:48           ` Philippe Vaucher
  2019-12-15  9:06             ` Philippe Vaucher
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Vaucher @ 2019-12-14 11:48 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

On Fri, Dec 13, 2019 at 7:31 PM Michael Albinus <michael.albinus@gmx.de>
wrote:

> > Is that with or without the patch? Could you reproduce the problem
> > without the patch?
>
> That was with the patch. I reran the test again, this time over a slow
> connection (using a multi-hop to savannah and back). With the patch, the
> command returned after <10 secs. W/o the patch, I had to cancel the
> command after a while.


A great, nice that you were able to reproduce.

Thank you very much, I'll report when I have news on the other side.

Kind regards,
Philippe

[-- Attachment #2: Type: text/html, Size: 984 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TRAMP problem with large repositories
  2019-12-14 11:48           ` Philippe Vaucher
@ 2019-12-15  9:06             ` Philippe Vaucher
  0 siblings, 0 replies; 8+ messages in thread
From: Philippe Vaucher @ 2019-12-15  9:06 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 232 bytes --]

>
> Thank you very much, I'll report when I have news on the other side.
>

"It worked! Projectile was able to fetch the files within about 10 seconds,
and now the files appear to be cached."

All good then! Thanks a lot.

Philippe

[-- Attachment #2: Type: text/html, Size: 564 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-12-15  9:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-12-11 19:46 TRAMP problem with large repositories Philippe Vaucher
2019-12-12 13:35 ` Michael Albinus
2019-12-13 11:39   ` Philippe Vaucher
2019-12-13 11:56     ` Michael Albinus
2019-12-13 17:38       ` Philippe Vaucher
2019-12-13 18:31         ` Michael Albinus
2019-12-14 11:48           ` Philippe Vaucher
2019-12-15  9:06             ` Philippe Vaucher

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).