* TRAMP problem with large repositories @ 2019-12-11 19:46 Philippe Vaucher 2019-12-12 13:35 ` Michael Albinus 0 siblings, 1 reply; 8+ messages in thread From: Philippe Vaucher @ 2019-12-11 19:46 UTC (permalink / raw) To: Emacs developers; +Cc: Michael Albinus [-- Attachment #1: Type: text/plain, Size: 2224 bytes --] Hello, Sorry if this is not the right place to post, feel free to redirect me as needed. While helping someone for a projectile issue ( https://github.com/bbatsov/projectile/issues/1480), it seems that when `shell-command-to-string` tries to execute `git ls-files -zco --exclude-standard` over TRAMP on a repository that has 85K files it takes forever to complete. Here's a stacktrace: https://user-images.githubusercontent.com/81829/70549675-72b07f00-1b29-11ea-90f6-91fe0c36b0f4.png We see that `tramp-wait-for-output` calls `tramp-wait-for-regexp` which calls `tramp-check-for-regexp`, and when looking at the source: (defun tramp-wait-for-output (proc &optional timeout) "Wait for output from remote command." (unless (buffer-live-p (process-buffer proc)) (delete-process proc) (tramp-error proc 'file-error "Process `%s' not available, try again" proc)) (with-current-buffer (process-buffer proc) (let* (;; Initially, `tramp-end-of-output' is "#$ ". There might ;; be leading escape sequences, which must be ignored. ;; Busyboxes built with the EDITING_ASK_TERMINAL config ;; option send also escape sequences, which must be ;; ignored. (regexp (format "[^#$\n]*%s\\(%s\\)?\r?$" (regexp-quote tramp-end-of-output) tramp-device-escape-sequence-regexp)) ;; Sometimes, the commands do not return a newline but a ;; null byte before the shell prompt, for example "git ;; ls-files -c -z ...". (regexp1 (format "\\(^\\|\000\\)%s" regexp)) (found (tramp-wait-for-regexp proc timeout regexp1))) .... snip ... My understanding is that it does a loop that reads a bit of what the commands outputs then tries to parse end of lines (or '\0') and repeats until the process died or that it found one. Because the command returns a huge string (85K files), this process of read-regexp-repeat takes all the CPU (compared to reading the whole chunk in one go and then trying to check for the regexp). My questions are the following: 1. Did I understand the problem right? Is this something known? 2. Is there something to be done about this? Or maybe it would it require too much refactoring / faster implementation? Kind regards, Philippe [-- Attachment #2: Type: text/html, Size: 7583 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories 2019-12-11 19:46 TRAMP problem with large repositories Philippe Vaucher @ 2019-12-12 13:35 ` Michael Albinus 2019-12-13 11:39 ` Philippe Vaucher 0 siblings, 1 reply; 8+ messages in thread From: Michael Albinus @ 2019-12-12 13:35 UTC (permalink / raw) To: Philippe Vaucher; +Cc: Emacs developers [-- Attachment #1: Type: text/plain, Size: 1411 bytes --] Philippe Vaucher <philippe.vaucher@gmail.com> writes: > Hello, Hi Philippe, > While helping someone for a projectile issue > (https://github.com/bbatsov/projectile/issues/1480), it seems that > when `shell-command-to-string` tries to execute `git ls-files -zco - > -exclude-standard` over TRAMP on a repository that has 85K files it > takes forever to complete. > > We see that `tramp-wait-for-output` calls `tramp-wait-for-regexp` > which calls `tramp-check-for-regexp`, and when looking at the source: > > My understanding is that it does a loop that reads a bit of what the > commands outputs then tries to parse end of lines (or '\0') and > repeats until the process died or that it found one. Because the > command returns a huge string (85K files), this process of > read-regexp-repeat takes all the CPU (compared to reading the whole > chunk in one go and then trying to check for the regexp). > > My questions are the following: > > 1 Did I understand the problem right? Is this something known? Yes, your analysis is right. And no, I haven't seen related reports yet. > 2 Is there something to be done about this? Or maybe it would it > require too much refactoring / faster implementation? I have appended a patch which should fix the problem. Could you, please, (let) test? Btw, the latest Tramp release is always available via GNU ELPA. > Kind regards, > Philippe Best regards, Michael. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 3505 bytes --] diff --git a/lisp/tramp-sh.el b/lisp/tramp-sh.el index 8de88d35..506c33df 100644 --- a/lisp/tramp-sh.el +++ b/lisp/tramp-sh.el @@ -5102,9 +5102,8 @@ function waits for output unless NOOUTPUT is set." (forward-line 1) (delete-region (point-min) (point))) ;; Delete the prompt. - (goto-char (point-max)) - (re-search-backward regexp nil t) - (delete-region (point) (point-max))) + (when (tramp-search-regexp regexp) + (delete-region (point) (point-max)))) (if timeout (tramp-error proc 'file-error @@ -5134,8 +5133,7 @@ DONT-SUPPRESS-ERR is non-nil, stderr won't be sent to /dev/null." "echo tramp_exit_status $?" (if subshell " )" ""))) (with-current-buffer (tramp-get-connection-buffer vec) - (goto-char (point-max)) - (unless (re-search-backward "tramp_exit_status [0-9]+" nil t) + (unless (tramp-search-regexp "tramp_exit_status [0-9]+") (tramp-error vec 'file-error "Couldn't find exit status of `%s'" command)) (skip-chars-forward "^ ") diff --git a/lisp/tramp.el b/lisp/tramp.el index 03e04568..e05e0965 100644 --- a/lisp/tramp.el +++ b/lisp/tramp.el @@ -4196,19 +4196,35 @@ for process communication also." (buffer-string)) result))) +(defun tramp-search-regexp (regexp) + "Search for REGEXP backwards, starting at point-max. +If found, set point to the end of the occurrence found, and return point. +Otherwise, return nil." + (goto-char (point-max)) + ;; We restrict ourselves to the last 256 characters. There were + ;; reports of 85kB output, which has blocked Tramp forever. + (re-search-backward regexp (max (point-min) (- (point) 256)) 'noerror)) + (defun tramp-check-for-regexp (proc regexp) "Check, whether REGEXP is contained in process buffer of PROC. Erase echoed commands if exists." (with-current-buffer (process-buffer proc) (goto-char (point-min)) - ;; Check whether we need to remove echo output. + ;; Check whether we need to remove echo output. The max length of + ;; the echo mark regexp is taken for search. We restrict the + ;; search for the second echo mark to PIPE_BUF characters. (when (and (tramp-get-connection-property proc "check-remote-echo" nil) - (re-search-forward tramp-echoed-echo-mark-regexp nil t)) + (re-search-forward + tramp-echoed-echo-mark-regexp + (+ (point) (* 5 tramp-echo-mark-marker-length)) t)) (let ((begin (match-beginning 0))) - (when (re-search-forward tramp-echoed-echo-mark-regexp nil t) + (when + (re-search-forward + tramp-echoed-echo-mark-regexp + (+ (point) (tramp-get-connection-property proc "pipe-buf" 4096)) t) ;; Discard echo from remote output. - (tramp-set-connection-property proc "check-remote-echo" nil) + (tramp-flush-connection-property proc "check-remote-echo") (tramp-message proc 5 "echo-mark found") (forward-line 1) (delete-region begin (point)) @@ -4229,8 +4245,7 @@ Erase echoed commands if exists." ;; overflow in regexp matcher". For example, //DIRED// lines of ;; directory listings with some thousand files. Therefore, we ;; look from the end. - (goto-char (point-max)) - (ignore-errors (re-search-backward regexp nil t))))) + (tramp-search-regexp regexp)))) (defun tramp-wait-for-regexp (proc timeout regexp) "Wait for a REGEXP to appear from process PROC within TIMEOUT seconds. ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories 2019-12-12 13:35 ` Michael Albinus @ 2019-12-13 11:39 ` Philippe Vaucher 2019-12-13 11:56 ` Michael Albinus 0 siblings, 1 reply; 8+ messages in thread From: Philippe Vaucher @ 2019-12-13 11:39 UTC (permalink / raw) To: Michael Albinus; +Cc: Emacs developers [-- Attachment #1: Type: text/plain, Size: 397 bytes --] > > > 2 Is there something to be done about this? Or maybe it would it > > require too much refactoring / faster implementation? > > I have appended a patch which should fix the problem. Could you, please, > (let) test? > Great, I'll make them test & report (or test myself if needed). > Btw, the latest Tramp release is always available via GNU ELPA. > Ah, good to know! Thanks, Philippe [-- Attachment #2: Type: text/html, Size: 985 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories 2019-12-13 11:39 ` Philippe Vaucher @ 2019-12-13 11:56 ` Michael Albinus 2019-12-13 17:38 ` Philippe Vaucher 0 siblings, 1 reply; 8+ messages in thread From: Michael Albinus @ 2019-12-13 11:56 UTC (permalink / raw) To: Philippe Vaucher; +Cc: Emacs developers Philippe Vaucher <philippe.vaucher@gmail.com> writes: > I have appended a patch which should fix the problem. Could you, > please, (let) test? > > Great, I'll make them test & report (or test myself if needed). FTR, this morning I've made a test with a cloned Linux kernel git repo (66473 files). I've accessed the directory via "/ssh::". (shell-command-to-string "git ls-files -zco --exclude-standard") returned in a few seconds, with a line of 2479486 bytes. > Btw, the latest Tramp release is always available via GNU ELPA. > > Ah, good to know! I've committed the patch meanwhile. Tramp 2.4.3, scheduled for end of the year, will contain it. In case there's a problem, you still have some days to report :-) > Thanks, > Philippe Best regards, Michael. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories 2019-12-13 11:56 ` Michael Albinus @ 2019-12-13 17:38 ` Philippe Vaucher 2019-12-13 18:31 ` Michael Albinus 0 siblings, 1 reply; 8+ messages in thread From: Philippe Vaucher @ 2019-12-13 17:38 UTC (permalink / raw) To: Michael Albinus; +Cc: Emacs developers [-- Attachment #1: Type: text/plain, Size: 740 bytes --] > > > I have appended a patch which should fix the problem. Could you, > > please, (let) test? > > > > Great, I'll make them test & report (or test myself if needed). > > FTR, this morning I've made a test with a cloned Linux kernel git repo > (66473 files). I've accessed the directory via "/ssh::". > (shell-command-to-string "git ls-files -zco --exclude-standard") > returned in a few seconds, with a line of 2479486 bytes. > Is that with or without the patch? Could you reproduce the problem without the patch? > I've committed the patch meanwhile. Tramp 2.4.3, scheduled for end of > the year, will contain it. > > In case there's a problem, you still have some days to report :-) > Great, thanks! Kind regards, Philippe [-- Attachment #2: Type: text/html, Size: 1264 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories 2019-12-13 17:38 ` Philippe Vaucher @ 2019-12-13 18:31 ` Michael Albinus 2019-12-14 11:48 ` Philippe Vaucher 0 siblings, 1 reply; 8+ messages in thread From: Michael Albinus @ 2019-12-13 18:31 UTC (permalink / raw) To: Philippe Vaucher; +Cc: Emacs developers Philippe Vaucher <philippe.vaucher@gmail.com> writes: > FTR, this morning I've made a test with a cloned Linux kernel git > repo (66473 files). I've accessed the directory via "/ssh::". > (shell-command-to-string "git ls-files -zco --exclude-standard") > returned in a few seconds, with a line of 2479486 bytes. > > Is that with or without the patch? Could you reproduce the problem > without the patch? That was with the patch. I reran the test again, this time over a slow connection (using a multi-hop to savannah and back). With the patch, the command returned after <10 secs. W/o the patch, I had to cancel the command after a while. > Kind regards, > Philippe Best regards, Michael. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories 2019-12-13 18:31 ` Michael Albinus @ 2019-12-14 11:48 ` Philippe Vaucher 2019-12-15 9:06 ` Philippe Vaucher 0 siblings, 1 reply; 8+ messages in thread From: Philippe Vaucher @ 2019-12-14 11:48 UTC (permalink / raw) To: Michael Albinus; +Cc: Emacs developers [-- Attachment #1: Type: text/plain, Size: 563 bytes --] On Fri, Dec 13, 2019 at 7:31 PM Michael Albinus <michael.albinus@gmx.de> wrote: > > Is that with or without the patch? Could you reproduce the problem > > without the patch? > > That was with the patch. I reran the test again, this time over a slow > connection (using a multi-hop to savannah and back). With the patch, the > command returned after <10 secs. W/o the patch, I had to cancel the > command after a while. A great, nice that you were able to reproduce. Thank you very much, I'll report when I have news on the other side. Kind regards, Philippe [-- Attachment #2: Type: text/html, Size: 984 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TRAMP problem with large repositories 2019-12-14 11:48 ` Philippe Vaucher @ 2019-12-15 9:06 ` Philippe Vaucher 0 siblings, 0 replies; 8+ messages in thread From: Philippe Vaucher @ 2019-12-15 9:06 UTC (permalink / raw) To: Michael Albinus; +Cc: Emacs developers [-- Attachment #1: Type: text/plain, Size: 232 bytes --] > > Thank you very much, I'll report when I have news on the other side. > "It worked! Projectile was able to fetch the files within about 10 seconds, and now the files appear to be cached." All good then! Thanks a lot. Philippe [-- Attachment #2: Type: text/html, Size: 564 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-12-15 9:06 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-12-11 19:46 TRAMP problem with large repositories Philippe Vaucher 2019-12-12 13:35 ` Michael Albinus 2019-12-13 11:39 ` Philippe Vaucher 2019-12-13 11:56 ` Michael Albinus 2019-12-13 17:38 ` Philippe Vaucher 2019-12-13 18:31 ` Michael Albinus 2019-12-14 11:48 ` Philippe Vaucher 2019-12-15 9:06 ` Philippe Vaucher
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).