unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Michael Albinus <michael.albinus@gmx.de>
To: Philippe Vaucher <philippe.vaucher@gmail.com>
Cc: Emacs developers <emacs-devel@gnu.org>
Subject: Re: TRAMP problem with large repositories
Date: Thu, 12 Dec 2019 14:35:50 +0100	[thread overview]
Message-ID: <877e313bux.fsf@gmx.de> (raw)
In-Reply-To: CAGK7Mr6ghyJ_MpOOg+bLB-P4zcm-i21SReNaW_u41nhR=o-etg@mail.gmail.com

[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]

Philippe Vaucher <philippe.vaucher@gmail.com> writes:

> Hello,

Hi Philippe,

> While helping someone for a projectile issue
> (https://github.com/bbatsov/projectile/issues/1480), it seems that
> when `shell-command-to-string` tries to execute `git ls-files -zco -
> -exclude-standard` over TRAMP on a repository that has 85K files it
> takes forever to complete.
>
> We see that `tramp-wait-for-output` calls `tramp-wait-for-regexp`
> which calls `tramp-check-for-regexp`, and when looking at the source:
>
> My understanding is that it does a loop that reads a bit of what the
> commands outputs then tries to parse end of lines (or '\0') and
> repeats until the process died or that it found one. Because the
> command returns a huge string (85K files), this process of
> read-regexp-repeat takes all the CPU (compared to reading the whole
> chunk in one go and then trying to check for the regexp).
>
> My questions are the following:
>
> 1 Did I understand the problem right? Is this something known?

Yes, your analysis is right. And no, I haven't seen related reports yet.

> 2 Is there something to be done about this? Or maybe it would it
>   require too much refactoring / faster implementation?

I have appended a patch which should fix the problem. Could you, please,
(let) test?

Btw, the latest Tramp release is always available via GNU ELPA.

> Kind regards,
> Philippe

Best regards, Michael.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3505 bytes --]

diff --git a/lisp/tramp-sh.el b/lisp/tramp-sh.el
index 8de88d35..506c33df 100644
--- a/lisp/tramp-sh.el
+++ b/lisp/tramp-sh.el
@@ -5102,9 +5102,8 @@ function waits for output unless NOOUTPUT is set."
 	      (forward-line 1)
 	      (delete-region (point-min) (point)))
 	    ;; Delete the prompt.
-	    (goto-char (point-max))
-	    (re-search-backward regexp nil t)
-	    (delete-region (point) (point-max)))
+	    (when (tramp-search-regexp regexp)
+	      (delete-region (point) (point-max))))
 	(if timeout
 	    (tramp-error
 	     proc 'file-error
@@ -5134,8 +5133,7 @@ DONT-SUPPRESS-ERR is non-nil, stderr won't be sent to /dev/null."
 	   "echo tramp_exit_status $?"
 	   (if subshell " )" "")))
   (with-current-buffer (tramp-get-connection-buffer vec)
-    (goto-char (point-max))
-    (unless (re-search-backward "tramp_exit_status [0-9]+" nil t)
+    (unless (tramp-search-regexp "tramp_exit_status [0-9]+")
       (tramp-error
        vec 'file-error "Couldn't find exit status of `%s'" command))
     (skip-chars-forward "^ ")
diff --git a/lisp/tramp.el b/lisp/tramp.el
index 03e04568..e05e0965 100644
--- a/lisp/tramp.el
+++ b/lisp/tramp.el
@@ -4196,19 +4196,35 @@ for process communication also."
        (buffer-string))
       result)))

+(defun tramp-search-regexp (regexp)
+  "Search for REGEXP backwards, starting at point-max.
+If found, set point to the end of the occurrence found, and return point.
+Otherwise, return nil."
+  (goto-char (point-max))
+  ;; We restrict ourselves to the last 256 characters.  There were
+  ;; reports of 85kB output, which has blocked Tramp forever.
+  (re-search-backward regexp (max (point-min) (- (point) 256)) 'noerror))
+
 (defun tramp-check-for-regexp (proc regexp)
   "Check, whether REGEXP is contained in process buffer of PROC.
 Erase echoed commands if exists."
   (with-current-buffer (process-buffer proc)
     (goto-char (point-min))

-    ;; Check whether we need to remove echo output.
+    ;; Check whether we need to remove echo output.  The max length of
+    ;; the echo mark regexp is taken for search.  We restrict the
+    ;; search for the second echo mark to PIPE_BUF characters.
     (when (and (tramp-get-connection-property proc "check-remote-echo" nil)
-	       (re-search-forward tramp-echoed-echo-mark-regexp nil t))
+	       (re-search-forward
+		tramp-echoed-echo-mark-regexp
+		(+ (point) (* 5 tramp-echo-mark-marker-length)) t))
       (let ((begin (match-beginning 0)))
-	(when (re-search-forward tramp-echoed-echo-mark-regexp nil t)
+	(when
+	    (re-search-forward
+	     tramp-echoed-echo-mark-regexp
+	     (+ (point) (tramp-get-connection-property proc "pipe-buf" 4096)) t)
 	  ;; Discard echo from remote output.
-	  (tramp-set-connection-property proc "check-remote-echo" nil)
+	  (tramp-flush-connection-property proc "check-remote-echo")
 	  (tramp-message proc 5 "echo-mark found")
 	  (forward-line 1)
 	  (delete-region begin (point))
@@ -4229,8 +4245,7 @@ Erase echoed commands if exists."
       ;; overflow in regexp matcher".  For example, //DIRED// lines of
       ;; directory listings with some thousand files.  Therefore, we
       ;; look from the end.
-      (goto-char (point-max))
-      (ignore-errors (re-search-backward regexp nil t)))))
+      (tramp-search-regexp regexp))))

 (defun tramp-wait-for-regexp (proc timeout regexp)
   "Wait for a REGEXP to appear from process PROC within TIMEOUT seconds.

  reply	other threads:[~2019-12-12 13:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-11 19:46 TRAMP problem with large repositories Philippe Vaucher
2019-12-12 13:35 ` Michael Albinus [this message]
2019-12-13 11:39   ` Philippe Vaucher
2019-12-13 11:56     ` Michael Albinus
2019-12-13 17:38       ` Philippe Vaucher
2019-12-13 18:31         ` Michael Albinus
2019-12-14 11:48           ` Philippe Vaucher
2019-12-15  9:06             ` Philippe Vaucher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877e313bux.fsf@gmx.de \
    --to=michael.albinus@gmx.de \
    --cc=emacs-devel@gnu.org \
    --cc=philippe.vaucher@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).