From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Karthik Chikmagalur <karthikchikmagalur@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Understanding filter function calls
Date: Fri, 28 Jul 2023 14:42:47 -0700
Message-ID: <87cz0b1zmw.fsf@gmail.com>
References: <87y1j5vp3y.fsf@gmail.com>
 <96ad3a89-a0bd-6f8a-6251-d3f2f201e4f7@vodafonemail.de>
 <87pm4d12rc.fsf@gmail.com> <835y64lhd8.fsf@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="19658"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: jschmidt4gnu@vodafonemail.de, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jul 29 00:20:40 2023
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1qPVp9-0004yn-Uo
	for ged-emacs-devel@m.gmane-mx.org; Sat, 29 Jul 2023 00:20:40 +0200
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces@gnu.org>)
	id 1qPVEd-00027J-5k; Fri, 28 Jul 2023 17:42:55 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <karthikchikmagalur@gmail.com>)
 id 1qPVEb-00026S-Ai
 for emacs-devel@gnu.org; Fri, 28 Jul 2023 17:42:53 -0400
Original-Received: from mail-pl1-x62b.google.com ([2607:f8b0:4864:20::62b])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <karthikchikmagalur@gmail.com>)
 id 1qPVEZ-0008NH-AV; Fri, 28 Jul 2023 17:42:53 -0400
Original-Received: by mail-pl1-x62b.google.com with SMTP id
 d9443c01a7336-1bb8e45185bso16798475ad.1; 
 Fri, 28 Jul 2023 14:42:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20221208; t=1690580568; x=1691185368;
 h=mime-version:message-id:date:references:in-reply-to:subject:cc:to
 :from:from:to:cc:subject:date:message-id:reply-to;
 bh=tHPWe5zGvrZLGrR4n//zxnTgFxmX5BqKjie1uDJM2Z0=;
 b=dKc6EQge4CU+mR7oeMJfesQmO1IkcxohwPvguhUCgJrubuF+zXKAMWJlWyz0v3WgOC
 oOfqnJSB4CPc+RkszyxvnRj5Vnv//G0WCM5cn5vZbV2xR1aBBp9p+dpaGpb4ujpfr+H3
 bpw5NTc+KoVpBPwNzd1z1sZC8MOAWBvslNCwANdwFLwHXwptdU0LjYAR4ROy1AIBaSsh
 jgvXMbBDbg12371Ph/ZotdCCJNlr2e3q0N//i3FWB8+lo/KKfeoRun1v08HPcXqcLXPc
 FTuiHQA4d/IpaSUPLmR9H8u68geqLqomerCn1NSvB9XM9kGITXDiq/r+SGg2fjphZ2m3
 4OFw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20221208; t=1690580568; x=1691185368;
 h=mime-version:message-id:date:references:in-reply-to:subject:cc:to
 :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
 bh=tHPWe5zGvrZLGrR4n//zxnTgFxmX5BqKjie1uDJM2Z0=;
 b=KO9LJGz+JRh5/Y2hji5+Do+nYz5wNB9r1Y2C261nLmIp5TbdGrAUNWflP0qng5DLTm
 414To/pnR6KK5O5nDXejoHFqjwYVR1wfXom8f1HLPPutZooiUHO7/dmjd+GHly3iKAV5
 whbkaHhuYeJJmLyjciQQy3ehu7S9ApkeuyBD8yF+tA7l8IwUqViMvGLnKMneouYYfFRX
 hGPLazeVTH/8JuZURPsCMM/oG/Sb2qDztzNRfdeXTNA29jCfigTPzvZ25j4xHGfHkRqs
 qAz304L1pB078nhjydZWphlRvLjyDOF9WsatidFAFcH+A2LSRIqa40UmsL7zeCIq0AxO
 RXqw==
X-Gm-Message-State: ABy/qLaNZ5GHfMIfELgBNfcOGnqdBUHCA2s3CRP7IFywSFr7OOzOdayw
 okPh06siwwUElDeSwOcLAWvMy6WE9HE=
X-Google-Smtp-Source: APBJJlGmzdsrLTXqgMLxMjXBOh6bSKYBW0tkEza+7L5q30ZTC5zFwZXDbAVw6BFUAxa2TIreZcd7AQ==
X-Received: by 2002:a17:902:ce8f:b0:1b9:e591:db38 with SMTP id
 f15-20020a170902ce8f00b001b9e591db38mr2747376plg.8.1690580568501; 
 Fri, 28 Jul 2023 14:42:48 -0700 (PDT)
Original-Received: from localhost ([169.231.45.234]) by smtp.gmail.com with ESMTPSA id
 21-20020a170902c21500b001b850c9af71sm4011838pll.285.2023.07.28.14.42.47
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 28 Jul 2023 14:42:47 -0700 (PDT)
In-Reply-To: <835y64lhd8.fsf@gnu.org>
Received-SPF: pass client-ip=2607:f8b0:4864:20::62b;
 envelope-from=karthikchikmagalur@gmail.com; helo=mail-pl1-x62b.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.emacs.devel:308194
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/308194>

Thank you for the suggestions -- this problem is resolved! (See below)

But I still have a few questions about Emacs' async subprocess API.
(Included at the end)

> I think it isn't that dvisvgm is waiting for Emacs, it's that the
> whole process of reading output by Emacs and processing that output
> takes longer.

This is what I assumed at first too, but I made the filter function
a noop and still observed this behavior.

>> Using a larger stream buffer (if possible) should fix this issue.
> 
> You already tried that, AFAIU, and it didn't help.

OK, I didn't realize this is what `read-process-output-max' sets.  It
makes sense now.

>> 2.  Enlarge the buffer or "pipe" connecting dvisvgm to Emacs.  This
>> stream buffer appears to be set to 4KB.  Since dvisvgm produces far more
>> output (to stdout) than this between two successive instances of Emacs
>> accepting process output, widening the pipe should relieve this
>> pressure.  I tried tweaking `read-process-output-max' but this doesn't
>> help.
>
> Which probably means that each time we get to check for subprocess
> output, there's less than 4KB of stuff in the pipe ready to be read?
> Did you look at the amount of bytes we read each time?  How many bytes
> do we read, and does this number change if you change the value of
> read-process-output-max?

In the following description,
- `process-adaptive-read-buffering' is set to t
- `read-process-output-max' is set to 65,536

1.  I logged the length of the string that is passed to the filter
function on each invocation.  On the first 3-4 calls, the length is
variable, from 1 to 2000.  In the remaining 30 calls, the length is
almost always 4095. 

2. There are rare exceptions to this, when the length jumps up to
20,000.  This is usually for a single filter call out of ~35.

3. There is no change to the above behavior when I change
`read-process-output-max', although I didn't set it below 4096.

> Also, did you try setting process-adaptive-read-buffering to nil?

Setting it to nil essentially fixes the problem!  The filter function is
now called 80+ times instead of 35 times, the string it's called with
each time is of variable length, generally under 500 characters, but
overall the process is much, much faster.

Total preview time:
| process-adaptive-read-buffering | t         | nil       |
|---------------------------------+-----------+-----------|
| TeXLive 2022                    | 2.65 secs | 1.88 secs |
| TeXLive 2023                    | 4.03 secs | 1.80 secs |

We get a huge speed-up, up to 120%.

>> 3.  Get dvisvgm to generate less verbose output.  Unfortunately this is
>> not configurable at the level of granularity that we need.  We can't
>> turn it off completely either since we rely on certain strings in the
>> process stdout to update the LaTeX preview state in Org buffers.
>> 
>> Any ideas on how to avoid this throttling would be appreciated.
>
> So once again a problem generated by an external program, which cannot
> be configured by that program, is blamed on Emacs?  Is this fair?

Eli, I apologize for giving the impression that I'm blaming this on
Emacs.  Your phrasing ("once again") suggests that there's more to this
than I'm aware of.  My only goal here is to provide the most responsive
user experience possible when writing LaTeX math in Org mode, under the
constraint that this needs to be done by gluing together two (actually
three) processes.  I care about where the bottleneck is -- or was --
only so that I may address it.

> My take of this is not that Emacs is "throttling" dvisvgm, but that
> dvisvgm is "flooding" Emacs with useless data, and cannot be told to
> shut up.

This is a fair take, but as it turns out from the new benchmarks, the
total preview time is the same for the two dvisvgm versions once
`process-adaptive-read-buffering' is set to nil.

> AFAICT, the data it outputs is completely useless in
> production use cases, and (with the possible exception of the last
> line, which shows the output file name) is basically debug-level info.

The filter function needs to read the output file name for the purposes
of updating LaTeX previews in the buffer.  The rest is superfluous to us
-- but it's possible that the sizing information is useful to other
applications.

> Did you try to take this problem up with the dvisvgm developers?

I was planning to, but it looks like TeXLive 2023 is actually slightly
faster now, so the extra stdout makes no difference.

> And in any case, we are talking about 0.4 sec delay wrt the older
> version of TeXLive, right?  Is this really large enough to worry
> about?

The delay was about 40%, which could be 1+ second with lots of LaTeX
fragments, as in my original benchmark.  Moreover, even 200ms makes a
noticeable difference when

1. opening Org files that have the latexpreview startup option set,

2. live-previewing equations with a quick feedback loop: See
https://tinyurl.com/ms2ksthc

3. and when tweaking equation numbering etc, which can cause all
subsequent LaTeX fragments in the buffer to have their previews
regenerated.

>> 1.  Reduce the duration between successive calls of the filter function.
>> Is this configurable in Emacs?  I don't see anything relevant in the
>> manual sections on accepting output from processes or filter functions.
>
> This is not configurable for the simple reason that Emacs checks for
> subprocess output every time its main loop gets to that, so basically
> Emacs does that as fast as possible, assuming it is idle, i.e. no
> other command or timer runs.

Because of `process-adaptive-read-buffering', I'm not sure this is all
there is to it, because I wouldn't expect a difference in how long the
preview generation run takes otherwise.  Am I correct?  The
documentation for this variable states:

"If non-nil, improve receive buffering by delaying after short reads."

----

This brings me to the question of what is going on.  My initial
assessment -- that dvisvgm is waiting on Emacs to clear the output
stream buffer -- is wrong, since changing `read-process-output-max'
doesn't change how long the run takes, or how much data the filter
function is called with each time.

1. So why is the run faster with `process-adaptive-read-buffering' set
to nil?  I understand that this reduces read latency, but not why the
process is so much faster overall.

2. What is the performance implication of setting
`process-adaptive-read-buffering' to nil?

The documentation for this variable mentions:

> On some systems, when Emacs reads the output from a subprocess, the
> output data is read in very small blocks, potentially resulting in very
> poor performance.

But I was able to use Emacs normally (typing, calling M-x etc) when the
previews were being updated, which is great.  So I'm not sure what the
performance implications are here.

Karthik