From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#56682: Fix the long lines font locking related slowdowns Date: Sun, 14 Aug 2022 23:46:13 +0300 Message-ID: References: <92da07bd02941d5537e9@heytings.org> <5308e3b5-a160-17d7-77ee-b1d00acfa20d@yandex.ru> <92da07bd02a6cc861e1a@heytings.org> <837d3lzv8n.fsf@gnu.org> <2c8d6755-cfe2-6559-3fde-3fa30ffb411e@yandex.ru> <83mtcgy44k.fsf@gnu.org> <83k07jx5jn.fsf@gnu.org> <866e510d-a060-7daa-d002-97861d056fa7@yandex.ru> <1144021660321893@iva5-64778ce1ba26.qloud-c.yandex.net> <12348081660379417@sas2-a098efd00d24.qloud-c.yandex.net> <66bbbb95983414e79637@heytings.org> <83wnbckp0q.fsf@gnu.org> <8e884ebe-2d2e-d599-15c3-a5cfe5e6b295@yandex.ru> <83o7wnl7ok.fsf@gnu.org> <036414cc-c711-efaf-ed5b-f8ccfaca0604@yandex.ru> <83v8qvj79c.fsf@gnu.org> <838rnqk8op.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38987"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Cc: 56682@debbugs.gnu.org, gregory@heytings.org, monnier@iro.umontreal.ca To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Aug 14 22:49:17 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oNKXt-0009um-6G for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 14 Aug 2022 22:49:17 +0200 Original-Received: from localhost ([::1]:60948 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oNKXr-0001mE-Vf for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 14 Aug 2022 16:49:16 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35682) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oNKVi-0008ES-Oj for bug-gnu-emacs@gnu.org; Sun, 14 Aug 2022 16:47:08 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:50219) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oNKVi-0005Jh-Ge for bug-gnu-emacs@gnu.org; Sun, 14 Aug 2022 16:47:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oNKVi-0003j0-AE for bug-gnu-emacs@gnu.org; Sun, 14 Aug 2022 16:47:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 14 Aug 2022 20:47:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56682 X-GNU-PR-Package: emacs Original-Received: via spool by 56682-submit@debbugs.gnu.org id=B56682.166050998614275 (code B ref 56682); Sun, 14 Aug 2022 20:47:02 +0000 Original-Received: (at 56682) by debbugs.gnu.org; 14 Aug 2022 20:46:26 +0000 Original-Received: from localhost ([127.0.0.1]:39968 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oNKV7-0003iA-LM for submit@debbugs.gnu.org; Sun, 14 Aug 2022 16:46:26 -0400 Original-Received: from mail-wm1-f53.google.com ([209.85.128.53]:52154) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oNKV4-0003hv-A3 for 56682@debbugs.gnu.org; Sun, 14 Aug 2022 16:46:23 -0400 Original-Received: by mail-wm1-f53.google.com with SMTP id ay12so3067273wmb.1 for <56682@debbugs.gnu.org>; Sun, 14 Aug 2022 13:46:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc; bh=T0ytR0+yTEaxO9fNuB6eEz6jMqnceWuKm9nJQ9CsFGE=; b=XRyGhJyYMQQO/QHrXzOGbF9iuJl1VBn5akaT/jeefQbnzA+nTEN17nwAPCV52zsHy0 xPAsdqsKB6QVY/hGIoZNXPxH/2yi3baoq2n6VGtyguq9b7K8WleZ67CYK523DeQg/F/w tXzlTFn6zZ3YFUpOTaYKgXSUc3Cuc5wYkE7wSRJi5y4HE4Pygor74zo4sB7dWEN7Y1sB W+GHK5UdMs3VxcCzy1wtDL5ptdg+02yyLzWQ+yfOr1lYijaRrfHJXBoUI6rqtZ+fY+ZI hwJnDRFbv4NwVt7ayIC6N7Mg3r0ejS9wk2WWEnLJs0hTXVgv+KCDPl9YUVVYeuGuDDbS nV5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc; bh=T0ytR0+yTEaxO9fNuB6eEz6jMqnceWuKm9nJQ9CsFGE=; b=T2zI798qWEH9IGXcKea33uT/hVSJ4ZixD8SrOZA2COrXXjuwYeJdH43Yx9tuXn03V8 DZ86hgdCN3lEFjK1M+UxsR0wW31G2woAQT5fTyuwWlIZd+FhnJmu8n41fUjpOGaZRBmv aCy1bmcBpxZ5QLzYS+VN47ZkGGTj+FoFT63hkq+vEu15Z+9P8VI+ibTopKcBBroAFUxC 0FKPVRHv837Qz6c4utfQFvFpdch2IwBMgGCbAuEdqlZOGz5Qtapv968KuGOV/RIAh6gx eZS5GHza2hNgjDaQQzkTixgfzr5Z+SVscR2EYH8mUpBA0KhdoGJBZgkaWg6eWgeGh/LU klcw== X-Gm-Message-State: ACgBeo0dXZhntM/lA3Hd/MPg/BQGwqv8FjkUesHqolxSqiWMm4b2AskD pnWUPccOs9bx6HW1OmpPUkw= X-Google-Smtp-Source: AA6agR6SLRAHP+eeU8gKOuD/t/ZHjODha6HE1MS5sYpqXfuahJBjp/+o9YIw/xOUmEjVbPPd3gv07g== X-Received: by 2002:a05:600c:1c93:b0:3a5:cceb:4d6f with SMTP id k19-20020a05600c1c9300b003a5cceb4d6fmr7693127wms.82.1660509976508; Sun, 14 Aug 2022 13:46:16 -0700 (PDT) Original-Received: from [192.168.0.6] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id n38-20020a05600c3ba600b003a5c067cbadsm7595648wms.35.2022.08.14.13.46.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 14 Aug 2022 13:46:15 -0700 (PDT) Content-Language: en-US In-Reply-To: <838rnqk8op.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:239700 Archived-At: On 14.08.2022 20:59, Eli Zaretskii wrote: >> Date: Sun, 14 Aug 2022 20:47:40 +0300 >> Cc: 56682@debbugs.gnu.org, gregory@heytings.org, monnier@iro.umontreal.ca >> From: Dmitry Gutov >> >>> The better way is to acknowledge that some inaccuracies are acceptable >>> in those cases. With that in mind, one can design a syntax analyzer >>> that looks back only a short ways, until it finds some place that >>> could reasonably serve as an anchor point for heuristic decisions >>> about whether we are inside or outside a string or comment, and then >>> verifying that guess with some telltale syntactic elements that follow >>> (like semi-colons or comment-end delimiters in C). While this kind of >>> heuristics can sometimes fail, if they only fail rarely, the result is >>> a huge win. >> >> You cannot design a language-agnostic syntax analyzer like that. > > _I_ cannot, but hopefully someone else will. That seems unlikely. Nothing's impossible, of course, but I wouldn't want to wait for such an invention to come up before we make the decision on how to proceed now. What _can_ be done is make syntax-ppss's cache invalidations more local by introducing a "repair" step. That would only speed up certain operations, at most, and the initial wait near EOB can't be avoided this way. >>> In any case, the way to speed up these cases is to look at the profile >>> and identify the code that is slowing us down; then attempt to make it >>> faster. (20 sec is actually long enough for us to interrupt Emacs >>> under a debugger and look at the backtrace to find the culprit.) >> >> I've profiled and benchmarked this scenario already: all of the delay >> (17 seconds, to be precise) come from parse-partial-sexp. 1 GB is a lot. > > Before we get to 1GB files, there are 20MB files and 250MB files. I > found quite a few low-hanging fruit in those that are worth plucking, > while we wait for parse-partial-sexp to get its act together. Definitely. But when the profiler output in a 1 GB file comes down to syntax-ppss only, that means the low-handing fruit has been picked. >>> If that solves the problems in a reasonable way for very long lines, >>> maybe we will eventually have such an option. >> >> Can I merge the branch, then? > > Please wait until I have time to review it. > >> I was hoping for a stylistic review, perhaps. Like, whether you like the >> name of the variable, and should it be split in two. >> >> A change of the default value(s) is on the table too. > > Will definitely do, I'm just busy with "other things" right now, most > of them related to other aspects of long lines. Roger that. >>> One such major mode and one such file was presented long ago : a >>> single-line XML file. >> >> XMl is indeed slower. It takes almost 3 seconds for me to scroll to the >> end of a 20 MB XML file. >> >> Most of it comes from sgml--syntax-propertize-ppss, which is probably >> justified: XML is a more complex language. > > Did you wait till nxml-mode did its initial scan and displayed "Valid" > in the mode line? The performance is quite different before and after > that. It takes a while to switch from "Validated: 0" to "Valid", but the performance seems about the same in both states. Maybe some other example file would show different behavior, IDK. >> But other than the initial delay, scrolling, and isearch, and local >> editing, all work fast, unlike the original situation with JSON. > > With which branch? scratch/font_lock_large_files, with 'emacs -Q' I've also run this test on master now, and M-> is not instant there either. Apparently, a fair amount time is also spent in nxml-extend-region (which calls sgml-syntax-propertize and syntax-ppss). Not sure why it would spend any significant time in either, though, if they're called inside a narrowing.