From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#56682: Fix the long lines font locking related slowdowns Date: Sun, 14 Aug 2022 20:47:40 +0300 Message-ID: References: <92da07bd02941d5537e9@heytings.org> <5308e3b5-a160-17d7-77ee-b1d00acfa20d@yandex.ru> <92da07bd02a6cc861e1a@heytings.org> <837d3lzv8n.fsf@gnu.org> <2c8d6755-cfe2-6559-3fde-3fa30ffb411e@yandex.ru> <83mtcgy44k.fsf@gnu.org> <83k07jx5jn.fsf@gnu.org> <866e510d-a060-7daa-d002-97861d056fa7@yandex.ru> <1144021660321893@iva5-64778ce1ba26.qloud-c.yandex.net> <12348081660379417@sas2-a098efd00d24.qloud-c.yandex.net> <66bbbb95983414e79637@heytings.org> <83wnbckp0q.fsf@gnu.org> <8e884ebe-2d2e-d599-15c3-a5cfe5e6b295@yandex.ru> <83o7wnl7ok.fsf@gnu.org> <036414cc-c711-efaf-ed5b-f8ccfaca0604@yandex.ru> <83v8qvj79c.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18879"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Cc: 56682@debbugs.gnu.org, gregory@heytings.org, monnier@iro.umontreal.ca To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Aug 14 19:48:11 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oNHic-0004jw-Gq for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 14 Aug 2022 19:48:10 +0200 Original-Received: from localhost ([::1]:33522 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oNHia-0005KB-Qq for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 14 Aug 2022 13:48:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43030) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oNHiU-0005Jz-Ee for bug-gnu-emacs@gnu.org; Sun, 14 Aug 2022 13:48:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:50079) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oNHiU-000692-5H for bug-gnu-emacs@gnu.org; Sun, 14 Aug 2022 13:48:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oNHiT-0007hP-Un for bug-gnu-emacs@gnu.org; Sun, 14 Aug 2022 13:48:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 14 Aug 2022 17:48:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56682 X-GNU-PR-Package: emacs Original-Received: via spool by 56682-submit@debbugs.gnu.org id=B56682.166049927329540 (code B ref 56682); Sun, 14 Aug 2022 17:48:01 +0000 Original-Received: (at 56682) by debbugs.gnu.org; 14 Aug 2022 17:47:53 +0000 Original-Received: from localhost ([127.0.0.1]:39828 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oNHiK-0007gL-Cv for submit@debbugs.gnu.org; Sun, 14 Aug 2022 13:47:52 -0400 Original-Received: from mail-wm1-f42.google.com ([209.85.128.42]:53231) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oNHiH-0007fO-3m for 56682@debbugs.gnu.org; Sun, 14 Aug 2022 13:47:51 -0400 Original-Received: by mail-wm1-f42.google.com with SMTP id k17so2930211wmr.2 for <56682@debbugs.gnu.org>; Sun, 14 Aug 2022 10:47:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc; bh=kj6wikokSK01VdPDZj7CzIr86bnE4wbyFkKjU8ip6AY=; b=HKfGvk+Au1To3HNn9Ulrg4kR6AGk1gZWoggHIw5s2b/GXhLKKDioiC4c1RUi4qAtMf DlODL276Bh5SIvIIU1Is2dAXSSXrvEGWWN77AB/lsB3yikSntr/7VyyW2Hw1gU6kIckM muCDEG7LANTW2XZpaNStAzFI1CK8+sPWlQb3Xm9uIBkD7psoml8lovJZayrTdoFzOwQU fcMSnF1tnwA/1/BkjxR3carCmmZMGlGpC3eq+BC22EIM2U61zkbMwCoKXCAa7iCR6mTS FHeLqre63xyegsUPouqTXmvNF+oZ6WMEe+0DSbJQk2liAeN5fGwSycfxDmbU1ACnB4J3 cRaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc; bh=kj6wikokSK01VdPDZj7CzIr86bnE4wbyFkKjU8ip6AY=; b=dvpzsHB3FAsuHXsD+ig2n/RYjc+MiJtKn9Bq6DCbJNRnkywNFgyOlhEAuU6l4ldYwZ q7wRLu1Bbwu+bw+szexwwaASVB1btyFwG6QWKI+KJ5JeZltHnga95LPPZ54VTBI1Xily z/eczVMplKpsZwn1Fz0agg0CmmPIs8WpuijhsGl/QruC4y/SJfFmWycNG014r08z9I/l uc3F9u356E0Zom9Us/VJL8lMjTVvuQ8VpzqOGOpnqQAH1jPcKr5rms5QjNWzrCH1szNn hyf0rMbPR0iWCVYI6ejxk+Fp+ii6hIqIQ2hmZfSDwtqrTEjQ+2ev6FlywX/MbbaAoAKm RjlQ== X-Gm-Message-State: ACgBeo0AkgNlia9taJqEHUpR34wzznJgm55+9Wpy5LUwW20Iu3dJ/ePq foVtsSvIOmtzmy7DR4+WVJU= X-Google-Smtp-Source: AA6agR4JuGsaIk6kAGWK5P0S2RMm40FmC1jiJVB1QBcOqmcJss6k8NNCYbzDLXx2E1cD3Jh8Ew8PaQ== X-Received: by 2002:a05:600c:1c19:b0:3a5:a3c7:3800 with SMTP id j25-20020a05600c1c1900b003a5a3c73800mr8287938wms.69.1660499263106; Sun, 14 Aug 2022 10:47:43 -0700 (PDT) Original-Received: from [192.168.0.6] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id p4-20020a5d4584000000b0021e4bc9edbfsm4958598wrq.112.2022.08.14.10.47.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 14 Aug 2022 10:47:42 -0700 (PDT) Content-Language: en-US In-Reply-To: <83v8qvj79c.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:239680 Archived-At: On 14.08.2022 16:15, Eli Zaretskii wrote: >> There's no point in doing that. Either we narrow to some area around >> point (might even using a larger radius like 1 MB), or we we only >> fontify up to some position. The former easily creates bad fontification. >> >> The alternative, of course, is to pay the price of syntax-ppss on larger >> spans and wait the corresponding amount of time the first time the user >> scrolls to EOB. That's what the current default on the branch does. > > You are still thinking in terms of the original design of syntactical > analysis which strives to produce 100% accurate results. That design > principle doesn't work with very long lines, so sticking to it would > indeed lead us to give up on solving the problem. s/very long lines/very large files In any case, the "original design" is not going anywhere (as the only way to achieve correctness), and I'm talking in terms of balance between accuracy and performance. To use Gregory's narrowing approach in font-lock, checkout the branch under discussion (scratch/font_lock_large_files) and evaluate (setq font-lock-large-files '(narrow . 5000)) You'll see the same behavior as on master now (except narrowing isn't "hard"), with the same performance characteristics. > The better way is to acknowledge that some inaccuracies are acceptable > in those cases. With that in mind, one can design a syntax analyzer > that looks back only a short ways, until it finds some place that > could reasonably serve as an anchor point for heuristic decisions > about whether we are inside or outside a string or comment, and then > verifying that guess with some telltale syntactic elements that follow > (like semi-colons or comment-end delimiters in C). While this kind of > heuristics can sometimes fail, if they only fail rarely, the result is > a huge win. You cannot design a language-agnostic syntax analyzer like that. It's something every major mode would have to consider how to implement. It's relatively easy to design for JSON (again) because the syntax is so simple, but for others -- not so much. So we need to settle on the basic design first. The code on the branch includes the narrowing approach which is trivially extended to use the "find safe place" hook when it's available. But it won't be always available. >> But as Gregory shows, when you get to _really_ large files (like 1 GB >> JSON file in his example), pressing M-> will still make you wait (I have >> to wait around 20 seconds). > > Try with the latest master, it might have improved (fingers crossed). All improvements are welcome, but that's unlikely: > In any case, the way to speed up these cases is to look at the profile > and identify the code that is slowing us down; then attempt to make it > faster. (20 sec is actually long enough for us to interrupt Emacs > under a debugger and look at the backtrace to find the culprit.) I've profiled and benchmarked this scenario already: all of the delay (17 seconds, to be precise) come from parse-partial-sexp. 1 GB is a lot. >>>> So the "don't fontify past X" strategy is simply based on the idea >>>> that no fontification is probably better than unreliable and >>>> obviously incorrect one. >>> >>> I disagree with that idea, but if someone agrees with you, they can >>> simply turn off font-lock. As was already mentioned many times in >>> this endless discussion. >> >> If someone agrees with me, they will simply be able to customize >> font-lock-large-files to choose this strategy. > > If that solves the problems in a reasonable way for very long lines, > maybe we will eventually have such an option. Can I merge the branch, then? I was hoping for a stylistic review, perhaps. Like, whether you like the name of the variable, and should it be split in two. A change of the default value(s) is on the table too. >> I'm still waiting for people to come forward with other major modes >> which have the same kind of problems. Preferably ones that are likely to >> be used with large files. > > One such major mode and one such file was presented long ago : a > single-line XML file. XMl is indeed slower. It takes almost 3 seconds for me to scroll to the end of a 20 MB XML file. Most of it comes from sgml--syntax-propertize-ppss, which is probably justified: XML is a more complex language. But other than the initial delay, scrolling, and isearch, and local editing, all work fast, unlike the original situation with JSON.