From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#56682: Fix the long lines font locking related slowdowns Date: Sun, 14 Aug 2022 16:15:43 +0300 Message-ID: <83v8qvj79c.fsf@gnu.org> References: <92da07bd028e3ede61a6@heytings.org> <47894c57-dd8b-5778-240a-3fa6540e4d37@yandex.ru> <92da07bd02941d5537e9@heytings.org> <5308e3b5-a160-17d7-77ee-b1d00acfa20d@yandex.ru> <92da07bd02a6cc861e1a@heytings.org> <837d3lzv8n.fsf@gnu.org> <2c8d6755-cfe2-6559-3fde-3fa30ffb411e@yandex.ru> <83mtcgy44k.fsf@gnu.org> <83k07jx5jn.fsf@gnu.org> <866e510d-a060-7daa-d002-97861d056fa7@yandex.ru> <1144021660321893@iva5-64778ce1ba26.qloud-c.yandex.net> <12348081660379417@sas2-a098efd00d24.qloud-c.yandex.net> <66bbbb95983414e79637@heytings.org> <83wnbckp0q.fsf@gnu.org> <8e884ebe-2d2e-d599-15c3-a5cfe5e6b295@yandex.ru> <83o7wnl7ok.fsf@gnu.org> <036414cc-c711-efaf-ed5b-f8ccfaca0604@yandex.ru> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29948"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 56682@debbugs.gnu.org, gregory@heytings.org, monnier@iro.umontreal.ca To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Aug 14 15:17:29 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oNDUe-0007bw-Sb for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 14 Aug 2022 15:17:29 +0200 Original-Received: from localhost ([::1]:40498 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oNDUd-0005A9-Ck for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 14 Aug 2022 09:17:27 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36466) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oNDUE-00059z-RT for bug-gnu-emacs@gnu.org; Sun, 14 Aug 2022 09:17:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:46459) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oNDUE-000201-EJ for bug-gnu-emacs@gnu.org; Sun, 14 Aug 2022 09:17:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oNDUE-00041W-5t for bug-gnu-emacs@gnu.org; Sun, 14 Aug 2022 09:17:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 14 Aug 2022 13:17:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56682 X-GNU-PR-Package: emacs Original-Received: via spool by 56682-submit@debbugs.gnu.org id=B56682.166048297015397 (code B ref 56682); Sun, 14 Aug 2022 13:17:02 +0000 Original-Received: (at 56682) by debbugs.gnu.org; 14 Aug 2022 13:16:10 +0000 Original-Received: from localhost ([127.0.0.1]:36206 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oNDTN-00040H-EC for submit@debbugs.gnu.org; Sun, 14 Aug 2022 09:16:09 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:51672) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oNDTL-0003zu-Mc for 56682@debbugs.gnu.org; Sun, 14 Aug 2022 09:16:08 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:45278) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oNDTF-0001tS-0L; Sun, 14 Aug 2022 09:16:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=61GpS9KIp6G3AvXAR5b/4I0DfXv4Ac+wiiR0R8U+xcU=; b=MQqepzpx3WSJ BK+7lGt/p2YuLwZIM4p5OdTzMoQXGAnhABBf2CRpBlBMZSMCq6aZU1o6jSDE2xnxq2x9YKFRsy28j 6VrDzxgR7MS/Qi+yrY3uLLUTyxVZqiIr0kU9yQS2GPEGIKgAgi1gBVLhU38kk83Bh/usFPntINQyh aHNpQigYySzF9SzIv6QIv8qAuwtBc5FdUhpSs4WV9Tc4hBXAZg3aJAqdw/x2QvBEL3etTO08B+u62 q6YtTGQrSrA2gdM0ZP314T6aAXMA9OYZatN9zIcHgixWn7N7KItWunzQot0cpVaKrV6N477oGDAGL hlkL/akF/ASg/TngaQaw0g==; Original-Received: from [87.69.77.57] (port=4597 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oNDTE-0001V0-Fo; Sun, 14 Aug 2022 09:16:00 -0400 In-Reply-To: <036414cc-c711-efaf-ed5b-f8ccfaca0604@yandex.ru> (message from Dmitry Gutov on Sun, 14 Aug 2022 13:29:42 +0300) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:239647 Archived-At: > Date: Sun, 14 Aug 2022 13:29:42 +0300 > Cc: 56682@debbugs.gnu.org, gregory@heytings.org, monnier@iro.umontreal.ca > From: Dmitry Gutov > > > I know, but we are not doing only what is easy to do, do we? We do > > (or should do) what the users expect. In this case, if we want to > > fontify some relatively small portion of the document, it should be > > the portion around where the file is first displayed. > > There's no point in doing that. Either we narrow to some area around > point (might even using a larger radius like 1 MB), or we we only > fontify up to some position. The former easily creates bad fontification. > > The alternative, of course, is to pay the price of syntax-ppss on larger > spans and wait the corresponding amount of time the first time the user > scrolls to EOB. That's what the current default on the branch does. You are still thinking in terms of the original design of syntactical analysis which strives to produce 100% accurate results. That design principle doesn't work with very long lines, so sticking to it would indeed lead us to give up on solving the problem. The better way is to acknowledge that some inaccuracies are acceptable in those cases. With that in mind, one can design a syntax analyzer that looks back only a short ways, until it finds some place that could reasonably serve as an anchor point for heuristic decisions about whether we are inside or outside a string or comment, and then verifying that guess with some telltale syntactic elements that follow (like semi-colons or comment-end delimiters in C). While this kind of heuristics can sometimes fail, if they only fail rarely, the result is a huge win. > But as Gregory shows, when you get to _really_ large files (like 1 GB > JSON file in his example), pressing M-> will still make you wait (I have > to wait around 20 seconds). Try with the latest master, it might have improved (fingers crossed). In any case, the way to speed up these cases is to look at the profile and identify the code that is slowing us down; then attempt to make it faster. (20 sec is actually long enough for us to interrupt Emacs under a debugger and look at the backtrace to find the culprit.) > >> So the "don't fontify past X" strategy is simply based on the idea > >> that no fontification is probably better than unreliable and > >> obviously incorrect one. > > > > I disagree with that idea, but if someone agrees with you, they can > > simply turn off font-lock. As was already mentioned many times in > > this endless discussion. > > If someone agrees with me, they will simply be able to customize > font-lock-large-files to choose this strategy. If that solves the problems in a reasonable way for very long lines, maybe we will eventually have such an option. > I'm still waiting for people to come forward with other major modes > which have the same kind of problems. Preferably ones that are likely to > be used with large files. One such major mode and one such file was presented long ago : a single-line XML file.