From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Hugo Thunnissen Newsgroups: gmane.emacs.devel Subject: Fontification using a syntax tree Date: Sat, 18 Sep 2021 15:37:21 +0000 Message-ID: <87bl4pc2tq.fsf@hugot.nl> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18544"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Sep 18 17:40:53 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mRcSS-0004bw-G7 for ged-emacs-devel@m.gmane-mx.org; Sat, 18 Sep 2021 17:40:52 +0200 Original-Received: from localhost ([::1]:57128 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mRcSQ-0000Qt-5c for ged-emacs-devel@m.gmane-mx.org; Sat, 18 Sep 2021 11:40:50 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60376) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mRcPN-00089V-NE for emacs-devel@gnu.org; Sat, 18 Sep 2021 11:37:42 -0400 Original-Received: from aibo.runbox.com ([91.220.196.211]:58918) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mRcPH-0006Xb-UN for emacs-devel@gnu.org; Sat, 18 Sep 2021 11:37:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=hugot.nl; s=selector1; h=Content-Type:MIME-Version:Message-ID:Date:Subject:To:From; bh=z1ys+b+qoHUdaUZWFy4jo1bxXVLfYMVyENYCFcdD77c=; b=N6WR3BHvf7TJ1ZZOXOtEfxYxE g4WWDGf1tmN1gXorwKdEtSWfYibUeeSNFvWag0w8Ud64v2922Eu+chBDF4lPPRU3Bt4t2SSXzjz46 7Taw8MQ3u6dpdvEC0+q+ykvS4vKmJLrGj6YzU3iO3G79MNr7DX8xyF9CiM0HdzaYWAPdkjB9g7Ce8 nV+RBen2PKzVxzMsg4p8ETw3CE+OU7L1paA8imFGO96yIojKAFaSZpb1bbftIzhXiEGZxrpa04Udd sM7Qpj08+9HZ6bBW+QrLy3sZcuVdcEXUldtY8AAV9uLQcOH6yYB8SybwwSGeJkD5aNu0/XzK9toSj lif21i8kA==; Original-Received: from [10.9.9.72] (helo=submission01.runbox) by mailtransmit02.runbox with esmtp (Exim 4.86_2) (envelope-from ) id 1mRcP9-0000PR-6Q for emacs-devel@gnu.org; Sat, 18 Sep 2021 17:37:27 +0200 Original-Received: by submission01.runbox with esmtpsa [Authenticated ID (1060096)] (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) id 1mRcP4-00078W-1M for emacs-devel@gnu.org; Sat, 18 Sep 2021 17:37:22 +0200 Mail-Followup-To: emacs-devel@gnu.org Received-SPF: pass client-ip=91.220.196.211; envelope-from=devel@hugot.nl; helo=aibo.runbox.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, T_SPF_HELO_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:274997 Archived-At: Hi all, In the past weeks I've been improving phpinspect.el (my php parser/completion package) to a point where the completion is functional in most general cases for OOP code. At the moment the parsing is done fairly "dumb" in the sense that the entire buffer is parsed until the current point every time an eldoc string or a completion needs to be provided. This is not a problem in 100-1000 line files, but once you're editing the last function in a 2000 line PHP class, you're bound to get a little annoyed by the hiccups. For reference, parsing a 2000 line class takes about 0.3s on my ryzen 5 3600, while a 400 line class takes only 0.06s. To optimize this process, I am going to store my syntax tree and the point -start and -end positions for its tokens in between parser invocations. That will allow me to invalidate my syntax tree starting from the token that is enclosing the start point of the edited region, and "refresh" the invalidated part of the tree by parsing from that point onwards. Now, since I am going to store start and end positions of tokens, I was thinking that from a performance standpoint it might be beneficial to also use this information to provide fontification. My question to you is, how should I go about doing this? From what I understand, font-lock works with syntax tables, but if I use a syntax table for font-lock, I'm letting font-lock take care of the parsing while I have a perfectly fine syntax tree ready to use, right? Theoretically, with my stored tokens, fontification would be as simple as: (pseudocode) ;; buffer-local alist with token objects as key. (setq phpinspect--token-positions `((,token . (start . end)) ...)) (dolist (token-cons phpinspect--token-positions) (let ((token (car token-cons)) (point-start (cadr token-cons)) (point-end (cddr token-cons))) (when (is-an-eligible-token-for-fontification-p token) (put-text-property point-start point-end '(whatever property for this token))))) Before I look deeper into this, is there a way to make this work with font-lock? Or would I have to implement my own fontification mode? At that point, is it even a thing that I should want to be doing? Is it a bad idea / bad practice to not use font-lock? Any advice or thoughts are most welcome. -Hugo