From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Lynn Winebarger <owinebar@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Why tree-sitter instead of Semantic? (was Re: CC Mode with
 font-lock-maximum-decoration 2)
Date: Thu, 18 Aug 2022 08:34:53 -0400
Message-ID: <CAM=F=bB6sWUXvwDi5iigBN-n5JKZdyhUWO8TXU5FEPHOROWMHw@mail.gmail.com>
References: <YvFYvQt+xVJHmona@ACM> <83o7wuva9o.fsf@gnu.org>
 <YvFffq5Z/jTaAMra@ACM> <83mtceupbx.fsf@gnu.org> <YvIUEEAi00E5Ooa6@ACM>
 <83lerxvfnu.fsf@gnu.org> <YvJD5JMvJd+BIqEt@ACM> <838rnxvdcq.fsf@gnu.org>
 <YvKM9NF2e9v9yUmO@ACM> <83r11ptksn.fsf@gnu.org> <YvKcw0IurgWRsL9G@ACM>
 <83a68dti6w.fsf@gnu.org>
 <CAM=F=bB4GjNCaGGDCAGT2q98bky-AhruinemoMkMf3COeT3KjQ@mail.gmail.com>
 <c706a0f2-1e97-c9af-2fca-17d74dea3518@secure.kjonigsen.net>
 <87a687sjnv.fsf@yahoo.com>
 <CAM=F=bDSYwzpukCgwVcSMqb_5ejQkb78+dh=Dur88H47tGo2SQ@mail.gmail.com>
 <83zgg4fm9p.fsf@gnu.org>
 <CAM=F=bC-CU5qa6D+4PeDHf+axKewkd11OwAKtEUP1QBU-2wUoQ@mail.gmail.com>
 <jwvmtc4f6tl.fsf-monnier+emacs@gnu.org>
 <CAM=F=bCHrXTeT6XrVNk=vO+wbB4aD4rFLb_C-wLwKk3yqO1JKQ@mail.gmail.com>
 <e6d58072-6eb0-9557-209e-02d0d0c02a94@siege-engine.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="15258"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Eli Zaretskii <eliz@gnu.org>,
 luangruo@yahoo.com, 
 jostein@secure.kjonigsen.net, jostein@kjonigsen.net, acm@muc.de, 
 emacs-devel@gnu.org, casouri@gmail.com
To: Eric Ludlam <ericludlam@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Aug 18 14:36:43 2022
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1oOelP-0003pR-GE
	for ged-emacs-devel@m.gmane-mx.org; Thu, 18 Aug 2022 14:36:43 +0200
Original-Received: from localhost ([::1]:45474 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1oOelO-0003ij-60
	for ged-emacs-devel@m.gmane-mx.org; Thu, 18 Aug 2022 08:36:42 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44522)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <owinebar@gmail.com>)
 id 1oOeju-0002tI-D2
 for emacs-devel@gnu.org; Thu, 18 Aug 2022 08:35:10 -0400
Original-Received: from mail-pj1-x1036.google.com ([2607:f8b0:4864:20::1036]:44668)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <owinebar@gmail.com>)
 id 1oOejs-0005i9-O4; Thu, 18 Aug 2022 08:35:10 -0400
Original-Received: by mail-pj1-x1036.google.com with SMTP id
 r15-20020a17090a1bcf00b001fabf42a11cso1694875pjr.3; 
 Thu, 18 Aug 2022 05:35:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc;
 bh=SRaUFJxnYvxpVx2CQBbQDBrk4xeVvN4f9ziiTl4gxYM=;
 b=NGKePNYvTxP+8mCV5uizQMBRskhoS6mVZiVEVgAfD4bFDBQLa4QXtyJT/AZTVjGyTS
 gz/8JfBTqyhBknNGJg+BjiF5aulNqU1Cnj+JC39c2AAALDxGBCWQbVeiiG5EilC184yV
 WfXrdDu+1RU6PFquKfhjsT0+gW4XXZhw0aEWLYo7hRcc0yEMYKnWFRFowzNucOuRuvBL
 nIVwmiyvY6i9QoDVYIMAGeY42xm6ipQdDhla2PF75HP7QRUGrWfLBDqTWnVXSOXXs5eT
 IunaE4YYCxZNQXc8kP0xHAb+ezsOaPvRe5wzGh7LBH1pBZPwl0Ns3h0hp2Zn3QJSAgoU
 V3JQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc;
 bh=SRaUFJxnYvxpVx2CQBbQDBrk4xeVvN4f9ziiTl4gxYM=;
 b=eFiAqTzgNmm94F12VWbu1jqEmD6j038wlyFg80lqjpgCvtuz0phrW7705J0lZQAsWb
 N7REkrjPVn/Razz9Dnhgd2LgvMAgh94JW9OztQDo94Ufiskn86egtxDUnuvATHdwi7aw
 wzaFRPtw+fubAuEeWKfFkJnIKHXOEGtyxyF7mpedxMNkRfZClppnMZY2YYgutMMYtAmh
 TdH8YDCkajaBoIpdWqy1bM6fal+1GZq1jFfTk1sjMnInlu4MR0EQJqhgcxen2YGfakS2
 TTXKbU8a/S5YArL7C3VFhOF33FYGve6/rWTzEF6epu7gE3wjTKp+GomkJncFXYXYI6Jz
 IyKg==
X-Gm-Message-State: ACgBeo0QtqbI2ApXrTeri8ZR40bm3kPb90gk7P5+aBbbCwaZ8xokW5QA
 ffqrRLwTP7xhWfZou8mjvy5R1uop3V1jGlRXNJU=
X-Google-Smtp-Source: AA6agR6Xc9omizgY7SZV0km1h49iOTqP/tl92a1SLG4xz6GuC5TWV/wfGxhsV1pHaIMSckrlavrDn1lK7QkWDOCof8Q=
X-Received: by 2002:a17:90a:4493:b0:1fa:d6f9:a22e with SMTP id
 t19-20020a17090a449300b001fad6f9a22emr972182pjg.203.1660826106774; Thu, 18
 Aug 2022 05:35:06 -0700 (PDT)
In-Reply-To: <e6d58072-6eb0-9557-209e-02d0d0c02a94@siege-engine.com>
Received-SPF: pass client-ip=2607:f8b0:4864:20::1036;
 envelope-from=owinebar@gmail.com; helo=mail-pj1-x1036.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:293602
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/293602>

On Tue, Aug 16, 2022 at 9:41 PM Eric Ludlam <ericludlam@gmail.com> wrote:
>
> On 8/16/22 1:40 PM, Lynn Winebarger wrote:
> > On Tue, Aug 16, 2022 at 1:19 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> >>
> >>> I'm only saying there's a disconnect between Jostein's report and Po's
> >>> response.  It's probably a UI issue.  There's a checkbox in a dropdown
> >>> menu that says "Source Code Parsers (Semantic)".
> >>
> >> FWIW, I've used (semantic-mode 1) to enable CEDET in Emacs's C source
> >> files and that was all that was needed to get TAB completion of struct
> >> field's names working.
> >> I haven't used it for much more than that, admittedly.
> >
> > It also works for me, but I also have been mostly looking at Emacs
> > source with it, and Semantic knows how to use the TAGS file for
> > context-sensitive completion in C.  And something is working
> > gangbusters in Elisp, but unfortunately I can't really identify which
> > package is doing the work.
> >
> >>> *  "${" and "{" could both open a block closed by "}"
> >>
> >> Why do you think it's a problem?
> > If you want the lexer to tokenize the ${ as a symbol while still
> > recognizing the text in between as delimited, it seems like a problem.
> >    I mean, I already deal with that in ordinary font-lock, I was hoping
> > the parser/lexer generation would address the issue independently of
> > syntax tables.
>
> Lexers are built per-language from a set of analyzers.  Thus, you call
> (define-lex ...) and list a bunch of analyzers, which are created with
> `define-lex-analyzer' or one of the variants.
>
> The analyzers mostly use regular expressions, and when possible, uses
> expressions that use the syntax table because they are quite fast.  If
> you restrict yourself to the built-in named lexer analyzers, like
> 'semantic-lex-whitespace', then that is what they are, but you can use
> `define-lex-analyzer' or `define-lex-regex-analyzer' and write any code
> you want to do a match, push a token, and find the end point.  The C
> lexer/parser does this a lot.
>
> For a very simple case like matching ${:
> (define-lex-simple-regex-analyzer my-dollar-curly
>   "doc string"
>   "\\$\\{" 'dollar-curly)
>
> and then put this in front of the { } block analyzer when you build up
> your lexer.

Thanks for the details.  I'm not sure what you mean by "put this in
front of the ... block analyzer" though.  I just don't understand how
the different token types interact with each other and/or the "block"
(or other) construct well enough to confidently use the built-in
types.
What I will take away here is that I can closely review the C
lexer/parser to see how someone who does understand the interaction of
those types uses them effectively, before investing a lot of time
studying the construction of the built-in types for the purpose of
extending them.  Which I'm not sure I would do for the problem I'm
currently dealing with in any case.
Am I right that the "block" classification is used to allow Semantic
to localize the impact of unparseable text?  It sounds like the system
will still function without explicitly declaring block constructs, but
some useful features might be effectively disabled.
Lynn