From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: ambulajan@gmail.com Newsgroups: gmane.emacs.devel Subject: Using the wisent parser-generator, as it creates faster parsers Date: Mon, 26 Dec 2022 06:02:34 +0200 Message-ID: <8075de284038bb4970568bb856656cbc88ded050.camel@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12350"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Evolution 3.46.2 Cc: emacs-devel@gnu.org To: ericludlam@gmail.com Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Dec 26 05:03:28 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1p9ehz-00034V-Rv for ged-emacs-devel@m.gmane-mx.org; Mon, 26 Dec 2022 05:03:27 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p9ehF-0001Tb-F7; Sun, 25 Dec 2022 23:02:41 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p9ehE-0001TP-9F for emacs-devel@gnu.org; Sun, 25 Dec 2022 23:02:40 -0500 Original-Received: from mail-ej1-x62f.google.com ([2a00:1450:4864:20::62f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1p9ehC-00021d-Mi for emacs-devel@gnu.org; Sun, 25 Dec 2022 23:02:40 -0500 Original-Received: by mail-ej1-x62f.google.com with SMTP id gh17so24075630ejb.6 for ; Sun, 25 Dec 2022 20:02:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:user-agent:content-transfer-encoding:date:cc:to:from :subject:message-id:from:to:cc:subject:date:message-id:reply-to; bh=poln+hHdNF3okMKvvtliVz5qWjLx9DvPsyT1uO9/11g=; b=FzzhEPwW7XSzXlY9Z/LTAIYZFeZM2Cp4gglaqc8s7lq9PcID5+YxhOqWSYJldVFH8c sDxO8w3oeHFPF+JlAQ8notxUjUA0vxCiOg0A65fClnVW0hhED1+4CqlnmDcwfBuiLpZx h7G22ToM8VgPmcCcbhLdsp4T85EUgd3O7GRv4rmuZPTZzwX0i/b646p8x+sQz3pAaShS rbsnxL1+UpA2+/nDfvFVw5KrYlryQAgoZvIJT09Pkfkh9ixNBvnLMhWkSX3FPG3BLO6i png8cLOQMwnepMcmS0EoPYvvJRsl/l6MF95RvH9zZD26zK4+PilnQx1Qk81FpJ7xenM7 UJrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:content-transfer-encoding:date:cc:to:from :subject:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=poln+hHdNF3okMKvvtliVz5qWjLx9DvPsyT1uO9/11g=; b=g+79jjnJBc/0bdB8outsbBhc1L3aEML2+nNkg1MfnlyTR9okhBXwhxvRsbKjEQeMZs oNcu7B0R2Gvz7R5QKtrN9ZwAS+9f9d9E5u7CK099iDrAYtxKVzTOyU5/FJUS30vTZCEu EOdw3SsIT/7qPGGnSw5ZH+NUUS5KZ4A/iDOsj0oyOlSe/QGrAzaDuQkdvQKe+OTyZzd2 yb/TidTMlrakIkjBURLp24zFRAbJ/jz8VnNlWsY6tMGct4JvzAf1ERmSaoa/XW9KgkYZ +YasPT4R7ojuXf9+EvcFXV+c+9DWggYoOH+SijcRhz5bwHiI2WSjMzMLm6cpxvgykDPt tOdg== X-Gm-Message-State: AFqh2koUQK3Mms1/o2hmiif9P8e3DyXuoR+odalRKVtAHalaZl4V1TNO bfy6vHm9XmJsA3hQ+r/uKIU= X-Google-Smtp-Source: AMrXdXs2a+1WNCDfbwHFagV2hsUs8okPb+GApuuiwFh2Px7hS4DB8MuJn7B6yF462kD9oxU5RMYFrQ== X-Received: by 2002:a17:906:b053:b0:7ad:ca80:5669 with SMTP id bj19-20020a170906b05300b007adca805669mr17455740ejb.64.1672027355851; Sun, 25 Dec 2022 20:02:35 -0800 (PST) Original-Received: from [192.168.0.103] ([89.28.47.80]) by smtp.gmail.com with ESMTPSA id hk25-20020a170906c9d900b007c094d31f35sm4324482ejb.76.2022.12.25.20.02.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 25 Dec 2022 20:02:35 -0800 (PST) Received-SPF: pass client-ip=2a00:1450:4864:20::62f; envelope-from=ambulajan@gmail.com; helo=mail-ej1-x62f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:301924 Archived-At: A follow-up note after reading earlier discussion with a subject "Re: Why tree-sitter instead of Semantic?". > If you want to build a parser that sits on the lexer, there is more > to it, as I recommend using the wisent parser-generator, as it > creates faster parsers. In the wisent .wy files, you define %tokens > using a bison-like syntax, and that in turns builds analyzers that > you include in your lexer. I doubt that the problem of Semantic parsers was ever in Elisp being slow for that purpose. For me it was writing a LALR parser. Everything else was logical - lexers, SemanticDB, etc. But a grammar in development that stalls at every step with shift/reduce and reduce/reduce conflicts is like pushing against a wall. LALR algorithm never meant to be an interface for a developer, rather a workaround for slow CPUs with small memory systems of the 1980s. I've written an Earley parser, and so far it looks in the same performance category as LALR(wisent) written in Elisp. Earley parser works with any grammar you throw at it. No conflicts. Each token gets full context of rules that are in effect at that point. Seems like there's no need to build parse trees, a list of states- tokens can be thought of as a flattened parse tree. Though there's a lot of testing for this concept ahead. Semantic is the only such a system that's conceptualized as approachable(in Emacs way). Everything else is some combination of "black boxes" connected with wires. Lexers can be created with "block" tokens, when function's body is consumed as one token. Potentially it allows invocation of a parser with different variants of lexers - one mode with block tokens for the exploration of project's structure, and another mode for indentation and error checking purposes.