From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Vitalie Spinu Newsgroups: gmane.emacs.devel Subject: Re: Syntax tables for multiple modes [was: bug#22983: syntax-ppss returns wrong result.] Date: Mon, 21 Mar 2016 15:13:22 +0100 Message-ID: <877fgvgbr1.fsf@gmail.com> References: <20160311151512.GD2888@acm.fritz.box> <20160311212410.GG2888@acm.fritz.box> <73903215-f94b-e194-7bfe-0d6350c95769@yandex.ru> <20160311221540.GH2888@acm.fritz.box> <2c301ec9-041d-9172-d628-479062314b23@yandex.ru> <20160314151621.GF1894@acm.fritz.box> <874mc2dqtk.fsf@gmail.com> <87egb5cpmg.fsf@gmail.com> <87a8lsd4j3.fsf@gmail.com> <87twk0beuh.fsf@gmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1458569673 4105 80.91.229.3 (21 Mar 2016 14:14:33 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 21 Mar 2016 14:14:33 +0000 (UTC) Cc: Alan Mackenzie , Dmitry Gutov , emacs-devel To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Mar 21 15:14:28 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ai0bC-00008E-UH for ged-emacs-devel@m.gmane.org; Mon, 21 Mar 2016 15:14:27 +0100 Original-Received: from localhost ([::1]:58065 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ai0bC-0003mW-9L for ged-emacs-devel@m.gmane.org; Mon, 21 Mar 2016 10:14:26 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39207) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ai0aI-0003D2-Kh for emacs-devel@gnu.org; Mon, 21 Mar 2016 10:13:35 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ai0aD-0000q9-He for emacs-devel@gnu.org; Mon, 21 Mar 2016 10:13:30 -0400 Original-Received: from mail-wm0-x235.google.com ([2a00:1450:400c:c09::235]:38757) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ai0aD-0000q4-6A for emacs-devel@gnu.org; Mon, 21 Mar 2016 10:13:25 -0400 Original-Received: by mail-wm0-x235.google.com with SMTP id l68so123609557wml.1 for ; Mon, 21 Mar 2016 07:13:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=nqExY8w+E59Jmd4Zr+SlzPpVP/Bnt+XjB9SlyrEYaWQ=; b=XYdJmhx6Wwwi4VmO5xpbBXyP/ti2FI9n9KThh61wqw/JsxAPVKUhQ3VS0cJr8bEvss IYsFYwxhBJvf62Pmh0h6Q4uHj02w7hxlYImrJ7zJmAjaYFz/6aWrj7o1fZ4DNgo1hwOF XeT6NqC/dsl+7QDgdD03kkOk0/SbEJSqUhCuqQALE0U9ryBZCmeSZk+l5wb9x6SwVnuX 6AEZcGIConuIMNWNuNLKVcpzgNrxKdDWwJwjn1qfGKcEIwT45XVQzBmc0+nzMyrfY3c9 c++eMiC1b+1mq7ieq60oWem55bQesgY3KYAaqPLbPyfiP1+EqljJOTrzeLKEoOjV6sAP 4u8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=nqExY8w+E59Jmd4Zr+SlzPpVP/Bnt+XjB9SlyrEYaWQ=; b=FAtw/gJ6Fp/Z8yfT7cJztNncutqopHa7hbGLqGfc0nsyuEAaHASM0gMLt27jsHM/3z V4Uy+zzEiAikt/3TaxT4S65JQkCrLBiEsJApsCOcuLqkyBptBVPi3RxYZYjZdyjQa1iw 6E5EMdjsnHh7XJC/6dOdoA842xCIRvUXBsTa6kZ8d3YSwCwYwLmVoNH7tP8CPjI5qC47 w6hmbVSmxu/ywjDLECgHQWhS8pz4ADTZRDm6EscmSe/IxatC9a/GKQMNJ9IQSaYjH9Hp 61+C9T4HrK13xVDebMqEbUx2EdjLmxZYkBtnjii/KhommLDKZs1Sv0BTOhH/u8Wnw/QC nR3w== X-Gm-Message-State: AD7BkJKR8NifjryTeWDT7un7kcvYPR+zUk7mWIbLK/06nkg4mC4Yf2h8QPGjY0+KGzURZw== X-Received: by 10.28.129.213 with SMTP id c204mr14726010wmd.89.1458569604508; Mon, 21 Mar 2016 07:13:24 -0700 (PDT) Original-Received: from localhost ([143.176.214.220]) by smtp.gmail.com with ESMTPSA id jo6sm25566640wjb.48.2016.03.21.07.13.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Mar 2016 07:13:23 -0700 (PDT) In-Reply-To: (Stefan Monnier's message of "Mon, 21 Mar 2016 08:26:25 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.91 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a00:1450:400c:c09::235 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:201993 Archived-At: >> On Mon, Mar 21 2016 08:26, Stefan Monnier wrote: >> parse-partial-sexp should work between hard limits (at least the lower >> bound). It should operate as if hard-narrowed buffer is the real buffer. > You mean it should ignore the current (user)narrowing? Why? I'd think that if > something needs to ignore the (user)narrowing it'd be parse-partial-sexp's > *caller* but not parse-partial-sexp itself. Currently it just throws out-of-range errors. So in that sense it does ignore user narrowing in a very inconvenient way. parse-partial-sexp is called from code exclusively and it just happens that in multi-modes it is called outside of narrow region quite often. That's a major inconvenience. Why on earth one would need to take account in user narrowing for syntax parsing? If parse-partial-sexp could be made to always widen to hard limits it will automatically solve a bunch of problems. bug#22983 being one of them, condition-case awkwardness in syntax-ppss being another one, and the ubiquitous out-of-range errors in font-lock in multi-modes being the most important one. >> So ideally it should take (max FROM (car hard-widen-limits)) as the starting >> position. > You mean: as opposed to (max FROM (point-min))? Yes. > I disagree. Functions should usually not accept to talk about positions > outside of the point-min/max range. Depends on the function. point-max/min is mostly user level. Why wold syntax parsing would need to respect that? Bug#22983 ilustrates that clearly. If user narrows in the middles of a string, it creates huge problems. Note that with Dmitry's new syntax-ppps-dont-widen proposal syntax-ppps widens first. Can I ask you the reverse? What do you gain by respecting user narrowing in syntax parsing? > Notice how syntax-ppss is different in this regard: since it doesn't > receive FROM, that same rule doesn't prevent syntax-ppss from widening > to (car hard-widen-limits). Well, not quite different. It has POS which might be outside of user narrowed range. >> This will give the desired consistency between parse-partial-sexp and >> syntax-ppss with the price of slightly modifying the semantics of >> parse-partial-sexp in a backward compatible way. > I'd be curious to know in which circumstances (i.e. specific code in specific > packages) this would make a difference. As mentioned above, I think these > cases would be better fixed by changing the calling code to perform widening > before calling parse-partial-sexp. I think bug#22983 is illustrative enough. Multi-mode code is a nightmare because of out-of-range errors in parsing. `syntax-ppss` is protected but that condition-case is triggered in 99.99% of the times in multi-modes. In multi modes you really want to keep narrowing because most of the major-mode functionality works well on narrowed code. Pretty much all of it except syntactic parsing and font-locking. Occasional property lockup outside of narrowed region could be dealt with on case by case basis or, hopefully, with new hard-narrowed-limits at the core of it. >>>> A patch that would require hunting every single mode out there and >>>> implementing multi-modes locally should have been more carefully >>>> considered IMO. >> - Major mode authors won't need to know about multi-modes. That >> means not dealing with chunks/spans/headers etc. These concepts are >> not even uniformly defined between existing multi-mode engines. > I understand that's your claim, but I don't understand why/how this is > different between the two proposals. Major mode author has to deal with the span explicitly as defined in previous-chunk in prog-indentation-context. Cognitively this is a more demanding task. Ask a new person to go and read the doc of prog-indentation-context and ask how much he or she understands of it. I read it and I think I understand most of it, but looking at all the usages of prog-widen and prog-first-column in python.el my brain gives up. Previous-chunk is not even used in python.el! The prog-calculate-indent-function is more general. You can call it on any buffer position (need not be last point in the previous span). It can be called with whatever STRING-BEFORE and STRING-AFTER (these can, but need not be, actual strings in the buffer). Current prog-indentation-context allows for possibility of a string to be inserted before begging in of current chunk. STRING-BEFORE is more more general than that because of the arbitrary POS that it can be applied to. My claim is that we can achieve much higher generality and don't bother mode authors with all those concepts like current/previous span/chunk, starting/end position etc. Only multi-mode engine can take proper care of those anyways. Here is a simple example when inner mode cannot decide by itself on the indentation. Assume for concreteness a noweb header with some code immediately following the header: <>= some_call(blabla) some_other_call(blabla) ## indented by offset 2 with respect to header or prev_chunk How do you indent the some_call(blabal) after the header? The most meaningful way is to keep it untouched just as user defined it. If inner mode would indent it by itself it would give offset of 4. This is a simple example of header dependence. You can easily imagine more complex cases when not only one previous span need to be considered but a range of previous spans of the same inner mode. Moreover there might be nested inner chunks. Which chunk/span will you include in prog-indentation-context? The entire previous code chunk or only the last homogeneous span after the most recent inner-inner chunk? Indentation of a span is commonly dependent on the header of the chunk (note the terminology distinction). You can imagine having a parameter in the header that would determine the indentation of the chunk's body. Header-dependence is a simple and common case of inter-span dependence. It's not hard to imagine complex cases when indentation in current span will depend not only on the previous span of the same mode but on other spans of host mode or even other inner (nested or not) modes. IMO the best way is to leave all this complexities to multi-mode authors to deal with on case by case basis. You never know what sort of complexities and chunk dependencies new multi-modes will impose. Better keep things generic. prog-calculate-indent-function seems like a multi-mode agnostic solution. I am not sure if it will solve all problems, but it's surely solves more than prog-indentation-context does in a cleaner way. Note on terminology. I put quite some effort to sort things out in polymode. Glossary of terms is here: https://github.com/vspinu/polymode/tree/master/modes#glossary-of-terms For many reasons it's important to distinguish between portions of code that include header/tails and homogeneous portions of the same mode. Former portions I call `chunks` and those can include other chunks of different sub-modes. The latter, homogeneous portions, I call `spans`. The fact that core emacs is now starting building pieces of multi-mode functionality here and there and thus entrenching a somewhat naive interpretation of a "chunk" doesn't make me happy. Not a big deal though. > My hunch is that if STRING-BEFORE/AFTER don't matter, It will actually matter for quite some modes in continuation chunks. I was too optimistic. Vitalie