From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Date: Fri, 18 Mar 2016 18:25:47 +0000 Message-ID: <20160318182547.GB9433@acm.fritz.box> References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1458325970 9730 80.91.229.3 (18 Mar 2016 18:32:50 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 18 Mar 2016 18:32:50 +0000 (UTC) Cc: 23019@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Mar 18 19:32:39 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1agzCQ-000785-G4 for geb-bug-gnu-emacs@m.gmane.org; Fri, 18 Mar 2016 19:32:38 +0100 Original-Received: from localhost ([::1]:45672 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agzCQ-00076N-0e for geb-bug-gnu-emacs@m.gmane.org; Fri, 18 Mar 2016 14:32:38 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37806) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agz49-00089C-R3 for bug-gnu-emacs@gnu.org; Fri, 18 Mar 2016 14:24:12 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1agz46-00085Z-Ay for bug-gnu-emacs@gnu.org; Fri, 18 Mar 2016 14:24:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:55819) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agz46-00085V-6v for bug-gnu-emacs@gnu.org; Fri, 18 Mar 2016 14:24:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1agz46-00084G-36 for bug-gnu-emacs@gnu.org; Fri, 18 Mar 2016 14:24:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 18 Mar 2016 18:24:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23019 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 23019-submit@debbugs.gnu.org id=B23019.145832538730932 (code B ref 23019); Fri, 18 Mar 2016 18:24:02 +0000 Original-Received: (at 23019) by debbugs.gnu.org; 18 Mar 2016 18:23:07 +0000 Original-Received: from localhost ([127.0.0.1]:52943 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agz3C-00082q-6S for submit@debbugs.gnu.org; Fri, 18 Mar 2016 14:23:07 -0400 Original-Received: from mail.muc.de ([193.149.48.3]:10776) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agz39-00082h-Oe for 23019@debbugs.gnu.org; Fri, 18 Mar 2016 14:23:05 -0400 Original-Received: (qmail 68212 invoked by uid 3782); 18 Mar 2016 18:23:02 -0000 Original-Received: from acm.muc.de (p548A53B1.dip0.t-ipconnect.de [84.138.83.177]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 18 Mar 2016 19:23:01 +0100 Original-Received: (qmail 11281 invoked by uid 1000); 18 Mar 2016 18:25:47 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:115049 Archived-At: Hello, Stefan. On Fri, Mar 18, 2016 at 12:23:02PM -0400, Stefan Monnier wrote: > >> - change element 10 so it's nil if the last char was an "end of > >> something". Another way to look at it, is that the element 10 should > >> only be non-nil if the "next lexeme" might start on that > >> previous character. > > I've tried this, and it's somewhat ugly. Setting the "previous_syntax" > > to nil is also needed for the asterisk in "/*". The nil would appear to > > mean "the syntactic value of the last character has already been used > > up". So the "previous_syntax" is nil in the most interesting cases. It > > also feels somewhat ad-hoc. > > How about this idea: element 10 will record the syntax of the previous > > character ONLY when it is potentially the first character of a two > > character comment delimiter, otherwise it'll be nil. At least that's > > being honest about what the thing's being used for. > IIUC the only difference between what I (think I) suggested and what > you're proposing is that you want to return nil for the "prev is > backslash" whereas I was suggesting to return non-nil in that case. > [ AFAIK the only two-char elements we handle so far as the comment > delimiters and the backslash escapes. ] We also have Scharquote, which scan_sexps_forward handles identically to Sescape. > Do I understand this right? Yes, but I've no strong feelings on the matter. > > It would appear to be, yes. We really can't get rid of element 5, > > though, because there will surely be code out there that uses it. But > > if I change element 10 as outlined above, element 5 will no longer be > > redundant. > I'd even be tempted to re-use element 5, although it might > conceivably break some code out there. I have bad feelings about that. Is it really worth the risk, just to save one cons cell on a list that not that many instances of exist at any time? > But even if we don't re-use element 5, I would actually much prefer to > render element 5 redundant. OK. Here's an updated patch which does just that. Comments would be welcome. > Stefan Amend parse-partial-sexp correctly to handle two character comment delimiters Do this by adding a new field to the parser state: the syntax of the last character scanned, should that be the first char of a (potential) two char construct, nil otherwise. This should make the parser state complete. Also document element 9 of the parser state. Also refactor the code a bit. * src/syntax.c (struct lisp_parse_state): Add a new field. (SYNTAX_FLAGS_COMSTARTEND_FIRST): New function. (internalize_parse_state): New function, extracted from scan_sexps_forward. (back_comment): Call internalize_parse_state. (forw_comment): Return the syntax of the last character scanned to the caller. (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment. (scan_sexps_forward): Remove a redundant state parameter. Access all `state' information via the address parameter `state'. Remove the code which converts from external to internal form of `state'. Access buffer contents only from `from' onwards. Reformulate code at the top of the main loop correctly to recognize comment openers when starting in the middle of one. Call forw_comment with extra argument (for return of final syntax value). (Fparse_partial_sexp): Document elements 9, 10 of the parser state in the doc string. Clarify the doc string in general. Call internalize_parse_state. Take account of the new elements when consing up the output parser state. * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new element 10. Minor wording corrections (remove reference to "trivial cases"). (Low Level Parsing): Minor corrections. diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi index d5a7eba..f81c164 100644 --- a/doc/lispref/syntax.texi +++ b/doc/lispref/syntax.texi @@ -791,10 +791,10 @@ Parser State @subsection Parser State @cindex parser state - A @dfn{parser state} is a list of ten elements describing the state -of the syntactic parser, after it parses the text between a specified -starting point and a specified end point in the buffer. Parsing -functions such as @code{syntax-ppss} + A @dfn{parser state} is a list of (currently) eleven elements +describing the state of the syntactic parser, after it parses the text +between a specified starting point and a specified end point in the +buffer. Parsing functions such as @code{syntax-ppss} @ifnottex (@pxref{Position Parse}) @end ifnottex @@ -851,15 +851,20 @@ Parser State this element is @code{nil}. @item -Internal data for continuing the parsing. The meaning of this -data is subject to change; it is used if you pass this list -as the @var{state} argument to another call. +The list of the positions of the currently open parentheses, starting +with the outermost. + +@item +When the last buffer position scanned was the (potential) first +character of a two character construct (comment delimiter or +escaped/char-quoted character pair), the @var{syntax-code} +(@pxref{Syntax Table Internals}) of that position. Otherwise +@code{nil}. @end enumerate Elements 1, 2, and 6 are ignored in a state which you pass as an -argument to continue parsing, and elements 8 and 9 are used only in -trivial cases. Those elements are mainly used internally by the -parser code. +argument to continue parsing. Elements 9 and 10 are mainly used +internally by the parser code. One additional piece of useful information is available from a parser state using this function: @@ -898,11 +903,11 @@ Low-Level Parsing If the fourth argument @var{stop-before} is non-@code{nil}, parsing stops when it comes to any character that starts a sexp. If -@var{stop-comment} is non-@code{nil}, parsing stops when it comes to the -start of an unnested comment. If @var{stop-comment} is the symbol +@var{stop-comment} is non-@code{nil}, parsing stops after the start of +an unnested comment. If @var{stop-comment} is the symbol @code{syntax-table}, parsing stops after the start of an unnested -comment or a string, or the end of an unnested comment or a string, -whichever comes first. +comment or a string, or after the end of an unnested comment or a +string, whichever comes first. If @var{state} is @code{nil}, @var{start} is assumed to be at the top level of parenthesis structure, such as the beginning of a function diff --git a/src/syntax.c b/src/syntax.c index 249d0d5..e6a1942 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -81,6 +81,11 @@ SYNTAX_FLAGS_COMEND_SECOND (int flags) return (flags >> 19) & 1; } static bool +SYNTAX_FLAGS_COMSTARTEND_FIRST (int flags) +{ + return (flags & 0x50000) != 0; +} +static bool SYNTAX_FLAGS_PREFIX (int flags) { return (flags >> 20) & 1; @@ -153,6 +158,10 @@ struct lisp_parse_state ptrdiff_t comstr_start; /* Position of last comment/string starter. */ Lisp_Object levelstarts; /* Char numbers of starts-of-expression of levels (starting from outermost). */ + int prev_syntax; /* Syntax of previous position scanned, when + that position (potentially) holds the first char + of a 2-char construct, i.e. comment delimiter + or Sescape, etc. Smax otherwise. */ }; /* These variables are a cache for finding the start of a defun. @@ -176,7 +185,8 @@ static Lisp_Object skip_syntaxes (bool, Lisp_Object, Lisp_Object); static Lisp_Object scan_lists (EMACS_INT, EMACS_INT, EMACS_INT, bool); static void scan_sexps_forward (struct lisp_parse_state *, ptrdiff_t, ptrdiff_t, ptrdiff_t, EMACS_INT, - bool, Lisp_Object, int); + bool, int); +static void internalize_parse_state (Lisp_Object, struct lisp_parse_state *); static bool in_classes (int, Lisp_Object); static void parse_sexp_propertize (ptrdiff_t charpos); @@ -911,10 +921,11 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, } do { + internalize_parse_state (Qnil, &state); scan_sexps_forward (&state, defun_start, defun_start_byte, comment_end, TYPE_MINIMUM (EMACS_INT), - 0, Qnil, 0); + 0, 0); defun_start = comment_end; if (!adjusted) { @@ -2314,7 +2325,9 @@ in_classes (int c, Lisp_Object iso_classes) into *CHARPOS_PTR and the corresponding bytepos into *BYTEPOS_PTR. Else, return false and store the charpos STOP into *CHARPOS_PTR, the corresponding bytepos into *BYTEPOS_PTR and the current nesting - (as defined for state.incomment) in *INCOMMENT_PTR. + (as defined for state->incomment) in *INCOMMENT_PTR. The + SYNTAX_WITH_FLAGS of the last character scanned in the comment is + stored into *last_syntax_ptr. The comment end is the last character of the comment rather than the character just after the comment. @@ -2326,7 +2339,7 @@ static bool forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, EMACS_INT nesting, int style, int prev_syntax, ptrdiff_t *charpos_ptr, ptrdiff_t *bytepos_ptr, - EMACS_INT *incomment_ptr) + EMACS_INT *incomment_ptr, int *last_syntax_ptr) { register int c, c1; register enum syntaxcode code; @@ -2346,6 +2359,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, *incomment_ptr = nesting; *charpos_ptr = from; *bytepos_ptr = from_byte; + *last_syntax_ptr = syntax; return 0; } c = FETCH_CHAR_AS_MULTIBYTE (from_byte); @@ -2415,6 +2429,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, } *charpos_ptr = from; *bytepos_ptr = from_byte; + *last_syntax_ptr = syntax; return 1; } @@ -2436,6 +2451,7 @@ between them, return t; otherwise return nil. */) EMACS_INT count1; ptrdiff_t out_charpos, out_bytepos; EMACS_INT dummy; + int dummy2; CHECK_NUMBER (count); count1 = XINT (count); @@ -2499,7 +2515,7 @@ between them, return t; otherwise return nil. */) } /* We're at the start of a comment. */ found = forw_comment (from, from_byte, stop, comnested, comstyle, 0, - &out_charpos, &out_bytepos, &dummy); + &out_charpos, &out_bytepos, &dummy, &dummy2); from = out_charpos; from_byte = out_bytepos; if (!found) { @@ -2659,6 +2675,7 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag) ptrdiff_t from_byte; ptrdiff_t out_bytepos, out_charpos; EMACS_INT dummy; + int dummy2; bool multibyte_symbol_p = sexpflag && multibyte_syntax_as_symbol; if (depth > 0) min_depth = 0; @@ -2755,7 +2772,8 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag) UPDATE_SYNTAX_TABLE_FORWARD (from); found = forw_comment (from, from_byte, stop, comnested, comstyle, 0, - &out_charpos, &out_bytepos, &dummy); + &out_charpos, &out_bytepos, &dummy, + &dummy2); from = out_charpos, from_byte = out_bytepos; if (!found) { @@ -3119,7 +3137,7 @@ the prefix syntax flag (p). */) } /* Parse forward from FROM / FROM_BYTE to END, - assuming that FROM has state OLDSTATE (nil means FROM is start of function), + assuming that FROM has state STATE, and return a description of the state of the parse at END. If STOPBEFORE, stop at the start of an atom. If COMMENTSTOP is 1, stop at the start of a comment. @@ -3127,12 +3145,11 @@ the prefix syntax flag (p). */) after the beginning of a string, or after the end of a string. */ static void -scan_sexps_forward (struct lisp_parse_state *stateptr, +scan_sexps_forward (struct lisp_parse_state *state, ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t end, EMACS_INT targetdepth, bool stopbefore, - Lisp_Object oldstate, int commentstop) + int commentstop) { - struct lisp_parse_state state; enum syntaxcode code; int c1; bool comnested; @@ -3148,7 +3165,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr, Lisp_Object tem; ptrdiff_t prev_from; /* Keep one character before FROM. */ ptrdiff_t prev_from_byte; - int prev_from_syntax; + int prev_from_syntax, prev_prev_from_syntax; bool boundary_stop = commentstop == -1; bool nofence; bool found; @@ -3165,6 +3182,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr, do { prev_from = from; \ prev_from_byte = from_byte; \ temp = FETCH_CHAR_AS_MULTIBYTE (prev_from_byte); \ + prev_prev_from_syntax = prev_from_syntax; \ prev_from_syntax = SYNTAX_WITH_FLAGS (temp); \ INC_BOTH (from, from_byte); \ if (from < end) \ @@ -3174,88 +3192,38 @@ do { prev_from = from; \ immediate_quit = 1; QUIT; - if (NILP (oldstate)) - { - depth = 0; - state.instring = -1; - state.incomment = 0; - state.comstyle = 0; /* comment style a by default. */ - state.comstr_start = -1; /* no comment/string seen. */ - } - else - { - tem = Fcar (oldstate); - if (!NILP (tem)) - depth = XINT (tem); - else - depth = 0; - - oldstate = Fcdr (oldstate); - oldstate = Fcdr (oldstate); - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - /* Check whether we are inside string_fence-style string: */ - state.instring = (!NILP (tem) - ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE) - : -1); + depth = state->depth; + start_quoted = state->quoted; + prev_prev_from_syntax = Smax; + prev_from_syntax = state->prev_syntax; - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - state.incomment = (!NILP (tem) - ? (INTEGERP (tem) ? XINT (tem) : -1) - : 0); - - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - start_quoted = !NILP (tem); - - /* if the eighth element of the list is nil, we are in comment - style a. If it is non-nil, we are in comment style b */ - oldstate = Fcdr (oldstate); - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - state.comstyle = (NILP (tem) - ? 0 - : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE) - ? XINT (tem) - : ST_COMMENT_STYLE)); - - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - state.comstr_start = - RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1; - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - while (!NILP (tem)) /* >= second enclosing sexps. */ - { - Lisp_Object temhd = Fcar (tem); - if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX)) - curlevel->last = XINT (temhd); - if (++curlevel == endlevel) - curlevel--; /* error ("Nesting too deep for parser"); */ - curlevel->prev = -1; - curlevel->last = -1; - tem = Fcdr (tem); - } + tem = state->levelstarts; + while (!NILP (tem)) /* >= second enclosing sexps. */ + { + Lisp_Object temhd = Fcar (tem); + if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX)) + curlevel->last = XINT (temhd); + if (++curlevel == endlevel) + curlevel--; /* error ("Nesting too deep for parser"); */ + curlevel->prev = -1; + curlevel->last = -1; + tem = Fcdr (tem); } - state.quoted = 0; - mindepth = depth; - curlevel->prev = -1; curlevel->last = -1; - SETUP_SYNTAX_TABLE (prev_from, 1); - temp = FETCH_CHAR (prev_from_byte); - prev_from_syntax = SYNTAX_WITH_FLAGS (temp); - UPDATE_SYNTAX_TABLE_FORWARD (from); + state->quoted = 0; + mindepth = depth; + + SETUP_SYNTAX_TABLE (from, 1); /* Enter the loop at a place appropriate for initial state. */ - if (state.incomment) + if (state->incomment) goto startincomment; - if (state.instring >= 0) + if (state->instring >= 0) { - nofence = state.instring != ST_STRING_STYLE; + nofence = state->instring != ST_STRING_STYLE; if (start_quoted) goto startquotedinstring; goto startinstring; @@ -3266,11 +3234,8 @@ do { prev_from = from; \ while (from < end) { int syntax; - INC_FROM; - code = prev_from_syntax & 0xff; - if (from < end - && SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax) + if (SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax) && (c1 = FETCH_CHAR (from_byte), syntax = SYNTAX_WITH_FLAGS (c1), SYNTAX_FLAGS_COMSTART_SECOND (syntax))) @@ -3280,32 +3245,39 @@ do { prev_from = from; \ /* Record the comment style we have entered so that only the comment-end sequence of the same style actually terminates the comment section. */ - state.comstyle + state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (syntax, prev_from_syntax); comnested = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) | SYNTAX_FLAGS_COMMENT_NESTED (syntax)); - state.incomment = comnested ? 1 : -1; - state.comstr_start = prev_from; + state->incomment = comnested ? 1 : -1; + state->comstr_start = prev_from; INC_FROM; + prev_from_syntax = Smax; /* the syntax has already been + "used up". */ code = Scomment; } - else if (code == Scomment_fence) - { - /* Record the comment style we have entered so that only - the comment-end sequence of the same style actually - terminates the comment section. */ - state.comstyle = ST_COMMENT_STYLE; - state.incomment = -1; - state.comstr_start = prev_from; - code = Scomment; - } - else if (code == Scomment) - { - state.comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0); - state.incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ? - 1 : -1); - state.comstr_start = prev_from; - } + else + { + INC_FROM; + code = prev_from_syntax & 0xff; + if (code == Scomment_fence) + { + /* Record the comment style we have entered so that only + the comment-end sequence of the same style actually + terminates the comment section. */ + state->comstyle = ST_COMMENT_STYLE; + state->incomment = -1; + state->comstr_start = prev_from; + code = Scomment; + } + else if (code == Scomment) + { + state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0); + state->incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ? + 1 : -1); + state->comstr_start = prev_from; + } + } if (SYNTAX_FLAGS_PREFIX (prev_from_syntax)) continue; @@ -3350,25 +3322,28 @@ do { prev_from = from; \ case Scomment_fence: /* Can't happen because it's handled above. */ case Scomment: - if (commentstop || boundary_stop) goto done; + if (commentstop || boundary_stop) goto done; startincomment: /* The (from == BEGV) test was to enter the loop in the middle so that we find a 2-char comment ender even if we start in the middle of it. We don't want to do that if we're just at the beginning of the comment (think of (*) ... (*)). */ found = forw_comment (from, from_byte, end, - state.incomment, state.comstyle, - (from == BEGV || from < state.comstr_start + 3) - ? 0 : prev_from_syntax, - &out_charpos, &out_bytepos, &state.incomment); + state->incomment, state->comstyle, + from == BEGV ? 0 : prev_from_syntax, + &out_charpos, &out_bytepos, &state->incomment, + &prev_from_syntax); from = out_charpos; from_byte = out_bytepos; - /* Beware! prev_from and friends are invalid now. - Luckily, the `done' doesn't use them and the INC_FROM - sets them to a sane value without looking at them. */ + /* Beware! prev_from and friends (except prev_from_syntax) + are invalid now. Luckily, the `done' doesn't use them + and the INC_FROM sets them to a sane value without + looking at them. */ if (!found) goto done; INC_FROM; - state.incomment = 0; - state.comstyle = 0; /* reset the comment style */ + state->incomment = 0; + state->comstyle = 0; /* reset the comment style */ + prev_from_syntax = Smax; /* Ensure "*|*" can't open a spurious new + comment. */ if (boundary_stop) goto done; break; @@ -3396,16 +3371,16 @@ do { prev_from = from; \ case Sstring: case Sstring_fence: - state.comstr_start = from - 1; + state->comstr_start = from - 1; if (stopbefore) goto stop; /* this arg means stop at sexp start */ curlevel->last = prev_from; - state.instring = (code == Sstring + state->instring = (code == Sstring ? (FETCH_CHAR_AS_MULTIBYTE (prev_from_byte)) : ST_STRING_STYLE); if (boundary_stop) goto done; startinstring: { - nofence = state.instring != ST_STRING_STYLE; + nofence = state->instring != ST_STRING_STYLE; while (1) { @@ -3419,7 +3394,7 @@ do { prev_from = from; \ /* Check C_CODE here so that if the char has a syntax-table property which says it is NOT a string character, it does not end the string. */ - if (nofence && c == state.instring && c_code == Sstring) + if (nofence && c == state->instring && c_code == Sstring) break; switch (c_code) @@ -3442,7 +3417,7 @@ do { prev_from = from; \ } } string_end: - state.instring = -1; + state->instring = -1; curlevel->prev = curlevel->last; INC_FROM; if (boundary_stop) goto done; @@ -3461,25 +3436,96 @@ do { prev_from = from; \ stop: /* Here if stopping before start of sexp. */ from = prev_from; /* We have just fetched the char that starts it; */ from_byte = prev_from_byte; + prev_from_syntax = prev_prev_from_syntax; goto done; /* but return the position before it. */ endquoted: - state.quoted = 1; + state->quoted = 1; done: - state.depth = depth; - state.mindepth = mindepth; - state.thislevelstart = curlevel->prev; - state.prevlevelstart + state->depth = depth; + state->mindepth = mindepth; + state->thislevelstart = curlevel->prev; + state->prevlevelstart = (curlevel == levelstart) ? -1 : (curlevel - 1)->last; - state.location = from; - state.location_byte = from_byte; - state.levelstarts = Qnil; + state->location = from; + state->location_byte = from_byte; + state->levelstarts = Qnil; while (curlevel > levelstart) - state.levelstarts = Fcons (make_number ((--curlevel)->last), - state.levelstarts); + state->levelstarts = Fcons (make_number ((--curlevel)->last), + state->levelstarts); + state->prev_syntax = (SYNTAX_FLAGS_COMSTARTEND_FIRST (prev_from_syntax) + || state->quoted) ? prev_from_syntax : Smax; immediate_quit = 0; +} + +/* Convert a (lisp) parse state to the internal form used in + scan_sexps_forward. */ +static void +internalize_parse_state (Lisp_Object external, struct lisp_parse_state *state) +{ + Lisp_Object tem; + + if (NILP (external)) + { + state->depth = 0; + state->instring = -1; + state->incomment = 0; + state->quoted = 0; + state->comstyle = 0; /* comment style a by default. */ + state->comstr_start = -1; /* no comment/string seen. */ + state->levelstarts = Qnil; + state->prev_syntax = Smax; + } + else + { + tem = Fcar (external); + if (!NILP (tem)) + state->depth = XINT (tem); + else + state->depth = 0; + + external = Fcdr (external); + external = Fcdr (external); + external = Fcdr (external); + tem = Fcar (external); + /* Check whether we are inside string_fence-style string: */ + state->instring = (!NILP (tem) + ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE) + : -1); + + external = Fcdr (external); + tem = Fcar (external); + state->incomment = (!NILP (tem) + ? (INTEGERP (tem) ? XINT (tem) : -1) + : 0); + + external = Fcdr (external); + tem = Fcar (external); + state->quoted = !NILP (tem); - *stateptr = state; + /* if the eighth element of the list is nil, we are in comment + style a. If it is non-nil, we are in comment style b */ + external = Fcdr (external); + external = Fcdr (external); + tem = Fcar (external); + state->comstyle = (NILP (tem) + ? 0 + : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE) + ? XINT (tem) + : ST_COMMENT_STYLE)); + + external = Fcdr (external); + tem = Fcar (external); + state->comstr_start = + RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1; + external = Fcdr (external); + tem = Fcar (external); + state->levelstarts = tem; + + external = Fcdr (external); + tem = Fcar (external); + state->prev_syntax = NILP (tem) ? Smax : XINT (tem); + } } DEFUN ("parse-partial-sexp", Fparse_partial_sexp, Sparse_partial_sexp, 2, 6, 0, @@ -3488,6 +3534,7 @@ Parsing stops at TO or when certain criteria are met; point is set to where parsing stops. If fifth arg OLDSTATE is omitted or nil, parsing assumes that FROM is the beginning of a function. + Value is a list of elements describing final state of parsing: 0. depth in parens. 1. character address of start of innermost containing list; nil if none. @@ -3501,16 +3548,22 @@ Value is a list of elements describing final state of parsing: 6. the minimum paren-depth encountered during this scan. 7. style of comment, if any. 8. character address of start of comment or string; nil if not in one. - 9. Intermediate data for continuation of parsing (subject to change). + 9. List of positions of currently open parens, outermost first. +10. When the last position scanned holds the first character of a + (potential) two character construct, the syntax of that position, + otherwise nil. That construct can be a two character comment + delimiter or an Escaped or Char-quoted character. +11..... Possible further internal information used by `parse-partial-sexp'. + If third arg TARGETDEPTH is non-nil, parsing stops if the depth in parentheses becomes equal to TARGETDEPTH. -Fourth arg STOPBEFORE non-nil means stop when come to +Fourth arg STOPBEFORE non-nil means stop when we come to any character that starts a sexp. Fifth arg OLDSTATE is a list like what this function returns. It is used to initialize the state of the parse. Elements number 1, 2, 6 are ignored. -Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. - If it is symbol `syntax-table', stop after the start of a comment or a +Sixth arg COMMENTSTOP non-nil means stop after the start of a comment. + If it is the symbol `syntax-table', stop after the start of a comment or a string, or after end of a comment or a string. */) (Lisp_Object from, Lisp_Object to, Lisp_Object targetdepth, Lisp_Object stopbefore, Lisp_Object oldstate, Lisp_Object commentstop) @@ -3527,15 +3580,17 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. target = TYPE_MINIMUM (EMACS_INT); /* We won't reach this depth. */ validate_region (&from, &to); + internalize_parse_state (oldstate, &state); scan_sexps_forward (&state, XINT (from), CHAR_TO_BYTE (XINT (from)), XINT (to), - target, !NILP (stopbefore), oldstate, + target, !NILP (stopbefore), (NILP (commentstop) ? 0 : (EQ (commentstop, Qsyntax_table) ? -1 : 1))); SET_PT_BOTH (state.location, state.location_byte); - return Fcons (make_number (state.depth), + return + Fcons (make_number (state.depth), Fcons (state.prevlevelstart < 0 ? Qnil : make_number (state.prevlevelstart), Fcons (state.thislevelstart < 0 @@ -3553,11 +3608,15 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. ? Qsyntax_table : make_number (state.comstyle)) : Qnil), - Fcons (((state.incomment - || (state.instring >= 0)) - ? make_number (state.comstr_start) - : Qnil), - Fcons (state.levelstarts, Qnil)))))))))); + Fcons (((state.incomment + || (state.instring >= 0)) + ? make_number (state.comstr_start) + : Qnil), + Fcons (state.levelstarts, + Fcons (state.prev_syntax == Smax + ? Qnil + : make_number (state.prev_syntax), + Qnil))))))))))); } void -- Alan Mackenzie (Nuremberg, Germany).