From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters Date: Sun, 20 Mar 2016 09:47:58 -0400 Message-ID: References: <20160320132125.5713.70103@vcs.savannah.gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1458481706 7384 80.91.229.3 (20 Mar 2016 13:48:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 20 Mar 2016 13:48:26 +0000 (UTC) Cc: Alan Mackenzie To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Mar 20 14:48:15 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ahdiG-0007GP-J3 for ged-emacs-devel@m.gmane.org; Sun, 20 Mar 2016 14:48:12 +0100 Original-Received: from localhost ([::1]:52999 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ahdiG-0006WB-06 for ged-emacs-devel@m.gmane.org; Sun, 20 Mar 2016 09:48:12 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60026) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ahdi9-0006VV-Fi for emacs-devel@gnu.org; Sun, 20 Mar 2016 09:48:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ahdi5-0007c9-MP for emacs-devel@gnu.org; Sun, 20 Mar 2016 09:48:05 -0400 Original-Received: from pruche.dit.umontreal.ca ([132.204.246.22]:38003) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ahdi5-0007bf-F4 for emacs-devel@gnu.org; Sun, 20 Mar 2016 09:48:01 -0400 Original-Received: from fmsmemgm.homelinux.net (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id u2KDlwUG012118; Sun, 20 Mar 2016 09:47:58 -0400 Original-Received: by fmsmemgm.homelinux.net (Postfix, from userid 20848) id 3C6ECAE665; Sun, 20 Mar 2016 09:47:58 -0400 (EDT) In-Reply-To: (Alan Mackenzie's message of "Sun, 20 Mar 2016 13:21:26 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) X-NAI-Spam-Flag: NO X-NAI-Spam-Level: X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0.8 X-NAI-Spam-Rules: 4 Rules triggered TRK_NCM1_W_SHT_CLCK_HRE=0.5, TRK_NCM1=0.2, SHT_CLCK_HRE=0.1, RV5616=0 X-NAI-Spam-Version: 2.3.0.9418 : core <5616> : inlines <4536> : streams <1605986> : uri <2170571> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 132.204.246.22 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:201930 Archived-At: What was John's opinion on reusing nth 5? Stefan >>>>> "Alan" == Alan Mackenzie writes: > branch: master > commit 9dcf5998935c8aaa846d7585b81f0dcfe1935b3d > Author: Alan Mackenzie > Commit: Alan Mackenzie > Amend parse-partial-sexp correctly to handle two character comment delimiters > Do this by adding a new field to the parser state: the syntax of the last > character scanned, should that be the first char of a (potential) two char > construct, nil otherwise. > This should make the parser state complete. > Also document element 9 of the parser state. Also refactor the code a bit. > * src/syntax.c (struct lisp_parse_state): Add a new field. > (SYNTAX_FLAGS_COMSTARTEND_FIRST): New function. > (internalize_parse_state): New function, extracted from scan_sexps_forward. > (back_comment): Call internalize_parse_state. > (forw_comment): Return the syntax of the last character scanned to the caller > when that character might be the first of a two character construct. > (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment. > (scan_sexps_forward): Remove a redundant state parameter. Access all `state' > information via the address parameter `state'. Remove the code which converts > from external to internal form of `state'. Access buffer contents only from > `from' onwards. Reformulate code at the top of the main loop correctly to > recognize comment openers when starting in the middle of one. Call > forw_comment with extra argument (for return of syntax value of possible first > char of a two char construct). > (Fparse_partial_sexp): Document elements 9, 10 of the parser state in the > doc string. Clarify the doc string in general. Call > internalize_parse_state. Take account of the new elements when consing up the > output parser state. > * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new > element 10. Minor wording corrections (remove reference to "trivial > cases"). > (Low Level Parsing): Minor corrections. > * etc/NEWS: Note new element 10, and documentation of element 9 of parser > state. > --- > doc/lispref/syntax.texi | 33 +++-- > etc/NEWS | 12 ++ > src/syntax.c | 372 ++++++++++++++++++++++++++++------------------- > 3 files changed, 252 insertions(+), 165 deletions(-) > diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi > index d5a7eba..f81c164 100644 > --- a/doc/lispref/syntax.texi > +++ b/doc/lispref/syntax.texi > @@ -791,10 +791,10 @@ Hooks}). > @subsection Parser State > @cindex parser state > - A @dfn{parser state} is a list of ten elements describing the state > -of the syntactic parser, after it parses the text between a specified > -starting point and a specified end point in the buffer. Parsing > -functions such as @code{syntax-ppss} > + A @dfn{parser state} is a list of (currently) eleven elements > +describing the state of the syntactic parser, after it parses the text > +between a specified starting point and a specified end point in the > +buffer. Parsing functions such as @code{syntax-ppss} > @ifnottex > (@pxref{Position Parse}) > @end ifnottex > @@ -851,15 +851,20 @@ position where the string began. When outside of strings and comments, > this element is @code{nil}. > @item > -Internal data for continuing the parsing. The meaning of this > -data is subject to change; it is used if you pass this list > -as the @var{state} argument to another call. > +The list of the positions of the currently open parentheses, starting > +with the outermost. > + > +@item > +When the last buffer position scanned was the (potential) first > +character of a two character construct (comment delimiter or > +escaped/char-quoted character pair), the @var{syntax-code} > +(@pxref{Syntax Table Internals}) of that position. Otherwise > +@code{nil}. > @end enumerate > Elements 1, 2, and 6 are ignored in a state which you pass as an > -argument to continue parsing, and elements 8 and 9 are used only in > -trivial cases. Those elements are mainly used internally by the > -parser code. > +argument to continue parsing. Elements 9 and 10 are mainly used > +internally by the parser code. > One additional piece of useful information is available from a > parser state using this function: > @@ -898,11 +903,11 @@ The depth starts at 0, or at whatever is given in @var{state}. > If the fourth argument @var{stop-before} is non-@code{nil}, parsing > stops when it comes to any character that starts a sexp. If > -@var{stop-comment} is non-@code{nil}, parsing stops when it comes to the > -start of an unnested comment. If @var{stop-comment} is the symbol > +@var{stop-comment} is non-@code{nil}, parsing stops after the start of > +an unnested comment. If @var{stop-comment} is the symbol > @code{syntax-table}, parsing stops after the start of an unnested > -comment or a string, or the end of an unnested comment or a string, > -whichever comes first. > +comment or a string, or after the end of an unnested comment or a > +string, whichever comes first. > If @var{state} is @code{nil}, @var{start} is assumed to be at the top > level of parenthesis structure, such as the beginning of a function > diff --git a/etc/NEWS b/etc/NEWS > index d963dee..ea32153 100644 > --- a/etc/NEWS > +++ b/etc/NEWS > @@ -175,6 +175,18 @@ a new window when opening man pages when there's already one, use > (inhibit-same-window . nil) > (mode . Man-mode)))) > ++++ > +** `parse-partial-sexp' state has a new element. Element 10 is > +non-nil when the last character scanned might be the first character > +of a two character construct, i.e. a comment delimiter or escaped > +character. Its value is the syntax of that last character. > + > ++++ > +** `parse-partial-sexp''s state, element 9, has now been confirmed as > +permanent and documented, and may be used by Lisp programs. Its value > +is a list of currently open parenthesis positions, starting with the > +outermost parenthesis. > + > > * Changes in Emacs 25.2 on Non-Free Operating Systems > diff --git a/src/syntax.c b/src/syntax.c > index fdcfdfc..ffe0ea5 100644 > --- a/src/syntax.c > +++ b/src/syntax.c > @@ -81,6 +81,11 @@ SYNTAX_FLAGS_COMEND_SECOND (int flags) > return (flags >> 19) & 1; > } > static bool > +SYNTAX_FLAGS_COMSTARTEND_FIRST (int flags) > +{ > + return (flags & 0x50000) != 0; > +} > +static bool > SYNTAX_FLAGS_PREFIX (int flags) > { > return (flags >> 20) & 1; > @@ -153,6 +158,10 @@ struct lisp_parse_state > ptrdiff_t comstr_start; /* Position of last comment/string starter. */ > Lisp_Object levelstarts; /* Char numbers of starts-of-expression > of levels (starting from outermost). */ > + int prev_syntax; /* Syntax of previous position scanned, when > + that position (potentially) holds the first char > + of a 2-char construct, i.e. comment delimiter > + or Sescape, etc. Smax otherwise. */ > }; > > /* These variables are a cache for finding the start of a defun. > @@ -176,7 +185,8 @@ static Lisp_Object skip_syntaxes (bool, Lisp_Object, Lisp_Object); > static Lisp_Object scan_lists (EMACS_INT, EMACS_INT, EMACS_INT, bool); > static void scan_sexps_forward (struct lisp_parse_state *, > ptrdiff_t, ptrdiff_t, ptrdiff_t, EMACS_INT, > - bool, Lisp_Object, int); > + bool, int); > +static void internalize_parse_state (Lisp_Object, struct lisp_parse_state *); > static bool in_classes (int, Lisp_Object); > static void parse_sexp_propertize (ptrdiff_t charpos); > @@ -911,10 +921,11 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > } > do > { > + internalize_parse_state (Qnil, &state); > scan_sexps_forward (&state, > defun_start, defun_start_byte, > comment_end, TYPE_MINIMUM (EMACS_INT), > - 0, Qnil, 0); > + 0, 0); > defun_start = comment_end; > if (!adjusted) > { > @@ -2310,11 +2321,15 @@ in_classes (int c, Lisp_Object iso_classes) > PREV_SYNTAX is the SYNTAX_WITH_FLAGS of the previous character > (or 0 If the search cannot start in the middle of a two-character). > - If successful, return true and store the charpos of the comment's end > - into *CHARPOS_PTR and the corresponding bytepos into *BYTEPOS_PTR. > - Else, return false and store the charpos STOP into *CHARPOS_PTR, the > - corresponding bytepos into *BYTEPOS_PTR and the current nesting > - (as defined for state.incomment) in *INCOMMENT_PTR. > + If successful, return true and store the charpos of the comment's > + end into *CHARPOS_PTR and the corresponding bytepos into > + *BYTEPOS_PTR. Else, return false and store the charpos STOP into > + *CHARPOS_PTR, the corresponding bytepos into *BYTEPOS_PTR and the > + current nesting (as defined for state->incomment) in > + *INCOMMENT_PTR. Should the last character scanned in an incomplete > + comment be a possible first character of a two character construct, > + we store its SYNTAX_WITH_FLAGS into *last_syntax_ptr. Otherwise, > + we store Smax into *last_syntax_ptr. > The comment end is the last character of the comment rather than the > character just after the comment. > @@ -2326,7 +2341,7 @@ static bool > forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > EMACS_INT nesting, int style, int prev_syntax, > ptrdiff_t *charpos_ptr, ptrdiff_t *bytepos_ptr, > - EMACS_INT *incomment_ptr) > + EMACS_INT *incomment_ptr, int *last_syntax_ptr) > { > register int c, c1; > register enum syntaxcode code; > @@ -2337,7 +2352,8 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > /* Enter the loop in the middle so that we find > a 2-char comment ender if we start in the middle of it. */ > syntax = prev_syntax; > - if (syntax != 0) goto forw_incomment; > + code = syntax & 0xff; > + if (syntax != 0 && from < stop) goto forw_incomment; > while (1) > { > @@ -2346,6 +2362,12 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > *incomment_ptr = nesting; > *charpos_ptr = from; > *bytepos_ptr = from_byte; > + *last_syntax_ptr = > + (code == Sescape || code == Scharquote > + || SYNTAX_FLAGS_COMEND_FIRST (syntax) > + || (nesting > 0 > + && SYNTAX_FLAGS_COMSTART_FIRST (syntax))) > + ? syntax : Smax ; > return 0; > } > c = FETCH_CHAR_AS_MULTIBYTE (from_byte); > @@ -2386,7 +2408,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > SYNTAX_FLAGS_COMMENT_NESTED (other_syntax)) > ? nesting > 0 : nesting < 0)) > { > - if (--nesting <= 0) > + syntax = Smax; /* So that "|#" (lisp) can not return > + the syntax of "#" in *last_syntax_ptr. */ > + if (--nesting <= 0) > /* We have encountered a comment end of the same style > as the comment sequence which began this comment section. */ > break; > @@ -2408,6 +2432,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > /* We have encountered a nested comment of the same style > as the comment sequence which began this comment section. */ > { > + syntax = Smax; /* So that "#|#" isn't also a comment ender. */ > INC_BOTH (from, from_byte); > UPDATE_SYNTAX_TABLE_FORWARD (from); > nesting++; > @@ -2415,6 +2440,8 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > } > *charpos_ptr = from; > *bytepos_ptr = from_byte; > + *last_syntax_ptr = Smax; /* Any syntactic power the last byte had is > + used up. */ > return 1; > } > @@ -2436,6 +2463,7 @@ between them, return t; otherwise return nil. */) > EMACS_INT count1; > ptrdiff_t out_charpos, out_bytepos; > EMACS_INT dummy; > + int dummy2; > CHECK_NUMBER (count); > count1 = XINT (count); > @@ -2499,7 +2527,7 @@ between them, return t; otherwise return nil. */) > } > /* We're at the start of a comment. */ > found = forw_comment (from, from_byte, stop, comnested, comstyle, 0, > - &out_charpos, &out_bytepos, &dummy); > + &out_charpos, &out_bytepos, &dummy, &dummy2); > from = out_charpos; from_byte = out_bytepos; > if (!found) > { > @@ -2659,6 +2687,7 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag) > ptrdiff_t from_byte; > ptrdiff_t out_bytepos, out_charpos; > EMACS_INT dummy; > + int dummy2; > bool multibyte_symbol_p = sexpflag && multibyte_syntax_as_symbol; > if (depth > 0) min_depth = 0; > @@ -2755,7 +2784,8 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag) > UPDATE_SYNTAX_TABLE_FORWARD (from); > found = forw_comment (from, from_byte, stop, > comnested, comstyle, 0, > - &out_charpos, &out_bytepos, &dummy); > + &out_charpos, &out_bytepos, &dummy, > + &dummy2); > from = out_charpos, from_byte = out_bytepos; > if (!found) > { > @@ -3119,7 +3149,7 @@ the prefix syntax flag (p). */) > } > > /* Parse forward from FROM / FROM_BYTE to END, > - assuming that FROM has state OLDSTATE (nil means FROM is start of function), > + assuming that FROM has state STATE, > and return a description of the state of the parse at END. > If STOPBEFORE, stop at the start of an atom. > If COMMENTSTOP is 1, stop at the start of a comment. > @@ -3127,12 +3157,11 @@ the prefix syntax flag (p). */) > after the beginning of a string, or after the end of a string. */ > static void > -scan_sexps_forward (struct lisp_parse_state *stateptr, > +scan_sexps_forward (struct lisp_parse_state *state, > ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t end, > EMACS_INT targetdepth, bool stopbefore, > - Lisp_Object oldstate, int commentstop) > + int commentstop) > { > - struct lisp_parse_state state; > enum syntaxcode code; > int c1; > bool comnested; > @@ -3148,7 +3177,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr, > Lisp_Object tem; > ptrdiff_t prev_from; /* Keep one character before FROM. */ > ptrdiff_t prev_from_byte; > - int prev_from_syntax; > + int prev_from_syntax, prev_prev_from_syntax; > bool boundary_stop = commentstop == -1; > bool nofence; > bool found; > @@ -3165,6 +3194,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr, > do { prev_from = from; \ > prev_from_byte = from_byte; \ > temp = FETCH_CHAR_AS_MULTIBYTE (prev_from_byte); \ > + prev_prev_from_syntax = prev_from_syntax; \ > prev_from_syntax = SYNTAX_WITH_FLAGS (temp); \ > INC_BOTH (from, from_byte); \ > if (from < end) \ > @@ -3174,88 +3204,38 @@ do { prev_from = from; \ > immediate_quit = 1; > QUIT; > - if (NILP (oldstate)) > - { > - depth = 0; > - state.instring = -1; > - state.incomment = 0; > - state.comstyle = 0; /* comment style a by default. */ > - state.comstr_start = -1; /* no comment/string seen. */ > - } > - else > - { > - tem = Fcar (oldstate); > - if (!NILP (tem)) > - depth = XINT (tem); > - else > - depth = 0; > - > - oldstate = Fcdr (oldstate); > - oldstate = Fcdr (oldstate); > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - /* Check whether we are inside string_fence-style string: */ > - state.instring = (!NILP (tem) > - ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE) > - : -1); > - > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - state.incomment = (!NILP (tem) > - ? (INTEGERP (tem) ? XINT (tem) : -1) > - : 0); > - > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - start_quoted = !NILP (tem); > + depth = state->depth; > + start_quoted = state->quoted; > + prev_prev_from_syntax = Smax; > + prev_from_syntax = state->prev_syntax; > - /* if the eighth element of the list is nil, we are in comment > - style a. If it is non-nil, we are in comment style b */ > - oldstate = Fcdr (oldstate); > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - state.comstyle = (NILP (tem) > - ? 0 > - : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE) > - ? XINT (tem) > - : ST_COMMENT_STYLE)); > - > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - state.comstr_start = > - RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1; > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - while (!NILP (tem)) /* >= second enclosing sexps. */ > - { > - Lisp_Object temhd = Fcar (tem); > - if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX)) > - curlevel->last = XINT (temhd); > - if (++curlevel == endlevel) > - curlevel--; /* error ("Nesting too deep for parser"); */ > - curlevel->prev = -1; > - curlevel->last = -1; > - tem = Fcdr (tem); > - } > + tem = state->levelstarts; > + while (!NILP (tem)) /* >= second enclosing sexps. */ > + { > + Lisp_Object temhd = Fcar (tem); > + if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX)) > + curlevel->last = XINT (temhd); > + if (++curlevel == endlevel) > + curlevel--; /* error ("Nesting too deep for parser"); */ > + curlevel->prev = -1; > + curlevel->last = -1; > + tem = Fcdr (tem); > } > - state.quoted = 0; > - mindepth = depth; > - curlevel-> prev = -1; curlevel-> last = -1; > - SETUP_SYNTAX_TABLE (prev_from, 1); > - temp = FETCH_CHAR (prev_from_byte); > - prev_from_syntax = SYNTAX_WITH_FLAGS (temp); > - UPDATE_SYNTAX_TABLE_FORWARD (from); > + state->quoted = 0; > + mindepth = depth; > + > + SETUP_SYNTAX_TABLE (from, 1); > /* Enter the loop at a place appropriate for initial state. */ > - if (state.incomment) > + if (state->incomment) > goto startincomment; > - if (state.instring >= 0) > + if (state->instring >= 0) > { > - nofence = state.instring != ST_STRING_STYLE; > + nofence = state->instring != ST_STRING_STYLE; > if (start_quoted) > goto startquotedinstring; > goto startinstring; > @@ -3266,11 +3246,8 @@ do { prev_from = from; \ > while (from < end) > { > int syntax; > - INC_FROM; > - code = prev_from_syntax & 0xff; > - if (from < end > - && SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax) > + if (SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax) > && (c1 = FETCH_CHAR (from_byte), > syntax = SYNTAX_WITH_FLAGS (c1), > SYNTAX_FLAGS_COMSTART_SECOND (syntax))) > @@ -3280,32 +3257,39 @@ do { prev_from = from; \ > /* Record the comment style we have entered so that only > the comment-end sequence of the same style actually > terminates the comment section. */ > - state.comstyle > + state->comstyle > = SYNTAX_FLAGS_COMMENT_STYLE (syntax, prev_from_syntax); > comnested = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) > | SYNTAX_FLAGS_COMMENT_NESTED (syntax)); > - state.incomment = comnested ? 1 : -1; > - state.comstr_start = prev_from; > + state->incomment = comnested ? 1 : -1; > + state->comstr_start = prev_from; > INC_FROM; > + prev_from_syntax = Smax; /* the syntax has already been > + "used up". */ > code = Scomment; > } > - else if (code == Scomment_fence) > - { > - /* Record the comment style we have entered so that only > - the comment-end sequence of the same style actually > - terminates the comment section. */ > - state.comstyle = ST_COMMENT_STYLE; > - state.incomment = -1; > - state.comstr_start = prev_from; > - code = Scomment; > - } > - else if (code == Scomment) > - { > - state.comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0); > - state.incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ? > - 1 : -1); > - state.comstr_start = prev_from; > - } > + else > + { > + INC_FROM; > + code = prev_from_syntax & 0xff; > + if (code == Scomment_fence) > + { > + /* Record the comment style we have entered so that only > + the comment-end sequence of the same style actually > + terminates the comment section. */ > + state->comstyle = ST_COMMENT_STYLE; > + state->incomment = -1; > + state->comstr_start = prev_from; > + code = Scomment; > + } > + else if (code == Scomment) > + { > + state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0); > + state->incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ? > + 1 : -1); > + state->comstr_start = prev_from; > + } > + } > if (SYNTAX_FLAGS_PREFIX (prev_from_syntax)) > continue; > @@ -3350,26 +3334,28 @@ do { prev_from = from; \ > case Scomment_fence: /* Can't happen because it's handled above. */ > case Scomment: > - if (commentstop || boundary_stop) goto done; > + if (commentstop || boundary_stop) goto done; > startincomment: > /* The (from == BEGV) test was to enter the loop in the middle so > that we find a 2-char comment ender even if we start in the > middle of it. We don't want to do that if we're just at the > beginning of the comment (think of (*) ... (*)). */ > found = forw_comment (from, from_byte, end, > - state.incomment, state.comstyle, > - (from == BEGV || from < state.comstr_start + 3) > - ? 0 : prev_from_syntax, > - &out_charpos, &out_bytepos, &state.incomment); > + state->incomment, state->comstyle, > + from == BEGV ? 0 : prev_from_syntax, > + &out_charpos, &out_bytepos, &state->incomment, > + &prev_from_syntax); > from = out_charpos; from_byte = out_bytepos; > - /* Beware! prev_from and friends are invalid now. > - Luckily, the `done' doesn't use them and the INC_FROM > - sets them to a sane value without looking at them. */ > + /* Beware! prev_from and friends (except prev_from_syntax) > + are invalid now. Luckily, the `done' doesn't use them > + and the INC_FROM sets them to a sane value without > + looking at them. */ > if (!found) goto done; > INC_FROM; > - state.incomment = 0; > - state.comstyle = 0; /* reset the comment style */ > - if (boundary_stop) goto done; > + state->incomment = 0; > + state->comstyle = 0; /* reset the comment style */ > + prev_from_syntax = Smax; /* For the comment closer */ > + if (boundary_stop) goto done; > break; > case Sopen: > @@ -3396,16 +3382,16 @@ do { prev_from = from; \ > case Sstring: > case Sstring_fence: > - state.comstr_start = from - 1; > + state->comstr_start = from - 1; > if (stopbefore) goto stop; /* this arg means stop at sexp start */ curlevel-> last = prev_from; > - state.instring = (code == Sstring > + state->instring = (code == Sstring > ? (FETCH_CHAR_AS_MULTIBYTE (prev_from_byte)) > : ST_STRING_STYLE); > if (boundary_stop) goto done; > startinstring: > { > - nofence = state.instring != ST_STRING_STYLE; > + nofence = state->instring != ST_STRING_STYLE; > while (1) > { > @@ -3419,7 +3405,7 @@ do { prev_from = from; \ > /* Check C_CODE here so that if the char has > a syntax-table property which says it is NOT > a string character, it does not end the string. */ > - if (nofence && c == state.instring && c_code == Sstring) > + if (nofence && c == state->instring && c_code == Sstring) > break; > switch (c_code) > @@ -3442,7 +3428,7 @@ do { prev_from = from; \ > } > } > string_end: > - state.instring = -1; > + state->instring = -1; curlevel-> prev = curlevel->last; > INC_FROM; > if (boundary_stop) goto done; > @@ -3461,25 +3447,96 @@ do { prev_from = from; \ > stop: /* Here if stopping before start of sexp. */ > from = prev_from; /* We have just fetched the char that starts it; */ > from_byte = prev_from_byte; > + prev_from_syntax = prev_prev_from_syntax; > goto done; /* but return the position before it. */ > endquoted: > - state.quoted = 1; > + state->quoted = 1; > done: > - state.depth = depth; > - state.mindepth = mindepth; > - state.thislevelstart = curlevel->prev; > - state.prevlevelstart > + state->depth = depth; > + state->mindepth = mindepth; > + state->thislevelstart = curlevel->prev; > + state->prevlevelstart > = (curlevel == levelstart) ? -1 : (curlevel - 1)->last; > - state.location = from; > - state.location_byte = from_byte; > - state.levelstarts = Qnil; > + state->location = from; > + state->location_byte = from_byte; > + state->levelstarts = Qnil; > while (curlevel > levelstart) > - state.levelstarts = Fcons (make_number ((--curlevel)->last), > - state.levelstarts); > + state->levelstarts = Fcons (make_number ((--curlevel)->last), > + state->levelstarts); > + state->prev_syntax = (SYNTAX_FLAGS_COMSTARTEND_FIRST (prev_from_syntax) > + || state->quoted) ? prev_from_syntax : Smax; > immediate_quit = 0; > +} > + > +/* Convert a (lisp) parse state to the internal form used in > + scan_sexps_forward. */ > +static void > +internalize_parse_state (Lisp_Object external, struct lisp_parse_state *state) > +{ > + Lisp_Object tem; > + > + if (NILP (external)) > + { > + state->depth = 0; > + state->instring = -1; > + state->incomment = 0; > + state->quoted = 0; > + state->comstyle = 0; /* comment style a by default. */ > + state->comstr_start = -1; /* no comment/string seen. */ > + state->levelstarts = Qnil; > + state->prev_syntax = Smax; > + } > + else > + { > + tem = Fcar (external); > + if (!NILP (tem)) > + state->depth = XINT (tem); > + else > + state->depth = 0; > + > + external = Fcdr (external); > + external = Fcdr (external); > + external = Fcdr (external); > + tem = Fcar (external); > + /* Check whether we are inside string_fence-style string: */ > + state->instring = (!NILP (tem) > + ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE) > + : -1); > + > + external = Fcdr (external); > + tem = Fcar (external); > + state->incomment = (!NILP (tem) > + ? (INTEGERP (tem) ? XINT (tem) : -1) > + : 0); > + > + external = Fcdr (external); > + tem = Fcar (external); > + state->quoted = !NILP (tem); > - *stateptr = state; > + /* if the eighth element of the list is nil, we are in comment > + style a. If it is non-nil, we are in comment style b */ > + external = Fcdr (external); > + external = Fcdr (external); > + tem = Fcar (external); > + state->comstyle = (NILP (tem) > + ? 0 > + : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE) > + ? XINT (tem) > + : ST_COMMENT_STYLE)); > + > + external = Fcdr (external); > + tem = Fcar (external); > + state->comstr_start = > + RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1; > + external = Fcdr (external); > + tem = Fcar (external); > + state->levelstarts = tem; > + > + external = Fcdr (external); > + tem = Fcar (external); > + state->prev_syntax = NILP (tem) ? Smax : XINT (tem); > + } > } > DEFUN ("parse-partial-sexp", Fparse_partial_sexp, Sparse_partial_sexp, 2, 6, 0, > @@ -3488,6 +3545,7 @@ Parsing stops at TO or when certain criteria are met; > point is set to where parsing stops. > If fifth arg OLDSTATE is omitted or nil, > parsing assumes that FROM is the beginning of a function. > + > Value is a list of elements describing final state of parsing: > 0. depth in parens. > 1. character address of start of innermost containing list; nil if none. > @@ -3501,16 +3559,22 @@ Value is a list of elements describing final state of parsing: > 6. the minimum paren-depth encountered during this scan. > 7. style of comment, if any. > 8. character address of start of comment or string; nil if not in one. > - 9. Intermediate data for continuation of parsing (subject to change). > + 9. List of positions of currently open parens, outermost first. > +10. When the last position scanned holds the first character of a > + (potential) two character construct, the syntax of that position, > + otherwise nil. That construct can be a two character comment > + delimiter or an Escaped or Char-quoted character. > +11..... Possible further internal information used by `parse-partial-sexp'. > + > If third arg TARGETDEPTH is non-nil, parsing stops if the depth > in parentheses becomes equal to TARGETDEPTH. > -Fourth arg STOPBEFORE non-nil means stop when come to > +Fourth arg STOPBEFORE non-nil means stop when we come to > any character that starts a sexp. > Fifth arg OLDSTATE is a list like what this function returns. > It is used to initialize the state of the parse. Elements number 1, 2, 6 > are ignored. > -Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. > - If it is symbol `syntax-table', stop after the start of a comment or a > +Sixth arg COMMENTSTOP non-nil means stop after the start of a comment. > + If it is the symbol `syntax-table', stop after the start of a comment or a > string, or after end of a comment or a string. */) > (Lisp_Object from, Lisp_Object to, Lisp_Object targetdepth, > Lisp_Object stopbefore, Lisp_Object oldstate, Lisp_Object commentstop) > @@ -3527,15 +3591,17 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. > target = TYPE_MINIMUM (EMACS_INT); /* We won't reach this depth. */ > validate_region (&from, &to); > + internalize_parse_state (oldstate, &state); > scan_sexps_forward (&state, XINT (from), CHAR_TO_BYTE (XINT (from)), > XINT (to), > - target, !NILP (stopbefore), oldstate, > + target, !NILP (stopbefore), > (NILP (commentstop) > ? 0 : (EQ (commentstop, Qsyntax_table) ? -1 : 1))); > SET_PT_BOTH (state.location, state.location_byte); > - return Fcons (make_number (state.depth), > + return > + Fcons (make_number (state.depth), > Fcons (state.prevlevelstart < 0 > ? Qnil : make_number (state.prevlevelstart), > Fcons (state.thislevelstart < 0 > @@ -3553,11 +3619,15 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. > ? Qsyntax_table > : make_number (state.comstyle)) > : Qnil), > - Fcons (((state.incomment > - || (state.instring >= 0)) > - ? make_number (state.comstr_start) > - : Qnil), > - Fcons (state.levelstarts, Qnil)))))))))); > + Fcons (((state.incomment > + || (state.instring >= 0)) > + ? make_number (state.comstr_start) > + : Qnil), > + Fcons (state.levelstarts, > + Fcons (state.prev_syntax == Smax > + ? Qnil > + : make_number (state.prev_syntax), > + Qnil))))))))))); > } > > void > _______________________________________________ > Emacs-diffs mailing list > Emacs-diffs@gnu.org > https://lists.gnu.org/mailman/listinfo/emacs-diffs