* Re: [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters
[not found] ` <E1ahdIL-0001Ul-Uz@vcs.savannah.gnu.org>
@ 2016-03-20 13:47 ` Stefan Monnier
2016-03-20 14:22 ` Alan Mackenzie
2016-03-20 14:40 ` Alan Mackenzie
0 siblings, 2 replies; 4+ messages in thread
From: Stefan Monnier @ 2016-03-20 13:47 UTC (permalink / raw)
To: emacs-devel; +Cc: Alan Mackenzie
What was John's opinion on reusing nth 5?
Stefan
>>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes:
> branch: master
> commit 9dcf5998935c8aaa846d7585b81f0dcfe1935b3d
> Author: Alan Mackenzie <acm@muc.de>
> Commit: Alan Mackenzie <acm@muc.de>
> Amend parse-partial-sexp correctly to handle two character comment delimiters
> Do this by adding a new field to the parser state: the syntax of the last
> character scanned, should that be the first char of a (potential) two char
> construct, nil otherwise.
> This should make the parser state complete.
> Also document element 9 of the parser state. Also refactor the code a bit.
> * src/syntax.c (struct lisp_parse_state): Add a new field.
> (SYNTAX_FLAGS_COMSTARTEND_FIRST): New function.
> (internalize_parse_state): New function, extracted from scan_sexps_forward.
> (back_comment): Call internalize_parse_state.
> (forw_comment): Return the syntax of the last character scanned to the caller
> when that character might be the first of a two character construct.
> (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment.
> (scan_sexps_forward): Remove a redundant state parameter. Access all `state'
> information via the address parameter `state'. Remove the code which converts
> from external to internal form of `state'. Access buffer contents only from
> `from' onwards. Reformulate code at the top of the main loop correctly to
> recognize comment openers when starting in the middle of one. Call
> forw_comment with extra argument (for return of syntax value of possible first
> char of a two char construct).
> (Fparse_partial_sexp): Document elements 9, 10 of the parser state in the
> doc string. Clarify the doc string in general. Call
> internalize_parse_state. Take account of the new elements when consing up the
> output parser state.
> * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new
> element 10. Minor wording corrections (remove reference to "trivial
> cases").
> (Low Level Parsing): Minor corrections.
> * etc/NEWS: Note new element 10, and documentation of element 9 of parser
> state.
> ---
> doc/lispref/syntax.texi | 33 +++--
> etc/NEWS | 12 ++
> src/syntax.c | 372 ++++++++++++++++++++++++++++-------------------
> 3 files changed, 252 insertions(+), 165 deletions(-)
> diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi
> index d5a7eba..f81c164 100644
> --- a/doc/lispref/syntax.texi
> +++ b/doc/lispref/syntax.texi
> @@ -791,10 +791,10 @@ Hooks}).
> @subsection Parser State
> @cindex parser state
> - A @dfn{parser state} is a list of ten elements describing the state
> -of the syntactic parser, after it parses the text between a specified
> -starting point and a specified end point in the buffer. Parsing
> -functions such as @code{syntax-ppss}
> + A @dfn{parser state} is a list of (currently) eleven elements
> +describing the state of the syntactic parser, after it parses the text
> +between a specified starting point and a specified end point in the
> +buffer. Parsing functions such as @code{syntax-ppss}
> @ifnottex
> (@pxref{Position Parse})
> @end ifnottex
> @@ -851,15 +851,20 @@ position where the string began. When outside of strings and comments,
> this element is @code{nil}.
> @item
> -Internal data for continuing the parsing. The meaning of this
> -data is subject to change; it is used if you pass this list
> -as the @var{state} argument to another call.
> +The list of the positions of the currently open parentheses, starting
> +with the outermost.
> +
> +@item
> +When the last buffer position scanned was the (potential) first
> +character of a two character construct (comment delimiter or
> +escaped/char-quoted character pair), the @var{syntax-code}
> +(@pxref{Syntax Table Internals}) of that position. Otherwise
> +@code{nil}.
> @end enumerate
> Elements 1, 2, and 6 are ignored in a state which you pass as an
> -argument to continue parsing, and elements 8 and 9 are used only in
> -trivial cases. Those elements are mainly used internally by the
> -parser code.
> +argument to continue parsing. Elements 9 and 10 are mainly used
> +internally by the parser code.
> One additional piece of useful information is available from a
> parser state using this function:
> @@ -898,11 +903,11 @@ The depth starts at 0, or at whatever is given in @var{state}.
> If the fourth argument @var{stop-before} is non-@code{nil}, parsing
> stops when it comes to any character that starts a sexp. If
> -@var{stop-comment} is non-@code{nil}, parsing stops when it comes to the
> -start of an unnested comment. If @var{stop-comment} is the symbol
> +@var{stop-comment} is non-@code{nil}, parsing stops after the start of
> +an unnested comment. If @var{stop-comment} is the symbol
> @code{syntax-table}, parsing stops after the start of an unnested
> -comment or a string, or the end of an unnested comment or a string,
> -whichever comes first.
> +comment or a string, or after the end of an unnested comment or a
> +string, whichever comes first.
> If @var{state} is @code{nil}, @var{start} is assumed to be at the top
> level of parenthesis structure, such as the beginning of a function
> diff --git a/etc/NEWS b/etc/NEWS
> index d963dee..ea32153 100644
> --- a/etc/NEWS
> +++ b/etc/NEWS
> @@ -175,6 +175,18 @@ a new window when opening man pages when there's already one, use
> (inhibit-same-window . nil)
> (mode . Man-mode))))
> ++++
> +** `parse-partial-sexp' state has a new element. Element 10 is
> +non-nil when the last character scanned might be the first character
> +of a two character construct, i.e. a comment delimiter or escaped
> +character. Its value is the syntax of that last character.
> +
> ++++
> +** `parse-partial-sexp''s state, element 9, has now been confirmed as
> +permanent and documented, and may be used by Lisp programs. Its value
> +is a list of currently open parenthesis positions, starting with the
> +outermost parenthesis.
> +
> \f
> * Changes in Emacs 25.2 on Non-Free Operating Systems
> diff --git a/src/syntax.c b/src/syntax.c
> index fdcfdfc..ffe0ea5 100644
> --- a/src/syntax.c
> +++ b/src/syntax.c
> @@ -81,6 +81,11 @@ SYNTAX_FLAGS_COMEND_SECOND (int flags)
> return (flags >> 19) & 1;
> }
> static bool
> +SYNTAX_FLAGS_COMSTARTEND_FIRST (int flags)
> +{
> + return (flags & 0x50000) != 0;
> +}
> +static bool
> SYNTAX_FLAGS_PREFIX (int flags)
> {
> return (flags >> 20) & 1;
> @@ -153,6 +158,10 @@ struct lisp_parse_state
> ptrdiff_t comstr_start; /* Position of last comment/string starter. */
> Lisp_Object levelstarts; /* Char numbers of starts-of-expression
> of levels (starting from outermost). */
> + int prev_syntax; /* Syntax of previous position scanned, when
> + that position (potentially) holds the first char
> + of a 2-char construct, i.e. comment delimiter
> + or Sescape, etc. Smax otherwise. */
> };
> \f
> /* These variables are a cache for finding the start of a defun.
> @@ -176,7 +185,8 @@ static Lisp_Object skip_syntaxes (bool, Lisp_Object, Lisp_Object);
> static Lisp_Object scan_lists (EMACS_INT, EMACS_INT, EMACS_INT, bool);
> static void scan_sexps_forward (struct lisp_parse_state *,
> ptrdiff_t, ptrdiff_t, ptrdiff_t, EMACS_INT,
> - bool, Lisp_Object, int);
> + bool, int);
> +static void internalize_parse_state (Lisp_Object, struct lisp_parse_state *);
> static bool in_classes (int, Lisp_Object);
> static void parse_sexp_propertize (ptrdiff_t charpos);
> @@ -911,10 +921,11 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
> }
> do
> {
> + internalize_parse_state (Qnil, &state);
> scan_sexps_forward (&state,
> defun_start, defun_start_byte,
> comment_end, TYPE_MINIMUM (EMACS_INT),
> - 0, Qnil, 0);
> + 0, 0);
> defun_start = comment_end;
> if (!adjusted)
> {
> @@ -2310,11 +2321,15 @@ in_classes (int c, Lisp_Object iso_classes)
> PREV_SYNTAX is the SYNTAX_WITH_FLAGS of the previous character
> (or 0 If the search cannot start in the middle of a two-character).
> - If successful, return true and store the charpos of the comment's end
> - into *CHARPOS_PTR and the corresponding bytepos into *BYTEPOS_PTR.
> - Else, return false and store the charpos STOP into *CHARPOS_PTR, the
> - corresponding bytepos into *BYTEPOS_PTR and the current nesting
> - (as defined for state.incomment) in *INCOMMENT_PTR.
> + If successful, return true and store the charpos of the comment's
> + end into *CHARPOS_PTR and the corresponding bytepos into
> + *BYTEPOS_PTR. Else, return false and store the charpos STOP into
> + *CHARPOS_PTR, the corresponding bytepos into *BYTEPOS_PTR and the
> + current nesting (as defined for state->incomment) in
> + *INCOMMENT_PTR. Should the last character scanned in an incomplete
> + comment be a possible first character of a two character construct,
> + we store its SYNTAX_WITH_FLAGS into *last_syntax_ptr. Otherwise,
> + we store Smax into *last_syntax_ptr.
> The comment end is the last character of the comment rather than the
> character just after the comment.
> @@ -2326,7 +2341,7 @@ static bool
> forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
> EMACS_INT nesting, int style, int prev_syntax,
> ptrdiff_t *charpos_ptr, ptrdiff_t *bytepos_ptr,
> - EMACS_INT *incomment_ptr)
> + EMACS_INT *incomment_ptr, int *last_syntax_ptr)
> {
> register int c, c1;
> register enum syntaxcode code;
> @@ -2337,7 +2352,8 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
> /* Enter the loop in the middle so that we find
> a 2-char comment ender if we start in the middle of it. */
> syntax = prev_syntax;
> - if (syntax != 0) goto forw_incomment;
> + code = syntax & 0xff;
> + if (syntax != 0 && from < stop) goto forw_incomment;
> while (1)
> {
> @@ -2346,6 +2362,12 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
> *incomment_ptr = nesting;
> *charpos_ptr = from;
> *bytepos_ptr = from_byte;
> + *last_syntax_ptr =
> + (code == Sescape || code == Scharquote
> + || SYNTAX_FLAGS_COMEND_FIRST (syntax)
> + || (nesting > 0
> + && SYNTAX_FLAGS_COMSTART_FIRST (syntax)))
> + ? syntax : Smax ;
> return 0;
> }
> c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
> @@ -2386,7 +2408,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
> SYNTAX_FLAGS_COMMENT_NESTED (other_syntax))
> ? nesting > 0 : nesting < 0))
> {
> - if (--nesting <= 0)
> + syntax = Smax; /* So that "|#" (lisp) can not return
> + the syntax of "#" in *last_syntax_ptr. */
> + if (--nesting <= 0)
> /* We have encountered a comment end of the same style
> as the comment sequence which began this comment section. */
> break;
> @@ -2408,6 +2432,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
> /* We have encountered a nested comment of the same style
> as the comment sequence which began this comment section. */
> {
> + syntax = Smax; /* So that "#|#" isn't also a comment ender. */
> INC_BOTH (from, from_byte);
> UPDATE_SYNTAX_TABLE_FORWARD (from);
> nesting++;
> @@ -2415,6 +2440,8 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
> }
> *charpos_ptr = from;
> *bytepos_ptr = from_byte;
> + *last_syntax_ptr = Smax; /* Any syntactic power the last byte had is
> + used up. */
> return 1;
> }
> @@ -2436,6 +2463,7 @@ between them, return t; otherwise return nil. */)
> EMACS_INT count1;
> ptrdiff_t out_charpos, out_bytepos;
> EMACS_INT dummy;
> + int dummy2;
> CHECK_NUMBER (count);
> count1 = XINT (count);
> @@ -2499,7 +2527,7 @@ between them, return t; otherwise return nil. */)
> }
> /* We're at the start of a comment. */
> found = forw_comment (from, from_byte, stop, comnested, comstyle, 0,
> - &out_charpos, &out_bytepos, &dummy);
> + &out_charpos, &out_bytepos, &dummy, &dummy2);
> from = out_charpos; from_byte = out_bytepos;
> if (!found)
> {
> @@ -2659,6 +2687,7 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag)
> ptrdiff_t from_byte;
> ptrdiff_t out_bytepos, out_charpos;
> EMACS_INT dummy;
> + int dummy2;
> bool multibyte_symbol_p = sexpflag && multibyte_syntax_as_symbol;
> if (depth > 0) min_depth = 0;
> @@ -2755,7 +2784,8 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag)
> UPDATE_SYNTAX_TABLE_FORWARD (from);
> found = forw_comment (from, from_byte, stop,
> comnested, comstyle, 0,
> - &out_charpos, &out_bytepos, &dummy);
> + &out_charpos, &out_bytepos, &dummy,
> + &dummy2);
> from = out_charpos, from_byte = out_bytepos;
> if (!found)
> {
> @@ -3119,7 +3149,7 @@ the prefix syntax flag (p). */)
> }
> \f
> /* Parse forward from FROM / FROM_BYTE to END,
> - assuming that FROM has state OLDSTATE (nil means FROM is start of function),
> + assuming that FROM has state STATE,
> and return a description of the state of the parse at END.
> If STOPBEFORE, stop at the start of an atom.
> If COMMENTSTOP is 1, stop at the start of a comment.
> @@ -3127,12 +3157,11 @@ the prefix syntax flag (p). */)
> after the beginning of a string, or after the end of a string. */
> static void
> -scan_sexps_forward (struct lisp_parse_state *stateptr,
> +scan_sexps_forward (struct lisp_parse_state *state,
> ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t end,
> EMACS_INT targetdepth, bool stopbefore,
> - Lisp_Object oldstate, int commentstop)
> + int commentstop)
> {
> - struct lisp_parse_state state;
> enum syntaxcode code;
> int c1;
> bool comnested;
> @@ -3148,7 +3177,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr,
> Lisp_Object tem;
> ptrdiff_t prev_from; /* Keep one character before FROM. */
> ptrdiff_t prev_from_byte;
> - int prev_from_syntax;
> + int prev_from_syntax, prev_prev_from_syntax;
> bool boundary_stop = commentstop == -1;
> bool nofence;
> bool found;
> @@ -3165,6 +3194,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr,
> do { prev_from = from; \
> prev_from_byte = from_byte; \
> temp = FETCH_CHAR_AS_MULTIBYTE (prev_from_byte); \
> + prev_prev_from_syntax = prev_from_syntax; \
> prev_from_syntax = SYNTAX_WITH_FLAGS (temp); \
> INC_BOTH (from, from_byte); \
> if (from < end) \
> @@ -3174,88 +3204,38 @@ do { prev_from = from; \
> immediate_quit = 1;
> QUIT;
> - if (NILP (oldstate))
> - {
> - depth = 0;
> - state.instring = -1;
> - state.incomment = 0;
> - state.comstyle = 0; /* comment style a by default. */
> - state.comstr_start = -1; /* no comment/string seen. */
> - }
> - else
> - {
> - tem = Fcar (oldstate);
> - if (!NILP (tem))
> - depth = XINT (tem);
> - else
> - depth = 0;
> -
> - oldstate = Fcdr (oldstate);
> - oldstate = Fcdr (oldstate);
> - oldstate = Fcdr (oldstate);
> - tem = Fcar (oldstate);
> - /* Check whether we are inside string_fence-style string: */
> - state.instring = (!NILP (tem)
> - ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE)
> - : -1);
> -
> - oldstate = Fcdr (oldstate);
> - tem = Fcar (oldstate);
> - state.incomment = (!NILP (tem)
> - ? (INTEGERP (tem) ? XINT (tem) : -1)
> - : 0);
> -
> - oldstate = Fcdr (oldstate);
> - tem = Fcar (oldstate);
> - start_quoted = !NILP (tem);
> + depth = state->depth;
> + start_quoted = state->quoted;
> + prev_prev_from_syntax = Smax;
> + prev_from_syntax = state->prev_syntax;
> - /* if the eighth element of the list is nil, we are in comment
> - style a. If it is non-nil, we are in comment style b */
> - oldstate = Fcdr (oldstate);
> - oldstate = Fcdr (oldstate);
> - tem = Fcar (oldstate);
> - state.comstyle = (NILP (tem)
> - ? 0
> - : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE)
> - ? XINT (tem)
> - : ST_COMMENT_STYLE));
> -
> - oldstate = Fcdr (oldstate);
> - tem = Fcar (oldstate);
> - state.comstr_start =
> - RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1;
> - oldstate = Fcdr (oldstate);
> - tem = Fcar (oldstate);
> - while (!NILP (tem)) /* >= second enclosing sexps. */
> - {
> - Lisp_Object temhd = Fcar (tem);
> - if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX))
> - curlevel->last = XINT (temhd);
> - if (++curlevel == endlevel)
> - curlevel--; /* error ("Nesting too deep for parser"); */
> - curlevel->prev = -1;
> - curlevel->last = -1;
> - tem = Fcdr (tem);
> - }
> + tem = state->levelstarts;
> + while (!NILP (tem)) /* >= second enclosing sexps. */
> + {
> + Lisp_Object temhd = Fcar (tem);
> + if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX))
> + curlevel->last = XINT (temhd);
> + if (++curlevel == endlevel)
> + curlevel--; /* error ("Nesting too deep for parser"); */
> + curlevel->prev = -1;
> + curlevel->last = -1;
> + tem = Fcdr (tem);
> }
> - state.quoted = 0;
> - mindepth = depth;
> -
curlevel-> prev = -1;
curlevel-> last = -1;
> - SETUP_SYNTAX_TABLE (prev_from, 1);
> - temp = FETCH_CHAR (prev_from_byte);
> - prev_from_syntax = SYNTAX_WITH_FLAGS (temp);
> - UPDATE_SYNTAX_TABLE_FORWARD (from);
> + state->quoted = 0;
> + mindepth = depth;
> +
> + SETUP_SYNTAX_TABLE (from, 1);
> /* Enter the loop at a place appropriate for initial state. */
> - if (state.incomment)
> + if (state->incomment)
> goto startincomment;
> - if (state.instring >= 0)
> + if (state->instring >= 0)
> {
> - nofence = state.instring != ST_STRING_STYLE;
> + nofence = state->instring != ST_STRING_STYLE;
> if (start_quoted)
> goto startquotedinstring;
> goto startinstring;
> @@ -3266,11 +3246,8 @@ do { prev_from = from; \
> while (from < end)
> {
> int syntax;
> - INC_FROM;
> - code = prev_from_syntax & 0xff;
> - if (from < end
> - && SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax)
> + if (SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax)
> && (c1 = FETCH_CHAR (from_byte),
> syntax = SYNTAX_WITH_FLAGS (c1),
> SYNTAX_FLAGS_COMSTART_SECOND (syntax)))
> @@ -3280,32 +3257,39 @@ do { prev_from = from; \
> /* Record the comment style we have entered so that only
> the comment-end sequence of the same style actually
> terminates the comment section. */
> - state.comstyle
> + state->comstyle
> = SYNTAX_FLAGS_COMMENT_STYLE (syntax, prev_from_syntax);
> comnested = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax)
> | SYNTAX_FLAGS_COMMENT_NESTED (syntax));
> - state.incomment = comnested ? 1 : -1;
> - state.comstr_start = prev_from;
> + state->incomment = comnested ? 1 : -1;
> + state->comstr_start = prev_from;
> INC_FROM;
> + prev_from_syntax = Smax; /* the syntax has already been
> + "used up". */
> code = Scomment;
> }
> - else if (code == Scomment_fence)
> - {
> - /* Record the comment style we have entered so that only
> - the comment-end sequence of the same style actually
> - terminates the comment section. */
> - state.comstyle = ST_COMMENT_STYLE;
> - state.incomment = -1;
> - state.comstr_start = prev_from;
> - code = Scomment;
> - }
> - else if (code == Scomment)
> - {
> - state.comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0);
> - state.incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ?
> - 1 : -1);
> - state.comstr_start = prev_from;
> - }
> + else
> + {
> + INC_FROM;
> + code = prev_from_syntax & 0xff;
> + if (code == Scomment_fence)
> + {
> + /* Record the comment style we have entered so that only
> + the comment-end sequence of the same style actually
> + terminates the comment section. */
> + state->comstyle = ST_COMMENT_STYLE;
> + state->incomment = -1;
> + state->comstr_start = prev_from;
> + code = Scomment;
> + }
> + else if (code == Scomment)
> + {
> + state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0);
> + state->incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ?
> + 1 : -1);
> + state->comstr_start = prev_from;
> + }
> + }
> if (SYNTAX_FLAGS_PREFIX (prev_from_syntax))
> continue;
> @@ -3350,26 +3334,28 @@ do { prev_from = from; \
> case Scomment_fence: /* Can't happen because it's handled above. */
> case Scomment:
> - if (commentstop || boundary_stop) goto done;
> + if (commentstop || boundary_stop) goto done;
> startincomment:
> /* The (from == BEGV) test was to enter the loop in the middle so
> that we find a 2-char comment ender even if we start in the
> middle of it. We don't want to do that if we're just at the
> beginning of the comment (think of (*) ... (*)). */
> found = forw_comment (from, from_byte, end,
> - state.incomment, state.comstyle,
> - (from == BEGV || from < state.comstr_start + 3)
> - ? 0 : prev_from_syntax,
> - &out_charpos, &out_bytepos, &state.incomment);
> + state->incomment, state->comstyle,
> + from == BEGV ? 0 : prev_from_syntax,
> + &out_charpos, &out_bytepos, &state->incomment,
> + &prev_from_syntax);
> from = out_charpos; from_byte = out_bytepos;
> - /* Beware! prev_from and friends are invalid now.
> - Luckily, the `done' doesn't use them and the INC_FROM
> - sets them to a sane value without looking at them. */
> + /* Beware! prev_from and friends (except prev_from_syntax)
> + are invalid now. Luckily, the `done' doesn't use them
> + and the INC_FROM sets them to a sane value without
> + looking at them. */
> if (!found) goto done;
> INC_FROM;
> - state.incomment = 0;
> - state.comstyle = 0; /* reset the comment style */
> - if (boundary_stop) goto done;
> + state->incomment = 0;
> + state->comstyle = 0; /* reset the comment style */
> + prev_from_syntax = Smax; /* For the comment closer */
> + if (boundary_stop) goto done;
> break;
> case Sopen:
> @@ -3396,16 +3382,16 @@ do { prev_from = from; \
> case Sstring:
> case Sstring_fence:
> - state.comstr_start = from - 1;
> + state->comstr_start = from - 1;
> if (stopbefore) goto stop; /* this arg means stop at sexp start */
curlevel-> last = prev_from;
> - state.instring = (code == Sstring
> + state->instring = (code == Sstring
> ? (FETCH_CHAR_AS_MULTIBYTE (prev_from_byte))
> : ST_STRING_STYLE);
> if (boundary_stop) goto done;
> startinstring:
> {
> - nofence = state.instring != ST_STRING_STYLE;
> + nofence = state->instring != ST_STRING_STYLE;
> while (1)
> {
> @@ -3419,7 +3405,7 @@ do { prev_from = from; \
> /* Check C_CODE here so that if the char has
> a syntax-table property which says it is NOT
> a string character, it does not end the string. */
> - if (nofence && c == state.instring && c_code == Sstring)
> + if (nofence && c == state->instring && c_code == Sstring)
> break;
> switch (c_code)
> @@ -3442,7 +3428,7 @@ do { prev_from = from; \
> }
> }
> string_end:
> - state.instring = -1;
> + state->instring = -1;
curlevel-> prev = curlevel->last;
> INC_FROM;
> if (boundary_stop) goto done;
> @@ -3461,25 +3447,96 @@ do { prev_from = from; \
> stop: /* Here if stopping before start of sexp. */
> from = prev_from; /* We have just fetched the char that starts it; */
> from_byte = prev_from_byte;
> + prev_from_syntax = prev_prev_from_syntax;
> goto done; /* but return the position before it. */
> endquoted:
> - state.quoted = 1;
> + state->quoted = 1;
> done:
> - state.depth = depth;
> - state.mindepth = mindepth;
> - state.thislevelstart = curlevel->prev;
> - state.prevlevelstart
> + state->depth = depth;
> + state->mindepth = mindepth;
> + state->thislevelstart = curlevel->prev;
> + state->prevlevelstart
> = (curlevel == levelstart) ? -1 : (curlevel - 1)->last;
> - state.location = from;
> - state.location_byte = from_byte;
> - state.levelstarts = Qnil;
> + state->location = from;
> + state->location_byte = from_byte;
> + state->levelstarts = Qnil;
> while (curlevel > levelstart)
> - state.levelstarts = Fcons (make_number ((--curlevel)->last),
> - state.levelstarts);
> + state->levelstarts = Fcons (make_number ((--curlevel)->last),
> + state->levelstarts);
> + state->prev_syntax = (SYNTAX_FLAGS_COMSTARTEND_FIRST (prev_from_syntax)
> + || state->quoted) ? prev_from_syntax : Smax;
> immediate_quit = 0;
> +}
> +
> +/* Convert a (lisp) parse state to the internal form used in
> + scan_sexps_forward. */
> +static void
> +internalize_parse_state (Lisp_Object external, struct lisp_parse_state *state)
> +{
> + Lisp_Object tem;
> +
> + if (NILP (external))
> + {
> + state->depth = 0;
> + state->instring = -1;
> + state->incomment = 0;
> + state->quoted = 0;
> + state->comstyle = 0; /* comment style a by default. */
> + state->comstr_start = -1; /* no comment/string seen. */
> + state->levelstarts = Qnil;
> + state->prev_syntax = Smax;
> + }
> + else
> + {
> + tem = Fcar (external);
> + if (!NILP (tem))
> + state->depth = XINT (tem);
> + else
> + state->depth = 0;
> +
> + external = Fcdr (external);
> + external = Fcdr (external);
> + external = Fcdr (external);
> + tem = Fcar (external);
> + /* Check whether we are inside string_fence-style string: */
> + state->instring = (!NILP (tem)
> + ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE)
> + : -1);
> +
> + external = Fcdr (external);
> + tem = Fcar (external);
> + state->incomment = (!NILP (tem)
> + ? (INTEGERP (tem) ? XINT (tem) : -1)
> + : 0);
> +
> + external = Fcdr (external);
> + tem = Fcar (external);
> + state->quoted = !NILP (tem);
> - *stateptr = state;
> + /* if the eighth element of the list is nil, we are in comment
> + style a. If it is non-nil, we are in comment style b */
> + external = Fcdr (external);
> + external = Fcdr (external);
> + tem = Fcar (external);
> + state->comstyle = (NILP (tem)
> + ? 0
> + : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE)
> + ? XINT (tem)
> + : ST_COMMENT_STYLE));
> +
> + external = Fcdr (external);
> + tem = Fcar (external);
> + state->comstr_start =
> + RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1;
> + external = Fcdr (external);
> + tem = Fcar (external);
> + state->levelstarts = tem;
> +
> + external = Fcdr (external);
> + tem = Fcar (external);
> + state->prev_syntax = NILP (tem) ? Smax : XINT (tem);
> + }
> }
> DEFUN ("parse-partial-sexp", Fparse_partial_sexp, Sparse_partial_sexp, 2, 6, 0,
> @@ -3488,6 +3545,7 @@ Parsing stops at TO or when certain criteria are met;
> point is set to where parsing stops.
> If fifth arg OLDSTATE is omitted or nil,
> parsing assumes that FROM is the beginning of a function.
> +
> Value is a list of elements describing final state of parsing:
> 0. depth in parens.
> 1. character address of start of innermost containing list; nil if none.
> @@ -3501,16 +3559,22 @@ Value is a list of elements describing final state of parsing:
> 6. the minimum paren-depth encountered during this scan.
> 7. style of comment, if any.
> 8. character address of start of comment or string; nil if not in one.
> - 9. Intermediate data for continuation of parsing (subject to change).
> + 9. List of positions of currently open parens, outermost first.
> +10. When the last position scanned holds the first character of a
> + (potential) two character construct, the syntax of that position,
> + otherwise nil. That construct can be a two character comment
> + delimiter or an Escaped or Char-quoted character.
> +11..... Possible further internal information used by `parse-partial-sexp'.
> +
> If third arg TARGETDEPTH is non-nil, parsing stops if the depth
> in parentheses becomes equal to TARGETDEPTH.
> -Fourth arg STOPBEFORE non-nil means stop when come to
> +Fourth arg STOPBEFORE non-nil means stop when we come to
> any character that starts a sexp.
> Fifth arg OLDSTATE is a list like what this function returns.
> It is used to initialize the state of the parse. Elements number 1, 2, 6
> are ignored.
> -Sixth arg COMMENTSTOP non-nil means stop at the start of a comment.
> - If it is symbol `syntax-table', stop after the start of a comment or a
> +Sixth arg COMMENTSTOP non-nil means stop after the start of a comment.
> + If it is the symbol `syntax-table', stop after the start of a comment or a
> string, or after end of a comment or a string. */)
> (Lisp_Object from, Lisp_Object to, Lisp_Object targetdepth,
> Lisp_Object stopbefore, Lisp_Object oldstate, Lisp_Object commentstop)
> @@ -3527,15 +3591,17 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment.
> target = TYPE_MINIMUM (EMACS_INT); /* We won't reach this depth. */
> validate_region (&from, &to);
> + internalize_parse_state (oldstate, &state);
> scan_sexps_forward (&state, XINT (from), CHAR_TO_BYTE (XINT (from)),
> XINT (to),
> - target, !NILP (stopbefore), oldstate,
> + target, !NILP (stopbefore),
> (NILP (commentstop)
> ? 0 : (EQ (commentstop, Qsyntax_table) ? -1 : 1)));
> SET_PT_BOTH (state.location, state.location_byte);
> - return Fcons (make_number (state.depth),
> + return
> + Fcons (make_number (state.depth),
> Fcons (state.prevlevelstart < 0
> ? Qnil : make_number (state.prevlevelstart),
> Fcons (state.thislevelstart < 0
> @@ -3553,11 +3619,15 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment.
> ? Qsyntax_table
> : make_number (state.comstyle))
> : Qnil),
> - Fcons (((state.incomment
> - || (state.instring >= 0))
> - ? make_number (state.comstr_start)
> - : Qnil),
> - Fcons (state.levelstarts, Qnil))))))))));
> + Fcons (((state.incomment
> + || (state.instring >= 0))
> + ? make_number (state.comstr_start)
> + : Qnil),
> + Fcons (state.levelstarts,
> + Fcons (state.prev_syntax == Smax
> + ? Qnil
> + : make_number (state.prev_syntax),
> + Qnil)))))))))));
> }
> \f
> void
> _______________________________________________
> Emacs-diffs mailing list
> Emacs-diffs@gnu.org
> https://lists.gnu.org/mailman/listinfo/emacs-diffs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters
2016-03-20 13:47 ` [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters Stefan Monnier
@ 2016-03-20 14:22 ` Alan Mackenzie
2016-03-20 14:40 ` Alan Mackenzie
1 sibling, 0 replies; 4+ messages in thread
From: Alan Mackenzie @ 2016-03-20 14:22 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
Hello, Stefan.
On Sun, Mar 20, 2016 at 09:47:58AM -0400, Stefan Monnier wrote:
> What was John's opinion on reusing nth 5?
I haven't asked him, yet. I didn't understand that your last post was a
suggestion that I should put that question to him. I'll do that now.
I am not in favour of reusing element 5, but not at all strongly.
> Stefan
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters
2016-03-20 13:47 ` [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters Stefan Monnier
2016-03-20 14:22 ` Alan Mackenzie
@ 2016-03-20 14:40 ` Alan Mackenzie
2016-03-20 15:30 ` Stefan Monnier
1 sibling, 1 reply; 4+ messages in thread
From: Alan Mackenzie @ 2016-03-20 14:40 UTC (permalink / raw)
To: John Wiegley; +Cc: Stefan Monnier, emacs-devel
Hello, John.
On Sun, Mar 20, 2016 at 09:47:58AM -0400, Stefan Monnier wrote:
> What was John's opinion on reusing nth 5?
> Stefan
Yes John, what is your opinion on reusing element 5 of the parser state?
Background:
The previous edition of `parse-partial-sexp' returned a ten-element list
which could be used for continuing that parse. Unfortunately, this list
was incomplete, leading to errors when the parse had stopped in the
middle of a two-character comment delimiter. This is bug #23019.
My patch, created with Stefan's help, has solved this bug, partly by
adding another element onto the parser state. But this new element
leaves element 5 ("t when just after an escape character") redundant.
Stefan is in favour of reusing position 5 for this new element, rather
than adding it at the end of the state (which has just been done). I am
against this, though not strongly. It might have some effects on
existing code, though this is not particularly likely.
What do you say?
--
Alan Mackenzie (Nuremberg, Germany).
> >>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes:
> > branch: master
> > commit 9dcf5998935c8aaa846d7585b81f0dcfe1935b3d
> > Author: Alan Mackenzie <acm@muc.de>
> > Commit: Alan Mackenzie <acm@muc.de>
> > Amend parse-partial-sexp correctly to handle two character comment delimiters
> > Do this by adding a new field to the parser state: the syntax of the last
> > character scanned, should that be the first char of a (potential) two char
> > construct, nil otherwise.
> > This should make the parser state complete.
> > Also document element 9 of the parser state. Also refactor the code a bit.
> > * src/syntax.c (struct lisp_parse_state): Add a new field.
> > (SYNTAX_FLAGS_COMSTARTEND_FIRST): New function.
> > (internalize_parse_state): New function, extracted from scan_sexps_forward.
> > (back_comment): Call internalize_parse_state.
> > (forw_comment): Return the syntax of the last character scanned to the caller
> > when that character might be the first of a two character construct.
> > (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment.
> > (scan_sexps_forward): Remove a redundant state parameter. Access all `state'
> > information via the address parameter `state'. Remove the code which converts
> > from external to internal form of `state'. Access buffer contents only from
> > `from' onwards. Reformulate code at the top of the main loop correctly to
> > recognize comment openers when starting in the middle of one. Call
> > forw_comment with extra argument (for return of syntax value of possible first
> > char of a two char construct).
> > (Fparse_partial_sexp): Document elements 9, 10 of the parser state in the
> > doc string. Clarify the doc string in general. Call
> > internalize_parse_state. Take account of the new elements when consing up the
> > output parser state.
> > * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new
> > element 10. Minor wording corrections (remove reference to "trivial
> > cases").
> > (Low Level Parsing): Minor corrections.
> > * etc/NEWS: Note new element 10, and documentation of element 9 of parser
> > state.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters
2016-03-20 14:40 ` Alan Mackenzie
@ 2016-03-20 15:30 ` Stefan Monnier
0 siblings, 0 replies; 4+ messages in thread
From: Stefan Monnier @ 2016-03-20 15:30 UTC (permalink / raw)
To: John Wiegley, emacs-devel; +Cc: Alan Mackenzie
What he says,
Stefan
>>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes:
> Hello, John.
> On Sun, Mar 20, 2016 at 09:47:58AM -0400, Stefan Monnier wrote:
>> What was John's opinion on reusing nth 5?
>> Stefan
> Yes John, what is your opinion on reusing element 5 of the parser state?
> Background:
> The previous edition of `parse-partial-sexp' returned a ten-element list
> which could be used for continuing that parse. Unfortunately, this list
> was incomplete, leading to errors when the parse had stopped in the
> middle of a two-character comment delimiter. This is bug #23019.
> My patch, created with Stefan's help, has solved this bug, partly by
> adding another element onto the parser state. But this new element
> leaves element 5 ("t when just after an escape character") redundant.
> Stefan is in favour of reusing position 5 for this new element, rather
> than adding it at the end of the state (which has just been done). I am
> against this, though not strongly. It might have some effects on
> existing code, though this is not particularly likely.
> What do you say?
> --
> Alan Mackenzie (Nuremberg, Germany).
>> >>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes:
>> > branch: master
>> > commit 9dcf5998935c8aaa846d7585b81f0dcfe1935b3d
>> > Author: Alan Mackenzie <acm@muc.de>
>> > Commit: Alan Mackenzie <acm@muc.de>
>> > Amend parse-partial-sexp correctly to handle two character comment delimiters
>> > Do this by adding a new field to the parser state: the syntax of the last
>> > character scanned, should that be the first char of a (potential) two char
>> > construct, nil otherwise.
>> > This should make the parser state complete.
>> > Also document element 9 of the parser state. Also refactor the code a bit.
>> > * src/syntax.c (struct lisp_parse_state): Add a new field.
>> > (SYNTAX_FLAGS_COMSTARTEND_FIRST): New function.
>> > (internalize_parse_state): New function, extracted from scan_sexps_forward.
>> > (back_comment): Call internalize_parse_state.
>> > (forw_comment): Return the syntax of the last character scanned to the caller
>> > when that character might be the first of a two character construct.
>> > (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment.
>> > (scan_sexps_forward): Remove a redundant state parameter. Access all `state'
>> > information via the address parameter `state'. Remove the code which converts
>> > from external to internal form of `state'. Access buffer contents only from
>> > `from' onwards. Reformulate code at the top of the main loop correctly to
>> > recognize comment openers when starting in the middle of one. Call
>> > forw_comment with extra argument (for return of syntax value of possible first
>> > char of a two char construct).
>> > (Fparse_partial_sexp): Document elements 9, 10 of the parser state in the
>> > doc string. Clarify the doc string in general. Call
>> > internalize_parse_state. Take account of the new elements when consing up the
>> > output parser state.
>> > * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new
>> > element 10. Minor wording corrections (remove reference to "trivial
>> > cases").
>> > (Low Level Parsing): Minor corrections.
>> > * etc/NEWS: Note new element 10, and documentation of element 9 of parser
>> > state.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-03-20 15:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20160320132125.5713.70103@vcs.savannah.gnu.org>
[not found] ` <E1ahdIL-0001Ul-Uz@vcs.savannah.gnu.org>
2016-03-20 13:47 ` [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters Stefan Monnier
2016-03-20 14:22 ` Alan Mackenzie
2016-03-20 14:40 ` Alan Mackenzie
2016-03-20 15:30 ` Stefan Monnier
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.