all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Re: [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters
       [not found] ` <E1ahdIL-0001Ul-Uz@vcs.savannah.gnu.org>
@ 2016-03-20 13:47   ` Stefan Monnier
  2016-03-20 14:22     ` Alan Mackenzie
  2016-03-20 14:40     ` Alan Mackenzie
  0 siblings, 2 replies; 4+ messages in thread
From: Stefan Monnier @ 2016-03-20 13:47 UTC (permalink / raw
  To: emacs-devel; +Cc: Alan Mackenzie

What was John's opinion on reusing nth 5?


        Stefan


>>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes:

> branch: master
> commit 9dcf5998935c8aaa846d7585b81f0dcfe1935b3d
> Author: Alan Mackenzie <acm@muc.de>
> Commit: Alan Mackenzie <acm@muc.de>

>     Amend parse-partial-sexp correctly to handle two character comment delimiters
    
>     Do this by adding a new field to the parser state: the syntax of the last
>     character scanned, should that be the first char of a (potential) two char
>     construct, nil otherwise.
>     This should make the parser state complete.
>     Also document element 9 of the parser state.  Also refactor the code a bit.
    
>     * src/syntax.c (struct lisp_parse_state): Add a new field.
>     (SYNTAX_FLAGS_COMSTARTEND_FIRST): New function.
>     (internalize_parse_state): New function, extracted from scan_sexps_forward.
>     (back_comment): Call internalize_parse_state.
>     (forw_comment): Return the syntax of the last character scanned to the caller
>     when that character might be the first of a two character construct.
>     (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment.
>     (scan_sexps_forward): Remove a redundant state parameter.  Access all `state'
>     information via the address parameter `state'.  Remove the code which converts
>     from external to internal form of `state'.  Access buffer contents only from
>     `from' onwards.  Reformulate code at the top of the main loop correctly to
>     recognize comment openers when starting in the middle of one.  Call
>     forw_comment with extra argument (for return of syntax value of possible first
>     char of a two char construct).
>     (Fparse_partial_sexp): Document elements 9, 10 of the parser state in the
>     doc string.  Clarify the doc string in general.  Call
>     internalize_parse_state.  Take account of the new elements when consing up the
>     output parser state.
    
>     * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new
>     element 10.  Minor wording corrections (remove reference to "trivial
>     cases").
>     (Low Level Parsing): Minor corrections.
    
>     * etc/NEWS: Note new element 10, and documentation of element 9 of parser
>     state.
> ---
>  doc/lispref/syntax.texi |   33 +++--
>  etc/NEWS                |   12 ++
>  src/syntax.c            |  372 ++++++++++++++++++++++++++++-------------------
>  3 files changed, 252 insertions(+), 165 deletions(-)

> diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi
> index d5a7eba..f81c164 100644
> --- a/doc/lispref/syntax.texi
> +++ b/doc/lispref/syntax.texi
> @@ -791,10 +791,10 @@ Hooks}).
>  @subsection Parser State
>  @cindex parser state
 
> -  A @dfn{parser state} is a list of ten elements describing the state
> -of the syntactic parser, after it parses the text between a specified
> -starting point and a specified end point in the buffer.  Parsing
> -functions such as @code{syntax-ppss}
> +  A @dfn{parser state} is a list of (currently) eleven elements
> +describing the state of the syntactic parser, after it parses the text
> +between a specified starting point and a specified end point in the
> +buffer.  Parsing functions such as @code{syntax-ppss}
>  @ifnottex
>  (@pxref{Position Parse})
>  @end ifnottex
> @@ -851,15 +851,20 @@ position where the string began.  When outside of strings and comments,
>  this element is @code{nil}.
 
>  @item
> -Internal data for continuing the parsing.  The meaning of this
> -data is subject to change; it is used if you pass this list
> -as the @var{state} argument to another call.
> +The list of the positions of the currently open parentheses, starting
> +with the outermost.
> +
> +@item
> +When the last buffer position scanned was the (potential) first
> +character of a two character construct (comment delimiter or
> +escaped/char-quoted character pair), the @var{syntax-code}
> +(@pxref{Syntax Table Internals}) of that position.  Otherwise
> +@code{nil}.
>  @end enumerate
 
>    Elements 1, 2, and 6 are ignored in a state which you pass as an
> -argument to continue parsing, and elements 8 and 9 are used only in
> -trivial cases.  Those elements are mainly used internally by the
> -parser code.
> +argument to continue parsing.  Elements 9 and 10 are mainly used
> +internally by the parser code.
 
>    One additional piece of useful information is available from a
>  parser state using this function:
> @@ -898,11 +903,11 @@ The depth starts at 0, or at whatever is given in @var{state}.
 
>  If the fourth argument @var{stop-before} is non-@code{nil}, parsing
>  stops when it comes to any character that starts a sexp.  If
> -@var{stop-comment} is non-@code{nil}, parsing stops when it comes to the
> -start of an unnested comment.  If @var{stop-comment} is the symbol
> +@var{stop-comment} is non-@code{nil}, parsing stops after the start of
> +an unnested comment.  If @var{stop-comment} is the symbol
>  @code{syntax-table}, parsing stops after the start of an unnested
> -comment or a string, or the end of an unnested comment or a string,
> -whichever comes first.
> +comment or a string, or after the end of an unnested comment or a
> +string, whichever comes first.
 
>  If @var{state} is @code{nil}, @var{start} is assumed to be at the top
>  level of parenthesis structure, such as the beginning of a function
> diff --git a/etc/NEWS b/etc/NEWS
> index d963dee..ea32153 100644
> --- a/etc/NEWS
> +++ b/etc/NEWS
> @@ -175,6 +175,18 @@ a new window when opening man pages when there's already one, use
>          (inhibit-same-window . nil)
>          (mode . Man-mode))))
 
> ++++
> +** `parse-partial-sexp' state has a new element.  Element 10 is
> +non-nil when the last character scanned might be the first character
> +of a two character construct, i.e. a comment delimiter or escaped
> +character.  Its value is the syntax of that last character.
> +
> ++++
> +** `parse-partial-sexp''s state, element 9, has now been confirmed as
> +permanent and documented, and may be used by Lisp programs.  Its value
> +is a list of currently open parenthesis positions, starting with the
> +outermost parenthesis.
> +
>  \f
>  * Changes in Emacs 25.2 on Non-Free Operating Systems
 
> diff --git a/src/syntax.c b/src/syntax.c
> index fdcfdfc..ffe0ea5 100644
> --- a/src/syntax.c
> +++ b/src/syntax.c
> @@ -81,6 +81,11 @@ SYNTAX_FLAGS_COMEND_SECOND (int flags)
>    return (flags >> 19) & 1;
>  }
>  static bool
> +SYNTAX_FLAGS_COMSTARTEND_FIRST (int flags)
> +{
> +  return (flags & 0x50000) != 0;
> +}
> +static bool
>  SYNTAX_FLAGS_PREFIX (int flags)
>  {
>    return (flags >> 20) & 1;
> @@ -153,6 +158,10 @@ struct lisp_parse_state
>      ptrdiff_t comstr_start;  /* Position of last comment/string starter.  */
>      Lisp_Object levelstarts; /* Char numbers of starts-of-expression
>  				of levels (starting from outermost).  */
> +    int prev_syntax; /* Syntax of previous position scanned, when
> +                        that position (potentially) holds the first char
> +                        of a 2-char construct, i.e. comment delimiter
> +                        or Sescape, etc.  Smax otherwise. */
>    };
>  \f
>  /* These variables are a cache for finding the start of a defun.
> @@ -176,7 +185,8 @@ static Lisp_Object skip_syntaxes (bool, Lisp_Object, Lisp_Object);
>  static Lisp_Object scan_lists (EMACS_INT, EMACS_INT, EMACS_INT, bool);
>  static void scan_sexps_forward (struct lisp_parse_state *,
>                                  ptrdiff_t, ptrdiff_t, ptrdiff_t, EMACS_INT,
> -                                bool, Lisp_Object, int);
> +                                bool, int);
> +static void internalize_parse_state (Lisp_Object, struct lisp_parse_state *);
>  static bool in_classes (int, Lisp_Object);
>  static void parse_sexp_propertize (ptrdiff_t charpos);
 
> @@ -911,10 +921,11 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
>  	}
>        do
>  	{
> +          internalize_parse_state (Qnil, &state);
>  	  scan_sexps_forward (&state,
>  			      defun_start, defun_start_byte,
>  			      comment_end, TYPE_MINIMUM (EMACS_INT),
> -			      0, Qnil, 0);
> +			      0, 0);
>  	  defun_start = comment_end;
>  	  if (!adjusted)
>  	    {
> @@ -2310,11 +2321,15 @@ in_classes (int c, Lisp_Object iso_classes)
>     PREV_SYNTAX is the SYNTAX_WITH_FLAGS of the previous character
>       (or 0 If the search cannot start in the middle of a two-character).
 
> -   If successful, return true and store the charpos of the comment's end
> -   into *CHARPOS_PTR and the corresponding bytepos into *BYTEPOS_PTR.
> -   Else, return false and store the charpos STOP into *CHARPOS_PTR, the
> -   corresponding bytepos into *BYTEPOS_PTR and the current nesting
> -   (as defined for state.incomment) in *INCOMMENT_PTR.
> +   If successful, return true and store the charpos of the comment's
> +   end into *CHARPOS_PTR and the corresponding bytepos into
> +   *BYTEPOS_PTR.  Else, return false and store the charpos STOP into
> +   *CHARPOS_PTR, the corresponding bytepos into *BYTEPOS_PTR and the
> +   current nesting (as defined for state->incomment) in
> +   *INCOMMENT_PTR.  Should the last character scanned in an incomplete
> +   comment be a possible first character of a two character construct,
> +   we store its SYNTAX_WITH_FLAGS into *last_syntax_ptr.  Otherwise,
> +   we store Smax into *last_syntax_ptr.
 
>     The comment end is the last character of the comment rather than the
>     character just after the comment.
> @@ -2326,7 +2341,7 @@ static bool
>  forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
>  	      EMACS_INT nesting, int style, int prev_syntax,
>  	      ptrdiff_t *charpos_ptr, ptrdiff_t *bytepos_ptr,
> -	      EMACS_INT *incomment_ptr)
> +	      EMACS_INT *incomment_ptr, int *last_syntax_ptr)
>  {
>    register int c, c1;
>    register enum syntaxcode code;
> @@ -2337,7 +2352,8 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
>    /* Enter the loop in the middle so that we find
>       a 2-char comment ender if we start in the middle of it.  */
>    syntax = prev_syntax;
> -  if (syntax != 0) goto forw_incomment;
> +  code = syntax & 0xff;
> +  if (syntax != 0 && from < stop) goto forw_incomment;
 
>    while (1)
>      {
> @@ -2346,6 +2362,12 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
>  	  *incomment_ptr = nesting;
>  	  *charpos_ptr = from;
>  	  *bytepos_ptr = from_byte;
> +          *last_syntax_ptr =
> +            (code == Sescape || code == Scharquote
> +             || SYNTAX_FLAGS_COMEND_FIRST (syntax)
> +             || (nesting > 0
> +                 && SYNTAX_FLAGS_COMSTART_FIRST (syntax)))
> +            ? syntax : Smax ;
>  	  return 0;
>  	}
>        c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
> @@ -2386,7 +2408,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
>  	       SYNTAX_FLAGS_COMMENT_NESTED (other_syntax))
>  	      ? nesting > 0 : nesting < 0))
>  	{
> -	  if (--nesting <= 0)
> +	  syntax = Smax;        /* So that "|#" (lisp) can not return
> +                                   the syntax of "#" in *last_syntax_ptr. */
> +          if (--nesting <= 0)
>  	    /* We have encountered a comment end of the same style
>  	       as the comment sequence which began this comment section.  */
>  	    break;
> @@ -2408,6 +2432,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
>  	/* We have encountered a nested comment of the same style
>  	   as the comment sequence which began this comment section.  */
>  	{
> +          syntax = Smax; /* So that "#|#" isn't also a comment ender. */
>  	  INC_BOTH (from, from_byte);
>  	  UPDATE_SYNTAX_TABLE_FORWARD (from);
>  	  nesting++;
> @@ -2415,6 +2440,8 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
>      }
>    *charpos_ptr = from;
>    *bytepos_ptr = from_byte;
> +  *last_syntax_ptr = Smax; /* Any syntactic power the last byte had is
> +                              used up. */
>    return 1;
>  }
 
> @@ -2436,6 +2463,7 @@ between them, return t; otherwise return nil.  */)
>    EMACS_INT count1;
>    ptrdiff_t out_charpos, out_bytepos;
>    EMACS_INT dummy;
> +  int dummy2;
 
>    CHECK_NUMBER (count);
>    count1 = XINT (count);
> @@ -2499,7 +2527,7 @@ between them, return t; otherwise return nil.  */)
>  	}
>        /* We're at the start of a comment.  */
>        found = forw_comment (from, from_byte, stop, comnested, comstyle, 0,
> -			    &out_charpos, &out_bytepos, &dummy);
> +			    &out_charpos, &out_bytepos, &dummy, &dummy2);
>        from = out_charpos; from_byte = out_bytepos;
>        if (!found)
>  	{
> @@ -2659,6 +2687,7 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag)
>    ptrdiff_t from_byte;
>    ptrdiff_t out_bytepos, out_charpos;
>    EMACS_INT dummy;
> +  int dummy2;
>    bool multibyte_symbol_p = sexpflag && multibyte_syntax_as_symbol;
 
>    if (depth > 0) min_depth = 0;
> @@ -2755,7 +2784,8 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag)
>  	      UPDATE_SYNTAX_TABLE_FORWARD (from);
>  	      found = forw_comment (from, from_byte, stop,
>  				    comnested, comstyle, 0,
> -				    &out_charpos, &out_bytepos, &dummy);
> +				    &out_charpos, &out_bytepos, &dummy,
> +                                    &dummy2);
>  	      from = out_charpos, from_byte = out_bytepos;
>  	      if (!found)
>  		{
> @@ -3119,7 +3149,7 @@ the prefix syntax flag (p).  */)
>  }
>  \f
>  /* Parse forward from FROM / FROM_BYTE to END,
> -   assuming that FROM has state OLDSTATE (nil means FROM is start of function),
> +   assuming that FROM has state STATE,
>     and return a description of the state of the parse at END.
>     If STOPBEFORE, stop at the start of an atom.
>     If COMMENTSTOP is 1, stop at the start of a comment.
> @@ -3127,12 +3157,11 @@ the prefix syntax flag (p).  */)
>     after the beginning of a string, or after the end of a string.  */
 
>  static void
> -scan_sexps_forward (struct lisp_parse_state *stateptr,
> +scan_sexps_forward (struct lisp_parse_state *state,
>  		    ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t end,
>  		    EMACS_INT targetdepth, bool stopbefore,
> -		    Lisp_Object oldstate, int commentstop)
> +		    int commentstop)
>  {
> -  struct lisp_parse_state state;
>    enum syntaxcode code;
>    int c1;
>    bool comnested;
> @@ -3148,7 +3177,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr,
>    Lisp_Object tem;
>    ptrdiff_t prev_from;		/* Keep one character before FROM.  */
>    ptrdiff_t prev_from_byte;
> -  int prev_from_syntax;
> +  int prev_from_syntax, prev_prev_from_syntax;
>    bool boundary_stop = commentstop == -1;
>    bool nofence;
>    bool found;
> @@ -3165,6 +3194,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr,
>  do { prev_from = from;				\
>       prev_from_byte = from_byte; 		\
>       temp = FETCH_CHAR_AS_MULTIBYTE (prev_from_byte);	\
> +     prev_prev_from_syntax = prev_from_syntax;  \
>       prev_from_syntax = SYNTAX_WITH_FLAGS (temp); \
>       INC_BOTH (from, from_byte);		\
>       if (from < end)				\
> @@ -3174,88 +3204,38 @@ do { prev_from = from;				\
>    immediate_quit = 1;
>    QUIT;
 
> -  if (NILP (oldstate))
> -    {
> -      depth = 0;
> -      state.instring = -1;
> -      state.incomment = 0;
> -      state.comstyle = 0;	/* comment style a by default.  */
> -      state.comstr_start = -1;	/* no comment/string seen.  */
> -    }
> -  else
> -    {
> -      tem = Fcar (oldstate);
> -      if (!NILP (tem))
> -	depth = XINT (tem);
> -      else
> -	depth = 0;
> -
> -      oldstate = Fcdr (oldstate);
> -      oldstate = Fcdr (oldstate);
> -      oldstate = Fcdr (oldstate);
> -      tem = Fcar (oldstate);
> -      /* Check whether we are inside string_fence-style string: */
> -      state.instring = (!NILP (tem)
> -			? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE)
> -			: -1);
> -
> -      oldstate = Fcdr (oldstate);
> -      tem = Fcar (oldstate);
> -      state.incomment = (!NILP (tem)
> -			 ? (INTEGERP (tem) ? XINT (tem) : -1)
> -			 : 0);
> -
> -      oldstate = Fcdr (oldstate);
> -      tem = Fcar (oldstate);
> -      start_quoted = !NILP (tem);
> +  depth = state->depth;
> +  start_quoted = state->quoted;
> +  prev_prev_from_syntax = Smax;
> +  prev_from_syntax = state->prev_syntax;
 
> -      /* if the eighth element of the list is nil, we are in comment
> -	 style a.  If it is non-nil, we are in comment style b */
> -      oldstate = Fcdr (oldstate);
> -      oldstate = Fcdr (oldstate);
> -      tem = Fcar (oldstate);
> -      state.comstyle = (NILP (tem)
> -			? 0
> -			: (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE)
> -			   ? XINT (tem)
> -			   : ST_COMMENT_STYLE));
> -
> -      oldstate = Fcdr (oldstate);
> -      tem = Fcar (oldstate);
> -      state.comstr_start =
> -	RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1;
> -      oldstate = Fcdr (oldstate);
> -      tem = Fcar (oldstate);
> -      while (!NILP (tem))		/* >= second enclosing sexps.  */
> -	{
> -	  Lisp_Object temhd = Fcar (tem);
> -	  if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX))
> -	    curlevel->last = XINT (temhd);
> -	  if (++curlevel == endlevel)
> -	    curlevel--; /* error ("Nesting too deep for parser"); */
> -	  curlevel->prev = -1;
> -	  curlevel->last = -1;
> -	  tem = Fcdr (tem);
> -	}
> +  tem = state->levelstarts;
> +  while (!NILP (tem))		/* >= second enclosing sexps.  */
> +    {
> +      Lisp_Object temhd = Fcar (tem);
> +      if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX))
> +        curlevel->last = XINT (temhd);
> +      if (++curlevel == endlevel)
> +        curlevel--; /* error ("Nesting too deep for parser"); */
> +      curlevel->prev = -1;
> +      curlevel->last = -1;
> +      tem = Fcdr (tem);
>      }
> -  state.quoted = 0;
> -  mindepth = depth;
> -
curlevel-> prev = -1;
curlevel-> last = -1;
 
> -  SETUP_SYNTAX_TABLE (prev_from, 1);
> -  temp = FETCH_CHAR (prev_from_byte);
> -  prev_from_syntax = SYNTAX_WITH_FLAGS (temp);
> -  UPDATE_SYNTAX_TABLE_FORWARD (from);
> +  state->quoted = 0;
> +  mindepth = depth;
> +
> +  SETUP_SYNTAX_TABLE (from, 1);
 
>    /* Enter the loop at a place appropriate for initial state.  */
 
> -  if (state.incomment)
> +  if (state->incomment)
>      goto startincomment;
> -  if (state.instring >= 0)
> +  if (state->instring >= 0)
>      {
> -      nofence = state.instring != ST_STRING_STYLE;
> +      nofence = state->instring != ST_STRING_STYLE;
>        if (start_quoted)
>  	goto startquotedinstring;
>        goto startinstring;
> @@ -3266,11 +3246,8 @@ do { prev_from = from;				\
>    while (from < end)
>      {
>        int syntax;
> -      INC_FROM;
> -      code = prev_from_syntax & 0xff;
 
> -      if (from < end
> -	  && SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax)
> +      if (SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax)
>  	  && (c1 = FETCH_CHAR (from_byte),
>  	      syntax = SYNTAX_WITH_FLAGS (c1),
>  	      SYNTAX_FLAGS_COMSTART_SECOND (syntax)))
> @@ -3280,32 +3257,39 @@ do { prev_from = from;				\
>  	  /* Record the comment style we have entered so that only
>  	     the comment-end sequence of the same style actually
>  	     terminates the comment section.  */
> -	  state.comstyle
> +	  state->comstyle
>  	    = SYNTAX_FLAGS_COMMENT_STYLE (syntax, prev_from_syntax);
>  	  comnested = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax)
>  		       | SYNTAX_FLAGS_COMMENT_NESTED (syntax));
> -	  state.incomment = comnested ? 1 : -1;
> -	  state.comstr_start = prev_from;
> +	  state->incomment = comnested ? 1 : -1;
> +	  state->comstr_start = prev_from;
>  	  INC_FROM;
> +          prev_from_syntax = Smax; /* the syntax has already been
> +                                      "used up". */
>  	  code = Scomment;
>  	}
> -      else if (code == Scomment_fence)
> -	{
> -	  /* Record the comment style we have entered so that only
> -	     the comment-end sequence of the same style actually
> -	     terminates the comment section.  */
> -	  state.comstyle = ST_COMMENT_STYLE;
> -	  state.incomment = -1;
> -	  state.comstr_start = prev_from;
> -	  code = Scomment;
> -	}
> -      else if (code == Scomment)
> -	{
> -	  state.comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0);
> -	  state.incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ?
> -			     1 : -1);
> -	  state.comstr_start = prev_from;
> -	}
> +      else
> +        {
> +          INC_FROM;
> +          code = prev_from_syntax & 0xff;
> +          if (code == Scomment_fence)
> +            {
> +              /* Record the comment style we have entered so that only
> +                 the comment-end sequence of the same style actually
> +                 terminates the comment section.  */
> +              state->comstyle = ST_COMMENT_STYLE;
> +              state->incomment = -1;
> +              state->comstr_start = prev_from;
> +              code = Scomment;
> +            }
> +          else if (code == Scomment)
> +            {
> +              state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0);
> +              state->incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ?
> +                                 1 : -1);
> +              state->comstr_start = prev_from;
> +            }
> +        }
 
>        if (SYNTAX_FLAGS_PREFIX (prev_from_syntax))
>  	continue;
> @@ -3350,26 +3334,28 @@ do { prev_from = from;				\
 
>  	case Scomment_fence: /* Can't happen because it's handled above.  */
>  	case Scomment:
> -	  if (commentstop || boundary_stop) goto done;
> +          if (commentstop || boundary_stop) goto done;
>  	startincomment:
>  	  /* The (from == BEGV) test was to enter the loop in the middle so
>  	     that we find a 2-char comment ender even if we start in the
>  	     middle of it.  We don't want to do that if we're just at the
>  	     beginning of the comment (think of (*) ... (*)).  */
>  	  found = forw_comment (from, from_byte, end,
> -				state.incomment, state.comstyle,
> -				(from == BEGV || from < state.comstr_start + 3)
> -				? 0 : prev_from_syntax,
> -				&out_charpos, &out_bytepos, &state.incomment);
> +				state->incomment, state->comstyle,
> +				from == BEGV ? 0 : prev_from_syntax,
> +				&out_charpos, &out_bytepos, &state->incomment,
> +                                &prev_from_syntax);
>  	  from = out_charpos; from_byte = out_bytepos;
> -	  /* Beware!  prev_from and friends are invalid now.
> -	     Luckily, the `done' doesn't use them and the INC_FROM
> -	     sets them to a sane value without looking at them. */
> +	  /* Beware!  prev_from and friends (except prev_from_syntax)
> +	     are invalid now.  Luckily, the `done' doesn't use them
> +	     and the INC_FROM sets them to a sane value without
> +	     looking at them. */
>  	  if (!found) goto done;
>  	  INC_FROM;
> -	  state.incomment = 0;
> -	  state.comstyle = 0;	/* reset the comment style */
> -	  if (boundary_stop) goto done;
> +	  state->incomment = 0;
> +	  state->comstyle = 0;	/* reset the comment style */
> +	  prev_from_syntax = Smax; /* For the comment closer */
> +          if (boundary_stop) goto done;
>  	  break;
 
>  	case Sopen:
> @@ -3396,16 +3382,16 @@ do { prev_from = from;				\
 
>  	case Sstring:
>  	case Sstring_fence:
> -	  state.comstr_start = from - 1;
> +	  state->comstr_start = from - 1;
>  	  if (stopbefore) goto stop;  /* this arg means stop at sexp start */
curlevel-> last = prev_from;
> -	  state.instring = (code == Sstring
> +	  state->instring = (code == Sstring
>  			    ? (FETCH_CHAR_AS_MULTIBYTE (prev_from_byte))
>  			    : ST_STRING_STYLE);
>  	  if (boundary_stop) goto done;
>  	startinstring:
>  	  {
> -	    nofence = state.instring != ST_STRING_STYLE;
> +	    nofence = state->instring != ST_STRING_STYLE;
 
>  	    while (1)
>  	      {
> @@ -3419,7 +3405,7 @@ do { prev_from = from;				\
>  		/* Check C_CODE here so that if the char has
>  		   a syntax-table property which says it is NOT
>  		   a string character, it does not end the string.  */
> -		if (nofence && c == state.instring && c_code == Sstring)
> +		if (nofence && c == state->instring && c_code == Sstring)
>  		  break;
 
>  		switch (c_code)
> @@ -3442,7 +3428,7 @@ do { prev_from = from;				\
>  	      }
>  	  }
>  	string_end:
> -	  state.instring = -1;
> +	  state->instring = -1;
curlevel-> prev = curlevel->last;
>  	  INC_FROM;
>  	  if (boundary_stop) goto done;
> @@ -3461,25 +3447,96 @@ do { prev_from = from;				\
>   stop:   /* Here if stopping before start of sexp. */
>    from = prev_from;    /* We have just fetched the char that starts it; */
>    from_byte = prev_from_byte;
> +  prev_from_syntax = prev_prev_from_syntax;
>    goto done; /* but return the position before it. */
 
>   endquoted:
> -  state.quoted = 1;
> +  state->quoted = 1;
>   done:
> -  state.depth = depth;
> -  state.mindepth = mindepth;
> -  state.thislevelstart = curlevel->prev;
> -  state.prevlevelstart
> +  state->depth = depth;
> +  state->mindepth = mindepth;
> +  state->thislevelstart = curlevel->prev;
> +  state->prevlevelstart
>      = (curlevel == levelstart) ? -1 : (curlevel - 1)->last;
> -  state.location = from;
> -  state.location_byte = from_byte;
> -  state.levelstarts = Qnil;
> +  state->location = from;
> +  state->location_byte = from_byte;
> +  state->levelstarts = Qnil;
>    while (curlevel > levelstart)
> -    state.levelstarts = Fcons (make_number ((--curlevel)->last),
> -			       state.levelstarts);
> +    state->levelstarts = Fcons (make_number ((--curlevel)->last),
> +                                state->levelstarts);
> +  state->prev_syntax = (SYNTAX_FLAGS_COMSTARTEND_FIRST (prev_from_syntax)
> +                        || state->quoted) ? prev_from_syntax : Smax;
>    immediate_quit = 0;
> +}
> +
> +/* Convert a (lisp) parse state to the internal form used in
> +   scan_sexps_forward.  */
> +static void
> +internalize_parse_state (Lisp_Object external, struct lisp_parse_state *state)
> +{
> +  Lisp_Object tem;
> +
> +  if (NILP (external))
> +    {
> +      state->depth = 0;
> +      state->instring = -1;
> +      state->incomment = 0;
> +      state->quoted = 0;
> +      state->comstyle = 0;	/* comment style a by default.  */
> +      state->comstr_start = -1;	/* no comment/string seen.  */
> +      state->levelstarts = Qnil;
> +      state->prev_syntax = Smax;
> +    }
> +  else
> +    {
> +      tem = Fcar (external);
> +      if (!NILP (tem))
> +	state->depth = XINT (tem);
> +      else
> +	state->depth = 0;
> +
> +      external = Fcdr (external);
> +      external = Fcdr (external);
> +      external = Fcdr (external);
> +      tem = Fcar (external);
> +      /* Check whether we are inside string_fence-style string: */
> +      state->instring = (!NILP (tem)
> +                         ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE)
> +                         : -1);
> +
> +      external = Fcdr (external);
> +      tem = Fcar (external);
> +      state->incomment = (!NILP (tem)
> +                          ? (INTEGERP (tem) ? XINT (tem) : -1)
> +                          : 0);
> +
> +      external = Fcdr (external);
> +      tem = Fcar (external);
> +      state->quoted = !NILP (tem);
 
> -  *stateptr = state;
> +      /* if the eighth element of the list is nil, we are in comment
> +	 style a.  If it is non-nil, we are in comment style b */
> +      external = Fcdr (external);
> +      external = Fcdr (external);
> +      tem = Fcar (external);
> +      state->comstyle = (NILP (tem)
> +                         ? 0
> +                         : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE)
> +                            ? XINT (tem)
> +                            : ST_COMMENT_STYLE));
> +
> +      external = Fcdr (external);
> +      tem = Fcar (external);
> +      state->comstr_start =
> +	RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1;
> +      external = Fcdr (external);
> +      tem = Fcar (external);
> +      state->levelstarts = tem;
> +
> +      external = Fcdr (external);
> +      tem = Fcar (external);
> +      state->prev_syntax = NILP (tem) ? Smax : XINT (tem);
> +    }
>  }
 
>  DEFUN ("parse-partial-sexp", Fparse_partial_sexp, Sparse_partial_sexp, 2, 6, 0,
> @@ -3488,6 +3545,7 @@ Parsing stops at TO or when certain criteria are met;
>   point is set to where parsing stops.
>  If fifth arg OLDSTATE is omitted or nil,
>   parsing assumes that FROM is the beginning of a function.
> +
>  Value is a list of elements describing final state of parsing:
>   0. depth in parens.
>   1. character address of start of innermost containing list; nil if none.
> @@ -3501,16 +3559,22 @@ Value is a list of elements describing final state of parsing:
>   6. the minimum paren-depth encountered during this scan.
>   7. style of comment, if any.
>   8. character address of start of comment or string; nil if not in one.
> - 9. Intermediate data for continuation of parsing (subject to change).
> + 9. List of positions of currently open parens, outermost first.
> +10. When the last position scanned holds the first character of a
> +    (potential) two character construct, the syntax of that position,
> +    otherwise nil.  That construct can be a two character comment
> +    delimiter or an Escaped or Char-quoted character.
> +11..... Possible further internal information used by `parse-partial-sexp'.
> +
>  If third arg TARGETDEPTH is non-nil, parsing stops if the depth
>  in parentheses becomes equal to TARGETDEPTH.
> -Fourth arg STOPBEFORE non-nil means stop when come to
> +Fourth arg STOPBEFORE non-nil means stop when we come to
>   any character that starts a sexp.
>  Fifth arg OLDSTATE is a list like what this function returns.
>   It is used to initialize the state of the parse.  Elements number 1, 2, 6
>   are ignored.
> -Sixth arg COMMENTSTOP non-nil means stop at the start of a comment.
> - If it is symbol `syntax-table', stop after the start of a comment or a
> +Sixth arg COMMENTSTOP non-nil means stop after the start of a comment.
> + If it is the symbol `syntax-table', stop after the start of a comment or a
>   string, or after end of a comment or a string.  */)
>    (Lisp_Object from, Lisp_Object to, Lisp_Object targetdepth,
>     Lisp_Object stopbefore, Lisp_Object oldstate, Lisp_Object commentstop)
> @@ -3527,15 +3591,17 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment.
>      target = TYPE_MINIMUM (EMACS_INT);	/* We won't reach this depth.  */
 
>    validate_region (&from, &to);
> +  internalize_parse_state (oldstate, &state);
>    scan_sexps_forward (&state, XINT (from), CHAR_TO_BYTE (XINT (from)),
>  		      XINT (to),
> -		      target, !NILP (stopbefore), oldstate,
> +		      target, !NILP (stopbefore),
>  		      (NILP (commentstop)
>  		       ? 0 : (EQ (commentstop, Qsyntax_table) ? -1 : 1)));
 
>    SET_PT_BOTH (state.location, state.location_byte);
 
> -  return Fcons (make_number (state.depth),
> +  return
> +    Fcons (make_number (state.depth),
>  	   Fcons (state.prevlevelstart < 0
>  		  ? Qnil : make_number (state.prevlevelstart),
>  	     Fcons (state.thislevelstart < 0
> @@ -3553,11 +3619,15 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment.
>  				  ? Qsyntax_table
>  				  : make_number (state.comstyle))
>  			       : Qnil),
> -			      Fcons (((state.incomment
> -				       || (state.instring >= 0))
> -				      ? make_number (state.comstr_start)
> -				      : Qnil),
> -				     Fcons (state.levelstarts, Qnil))))))))));
> +		         Fcons (((state.incomment
> +                                  || (state.instring >= 0))
> +                                 ? make_number (state.comstr_start)
> +                                 : Qnil),
> +			   Fcons (state.levelstarts,
> +                             Fcons (state.prev_syntax == Smax
> +                                    ? Qnil
> +                                    : make_number (state.prev_syntax),
> +                                Qnil)))))))))));
>  }
>  \f
>  void

> _______________________________________________
> Emacs-diffs mailing list
> Emacs-diffs@gnu.org
> https://lists.gnu.org/mailman/listinfo/emacs-diffs



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters
  2016-03-20 13:47   ` [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters Stefan Monnier
@ 2016-03-20 14:22     ` Alan Mackenzie
  2016-03-20 14:40     ` Alan Mackenzie
  1 sibling, 0 replies; 4+ messages in thread
From: Alan Mackenzie @ 2016-03-20 14:22 UTC (permalink / raw
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Sun, Mar 20, 2016 at 09:47:58AM -0400, Stefan Monnier wrote:
> What was John's opinion on reusing nth 5?

I haven't asked him, yet.  I didn't understand that your last post was a
suggestion that I should put that question to him.  I'll do that now.

I am not in favour of reusing element 5, but not at all strongly.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters
  2016-03-20 13:47   ` [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters Stefan Monnier
  2016-03-20 14:22     ` Alan Mackenzie
@ 2016-03-20 14:40     ` Alan Mackenzie
  2016-03-20 15:30       ` Stefan Monnier
  1 sibling, 1 reply; 4+ messages in thread
From: Alan Mackenzie @ 2016-03-20 14:40 UTC (permalink / raw
  To: John Wiegley; +Cc: Stefan Monnier, emacs-devel

Hello, John.

On Sun, Mar 20, 2016 at 09:47:58AM -0400, Stefan Monnier wrote:
> What was John's opinion on reusing nth 5?


>         Stefan

Yes John, what is your opinion on reusing element 5 of the parser state?

Background:

The previous edition of `parse-partial-sexp' returned a ten-element list
which could be used for continuing that parse.  Unfortunately, this list
was incomplete, leading to errors when the parse had stopped in the
middle of a two-character comment delimiter.  This is bug #23019.

My patch, created with Stefan's help, has solved this bug, partly by
adding another element onto the parser state.  But this new element
leaves element 5 ("t when just after an escape character") redundant.

Stefan is in favour of reusing position 5 for this new element, rather
than adding it at the end of the state (which has just been done).  I am
against this, though not strongly.  It might have some effects on
existing code, though this is not particularly likely.

What do you say?

-- 
Alan Mackenzie (Nuremberg, Germany).



> >>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes:

> > branch: master
> > commit 9dcf5998935c8aaa846d7585b81f0dcfe1935b3d
> > Author: Alan Mackenzie <acm@muc.de>
> > Commit: Alan Mackenzie <acm@muc.de>

> >     Amend parse-partial-sexp correctly to handle two character comment delimiters
    
> >     Do this by adding a new field to the parser state: the syntax of the last
> >     character scanned, should that be the first char of a (potential) two char
> >     construct, nil otherwise.
> >     This should make the parser state complete.
> >     Also document element 9 of the parser state.  Also refactor the code a bit.
    
> >     * src/syntax.c (struct lisp_parse_state): Add a new field.
> >     (SYNTAX_FLAGS_COMSTARTEND_FIRST): New function.
> >     (internalize_parse_state): New function, extracted from scan_sexps_forward.
> >     (back_comment): Call internalize_parse_state.
> >     (forw_comment): Return the syntax of the last character scanned to the caller
> >     when that character might be the first of a two character construct.
> >     (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment.
> >     (scan_sexps_forward): Remove a redundant state parameter.  Access all `state'
> >     information via the address parameter `state'.  Remove the code which converts
> >     from external to internal form of `state'.  Access buffer contents only from
> >     `from' onwards.  Reformulate code at the top of the main loop correctly to
> >     recognize comment openers when starting in the middle of one.  Call
> >     forw_comment with extra argument (for return of syntax value of possible first
> >     char of a two char construct).
> >     (Fparse_partial_sexp): Document elements 9, 10 of the parser state in the
> >     doc string.  Clarify the doc string in general.  Call
> >     internalize_parse_state.  Take account of the new elements when consing up the
> >     output parser state.
    
> >     * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new
> >     element 10.  Minor wording corrections (remove reference to "trivial
> >     cases").
> >     (Low Level Parsing): Minor corrections.
    
> >     * etc/NEWS: Note new element 10, and documentation of element 9 of parser
> >     state.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters
  2016-03-20 14:40     ` Alan Mackenzie
@ 2016-03-20 15:30       ` Stefan Monnier
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Monnier @ 2016-03-20 15:30 UTC (permalink / raw
  To: John Wiegley, emacs-devel; +Cc: Alan Mackenzie

What he says,


        Stefan


>>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes:

> Hello, John.
> On Sun, Mar 20, 2016 at 09:47:58AM -0400, Stefan Monnier wrote:
>> What was John's opinion on reusing nth 5?


>> Stefan

> Yes John, what is your opinion on reusing element 5 of the parser state?

> Background:

> The previous edition of `parse-partial-sexp' returned a ten-element list
> which could be used for continuing that parse.  Unfortunately, this list
> was incomplete, leading to errors when the parse had stopped in the
> middle of a two-character comment delimiter.  This is bug #23019.

> My patch, created with Stefan's help, has solved this bug, partly by
> adding another element onto the parser state.  But this new element
> leaves element 5 ("t when just after an escape character") redundant.

> Stefan is in favour of reusing position 5 for this new element, rather
> than adding it at the end of the state (which has just been done).  I am
> against this, though not strongly.  It might have some effects on
> existing code, though this is not particularly likely.

> What do you say?

> -- 
> Alan Mackenzie (Nuremberg, Germany).



>> >>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes:

>> > branch: master
>> > commit 9dcf5998935c8aaa846d7585b81f0dcfe1935b3d
>> > Author: Alan Mackenzie <acm@muc.de>
>> > Commit: Alan Mackenzie <acm@muc.de>

>> >     Amend parse-partial-sexp correctly to handle two character comment delimiters
    
>> >     Do this by adding a new field to the parser state: the syntax of the last
>> >     character scanned, should that be the first char of a (potential) two char
>> >     construct, nil otherwise.
>> >     This should make the parser state complete.
>> >     Also document element 9 of the parser state.  Also refactor the code a bit.
    
>> >     * src/syntax.c (struct lisp_parse_state): Add a new field.
>> >     (SYNTAX_FLAGS_COMSTARTEND_FIRST): New function.
>> >     (internalize_parse_state): New function, extracted from scan_sexps_forward.
>> >     (back_comment): Call internalize_parse_state.
>> >     (forw_comment): Return the syntax of the last character scanned to the caller
>> >     when that character might be the first of a two character construct.
>> >     (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment.
>> >     (scan_sexps_forward): Remove a redundant state parameter.  Access all `state'
>> >     information via the address parameter `state'.  Remove the code which converts
>> >     from external to internal form of `state'.  Access buffer contents only from
>> >     `from' onwards.  Reformulate code at the top of the main loop correctly to
>> >     recognize comment openers when starting in the middle of one.  Call
>> >     forw_comment with extra argument (for return of syntax value of possible first
>> >     char of a two char construct).
>> >     (Fparse_partial_sexp): Document elements 9, 10 of the parser state in the
>> >     doc string.  Clarify the doc string in general.  Call
>> >     internalize_parse_state.  Take account of the new elements when consing up the
>> >     output parser state.
    
>> >     * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new
>> >     element 10.  Minor wording corrections (remove reference to "trivial
>> >     cases").
>> >     (Low Level Parsing): Minor corrections.
    
>> >     * etc/NEWS: Note new element 10, and documentation of element 9 of parser
>> >     state.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-03-20 15:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20160320132125.5713.70103@vcs.savannah.gnu.org>
     [not found] ` <E1ahdIL-0001Ul-Uz@vcs.savannah.gnu.org>
2016-03-20 13:47   ` [Emacs-diffs] master 9dcf599: Amend parse-partial-sexp correctly to handle two character comment delimiters Stefan Monnier
2016-03-20 14:22     ` Alan Mackenzie
2016-03-20 14:40     ` Alan Mackenzie
2016-03-20 15:30       ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.