bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance.

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Alan Mackenzie <acm@muc.de>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 23019@debbugs.gnu.org
Subject: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance.
Date: Fri, 18 Mar 2016 18:25:47 +0000	[thread overview]
Message-ID: <20160318182547.GB9433@acm.fritz.box> (raw)
In-Reply-To: <jwv8u1f930c.fsf-monnier+emacsbugs@gnu.org>

Hello, Stefan.

On Fri, Mar 18, 2016 at 12:23:02PM -0400, Stefan Monnier wrote:

> >> - change element 10 so it's nil if the last char was an "end of
> >> something".  Another way to look at it, is that the element 10 should
> >> only be non-nil if the "next lexeme" might start on that
> >> previous character.

> > I've tried this, and it's somewhat ugly.  Setting the "previous_syntax"
> > to nil is also needed for the asterisk in "/*".  The nil would appear to
> > mean "the syntactic value of the last character has already been used
> > up".  So the "previous_syntax" is nil in the most interesting cases.  It
> > also feels somewhat ad-hoc.

> > How about this idea: element 10 will record the syntax of the previous
> > character ONLY when it is potentially the first character of a two
> > character comment delimiter, otherwise it'll be nil.  At least that's
> > being honest about what the thing's being used for.

> IIUC the only difference between what I (think I) suggested and what
> you're proposing is that you want to return nil for the "prev is
> backslash" whereas I was suggesting to return non-nil in that case.
> [ AFAIK the only two-char elements we handle so far as the comment
> delimiters and the backslash escapes.  ]

We also have Scharquote, which scan_sexps_forward handles identically to
Sescape.

> Do I understand this right?

Yes, but I've no strong feelings on the matter.

> > It would appear to be, yes.  We really can't get rid of element 5,
> > though, because there will surely be code out there that uses it.  But
> > if I change element 10 as outlined above, element 5 will no longer be
> > redundant.

> I'd even be tempted to re-use element 5, although it might
> conceivably break some code out there.

I have bad feelings about that.  Is it really worth the risk, just to
save one cons cell on a list that not that many instances of exist at
any time?

> But even if we don't re-use element 5, I would actually much prefer to
> render element 5 redundant.

OK.  Here's an updated patch which does just that.  Comments would be
welcome.

>         Stefan


Amend parse-partial-sexp correctly to handle two character comment delimiters

Do this by adding a new field to the parser state: the syntax of the last
character scanned, should that be the first char of a (potential) two char
construct, nil otherwise.
This should make the parser state complete.
Also document element 9 of the parser state.  Also refactor the code a bit.

* src/syntax.c (struct lisp_parse_state): Add a new field.
(SYNTAX_FLAGS_COMSTARTEND_FIRST): New function.
(internalize_parse_state): New function, extracted from scan_sexps_forward.
(back_comment): Call internalize_parse_state.
(forw_comment): Return the syntax of the last character scanned to the caller.
(Fforward_comment, scan_lists): New dummy variables, passed to forw_comment.
(scan_sexps_forward): Remove a redundant state parameter.  Access all `state'
information via the address parameter `state'.  Remove the code which converts
from external to internal form of `state'.  Access buffer contents only from
`from' onwards.  Reformulate code at the top of the main loop correctly to
recognize comment openers when starting in the middle of one.  Call
forw_comment with extra argument (for return of final syntax value).
(Fparse_partial_sexp): Document elements 9, 10 of the parser state in the
doc string.  Clarify the doc string in general.  Call
internalize_parse_state.  Take account of the new elements when consing up the
output parser state.

* doc/lispref/syntax.texi: (Parser State): Document element 9 and the new
element 10.  Minor wording corrections (remove reference to "trivial cases").
(Low Level Parsing): Minor corrections.




diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi
index d5a7eba..f81c164 100644
--- a/doc/lispref/syntax.texi
+++ b/doc/lispref/syntax.texi
@@ -791,10 +791,10 @@ Parser State
 @subsection Parser State
 @cindex parser state
 
-  A @dfn{parser state} is a list of ten elements describing the state
-of the syntactic parser, after it parses the text between a specified
-starting point and a specified end point in the buffer.  Parsing
-functions such as @code{syntax-ppss}
+  A @dfn{parser state} is a list of (currently) eleven elements
+describing the state of the syntactic parser, after it parses the text
+between a specified starting point and a specified end point in the
+buffer.  Parsing functions such as @code{syntax-ppss}
 @ifnottex
 (@pxref{Position Parse})
 @end ifnottex
@@ -851,15 +851,20 @@ Parser State
 this element is @code{nil}.
 
 @item
-Internal data for continuing the parsing.  The meaning of this
-data is subject to change; it is used if you pass this list
-as the @var{state} argument to another call.
+The list of the positions of the currently open parentheses, starting
+with the outermost.
+
+@item
+When the last buffer position scanned was the (potential) first
+character of a two character construct (comment delimiter or
+escaped/char-quoted character pair), the @var{syntax-code}
+(@pxref{Syntax Table Internals}) of that position.  Otherwise
+@code{nil}.
 @end enumerate
 
   Elements 1, 2, and 6 are ignored in a state which you pass as an
-argument to continue parsing, and elements 8 and 9 are used only in
-trivial cases.  Those elements are mainly used internally by the
-parser code.
+argument to continue parsing.  Elements 9 and 10 are mainly used
+internally by the parser code.
 
   One additional piece of useful information is available from a
 parser state using this function:
@@ -898,11 +903,11 @@ Low-Level Parsing
 
 If the fourth argument @var{stop-before} is non-@code{nil}, parsing
 stops when it comes to any character that starts a sexp.  If
-@var{stop-comment} is non-@code{nil}, parsing stops when it comes to the
-start of an unnested comment.  If @var{stop-comment} is the symbol
+@var{stop-comment} is non-@code{nil}, parsing stops after the start of
+an unnested comment.  If @var{stop-comment} is the symbol
 @code{syntax-table}, parsing stops after the start of an unnested
-comment or a string, or the end of an unnested comment or a string,
-whichever comes first.
+comment or a string, or after the end of an unnested comment or a
+string, whichever comes first.
 
 If @var{state} is @code{nil}, @var{start} is assumed to be at the top
 level of parenthesis structure, such as the beginning of a function
diff --git a/src/syntax.c b/src/syntax.c
index 249d0d5..e6a1942 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -81,6 +81,11 @@ SYNTAX_FLAGS_COMEND_SECOND (int flags)
   return (flags >> 19) & 1;
 }
 static bool
+SYNTAX_FLAGS_COMSTARTEND_FIRST (int flags)
+{
+  return (flags & 0x50000) != 0;
+}
+static bool
 SYNTAX_FLAGS_PREFIX (int flags)
 {
   return (flags >> 20) & 1;
@@ -153,6 +158,10 @@ struct lisp_parse_state
     ptrdiff_t comstr_start;  /* Position of last comment/string starter.  */
     Lisp_Object levelstarts; /* Char numbers of starts-of-expression
 				of levels (starting from outermost).  */
+    int prev_syntax; /* Syntax of previous position scanned, when
+                        that position (potentially) holds the first char
+                        of a 2-char construct, i.e. comment delimiter
+                        or Sescape, etc.  Smax otherwise. */
   };
 \f
 /* These variables are a cache for finding the start of a defun.
@@ -176,7 +185,8 @@ static Lisp_Object skip_syntaxes (bool, Lisp_Object, Lisp_Object);
 static Lisp_Object scan_lists (EMACS_INT, EMACS_INT, EMACS_INT, bool);
 static void scan_sexps_forward (struct lisp_parse_state *,
                                 ptrdiff_t, ptrdiff_t, ptrdiff_t, EMACS_INT,
-                                bool, Lisp_Object, int);
+                                bool, int);
+static void internalize_parse_state (Lisp_Object, struct lisp_parse_state *);
 static bool in_classes (int, Lisp_Object);
 static void parse_sexp_propertize (ptrdiff_t charpos);
 
@@ -911,10 +921,11 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
 	}
       do
 	{
+          internalize_parse_state (Qnil, &state);
 	  scan_sexps_forward (&state,
 			      defun_start, defun_start_byte,
 			      comment_end, TYPE_MINIMUM (EMACS_INT),
-			      0, Qnil, 0);
+			      0, 0);
 	  defun_start = comment_end;
 	  if (!adjusted)
 	    {
@@ -2314,7 +2325,9 @@ in_classes (int c, Lisp_Object iso_classes)
    into *CHARPOS_PTR and the corresponding bytepos into *BYTEPOS_PTR.
    Else, return false and store the charpos STOP into *CHARPOS_PTR, the
    corresponding bytepos into *BYTEPOS_PTR and the current nesting
-   (as defined for state.incomment) in *INCOMMENT_PTR.
+   (as defined for state->incomment) in *INCOMMENT_PTR.  The
+   SYNTAX_WITH_FLAGS of the last character scanned in the comment is
+   stored into *last_syntax_ptr.
 
    The comment end is the last character of the comment rather than the
    character just after the comment.
@@ -2326,7 +2339,7 @@ static bool
 forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
 	      EMACS_INT nesting, int style, int prev_syntax,
 	      ptrdiff_t *charpos_ptr, ptrdiff_t *bytepos_ptr,
-	      EMACS_INT *incomment_ptr)
+	      EMACS_INT *incomment_ptr, int *last_syntax_ptr)
 {
   register int c, c1;
   register enum syntaxcode code;
@@ -2346,6 +2359,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
 	  *incomment_ptr = nesting;
 	  *charpos_ptr = from;
 	  *bytepos_ptr = from_byte;
+          *last_syntax_ptr = syntax;
 	  return 0;
 	}
       c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
@@ -2415,6 +2429,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
     }
   *charpos_ptr = from;
   *bytepos_ptr = from_byte;
+  *last_syntax_ptr = syntax;
   return 1;
 }
 
@@ -2436,6 +2451,7 @@ between them, return t; otherwise return nil.  */)
   EMACS_INT count1;
   ptrdiff_t out_charpos, out_bytepos;
   EMACS_INT dummy;
+  int dummy2;
 
   CHECK_NUMBER (count);
   count1 = XINT (count);
@@ -2499,7 +2515,7 @@ between them, return t; otherwise return nil.  */)
 	}
       /* We're at the start of a comment.  */
       found = forw_comment (from, from_byte, stop, comnested, comstyle, 0,
-			    &out_charpos, &out_bytepos, &dummy);
+			    &out_charpos, &out_bytepos, &dummy, &dummy2);
       from = out_charpos; from_byte = out_bytepos;
       if (!found)
 	{
@@ -2659,6 +2675,7 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag)
   ptrdiff_t from_byte;
   ptrdiff_t out_bytepos, out_charpos;
   EMACS_INT dummy;
+  int dummy2;
   bool multibyte_symbol_p = sexpflag && multibyte_syntax_as_symbol;
 
   if (depth > 0) min_depth = 0;
@@ -2755,7 +2772,8 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag)
 	      UPDATE_SYNTAX_TABLE_FORWARD (from);
 	      found = forw_comment (from, from_byte, stop,
 				    comnested, comstyle, 0,
-				    &out_charpos, &out_bytepos, &dummy);
+				    &out_charpos, &out_bytepos, &dummy,
+                                    &dummy2);
 	      from = out_charpos, from_byte = out_bytepos;
 	      if (!found)
 		{
@@ -3119,7 +3137,7 @@ the prefix syntax flag (p).  */)
 }
 \f
 /* Parse forward from FROM / FROM_BYTE to END,
-   assuming that FROM has state OLDSTATE (nil means FROM is start of function),
+   assuming that FROM has state STATE,
    and return a description of the state of the parse at END.
    If STOPBEFORE, stop at the start of an atom.
    If COMMENTSTOP is 1, stop at the start of a comment.
@@ -3127,12 +3145,11 @@ the prefix syntax flag (p).  */)
    after the beginning of a string, or after the end of a string.  */
 
 static void
-scan_sexps_forward (struct lisp_parse_state *stateptr,
+scan_sexps_forward (struct lisp_parse_state *state,
 		    ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t end,
 		    EMACS_INT targetdepth, bool stopbefore,
-		    Lisp_Object oldstate, int commentstop)
+		    int commentstop)
 {
-  struct lisp_parse_state state;
   enum syntaxcode code;
   int c1;
   bool comnested;
@@ -3148,7 +3165,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr,
   Lisp_Object tem;
   ptrdiff_t prev_from;		/* Keep one character before FROM.  */
   ptrdiff_t prev_from_byte;
-  int prev_from_syntax;
+  int prev_from_syntax, prev_prev_from_syntax;
   bool boundary_stop = commentstop == -1;
   bool nofence;
   bool found;
@@ -3165,6 +3182,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr,
 do { prev_from = from;				\
      prev_from_byte = from_byte; 		\
      temp = FETCH_CHAR_AS_MULTIBYTE (prev_from_byte);	\
+     prev_prev_from_syntax = prev_from_syntax;  \
      prev_from_syntax = SYNTAX_WITH_FLAGS (temp); \
      INC_BOTH (from, from_byte);		\
      if (from < end)				\
@@ -3174,88 +3192,38 @@ do { prev_from = from;				\
   immediate_quit = 1;
   QUIT;
 
-  if (NILP (oldstate))
-    {
-      depth = 0;
-      state.instring = -1;
-      state.incomment = 0;
-      state.comstyle = 0;	/* comment style a by default.  */
-      state.comstr_start = -1;	/* no comment/string seen.  */
-    }
-  else
-    {
-      tem = Fcar (oldstate);
-      if (!NILP (tem))
-	depth = XINT (tem);
-      else
-	depth = 0;
-
-      oldstate = Fcdr (oldstate);
-      oldstate = Fcdr (oldstate);
-      oldstate = Fcdr (oldstate);
-      tem = Fcar (oldstate);
-      /* Check whether we are inside string_fence-style string: */
-      state.instring = (!NILP (tem)
-			? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE)
-			: -1);
+  depth = state->depth;
+  start_quoted = state->quoted;
+  prev_prev_from_syntax = Smax;
+  prev_from_syntax = state->prev_syntax;
 
-      oldstate = Fcdr (oldstate);
-      tem = Fcar (oldstate);
-      state.incomment = (!NILP (tem)
-			 ? (INTEGERP (tem) ? XINT (tem) : -1)
-			 : 0);
-
-      oldstate = Fcdr (oldstate);
-      tem = Fcar (oldstate);
-      start_quoted = !NILP (tem);
-
-      /* if the eighth element of the list is nil, we are in comment
-	 style a.  If it is non-nil, we are in comment style b */
-      oldstate = Fcdr (oldstate);
-      oldstate = Fcdr (oldstate);
-      tem = Fcar (oldstate);
-      state.comstyle = (NILP (tem)
-			? 0
-			: (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE)
-			   ? XINT (tem)
-			   : ST_COMMENT_STYLE));
-
-      oldstate = Fcdr (oldstate);
-      tem = Fcar (oldstate);
-      state.comstr_start =
-	RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1;
-      oldstate = Fcdr (oldstate);
-      tem = Fcar (oldstate);
-      while (!NILP (tem))		/* >= second enclosing sexps.  */
-	{
-	  Lisp_Object temhd = Fcar (tem);
-	  if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX))
-	    curlevel->last = XINT (temhd);
-	  if (++curlevel == endlevel)
-	    curlevel--; /* error ("Nesting too deep for parser"); */
-	  curlevel->prev = -1;
-	  curlevel->last = -1;
-	  tem = Fcdr (tem);
-	}
+  tem = state->levelstarts;
+  while (!NILP (tem))		/* >= second enclosing sexps.  */
+    {
+      Lisp_Object temhd = Fcar (tem);
+      if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX))
+        curlevel->last = XINT (temhd);
+      if (++curlevel == endlevel)
+        curlevel--; /* error ("Nesting too deep for parser"); */
+      curlevel->prev = -1;
+      curlevel->last = -1;
+      tem = Fcdr (tem);
     }
-  state.quoted = 0;
-  mindepth = depth;
-
   curlevel->prev = -1;
   curlevel->last = -1;
 
-  SETUP_SYNTAX_TABLE (prev_from, 1);
-  temp = FETCH_CHAR (prev_from_byte);
-  prev_from_syntax = SYNTAX_WITH_FLAGS (temp);
-  UPDATE_SYNTAX_TABLE_FORWARD (from);
+  state->quoted = 0;
+  mindepth = depth;
+
+  SETUP_SYNTAX_TABLE (from, 1);
 
   /* Enter the loop at a place appropriate for initial state.  */
 
-  if (state.incomment)
+  if (state->incomment)
     goto startincomment;
-  if (state.instring >= 0)
+  if (state->instring >= 0)
     {
-      nofence = state.instring != ST_STRING_STYLE;
+      nofence = state->instring != ST_STRING_STYLE;
       if (start_quoted)
 	goto startquotedinstring;
       goto startinstring;
@@ -3266,11 +3234,8 @@ do { prev_from = from;				\
   while (from < end)
     {
       int syntax;
-      INC_FROM;
-      code = prev_from_syntax & 0xff;
 
-      if (from < end
-	  && SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax)
+      if (SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax)
 	  && (c1 = FETCH_CHAR (from_byte),
 	      syntax = SYNTAX_WITH_FLAGS (c1),
 	      SYNTAX_FLAGS_COMSTART_SECOND (syntax)))
@@ -3280,32 +3245,39 @@ do { prev_from = from;				\
 	  /* Record the comment style we have entered so that only
 	     the comment-end sequence of the same style actually
 	     terminates the comment section.  */
-	  state.comstyle
+	  state->comstyle
 	    = SYNTAX_FLAGS_COMMENT_STYLE (syntax, prev_from_syntax);
 	  comnested = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax)
 		       | SYNTAX_FLAGS_COMMENT_NESTED (syntax));
-	  state.incomment = comnested ? 1 : -1;
-	  state.comstr_start = prev_from;
+	  state->incomment = comnested ? 1 : -1;
+	  state->comstr_start = prev_from;
 	  INC_FROM;
+          prev_from_syntax = Smax; /* the syntax has already been
+                                      "used up". */
 	  code = Scomment;
 	}
-      else if (code == Scomment_fence)
-	{
-	  /* Record the comment style we have entered so that only
-	     the comment-end sequence of the same style actually
-	     terminates the comment section.  */
-	  state.comstyle = ST_COMMENT_STYLE;
-	  state.incomment = -1;
-	  state.comstr_start = prev_from;
-	  code = Scomment;
-	}
-      else if (code == Scomment)
-	{
-	  state.comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0);
-	  state.incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ?
-			     1 : -1);
-	  state.comstr_start = prev_from;
-	}
+      else
+        {
+          INC_FROM;
+          code = prev_from_syntax & 0xff;
+          if (code == Scomment_fence)
+            {
+              /* Record the comment style we have entered so that only
+                 the comment-end sequence of the same style actually
+                 terminates the comment section.  */
+              state->comstyle = ST_COMMENT_STYLE;
+              state->incomment = -1;
+              state->comstr_start = prev_from;
+              code = Scomment;
+            }
+          else if (code == Scomment)
+            {
+              state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0);
+              state->incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ?
+                                 1 : -1);
+              state->comstr_start = prev_from;
+            }
+        }
 
       if (SYNTAX_FLAGS_PREFIX (prev_from_syntax))
 	continue;
@@ -3350,25 +3322,28 @@ do { prev_from = from;				\
 
 	case Scomment_fence: /* Can't happen because it's handled above.  */
 	case Scomment:
-	  if (commentstop || boundary_stop) goto done;
+          if (commentstop || boundary_stop) goto done;
 	startincomment:
 	  /* The (from == BEGV) test was to enter the loop in the middle so
 	     that we find a 2-char comment ender even if we start in the
 	     middle of it.  We don't want to do that if we're just at the
 	     beginning of the comment (think of (*) ... (*)).  */
 	  found = forw_comment (from, from_byte, end,
-				state.incomment, state.comstyle,
-				(from == BEGV || from < state.comstr_start + 3)
-				? 0 : prev_from_syntax,
-				&out_charpos, &out_bytepos, &state.incomment);
+				state->incomment, state->comstyle,
+				from == BEGV ? 0 : prev_from_syntax,
+				&out_charpos, &out_bytepos, &state->incomment,
+                                &prev_from_syntax);
 	  from = out_charpos; from_byte = out_bytepos;
-	  /* Beware!  prev_from and friends are invalid now.
-	     Luckily, the `done' doesn't use them and the INC_FROM
-	     sets them to a sane value without looking at them. */
+	  /* Beware!  prev_from and friends (except prev_from_syntax)
+	     are invalid now.  Luckily, the `done' doesn't use them
+	     and the INC_FROM sets them to a sane value without
+	     looking at them. */
 	  if (!found) goto done;
 	  INC_FROM;
-	  state.incomment = 0;
-	  state.comstyle = 0;	/* reset the comment style */
+	  state->incomment = 0;
+	  state->comstyle = 0;	/* reset the comment style */
+          prev_from_syntax = Smax; /* Ensure "*|*" can't open a spurious new
+                                      comment. */
 	  if (boundary_stop) goto done;
 	  break;
 
@@ -3396,16 +3371,16 @@ do { prev_from = from;				\
 
 	case Sstring:
 	case Sstring_fence:
-	  state.comstr_start = from - 1;
+	  state->comstr_start = from - 1;
 	  if (stopbefore) goto stop;  /* this arg means stop at sexp start */
 	  curlevel->last = prev_from;
-	  state.instring = (code == Sstring
+	  state->instring = (code == Sstring
 			    ? (FETCH_CHAR_AS_MULTIBYTE (prev_from_byte))
 			    : ST_STRING_STYLE);
 	  if (boundary_stop) goto done;
 	startinstring:
 	  {
-	    nofence = state.instring != ST_STRING_STYLE;
+	    nofence = state->instring != ST_STRING_STYLE;
 
 	    while (1)
 	      {
@@ -3419,7 +3394,7 @@ do { prev_from = from;				\
 		/* Check C_CODE here so that if the char has
 		   a syntax-table property which says it is NOT
 		   a string character, it does not end the string.  */
-		if (nofence && c == state.instring && c_code == Sstring)
+		if (nofence && c == state->instring && c_code == Sstring)
 		  break;
 
 		switch (c_code)
@@ -3442,7 +3417,7 @@ do { prev_from = from;				\
 	      }
 	  }
 	string_end:
-	  state.instring = -1;
+	  state->instring = -1;
 	  curlevel->prev = curlevel->last;
 	  INC_FROM;
 	  if (boundary_stop) goto done;
@@ -3461,25 +3436,96 @@ do { prev_from = from;				\
  stop:   /* Here if stopping before start of sexp. */
   from = prev_from;    /* We have just fetched the char that starts it; */
   from_byte = prev_from_byte;
+  prev_from_syntax = prev_prev_from_syntax;
   goto done; /* but return the position before it. */
 
  endquoted:
-  state.quoted = 1;
+  state->quoted = 1;
  done:
-  state.depth = depth;
-  state.mindepth = mindepth;
-  state.thislevelstart = curlevel->prev;
-  state.prevlevelstart
+  state->depth = depth;
+  state->mindepth = mindepth;
+  state->thislevelstart = curlevel->prev;
+  state->prevlevelstart
     = (curlevel == levelstart) ? -1 : (curlevel - 1)->last;
-  state.location = from;
-  state.location_byte = from_byte;
-  state.levelstarts = Qnil;
+  state->location = from;
+  state->location_byte = from_byte;
+  state->levelstarts = Qnil;
   while (curlevel > levelstart)
-    state.levelstarts = Fcons (make_number ((--curlevel)->last),
-			       state.levelstarts);
+    state->levelstarts = Fcons (make_number ((--curlevel)->last),
+                                state->levelstarts);
+  state->prev_syntax = (SYNTAX_FLAGS_COMSTARTEND_FIRST (prev_from_syntax)
+                        || state->quoted) ? prev_from_syntax : Smax;
   immediate_quit = 0;
+}
+
+/* Convert a (lisp) parse state to the internal form used in
+   scan_sexps_forward.  */
+static void
+internalize_parse_state (Lisp_Object external, struct lisp_parse_state *state)
+{
+  Lisp_Object tem;
+
+  if (NILP (external))
+    {
+      state->depth = 0;
+      state->instring = -1;
+      state->incomment = 0;
+      state->quoted = 0;
+      state->comstyle = 0;	/* comment style a by default.  */
+      state->comstr_start = -1;	/* no comment/string seen.  */
+      state->levelstarts = Qnil;
+      state->prev_syntax = Smax;
+    }
+  else
+    {
+      tem = Fcar (external);
+      if (!NILP (tem))
+	state->depth = XINT (tem);
+      else
+	state->depth = 0;
+
+      external = Fcdr (external);
+      external = Fcdr (external);
+      external = Fcdr (external);
+      tem = Fcar (external);
+      /* Check whether we are inside string_fence-style string: */
+      state->instring = (!NILP (tem)
+                         ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE)
+                         : -1);
+
+      external = Fcdr (external);
+      tem = Fcar (external);
+      state->incomment = (!NILP (tem)
+                          ? (INTEGERP (tem) ? XINT (tem) : -1)
+                          : 0);
+
+      external = Fcdr (external);
+      tem = Fcar (external);
+      state->quoted = !NILP (tem);
 
-  *stateptr = state;
+      /* if the eighth element of the list is nil, we are in comment
+	 style a.  If it is non-nil, we are in comment style b */
+      external = Fcdr (external);
+      external = Fcdr (external);
+      tem = Fcar (external);
+      state->comstyle = (NILP (tem)
+                         ? 0
+                         : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE)
+                            ? XINT (tem)
+                            : ST_COMMENT_STYLE));
+
+      external = Fcdr (external);
+      tem = Fcar (external);
+      state->comstr_start =
+	RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1;
+      external = Fcdr (external);
+      tem = Fcar (external);
+      state->levelstarts = tem;
+
+      external = Fcdr (external);
+      tem = Fcar (external);
+      state->prev_syntax = NILP (tem) ? Smax : XINT (tem);
+    }
 }
 
 DEFUN ("parse-partial-sexp", Fparse_partial_sexp, Sparse_partial_sexp, 2, 6, 0,
@@ -3488,6 +3534,7 @@ Parsing stops at TO or when certain criteria are met;
  point is set to where parsing stops.
 If fifth arg OLDSTATE is omitted or nil,
  parsing assumes that FROM is the beginning of a function.
+
 Value is a list of elements describing final state of parsing:
  0. depth in parens.
  1. character address of start of innermost containing list; nil if none.
@@ -3501,16 +3548,22 @@ Value is a list of elements describing final state of parsing:
  6. the minimum paren-depth encountered during this scan.
  7. style of comment, if any.
  8. character address of start of comment or string; nil if not in one.
- 9. Intermediate data for continuation of parsing (subject to change).
+ 9. List of positions of currently open parens, outermost first.
+10. When the last position scanned holds the first character of a
+    (potential) two character construct, the syntax of that position,
+    otherwise nil.  That construct can be a two character comment
+    delimiter or an Escaped or Char-quoted character.
+11..... Possible further internal information used by `parse-partial-sexp'.
+
 If third arg TARGETDEPTH is non-nil, parsing stops if the depth
 in parentheses becomes equal to TARGETDEPTH.
-Fourth arg STOPBEFORE non-nil means stop when come to
+Fourth arg STOPBEFORE non-nil means stop when we come to
  any character that starts a sexp.
 Fifth arg OLDSTATE is a list like what this function returns.
  It is used to initialize the state of the parse.  Elements number 1, 2, 6
  are ignored.
-Sixth arg COMMENTSTOP non-nil means stop at the start of a comment.
- If it is symbol `syntax-table', stop after the start of a comment or a
+Sixth arg COMMENTSTOP non-nil means stop after the start of a comment.
+ If it is the symbol `syntax-table', stop after the start of a comment or a
  string, or after end of a comment or a string.  */)
   (Lisp_Object from, Lisp_Object to, Lisp_Object targetdepth,
    Lisp_Object stopbefore, Lisp_Object oldstate, Lisp_Object commentstop)
@@ -3527,15 +3580,17 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment.
     target = TYPE_MINIMUM (EMACS_INT);	/* We won't reach this depth.  */
 
   validate_region (&from, &to);
+  internalize_parse_state (oldstate, &state);
   scan_sexps_forward (&state, XINT (from), CHAR_TO_BYTE (XINT (from)),
 		      XINT (to),
-		      target, !NILP (stopbefore), oldstate,
+		      target, !NILP (stopbefore),
 		      (NILP (commentstop)
 		       ? 0 : (EQ (commentstop, Qsyntax_table) ? -1 : 1)));
 
   SET_PT_BOTH (state.location, state.location_byte);
 
-  return Fcons (make_number (state.depth),
+  return
+    Fcons (make_number (state.depth),
 	   Fcons (state.prevlevelstart < 0
 		  ? Qnil : make_number (state.prevlevelstart),
 	     Fcons (state.thislevelstart < 0
@@ -3553,11 +3608,15 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment.
 				  ? Qsyntax_table
 				  : make_number (state.comstyle))
 			       : Qnil),
-			      Fcons (((state.incomment
-				       || (state.instring >= 0))
-				      ? make_number (state.comstr_start)
-				      : Qnil),
-				     Fcons (state.levelstarts, Qnil))))))))));
+		         Fcons (((state.incomment
+                                  || (state.instring >= 0))
+                                 ? make_number (state.comstr_start)
+                                 : Qnil),
+			   Fcons (state.levelstarts,
+                             Fcons (state.prev_syntax == Smax
+                                    ? Qnil
+                                    : make_number (state.prev_syntax),
+                                Qnil)))))))))));
 }
 \f
 void



-- 
Alan Mackenzie (Nuremberg, Germany).

next prev parent reply	other threads:[~2016-03-18 18:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-15  9:13 bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance Alan Mackenzie
2016-03-15  9:35 ` Andreas Röhler
2016-03-15 10:15   ` Alan Mackenzie
2016-03-15 13:38     ` Andreas Röhler
2016-03-17 12:58 ` Stefan Monnier
2016-03-17 21:49   ` Alan Mackenzie
2016-03-18  4:49     ` Stefan Monnier
2016-03-18 15:11       ` Alan Mackenzie
2016-03-18 15:22         ` Alan Mackenzie
2016-03-18 16:23         ` Stefan Monnier
2016-03-18 18:25           ` Alan Mackenzie [this message]
2016-03-18 19:36             ` Stefan Monnier
2016-03-19 17:06               ` Alan Mackenzie
2016-03-20  1:30                 ` Stefan Monnier
2016-03-20 13:41                   ` Alan Mackenzie
2016-04-03 22:53                   ` John Wiegley
2016-04-04 12:15                     ` Stefan Monnier
2016-04-05 12:54                     ` Alan Mackenzie
2016-04-05 13:50                       ` Stefan Monnier
2016-04-05 14:44                         ` Alan Mackenzie
2016-03-18 16:27     ` Stefan Monnier
2016-03-18 19:16       ` Alan Mackenzie
2016-03-18 19:40         ` Stefan Monnier

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:d5a7eba dfblob:f81c164 dfblob:249d0d5 dfblob:e6a1942 )
 OR (
bs:"bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance." )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160318182547.GB9433@acm.fritz.box \
    --to=acm@muc.de \
    --cc=23019@debbugs.gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.