Re: newline cache - Eli Zaretskii

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Eli Zaretskii <eliz@gnu.org>
To: rms@gnu.org
Cc: emacs-devel@gnu.org
Subject: Re: newline cache
Date: Tue, 22 Apr 2014 20:44:25 +0300	[thread overview]
Message-ID: <838uqxb6o6.fsf@gnu.org> (raw)
In-Reply-To: <E1WcTOw-0004HM-Fa@fencepost.gnu.org>

> Date: Tue, 22 Apr 2014 01:37:50 -0400
> From: Richard Stallman <rms@gnu.org>
> CC: emacs-devel@gnu.org
> 
>     I've seen a couple of problems in the current pretest which
>     disappeared once I turned off the cache in the Rmail buffer.  I've
>     tried to see what does Rmail do that triggers this, but came up
>     empty-handed.
> 
> I investigated  what was happening when the bug occurred.
> (forward-line 1) moved across an extra newline, down to point-max.

When forward-line errs, it's too late, because the cache is already
corrupted.  We need to catch the primitive that causes the corruption.

> But I don't know how the newline cache works, so I did not try to
> study that code.

The cache code is in region-cache.c.

> One possible approach is to write a builtin function to check the newline cache
> contents for valid correspondence to the buffer contents.
> I would not mind making Rmail run that in a post-command-hook.
> Then I could find out which command causes the cache to become incorrect.
> 
> But someone else will have to write that builtin function.

I did it, and installed that on the emacs-24 branch (will be merged to
trunk soon).  If you are tracking the trunk, and don't want to wait
for the merge, you can install the patch below in your sources and
rebuild.

The function I added returns an array with 2 sub-arrays, one with
newline positions according to the cache, the other with newline
positions according to the actual buffer contents.  Check these two
sub-arrays to be 'equal', and if they aren't, the cache was corrupted.

Here's the patch that I installed:

--- src/search.c	2014-03-16 16:28:34 +0000
+++ src/search.c	2014-04-22 17:37:35 +0000
@@ -3108,6 +3108,170 @@ DEFUN ("regexp-quote", Fregexp_quote, Sr
 				out - temp,
 				STRING_MULTIBYTE (string));
 }
+
+/* Like find_newline, but doesn't use the cache, and only searches forward.  */
+static ptrdiff_t
+find_newline1 (ptrdiff_t start, ptrdiff_t start_byte, ptrdiff_t end,
+	       ptrdiff_t end_byte, ptrdiff_t count, ptrdiff_t *shortage,
+	       ptrdiff_t *bytepos, bool allow_quit)
+{
+  if (count > 0)
+    {
+      if (!end)
+	end = ZV, end_byte = ZV_BYTE;
+    }
+  else
+    {
+      if (!end)
+	end = BEGV, end_byte = BEGV_BYTE;
+    }
+  if (end_byte == -1)
+    end_byte = CHAR_TO_BYTE (end);
+
+  if (shortage != 0)
+    *shortage = 0;
+
+  immediate_quit = allow_quit;
+
+  if (count > 0)
+    while (start != end)
+      {
+        /* Our innermost scanning loop is very simple; it doesn't know
+           about gaps, buffer ends, or the newline cache.  ceiling is
+           the position of the last character before the next such
+           obstacle --- the last character the dumb search loop should
+           examine.  */
+	ptrdiff_t tem, ceiling_byte = end_byte - 1;
+
+	if (start_byte == -1)
+	  start_byte = CHAR_TO_BYTE (start);
+
+        /* The dumb loop can only scan text stored in contiguous
+           bytes. BUFFER_CEILING_OF returns the last character
+           position that is contiguous, so the ceiling is the
+           position after that.  */
+	tem = BUFFER_CEILING_OF (start_byte);
+	ceiling_byte = min (tem, ceiling_byte);
+
+        {
+          /* The termination address of the dumb loop.  */
+	  unsigned char *lim_addr = BYTE_POS_ADDR (ceiling_byte) + 1;
+	  ptrdiff_t lim_byte = ceiling_byte + 1;
+
+	  /* Nonpositive offsets (relative to LIM_ADDR and LIM_BYTE)
+	     of the base, the cursor, and the next line.  */
+	  ptrdiff_t base = start_byte - lim_byte;
+	  ptrdiff_t cursor, next;
+
+	  for (cursor = base; cursor < 0; cursor = next)
+	    {
+              /* The dumb loop.  */
+	      unsigned char *nl = memchr (lim_addr + cursor, '\n', - cursor);
+	      next = nl ? nl - lim_addr : 0;
+
+              if (! nl)
+		break;
+	      next++;
+
+	      if (--count == 0)
+		{
+		  immediate_quit = 0;
+		  if (bytepos)
+		    *bytepos = lim_byte + next;
+		  return BYTE_TO_CHAR (lim_byte + next);
+		}
+            }
+
+	  start_byte = lim_byte;
+	  start = BYTE_TO_CHAR (start_byte);
+        }
+      }
+
+  immediate_quit = 0;
+  if (shortage)
+    *shortage = count;
+  if (bytepos)
+    {
+      *bytepos = start_byte == -1 ? CHAR_TO_BYTE (start) : start_byte;
+      eassert (*bytepos == CHAR_TO_BYTE (start));
+    }
+  return start;
+}
+
+DEFUN ("newline-cache-check", Fnewline_cache_check, Snewline_cache_check,
+       0, 1, 0,
+       doc: /* Check the newline cache of BUFFER against buffer contents.
+
+BUFFER defaults to the current buffer.
+
+Value is an array of 2 sub-arrays of buffer positions for newlines,
+the first based on the cache, the second based on actually scanning
+the buffer.  If the buffer doesn't have a cache, the value is nil.  */)
+  (Lisp_Object buffer)
+{
+  struct buffer *buf;
+  struct region_cache *nlcache;
+  ptrdiff_t shortage, nl_count_cache, nl_count_buf;
+  Lisp_Object cache_newlines, buf_newlines, val;
+  ptrdiff_t from, from_byte, found, i;
+
+  if (NILP (buffer))
+    buf = current_buffer;
+  else
+    {
+      CHECK_BUFFER (buffer);
+      buf = XBUFFER (buffer);
+    }
+  if (buf->base_buffer)
+    buf = buf->base_buffer;
+
+  /* If the buffer doesn't have a newline cache, return nil.  */
+  if (NILP (BVAR (buf, cache_long_scans))
+      || buf->newline_cache == NULL)
+    return Qnil;
+
+  /* How many newlines are there according to the cache?  */
+  find_newline (BUF_BEG (buf), BUF_BEG_BYTE (buf),
+		BUF_Z (buf), BUF_Z_BYTE (buf),
+		TYPE_MAXIMUM (ptrdiff_t), &shortage, NULL, true);
+  nl_count_cache = TYPE_MAXIMUM (ptrdiff_t) - shortage;
+
+  /* Create vector and populate it.  */
+  cache_newlines = make_uninit_vector (nl_count_cache);
+  for (from = BUF_BEG( buf), found = from, i = 0;
+       from < BUF_Z (buf);
+       from = found, i++)
+    {
+      ptrdiff_t from_byte = CHAR_TO_BYTE (from);
+
+      found = find_newline (from, from_byte, 0, -1, 1, &shortage, NULL, true);
+      if (shortage == 0)
+	ASET (cache_newlines, i, make_number (found - 1));
+    }
+
+  /* Now do the same, but without using the cache.  */
+  find_newline1 (BUF_BEG (buf), BUF_BEG_BYTE (buf),
+		 BUF_Z (buf), BUF_Z_BYTE (buf),
+		 TYPE_MAXIMUM (ptrdiff_t), &shortage, NULL, true);
+  nl_count_buf = TYPE_MAXIMUM (ptrdiff_t) - shortage;
+  buf_newlines = make_uninit_vector (nl_count_buf);
+  for (from = BUF_BEG( buf), found = from, i = 0;
+       from < BUF_Z (buf);
+       from = found, i++)
+    {
+      ptrdiff_t from_byte = CHAR_TO_BYTE (from);
+
+      found = find_newline1 (from, from_byte, 0, -1, 1, &shortage, NULL, true);
+      if (shortage == 0)
+	ASET (buf_newlines, i, make_number (found - 1));
+    }
+
+  /* Construct the value and return it.  */
+  val = make_uninit_vector (2);
+  ASET (val, 0, cache_newlines);
+  ASET (val, 1, buf_newlines);
+  return val;
+}
 \f
 void
 syms_of_search (void)
@@ -3180,4 +3344,5 @@ is to bind it with `let' around a small 
   defsubr (&Smatch_data);
   defsubr (&Sset_match_data);
   defsubr (&Sregexp_quote);
+  defsubr (&Snewline_cache_check);
 }

next prev parent reply	other threads:[~2014-04-22 17:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-21 16:28 newline cache Richard Stallman
2014-04-21 16:55 ` Eli Zaretskii
2014-04-22  5:37   ` Richard Stallman
2014-04-22 14:28     ` Stefan Monnier
2014-04-22 17:46       ` Eli Zaretskii
2014-04-22 17:44     ` Eli Zaretskii [this message]
2014-04-23  5:31       ` Richard Stallman
2014-04-23 15:14         ` Eli Zaretskii
2014-04-23  5:31       ` Richard Stallman
2014-04-23 15:23         ` Eli Zaretskii
2014-04-24  0:33           ` Richard Stallman
2014-04-25  9:20             ` Eli Zaretskii
2014-04-26 19:58               ` Eli Zaretskii
2014-04-27  2:42                 ` Eli Zaretskii
2014-04-29  8:41                   ` Jarek Czekalski
2014-04-29 14:25                     ` Eli Zaretskii
2014-05-21  8:41                   ` Damien Wyart
2014-05-21 13:09                     ` Stefan Monnier
2014-05-21 15:30                       ` Eli Zaretskii
2014-05-21 15:22                     ` Eli Zaretskii
2014-05-21 15:22                     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=838uqxb6o6.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.