From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: newline cache
Date: Tue, 22 Apr 2014 20:44:25 +0300
Message-ID: <838uqxb6o6.fsf@gnu.org>
References: <E1WcH4d-0004ok-KH@fencepost.gnu.org> <837g6id3mi.fsf@gnu.org>
	<E1WcTOw-0004HM-Fa@fencepost.gnu.org>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
X-Trace: ger.gmane.org 1398188699 4484 80.91.229.3 (22 Apr 2014 17:44:59 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 22 Apr 2014 17:44:59 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: rms@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Apr 22 19:44:52 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WcekT-0004HL-KO
	for ged-emacs-devel@m.gmane.org; Tue, 22 Apr 2014 19:44:49 +0200
Original-Received: from localhost ([::1]:56471 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WcekT-0003mp-AQ
	for ged-emacs-devel@m.gmane.org; Tue, 22 Apr 2014 13:44:49 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33947)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1WcekL-0003mi-3q
	for emacs-devel@gnu.org; Tue, 22 Apr 2014 13:44:47 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1WcekD-0006SQ-JZ
	for emacs-devel@gnu.org; Tue, 22 Apr 2014 13:44:41 -0400
Original-Received: from mtaout24.012.net.il ([80.179.55.180]:34643)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1WcekD-0006SK-4T; Tue, 22 Apr 2014 13:44:33 -0400
Original-Received: from conversion-daemon.mtaout24.012.net.il by mtaout24.012.net.il
	(HyperSendmail v2007.08) id <0N4G00400123ZZ00@mtaout24.012.net.il>;
	Tue, 22 Apr 2014 20:42:45 +0300 (IDT)
Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout24.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0N4G005DY179A400@mtaout24.012.net.il>;
	Tue, 22 Apr 2014 20:42:45 +0300 (IDT)
In-reply-to: <E1WcTOw-0004HM-Fa@fencepost.gnu.org>
X-012-Sender: halo1@inter.net.il
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 80.179.55.180
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:171583
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/171583>

> Date: Tue, 22 Apr 2014 01:37:50 -0400
> From: Richard Stallman <rms@gnu.org>
> CC: emacs-devel@gnu.org
> 
>     I've seen a couple of problems in the current pretest which
>     disappeared once I turned off the cache in the Rmail buffer.  I've
>     tried to see what does Rmail do that triggers this, but came up
>     empty-handed.
> 
> I investigated  what was happening when the bug occurred.
> (forward-line 1) moved across an extra newline, down to point-max.

When forward-line errs, it's too late, because the cache is already
corrupted.  We need to catch the primitive that causes the corruption.

> But I don't know how the newline cache works, so I did not try to
> study that code.

The cache code is in region-cache.c.

> One possible approach is to write a builtin function to check the newline cache
> contents for valid correspondence to the buffer contents.
> I would not mind making Rmail run that in a post-command-hook.
> Then I could find out which command causes the cache to become incorrect.
> 
> But someone else will have to write that builtin function.

I did it, and installed that on the emacs-24 branch (will be merged to
trunk soon).  If you are tracking the trunk, and don't want to wait
for the merge, you can install the patch below in your sources and
rebuild.

The function I added returns an array with 2 sub-arrays, one with
newline positions according to the cache, the other with newline
positions according to the actual buffer contents.  Check these two
sub-arrays to be 'equal', and if they aren't, the cache was corrupted.

Here's the patch that I installed:

--- src/search.c	2014-03-16 16:28:34 +0000
+++ src/search.c	2014-04-22 17:37:35 +0000
@@ -3108,6 +3108,170 @@ DEFUN ("regexp-quote", Fregexp_quote, Sr
 				out - temp,
 				STRING_MULTIBYTE (string));
 }
+
+/* Like find_newline, but doesn't use the cache, and only searches forward.  */
+static ptrdiff_t
+find_newline1 (ptrdiff_t start, ptrdiff_t start_byte, ptrdiff_t end,
+	       ptrdiff_t end_byte, ptrdiff_t count, ptrdiff_t *shortage,
+	       ptrdiff_t *bytepos, bool allow_quit)
+{
+  if (count > 0)
+    {
+      if (!end)
+	end = ZV, end_byte = ZV_BYTE;
+    }
+  else
+    {
+      if (!end)
+	end = BEGV, end_byte = BEGV_BYTE;
+    }
+  if (end_byte == -1)
+    end_byte = CHAR_TO_BYTE (end);
+
+  if (shortage != 0)
+    *shortage = 0;
+
+  immediate_quit = allow_quit;
+
+  if (count > 0)
+    while (start != end)
+      {
+        /* Our innermost scanning loop is very simple; it doesn't know
+           about gaps, buffer ends, or the newline cache.  ceiling is
+           the position of the last character before the next such
+           obstacle --- the last character the dumb search loop should
+           examine.  */
+	ptrdiff_t tem, ceiling_byte = end_byte - 1;
+
+	if (start_byte == -1)
+	  start_byte = CHAR_TO_BYTE (start);
+
+        /* The dumb loop can only scan text stored in contiguous
+           bytes. BUFFER_CEILING_OF returns the last character
+           position that is contiguous, so the ceiling is the
+           position after that.  */
+	tem = BUFFER_CEILING_OF (start_byte);
+	ceiling_byte = min (tem, ceiling_byte);
+
+        {
+          /* The termination address of the dumb loop.  */
+	  unsigned char *lim_addr = BYTE_POS_ADDR (ceiling_byte) + 1;
+	  ptrdiff_t lim_byte = ceiling_byte + 1;
+
+	  /* Nonpositive offsets (relative to LIM_ADDR and LIM_BYTE)
+	     of the base, the cursor, and the next line.  */
+	  ptrdiff_t base = start_byte - lim_byte;
+	  ptrdiff_t cursor, next;
+
+	  for (cursor = base; cursor < 0; cursor = next)
+	    {
+              /* The dumb loop.  */
+	      unsigned char *nl = memchr (lim_addr + cursor, '\n', - cursor);
+	      next = nl ? nl - lim_addr : 0;
+
+              if (! nl)
+		break;
+	      next++;
+
+	      if (--count == 0)
+		{
+		  immediate_quit = 0;
+		  if (bytepos)
+		    *bytepos = lim_byte + next;
+		  return BYTE_TO_CHAR (lim_byte + next);
+		}
+            }
+
+	  start_byte = lim_byte;
+	  start = BYTE_TO_CHAR (start_byte);
+        }
+      }
+
+  immediate_quit = 0;
+  if (shortage)
+    *shortage = count;
+  if (bytepos)
+    {
+      *bytepos = start_byte == -1 ? CHAR_TO_BYTE (start) : start_byte;
+      eassert (*bytepos == CHAR_TO_BYTE (start));
+    }
+  return start;
+}
+
+DEFUN ("newline-cache-check", Fnewline_cache_check, Snewline_cache_check,
+       0, 1, 0,
+       doc: /* Check the newline cache of BUFFER against buffer contents.
+
+BUFFER defaults to the current buffer.
+
+Value is an array of 2 sub-arrays of buffer positions for newlines,
+the first based on the cache, the second based on actually scanning
+the buffer.  If the buffer doesn't have a cache, the value is nil.  */)
+  (Lisp_Object buffer)
+{
+  struct buffer *buf;
+  struct region_cache *nlcache;
+  ptrdiff_t shortage, nl_count_cache, nl_count_buf;
+  Lisp_Object cache_newlines, buf_newlines, val;
+  ptrdiff_t from, from_byte, found, i;
+
+  if (NILP (buffer))
+    buf = current_buffer;
+  else
+    {
+      CHECK_BUFFER (buffer);
+      buf = XBUFFER (buffer);
+    }
+  if (buf->base_buffer)
+    buf = buf->base_buffer;
+
+  /* If the buffer doesn't have a newline cache, return nil.  */
+  if (NILP (BVAR (buf, cache_long_scans))
+      || buf->newline_cache == NULL)
+    return Qnil;
+
+  /* How many newlines are there according to the cache?  */
+  find_newline (BUF_BEG (buf), BUF_BEG_BYTE (buf),
+		BUF_Z (buf), BUF_Z_BYTE (buf),
+		TYPE_MAXIMUM (ptrdiff_t), &shortage, NULL, true);
+  nl_count_cache = TYPE_MAXIMUM (ptrdiff_t) - shortage;
+
+  /* Create vector and populate it.  */
+  cache_newlines = make_uninit_vector (nl_count_cache);
+  for (from = BUF_BEG( buf), found = from, i = 0;
+       from < BUF_Z (buf);
+       from = found, i++)
+    {
+      ptrdiff_t from_byte = CHAR_TO_BYTE (from);
+
+      found = find_newline (from, from_byte, 0, -1, 1, &shortage, NULL, true);
+      if (shortage == 0)
+	ASET (cache_newlines, i, make_number (found - 1));
+    }
+
+  /* Now do the same, but without using the cache.  */
+  find_newline1 (BUF_BEG (buf), BUF_BEG_BYTE (buf),
+		 BUF_Z (buf), BUF_Z_BYTE (buf),
+		 TYPE_MAXIMUM (ptrdiff_t), &shortage, NULL, true);
+  nl_count_buf = TYPE_MAXIMUM (ptrdiff_t) - shortage;
+  buf_newlines = make_uninit_vector (nl_count_buf);
+  for (from = BUF_BEG( buf), found = from, i = 0;
+       from < BUF_Z (buf);
+       from = found, i++)
+    {
+      ptrdiff_t from_byte = CHAR_TO_BYTE (from);
+
+      found = find_newline1 (from, from_byte, 0, -1, 1, &shortage, NULL, true);
+      if (shortage == 0)
+	ASET (buf_newlines, i, make_number (found - 1));
+    }
+
+  /* Construct the value and return it.  */
+  val = make_uninit_vector (2);
+  ASET (val, 0, cache_newlines);
+  ASET (val, 1, buf_newlines);
+  return val;
+}
 
 void
 syms_of_search (void)
@@ -3180,4 +3344,5 @@ is to bind it with `let' around a small 
   defsubr (&Smatch_data);
   defsubr (&Sset_match_data);
   defsubr (&Sregexp_quote);
+  defsubr (&Snewline_cache_check);
 }