Re: newline cache - Eli Zaretskii

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Eli Zaretskii <eliz@gnu.org>
To: rms@gnu.org
Cc: emacs-devel@gnu.org
Subject: Re: newline cache
Date: Wed, 23 Apr 2014 18:23:42 +0300	[thread overview]
Message-ID: <837g6g9iip.fsf@gnu.org> (raw)
In-Reply-To: <E1WcpmH-0000VT-KK@fencepost.gnu.org>

> Date: Wed, 23 Apr 2014 01:31:25 -0400
> From: Richard Stallman <rms@gnu.org>
> CC: emacs-devel@gnu.org
> 
>     The function I added returns an array with 2 sub-arrays, one with
>     newline positions according to the cache, the other with newline
>     positions according to the actual buffer contents.
> 
> That may take a very long time for my RMAIL buffer, which is 4 meg.  I
> don't think I could tolerate having that run after each Rmail command.

I suggest to try it; you might be surprised.  I tried it on a 7MB mbox
file, and didn't see any significant slowdown.  The reason is simple:
the mbox buffer is almost always narrowed, and find_newline, which is
the workhorse of the function I wrote, and also the main suspect, only
looks within the restriction.

In any case, if you decide to use the debugging code, please use the
patch below, which fixes a stupid thinko in the previous version.

> To make this fast enough for me to use it to localize the bug, it
> needs to operate only on a specified part of the buffer.

Narrowing already does that.  Anyway, it is impossible to predict in
advance in which portions of the buffer the corruption will happen, at
least not with the level of understanding of this bug that I have now.

FWIW, I played with this post-command-hook in a large mbox buffer, and
couldn't reproduce any problems yet.  So something else might be at
work here.

Here's an updated patch:

=== modified file 'src/search.c'
--- src/search.c	2014-03-16 16:28:34 +0000
+++ src/search.c	2014-04-23 15:21:25 +0000
@@ -3108,6 +3108,187 @@ DEFUN ("regexp-quote", Fregexp_quote, Sr
 				out - temp,
 				STRING_MULTIBYTE (string));
 }
+
+/* Like find_newline, but doesn't use the cache, and only searches forward.  */
+static ptrdiff_t
+find_newline1 (ptrdiff_t start, ptrdiff_t start_byte, ptrdiff_t end,
+	       ptrdiff_t end_byte, ptrdiff_t count, ptrdiff_t *shortage,
+	       ptrdiff_t *bytepos, bool allow_quit)
+{
+  if (count > 0)
+    {
+      if (!end)
+	end = ZV, end_byte = ZV_BYTE;
+    }
+  else
+    {
+      if (!end)
+	end = BEGV, end_byte = BEGV_BYTE;
+    }
+  if (end_byte == -1)
+    end_byte = CHAR_TO_BYTE (end);
+
+  if (shortage != 0)
+    *shortage = 0;
+
+  immediate_quit = allow_quit;
+
+  if (count > 0)
+    while (start != end)
+      {
+        /* Our innermost scanning loop is very simple; it doesn't know
+           about gaps, buffer ends, or the newline cache.  ceiling is
+           the position of the last character before the next such
+           obstacle --- the last character the dumb search loop should
+           examine.  */
+	ptrdiff_t tem, ceiling_byte = end_byte - 1;
+
+	if (start_byte == -1)
+	  start_byte = CHAR_TO_BYTE (start);
+
+        /* The dumb loop can only scan text stored in contiguous
+           bytes. BUFFER_CEILING_OF returns the last character
+           position that is contiguous, so the ceiling is the
+           position after that.  */
+	tem = BUFFER_CEILING_OF (start_byte);
+	ceiling_byte = min (tem, ceiling_byte);
+
+        {
+          /* The termination address of the dumb loop.  */
+	  unsigned char *lim_addr = BYTE_POS_ADDR (ceiling_byte) + 1;
+	  ptrdiff_t lim_byte = ceiling_byte + 1;
+
+	  /* Nonpositive offsets (relative to LIM_ADDR and LIM_BYTE)
+	     of the base, the cursor, and the next line.  */
+	  ptrdiff_t base = start_byte - lim_byte;
+	  ptrdiff_t cursor, next;
+
+	  for (cursor = base; cursor < 0; cursor = next)
+	    {
+              /* The dumb loop.  */
+	      unsigned char *nl = memchr (lim_addr + cursor, '\n', - cursor);
+	      next = nl ? nl - lim_addr : 0;
+
+              if (! nl)
+		break;
+	      next++;
+
+	      if (--count == 0)
+		{
+		  immediate_quit = 0;
+		  if (bytepos)
+		    *bytepos = lim_byte + next;
+		  return BYTE_TO_CHAR (lim_byte + next);
+		}
+            }
+
+	  start_byte = lim_byte;
+	  start = BYTE_TO_CHAR (start_byte);
+        }
+      }
+
+  immediate_quit = 0;
+  if (shortage)
+    *shortage = count;
+  if (bytepos)
+    {
+      *bytepos = start_byte == -1 ? CHAR_TO_BYTE (start) : start_byte;
+      eassert (*bytepos == CHAR_TO_BYTE (start));
+    }
+  return start;
+}
+
+DEFUN ("newline-cache-check", Fnewline_cache_check, Snewline_cache_check,
+       0, 1, 0,
+       doc: /* Check the newline cache of BUFFER against buffer contents.
+
+BUFFER defaults to the current buffer.
+
+Value is an array of 2 sub-arrays of buffer positions for newlines,
+the first based on the cache, the second based on actually scanning
+the buffer.  If the buffer doesn't have a cache, the value is nil.  */)
+  (Lisp_Object buffer)
+{
+  struct buffer *buf, *old = NULL;
+  ptrdiff_t shortage, nl_count_cache, nl_count_buf;
+  Lisp_Object cache_newlines, buf_newlines, val;
+  ptrdiff_t from, found, i;
+
+  if (NILP (buffer))
+    buf = current_buffer;
+  else
+    {
+      CHECK_BUFFER (buffer);
+      buf = XBUFFER (buffer);
+      old = current_buffer;
+    }
+  if (buf->base_buffer)
+    buf = buf->base_buffer;
+
+  /* If the buffer doesn't have a newline cache, return nil.  */
+  if (NILP (BVAR (buf, cache_long_scans))
+      || buf->newline_cache == NULL)
+    return Qnil;
+
+  /* find_newline can only work on the current buffer.  */
+  if (old != NULL)
+    set_buffer_internal_1 (buf);
+
+  /* How many newlines are there according to the cache?  */
+  find_newline (BEGV, BEGV_BYTE, ZV, ZV_BYTE,
+		TYPE_MAXIMUM (ptrdiff_t), &shortage, NULL, true);
+  nl_count_cache = TYPE_MAXIMUM (ptrdiff_t) - shortage;
+
+  /* Create vector and populate it.  */
+  cache_newlines = make_uninit_vector (nl_count_cache);
+
+  if (nl_count_cache)
+    {
+      for (from = BEGV, found = from, i = 0; from < ZV; from = found, i++)
+	{
+	  ptrdiff_t from_byte = CHAR_TO_BYTE (from);
+
+	  found = find_newline (from, from_byte, 0, -1, 1, &shortage,
+				NULL, true);
+	  if (shortage != 0 || i >= nl_count_cache)
+	    break;
+	  ASET (cache_newlines, i, make_number (found - 1));
+	}
+      /* Fill the rest of slots with an invalid position.  */
+      for ( ; i < nl_count_cache; i++)
+	ASET (cache_newlines, i, make_number (-1));
+    }
+
+  /* Now do the same, but without using the cache.  */
+  find_newline1 (BEGV, BEGV_BYTE, ZV, ZV_BYTE,
+		 TYPE_MAXIMUM (ptrdiff_t), &shortage, NULL, true);
+  nl_count_buf = TYPE_MAXIMUM (ptrdiff_t) - shortage;
+  buf_newlines = make_uninit_vector (nl_count_buf);
+  if (nl_count_buf)
+    {
+      for (from = BEGV, found = from, i = 0; from < ZV; from = found, i++)
+	{
+	  ptrdiff_t from_byte = CHAR_TO_BYTE (from);
+
+	  found = find_newline1 (from, from_byte, 0, -1, 1, &shortage,
+				 NULL, true);
+	  if (shortage != 0 || i >= nl_count_buf)
+	    break;
+	  ASET (buf_newlines, i, make_number (found - 1));
+	}
+      for ( ; i < nl_count_buf; i++)
+	ASET (buf_newlines, i, make_number (-1));
+    }
+
+  /* Construct the value and return it.  */
+  val = make_uninit_vector (2);
+  ASET (val, 0, cache_newlines);
+  ASET (val, 1, buf_newlines);
+
+  if (old != NULL)
+    set_buffer_internal_1 (old);
+  return val;
+}
 \f
 void
 syms_of_search (void)
@@ -3180,4 +3361,5 @@ is to bind it with `let' around a small 
   defsubr (&Smatch_data);
   defsubr (&Sset_match_data);
   defsubr (&Sregexp_quote);
+  defsubr (&Snewline_cache_check);
 }

next prev parent reply	other threads:[~2014-04-23 15:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-21 16:28 newline cache Richard Stallman
2014-04-21 16:55 ` Eli Zaretskii
2014-04-22  5:37   ` Richard Stallman
2014-04-22 14:28     ` Stefan Monnier
2014-04-22 17:46       ` Eli Zaretskii
2014-04-22 17:44     ` Eli Zaretskii
2014-04-23  5:31       ` Richard Stallman
2014-04-23 15:14         ` Eli Zaretskii
2014-04-23  5:31       ` Richard Stallman
2014-04-23 15:23         ` Eli Zaretskii [this message]
2014-04-24  0:33           ` Richard Stallman
2014-04-25  9:20             ` Eli Zaretskii
2014-04-26 19:58               ` Eli Zaretskii
2014-04-27  2:42                 ` Eli Zaretskii
2014-04-29  8:41                   ` Jarek Czekalski
2014-04-29 14:25                     ` Eli Zaretskii
2014-05-21  8:41                   ` Damien Wyart
2014-05-21 13:09                     ` Stefan Monnier
2014-05-21 15:30                       ` Eli Zaretskii
2014-05-21 15:22                     ` Eli Zaretskii
2014-05-21 15:22                     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=837g6g9iip.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.