all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* immediate strings #2
@ 2011-11-28  9:11 Dmitry Antipov
  2011-11-28 17:33 ` Stefan Monnier
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry Antipov @ 2011-11-28  9:11 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 7098 bytes --]

Here is the next version of immediate strings patch, with further improvements
suggested by Paul. As it was said, strings up to 21 bytes on 64-bit and up to
9 bytes on 32-bit can be immediate (trailing '\0' is not counted). Note this
code assumes sizeof (EMACS_INT) is equal to sizeof (void *), so it's not
compatible with WIDE_EMACS_INT.

Since there was a reasonable doubts whether this stuff is practically useful,
I did two benchmarks. The fisrt one was a simple string allocation benchmark,
attached as stringbench.el. The second one was just a compilation of all stuff
in lisp subdirectory with byte-force-recompile. Everything was tested with
64-bit executables and '-Q -batch' command line options.

Configuration: ./configure --prefix=/not/exists --without-sound --without-pop \
                --with-x-toolkit=lucid --without-dbus --without-libotf \
                --without-selinux --without-xft --without-gsettings \
                --without-gnutls --without-rsvg --without-xml2
Compiler: gcc 4.6.1, optimization flags -O3

Old executable size 12855360 bytes, new exectable size 12904512 bytes (0.38%
larger code size).

* Benchmark 1, 8 runs for each executable:

-- Old --

33.24user 0.23system 0:33.72elapsed 99%CPU (0avgtext+0avgdata 368268maxresident)k
0inputs+0outputs (0major+112338minor)pagefaults 0swaps
32.29user 0.25system 0:32.77elapsed 99%CPU (0avgtext+0avgdata 338012maxresident)k
0inputs+0outputs (0major+124684minor)pagefaults 0swaps
33.31user 0.24system 0:33.80elapsed 99%CPU (0avgtext+0avgdata 330612maxresident)k
0inputs+0outputs (0major+120164minor)pagefaults 0swaps
33.91user 0.24system 0:34.41elapsed 99%CPU (0avgtext+0avgdata 351588maxresident)k
0inputs+0outputs (0major+125401minor)pagefaults 0swaps
33.17user 0.27system 0:33.69elapsed 99%CPU (0avgtext+0avgdata 331480maxresident)k
0inputs+0outputs (0major+120374minor)pagefaults 0swaps
33.26user 0.31system 0:33.83elapsed 99%CPU (0avgtext+0avgdata 332956maxresident)k
0inputs+0outputs (0major+148027minor)pagefaults 0swaps
33.38user 0.28system 0:33.90elapsed 99%CPU (0avgtext+0avgdata 334400maxresident)k
0inputs+0outputs (0major+133420minor)pagefaults 0swaps
33.13user 0.23system 0:33.61elapsed 99%CPU (0avgtext+0avgdata 331132maxresident)k
0inputs+0outputs (0major+120341minor)pagefaults 0swaps

-- New --

32.59user 0.35system 0:33.18elapsed 99%CPU (0avgtext+0avgdata 332528maxresident)k
0inputs+0outputs (0major+149273minor)pagefaults 0swaps
32.62user 0.31system 0:33.17elapsed 99%CPU (0avgtext+0avgdata 332532maxresident)k
0inputs+0outputs (0major+149274minor)pagefaults 0swaps
32.44user 0.30system 0:32.98elapsed 99%CPU (0avgtext+0avgdata 333696maxresident)k
0inputs+0outputs (0major+145349minor)pagefaults 0swaps
29.29user 0.30system 0:29.80elapsed 99%CPU (0avgtext+0avgdata 366444maxresident)k
0inputs+0outputs (0major+136105minor)pagefaults 0swaps
31.90user 0.33system 0:32.47elapsed 99%CPU (0avgtext+0avgdata 362092maxresident)k
0inputs+0outputs (0major+161330minor)pagefaults 0swaps
34.29user 0.34system 0:34.88elapsed 99%CPU (0avgtext+0avgdata 375636maxresident)k
0inputs+0outputs (0major+160050minor)pagefaults 0swaps
32.64user 0.31system 0:33.20elapsed 99%CPU (0avgtext+0avgdata 336572maxresident)k
0inputs+0outputs (0major+150284minor)pagefaults 0swaps
33.17user 0.27system 0:33.69elapsed 99%CPU (0avgtext+0avgdata 360560maxresident)k
0inputs+0outputs (0major+126406minor)pagefaults 0swaps

-- Results --

Got 2.5% better speed, but ~3.1% larger heap usage. It's expected that heap
usage should be smaller, why it isn't? Old code increments consing_since_gc with
the number of bytes allocated for each new string's data, but new code does so
only for non-immediate strings; so, old code calls GC earlier than new, thus
giving smaller peak heap usage.

* Benchmark 2, 8 runs for each executable:

-- Old --

91.86user 0.49system 2:27.21elapsed 62%CPU (0avgtext+0avgdata 74736maxresident)k
0inputs+77864outputs (0major+39292minor)pagefaults 0swaps
91.57user 0.54system 2:27.30elapsed 62%CPU (0avgtext+0avgdata 74648maxresident)k
0inputs+78536outputs (0major+38641minor)pagefaults 0swaps
89.58user 0.52system 2:21.93elapsed 63%CPU (0avgtext+0avgdata 74684maxresident)k
0inputs+78536outputs (0major+38903minor)pagefaults 0swaps
91.53user 0.53system 2:25.14elapsed 63%CPU (0avgtext+0avgdata 74612maxresident)k
0inputs+78536outputs (0major+38538minor)pagefaults 0swaps
91.49user 0.56system 2:24.56elapsed 63%CPU (0avgtext+0avgdata 74708maxresident)k
0inputs+78528outputs (0major+38716minor)pagefaults 0swaps
91.77user 0.53system 2:24.01elapsed 64%CPU (0avgtext+0avgdata 74660maxresident)k
0inputs+78536outputs (0major+39164minor)pagefaults 0swaps
91.44user 0.54system 2:27.12elapsed 62%CPU (0avgtext+0avgdata 74728maxresident)k
0inputs+78536outputs (0major+39173minor)pagefaults 0swaps
91.72user 0.50system 2:24.25elapsed 63%CPU (0avgtext+0avgdata 74680maxresident)k
0inputs+78528outputs (0major+39538minor)pagefaults 0swaps

-- New --

89.98user 0.53system 2:22.79elapsed 63%CPU (0avgtext+0avgdata 73440maxresident)k
0inputs+78536outputs (0major+36362minor)pagefaults 0swaps
89.91user 0.51system 2:24.10elapsed 62%CPU (0avgtext+0avgdata 73528maxresident)k
0inputs+78528outputs (0major+36753minor)pagefaults 0swaps
89.85user 0.48system 2:24.74elapsed 62%CPU (0avgtext+0avgdata 73392maxresident)k
0inputs+78536outputs (0major+36745minor)pagefaults 0swaps
90.12user 0.54system 2:22.56elapsed 63%CPU (0avgtext+0avgdata 73440maxresident)k
0inputs+78536outputs (0major+37347minor)pagefaults 0swaps
89.95user 0.53system 2:23.74elapsed 62%CPU (0avgtext+0avgdata 73416maxresident)k
0inputs+78536outputs (0major+37292minor)pagefaults 0swaps
91.26user 0.53system 2:25.64elapsed 63%CPU (0avgtext+0avgdata 73440maxresident)k
0inputs+78536outputs (0major+36782minor)pagefaults 0swaps
90.03user 0.56system 2:25.01elapsed 62%CPU (0avgtext+0avgdata 73376maxresident)k
0inputs+78536outputs (0major+37418minor)pagefaults 0swaps
90.15user 0.54system 2:25.73elapsed 62%CPU (0avgtext+0avgdata 73448maxresident)k
0inputs+78536outputs (0major+37279minor)pagefaults 0swaps

-- Results --

Got ~1.3% better speed, ~1.7% smaller heap usage. Since this benchmark does a lot
of things besides string allocation, 'later GC' effect is negligible here.

Obviously, new string code is more complex, and, аs it seems at first, should be
slower because any access to string data involves an evaluation of a conditional
expression, which creates more pressure to instruction cache and branch prediction
logic. But an overall improvement may be explained by better spatial locality and
thus better data cache utilization (normal string and it's data may be allocated
far away from each other, so when cache line is filled by accessing a member of
Lisp_String, it's very unlikely to get the same cache line filled with string data;
for an immediate string, such a case should be quite rare). This may be checked,
for example, with valgrind by using it's cachegrind tool (but I didn't tried this
yet).

Dmitry

[-- Attachment #2: stringbench.el --]
[-- Type: text/plain, Size: 546 bytes --]

(defun lot-of-strings (nstrings maxlength)
  (let ((count 0) (strings nil))
    (while (<= count nstrings)
      (let* ((length (1+ (% count maxlength)))
	     (string (make-string length 65)))
	(and (zerop (logand length 1))
	     (setq strings (cons string strings)))
	(setq count (1+ count))))))

(defun runtest ()
  (let ((nstrings 16))
    (while (<= nstrings 1048576)
      (let ((maxlength 2))
	(while (<= maxlength 1024)
	  (lot-of-strings nstrings maxlength)
	  (setq maxlength (* 2 maxlength))))
      (setq nstrings (* 2 nstrings)))))

[-- Attachment #3: immstr2.patch --]
[-- Type: text/plain, Size: 18180 bytes --]

=== modified file 'src/alloc.c'
--- src/alloc.c	2011-11-20 03:07:02 +0000
+++ src/alloc.c	2011-11-28 05:32:19 +0000
@@ -136,20 +136,14 @@
 /* Mark, unmark, query mark bit of a Lisp string.  S must be a pointer
    to a struct Lisp_String.  */
 
-#define MARK_STRING(S)		((S)->size |= ARRAY_MARK_FLAG)
-#define UNMARK_STRING(S)	((S)->size &= ~ARRAY_MARK_FLAG)
-#define STRING_MARKED_P(S)	(((S)->size & ARRAY_MARK_FLAG) != 0)
+#define MARK_STRING(S)		((S)->gcmarkbit = 1)
+#define UNMARK_STRING(S)	((S)->gcmarkbit = 0)
+#define STRING_MARKED_P(S)	((S)->gcmarkbit)
 
 #define VECTOR_MARK(V)		((V)->header.size |= ARRAY_MARK_FLAG)
 #define VECTOR_UNMARK(V)	((V)->header.size &= ~ARRAY_MARK_FLAG)
 #define VECTOR_MARKED_P(V)	(((V)->header.size & ARRAY_MARK_FLAG) != 0)
 
-/* Value is the number of bytes of S, a pointer to a struct Lisp_String.
-   Be careful during GC, because S->size contains the mark bit for
-   strings.  */
-
-#define GC_STRING_BYTES(S)	(STRING_BYTES (S))
-
 /* Global variables.  */
 struct emacs_globals globals;
 
@@ -383,6 +377,7 @@
 static void mark_stack (void);
 static int live_vector_p (struct mem_node *, void *);
 static int live_buffer_p (struct mem_node *, void *);
+static int live_string_data_p (struct Lisp_String *);
 static int live_string_p (struct mem_node *, void *);
 static int live_cons_p (struct mem_node *, void *);
 static int live_symbol_p (struct mem_node *, void *);
@@ -1733,7 +1728,8 @@
    a pointer to the `u.data' member of its sdata structure; the
    structure starts at a constant offset in front of that.  */
 
-#define SDATA_OF_STRING(S) ((struct sdata *) ((S)->data - SDATA_DATA_OFFSET))
+#define SDATA_OF_STRING(S) ((S)->immbit ? (abort (), (struct sdata *) NULL) \
+  : ((struct sdata *) ((S)->u.dat.data - SDATA_DATA_OFFSET)))
 
 
 #ifdef GC_CHECK_STRING_OVERRUN
@@ -1815,21 +1811,34 @@
 
 static int check_string_bytes_count;
 
-#define CHECK_STRING_BYTES(S)	STRING_BYTES (S)
-
-
-/* Like GC_STRING_BYTES, but with debugging check.  */
+#define CHECK_STRING_BYTES(S)	string_bytes (S)
 
 EMACS_INT
 string_bytes (struct Lisp_String *s)
 {
-  EMACS_INT nbytes =
-    (s->size_byte < 0 ? s->size & ~ARRAY_MARK_FLAG : s->size_byte);
+  EMACS_INT nbytes;
 
-  if (!PURE_POINTER_P (s)
-      && s->data
-      && nbytes != SDATA_NBYTES (SDATA_OF_STRING (s)))
-    abort ();
+  if (s->immbit)
+    {
+      nbytes = s->u.imm.size_byte < 0 ?
+	s->u.imm.size : s->u.imm.size_byte;
+      if (nbytes >= STRING_IMM_MAX)
+	/* Impossible immediate string.  */
+	abort ();
+    }
+  else
+    {
+      nbytes = s->u.dat.size_byte < 0 ?
+	s->u.dat.size : s->u.dat.size_byte;
+      if (nbytes < STRING_IMM_MAX)
+	/* Impossible normal string.  */
+	abort ();
+      if (!PURE_POINTER_P (s) &&
+	  s->u.dat.data &&
+	  nbytes != SDATA_NBYTES (SDATA_OF_STRING (s)))
+	/* Normal non-pure string with size mismatch.  */
+	abort ();
+    }
   return nbytes;
 }
 
@@ -1854,7 +1863,7 @@
 	CHECK_STRING_BYTES (from->string);
 
       if (from->string)
-	nbytes = GC_STRING_BYTES (from->string);
+	nbytes = string_bytes (from->string);
       else
 	nbytes = SDATA_NBYTES (from);
 
@@ -2000,8 +2009,8 @@
   /* Determine the number of bytes needed to store NBYTES bytes
      of string data.  */
   needed = SDATA_SIZE (nbytes);
-  old_data = s->data ? SDATA_OF_STRING (s) : NULL;
-  old_nbytes = GC_STRING_BYTES (s);
+  old_data = s->u.dat.data ? SDATA_OF_STRING (s) : NULL;
+  old_nbytes = string_bytes (s);
 
   MALLOC_BLOCK_INPUT;
 
@@ -2060,13 +2069,11 @@
   MALLOC_UNBLOCK_INPUT;
 
   data->string = s;
-  s->data = SDATA_DATA (data);
+  s->u.dat.data = SDATA_DATA (data);
 #ifdef GC_CHECK_STRING_BYTES
   SDATA_NBYTES (data) = nbytes;
 #endif
-  s->size = nchars;
-  s->size_byte = nbytes;
-  s->data[nbytes] = '\0';
+  s->u.dat.data[nbytes] = '\0';
 #ifdef GC_CHECK_STRING_OVERRUN
   memcpy ((char *) data + needed, string_overrun_cookie,
 	  GC_STRING_OVERRUN_COOKIE_SIZE);
@@ -2084,6 +2091,12 @@
   consing_since_gc += needed;
 }
 
+#ifdef GC_STRING_STATS
+
+static EMACS_INT total_imm_strings, total_dat_strings;
+static EMACS_INT total_imm_bytes, total_dat_bytes;
+
+#endif
 
 /* Sweep and compact strings.  */
 
@@ -2097,6 +2110,11 @@
   total_strings = total_free_strings = 0;
   total_string_size = 0;
 
+#ifdef GC_STRING_STATS
+  total_imm_strings = total_dat_strings = 0;
+  total_imm_bytes = total_dat_bytes = 0;
+#endif
+
   /* Scan strings_blocks, free Lisp_Strings that aren't marked.  */
   for (b = string_blocks; b; b = next)
     {
@@ -2109,49 +2127,60 @@
 	{
 	  struct Lisp_String *s = b->strings + i;
 
-	  if (s->data)
+	  if (STRING_MARKED_P (s))
+	    {	      
+	      /* String is live; unmark it and its intervals.  */
+	      UNMARK_STRING (s);
+
+	      if (!NULL_INTERVAL_P (s->intervals))
+		UNMARK_BALANCE_INTERVALS (s->intervals);
+
+	      ++total_strings;
+	      total_string_size += string_bytes (s);
+#ifdef GC_STRING_STATS
+	      if (s->immbit)
+		{
+		  total_imm_strings++;
+		  total_imm_bytes += string_bytes (s);
+		}
+	      else
+		{
+		  total_dat_strings++;
+		  total_dat_bytes += string_bytes (s);
+		}
+#endif /* GC_STRING_STATS */
+	    }
+	  else
 	    {
-	      /* String was not on free-list before.  */
-	      if (STRING_MARKED_P (s))
-		{
-		  /* String is live; unmark it and its intervals.  */
-		  UNMARK_STRING (s);
-
-		  if (!NULL_INTERVAL_P (s->intervals))
-		    UNMARK_BALANCE_INTERVALS (s->intervals);
-
-		  ++total_strings;
-		  total_string_size += STRING_BYTES (s);
-		}
+	      if (s->immbit)
+		/* Fill data with special pattern. Used by
+		   GC to find dead immediate strings.  */
+		memset (s->u.imm.data, 0xff, STRING_IMM_MAX);
 	      else
 		{
-		  /* String is dead.  Put it on the free-list.  */
-		  struct sdata *data = SDATA_OF_STRING (s);
+		  if (s->u.dat.data)
+		    {
+		      /* String is dead.  Put it on the free-list.  */
+		      struct sdata *data = SDATA_OF_STRING (s);
 
-		  /* Save the size of S in its sdata so that we know
-		     how large that is.  Reset the sdata's string
-		     back-pointer so that we know it's free.  */
+		      /* Save the size of S in its sdata so that we know
+			 how large that is.  Reset the sdata's string
+			 back-pointer so that we know it's free.  */
 #ifdef GC_CHECK_STRING_BYTES
-		  if (GC_STRING_BYTES (s) != SDATA_NBYTES (data))
-		    abort ();
+		      if (string_bytes (s) != SDATA_NBYTES (data))
+			abort ();
 #else
-		  data->u.nbytes = GC_STRING_BYTES (s);
+		      data->u.nbytes = string_bytes (s);
 #endif
-		  data->string = NULL;
-
-		  /* Reset the strings's `data' member so that we
-		     know it's free.  */
-		  s->data = NULL;
-
-		  /* Put the string on the free-list.  */
-		  NEXT_FREE_LISP_STRING (s) = string_free_list;
-		  string_free_list = s;
-		  ++nfree;
+		      data->string = NULL;
+
+		      /* Reset the strings's `data' member
+			 so that we know it's free.  */
+		      s->u.dat.data = NULL;
+		    }
 		}
-	    }
-	  else
-	    {
-	      /* S was on the free-list before.  Put it there again.  */
+
+	      /* Put the string on the free-list.  */
 	      NEXT_FREE_LISP_STRING (s) = string_free_list;
 	      string_free_list = s;
 	      ++nfree;
@@ -2243,12 +2272,12 @@
 	  /* Check that the string size recorded in the string is the
 	     same as the one recorded in the sdata structure. */
 	  if (from->string
-	      && GC_STRING_BYTES (from->string) != SDATA_NBYTES (from))
+	      && string_bytes (from->string) != SDATA_NBYTES (from))
 	    abort ();
 #endif /* GC_CHECK_STRING_BYTES */
 
 	  if (from->string)
-	    nbytes = GC_STRING_BYTES (from->string);
+	    nbytes = string_bytes (from->string);
 	  else
 	    nbytes = SDATA_NBYTES (from);
 
@@ -2284,7 +2313,7 @@
 		{
 		  xassert (tb != b || to < from);
 		  memmove (to, from, nbytes + GC_STRING_EXTRA);
-		  to->string->data = SDATA_DATA (to);
+		  to->string->u.dat.data = SDATA_DATA (to);
 		}
 
 	      /* Advance past the sdata we copied to.  */
@@ -2533,7 +2562,19 @@
     return empty_multibyte_string;
 
   s = allocate_string ();
-  allocate_string_data (s, nchars, nbytes);
+  if (nbytes < STRING_IMM_MAX)
+    {
+      s->immbit = 1;
+      s->u.imm.size = nchars;
+      s->u.imm.size_byte = nbytes;
+    }
+  else
+    {
+      s->immbit = 0;
+      s->u.dat.size = nchars;
+      s->u.dat.size_byte = nbytes;
+      allocate_string_data (s, nchars, nbytes);
+    }
   XSETSTRING (string, s);
   string_chars_consed += nbytes;
   return string;
@@ -3884,6 +3925,22 @@
   x->color = MEM_BLACK;
 }
 
+/* Non-zero if data of S is valid.  */
+
+static inline int
+live_string_data_p (struct Lisp_String *s)
+{
+  if (s->immbit)
+    {
+      unsigned char *p;
+
+      for (p = s->u.imm.data; p < s->u.imm.data + STRING_IMM_MAX; p++)
+	if (*p != 0xff)
+	  return 1;
+      return 0;
+    }
+  return s->u.dat.data != NULL;
+}
 
 /* Value is non-zero if P is a pointer to a live Lisp string on
    the heap.  M is a pointer to the mem_block for P.  */
@@ -3901,7 +3958,7 @@
       return (offset >= 0
 	      && offset % sizeof b->strings[0] == 0
 	      && offset < (STRING_BLOCK_SIZE * sizeof b->strings[0])
-	      && ((struct Lisp_String *) p)->data != NULL);
+	      && live_string_data_p ((struct Lisp_String *) p));
     }
   else
     return 0;
@@ -4801,15 +4858,29 @@
   struct Lisp_String *s;
 
   s = (struct Lisp_String *) pure_alloc (sizeof *s, Lisp_String);
-  s->data = (unsigned char *) find_string_data_in_pure (data, nbytes);
-  if (s->data == NULL)
-    {
-      s->data = (unsigned char *) pure_alloc (nbytes + 1, -1);
-      memcpy (s->data, data, nbytes);
-      s->data[nbytes] = '\0';
-    }
-  s->size = nchars;
-  s->size_byte = multibyte ? nbytes : -1;
+
+  if (nbytes < STRING_IMM_MAX)
+    {
+      memcpy (s->u.imm.data, data, nbytes);
+      s->u.imm.data[nbytes] = '\0';
+      s->immbit = 1;
+      s->u.imm.size = nchars;
+      s->u.imm.size_byte = multibyte ? nbytes : -1;
+    }
+  else
+    {
+      s->u.dat.data = (unsigned char *) find_string_data_in_pure (data, nbytes);
+      if (s->u.dat.data == NULL)
+	{
+	  s->u.dat.data = (unsigned char *) pure_alloc (nbytes + 1, -1);
+	  memcpy (s->u.dat.data, data, nbytes);
+	  s->u.dat.data[nbytes] = '\0';
+	}
+      s->immbit = 0;
+      s->u.dat.size = nchars;
+      s->u.dat.size_byte = multibyte ? nbytes : -1;
+    }
+
   s->intervals = NULL_INTERVAL;
   XSETSTRING (string, s);
   return string;
@@ -4826,9 +4897,23 @@
   EMACS_INT nchars = strlen (data);
 
   s = (struct Lisp_String *) pure_alloc (sizeof *s, Lisp_String);
-  s->size = nchars;
-  s->size_byte = -1;
-  s->data = (unsigned char *) data;
+
+  if (nchars < STRING_IMM_MAX)
+    {
+      memcpy (s->u.imm.data, data, nchars);
+      s->u.imm.data[nchars] = '\0';
+      s->immbit = 1;
+      s->u.imm.size = nchars;
+      s->u.imm.size_byte = -1;
+    }
+  else
+    {
+      s->u.dat.data = (unsigned char *) data;
+      s->immbit = 0;
+      s->u.dat.size = nchars;
+      s->u.dat.size_byte = -1;
+    }
+
   s->intervals = NULL_INTERVAL;
   XSETSTRING (string, s);
   return string;
@@ -6250,6 +6335,31 @@
   return Flist (8, consed);
 }
 
+#ifdef GC_STRING_STATS
+
+DEFUN ("string-stats", Fstring_stats, Sstring_stats, 0, 0, 0,
+       doc: /* Return a list of counters that measures how much
+strings of a particular internal structure are alive after last
+garbage collection, and how many bytes are in them.
+The elements of the value are are as follows:
+  (IMM-STRINGS IMM-BYTES DAT-STRINGS DAT-BYTES)
+where IMM-STRINGS is the number of immediate strings, IMM-BYTES
+is the total number of bytes in them, DAT-STRINGS is the number of
+normal strings and DAT-BYES is the total number of bytes in them.  */)
+  (void)
+{
+  Lisp_Object data[4];
+
+  data[0] = make_number (min (MOST_POSITIVE_FIXNUM, total_imm_strings));
+  data[1] = make_number (min (MOST_POSITIVE_FIXNUM, total_imm_bytes));
+  data[2] = make_number (min (MOST_POSITIVE_FIXNUM, total_dat_strings));
+  data[3] = make_number (min (MOST_POSITIVE_FIXNUM, total_dat_bytes));
+
+  return Flist (4, data);
+}
+
+#endif /* GC_STRING_STATS */
+
 /* Find at most FIND_MAX symbols which have OBJ as their value or
    function.  This is used in gdbinit's `xwhichsymbols' command.  */
 
@@ -6475,7 +6585,9 @@
   defsubr (&Sgarbage_collect);
   defsubr (&Smemory_limit);
   defsubr (&Smemory_use_counts);
-
+#ifdef GC_STRING_STATS
+  defsubr (&Sstring_stats);
+#endif
 #if GC_MARK_STACK == GC_USE_GCPROS_CHECK_ZOMBIES
   defsubr (&Sgc_status);
 #endif

=== modified file 'src/fns.c'
--- src/fns.c	2011-11-19 09:18:31 +0000
+++ src/fns.c	2011-11-28 05:21:59 +0000
@@ -2176,8 +2176,8 @@
 	  int len = CHAR_STRING (charval, str);
 	  EMACS_INT size_byte = SBYTES (array);
 
-	  if (INT_MULTIPLY_OVERFLOW (SCHARS (array), len)
-	      || SCHARS (array) * len != size_byte)
+	  if (INT_MULTIPLY_OVERFLOW (size, len)
+	      || size * len != size_byte)
 	    error ("Attempt to change byte length of a string");
 	  for (idx = 0; idx < size_byte; idx++)
 	    *p++ = str[idx % len];

=== modified file 'src/lisp.h'
--- src/lisp.h	2011-11-27 18:52:53 +0000
+++ src/lisp.h	2011-11-28 06:01:45 +0000
@@ -696,17 +696,23 @@
 
 /* Convenience macros for dealing with Lisp strings.  */
 
-#define SDATA(string)		(XSTRING (string)->data + 0)
+#define SDATA(string)		(XSTRING (string)->immbit ? \
+				 (XSTRING (string)->u.imm.data + 0) : \
+				 (XSTRING (string)->u.dat.data + 0))
 #define SREF(string, index)	(SDATA (string)[index] + 0)
 #define SSET(string, index, new) (SDATA (string)[index] = (new))
-#define SCHARS(string)		(XSTRING (string)->size + 0)
-#define SBYTES(string)		(STRING_BYTES (XSTRING (string)) + 0)
+#define SCHARS(string)		(XSTRING (string)->immbit ? \
+				 (XSTRING (string)->u.imm.size + 0) : \
+				 (XSTRING (string)->u.dat.size + 0))
+#define SBYTES(string)		(string_bytes (XSTRING (string)) + 0)
 
 /* Avoid "differ in sign" warnings.  */
 #define SSDATA(x)  ((char *) SDATA (x))
 
 #define STRING_SET_CHARS(string, newsize) \
-    (XSTRING (string)->size = (newsize))
+  (XSTRING (string)->immbit ? \
+   (XSTRING (string)->u.imm.size = (newsize)) : \
+   (XSTRING (string)->u.dat.size = (newsize)))
 
 #define STRING_COPYIN(string, index, new, count) \
     memcpy (SDATA (string) + index, new, count)
@@ -796,24 +802,12 @@
 #define CDR_SAFE(c)				\
   (CONSP ((c)) ? XCDR ((c)) : Qnil)
 
+#define STRING_SIZE_BYTE(string) (XSTRING (string)->immbit ? \
+				  XSTRING (string)->u.imm.size_byte : \
+				  XSTRING (string)->u.dat.size_byte)
+
 /* Nonzero if STR is a multibyte string.  */
-#define STRING_MULTIBYTE(STR)  \
-  (XSTRING (STR)->size_byte >= 0)
-
-/* Return the length in bytes of STR.  */
-
-#ifdef GC_CHECK_STRING_BYTES
-
-struct Lisp_String;
-extern EMACS_INT string_bytes (struct Lisp_String *);
-#define STRING_BYTES(S) string_bytes ((S))
-
-#else /* not GC_CHECK_STRING_BYTES */
-
-#define STRING_BYTES(STR)  \
-  ((STR)->size_byte < 0 ? (STR)->size : (STR)->size_byte)
-
-#endif /* not GC_CHECK_STRING_BYTES */
+#define STRING_MULTIBYTE(string) (STRING_SIZE_BYTE (string) > 0)
 
 /* An upper bound on the number of bytes in a Lisp string, not
    counting the terminating null.  This a tight enough bound to
@@ -829,18 +823,28 @@
 #define STRING_BYTES_BOUND  \
   min (MOST_POSITIVE_FIXNUM, (ptrdiff_t) min (SIZE_MAX, PTRDIFF_MAX) - 1)
 
+/* Maximum amount of bytes, including '\0', in an immediate string.
+   This assumes that sizeof (EMACS_INT) is equal to sizeof (void * ).  */
+#define STRING_IMM_MAX (3 * sizeof (EMACS_INT) - 2)
+
 /* Mark STR as a unibyte string.  */
 #define STRING_SET_UNIBYTE(STR)  \
   do { if (EQ (STR, empty_multibyte_string))  \
       (STR) = empty_unibyte_string;  \
-    else XSTRING (STR)->size_byte = -1; } while (0)
+    else if (XSTRING (STR)->immbit) \
+      XSTRING (STR)->u.imm.size_byte = -1; \
+    else \
+      XSTRING (STR)->u.dat.size_byte = -1; } while (0)
 
 /* Mark STR as a multibyte string.  Assure that STR contains only
    ASCII characters in advance.  */
 #define STRING_SET_MULTIBYTE(STR)  \
   do { if (EQ (STR, empty_unibyte_string))  \
       (STR) = empty_multibyte_string;  \
-    else XSTRING (STR)->size_byte = XSTRING (STR)->size; } while (0)
+    else if (XSTRING (STR)->immbit) \
+      XSTRING (STR)->u.imm.size_byte = XSTRING (STR)->u.imm.size; \
+    else \
+      XSTRING (STR)->u.dat.size_byte = XSTRING (STR)->u.dat.size; } while (0)
 
 /* Get text properties.  */
 #define STRING_INTERVALS(STR)  (XSTRING (STR)->intervals + 0)
@@ -848,16 +852,59 @@
 /* Set text properties.  */
 #define STRING_SET_INTERVALS(STR, INT) (XSTRING (STR)->intervals = (INT))
 
-/* In a string or vector, the sign bit of the `size' is the gc mark bit */
-
 struct Lisp_String
   {
-    EMACS_INT size;
-    EMACS_INT size_byte;
-    INTERVAL intervals;		/* text properties in this string */
-    unsigned char *data;
+    /* Text properties in this string.  */
+    INTERVAL intervals;
+
+    /* Mark bit used for GC.  */
+    unsigned gcmarkbit : 1;
+
+    /* String subtype.  */
+    unsigned immbit : 1;
+
+    union {
+
+      /* When IMMBIT is 1. */
+      struct {
+	EMACS_INT size : 7;
+	EMACS_INT size_byte : 7;
+	unsigned char data[STRING_IMM_MAX];
+      } imm;
+      
+      /* When IMMBIT is 0.  */
+      struct {
+	EMACS_INT size : BITS_PER_EMACS_INT - 1;
+	EMACS_INT size_byte : BITS_PER_EMACS_INT - 1;
+	unsigned char *data;
+      } dat;
+    } u;
   };
 
+/* Return the length in bytes of STR.  */
+
+#ifdef GC_CHECK_STRING_BYTES
+
+/* Slower version with debugging check.  */
+
+extern EMACS_INT string_bytes (struct Lisp_String *);
+
+#else /* not GC_CHECK_STRING_BYTES */
+
+static inline
+EMACS_INT string_bytes (struct Lisp_String *s)
+{
+  EMACS_INT size, size_byte;
+
+  if (s->immbit)
+    size = s->u.imm.size, size_byte = s->u.imm.size_byte;
+  else
+    size = s->u.dat.size, size_byte = s->u.dat.size_byte;
+  return size_byte < 0 ? size : size_byte;
+}
+
+#endif /* GC_CHECK_STRING_BYTES */
+
 /* Header of vector-like objects.  This documents the layout constraints on
    vectors and pseudovectors other than struct Lisp_Subr.  It also prevents
    compilers from being fooled by Emacs's type punning: the XSETPSEUDOVECTOR


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-11-30 21:44 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-28  9:11 immediate strings #2 Dmitry Antipov
2011-11-28 17:33 ` Stefan Monnier
2011-11-28 19:48   ` Ken Raeburn
2011-11-28 20:10   ` Andreas Schwab
2011-11-28 21:54     ` Stefan Monnier
2011-11-28 22:25       ` Andreas Schwab
2011-11-29  0:57         ` Ken Raeburn
2011-11-29  8:44           ` Andreas Schwab
2011-11-29 15:48             ` Ken Raeburn
2011-11-29 16:08               ` Andreas Schwab
2011-11-30 16:43                 ` Ken Raeburn
2011-11-28 22:18   ` Paul Eggert
2011-11-29  2:07     ` Stefan Monnier
2011-11-29  3:37       ` Dmitry Antipov
2011-11-29  8:50       ` Paul Eggert
2011-11-30  5:37         ` Dmitry Antipov
2011-11-30  9:35           ` Paul Eggert
2011-11-30 16:43             ` Ken Raeburn
2011-11-30 21:44               ` Paul Eggert
2011-11-29  3:17     ` Dmitry Antipov
2011-11-29  5:29   ` Dmitry Antipov

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.