all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* immediate strings #2
@ 2011-11-28  9:11 Dmitry Antipov
  2011-11-28 17:33 ` Stefan Monnier
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry Antipov @ 2011-11-28  9:11 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 7098 bytes --]

Here is the next version of immediate strings patch, with further improvements
suggested by Paul. As it was said, strings up to 21 bytes on 64-bit and up to
9 bytes on 32-bit can be immediate (trailing '\0' is not counted). Note this
code assumes sizeof (EMACS_INT) is equal to sizeof (void *), so it's not
compatible with WIDE_EMACS_INT.

Since there was a reasonable doubts whether this stuff is practically useful,
I did two benchmarks. The fisrt one was a simple string allocation benchmark,
attached as stringbench.el. The second one was just a compilation of all stuff
in lisp subdirectory with byte-force-recompile. Everything was tested with
64-bit executables and '-Q -batch' command line options.

Configuration: ./configure --prefix=/not/exists --without-sound --without-pop \
                --with-x-toolkit=lucid --without-dbus --without-libotf \
                --without-selinux --without-xft --without-gsettings \
                --without-gnutls --without-rsvg --without-xml2
Compiler: gcc 4.6.1, optimization flags -O3

Old executable size 12855360 bytes, new exectable size 12904512 bytes (0.38%
larger code size).

* Benchmark 1, 8 runs for each executable:

-- Old --

33.24user 0.23system 0:33.72elapsed 99%CPU (0avgtext+0avgdata 368268maxresident)k
0inputs+0outputs (0major+112338minor)pagefaults 0swaps
32.29user 0.25system 0:32.77elapsed 99%CPU (0avgtext+0avgdata 338012maxresident)k
0inputs+0outputs (0major+124684minor)pagefaults 0swaps
33.31user 0.24system 0:33.80elapsed 99%CPU (0avgtext+0avgdata 330612maxresident)k
0inputs+0outputs (0major+120164minor)pagefaults 0swaps
33.91user 0.24system 0:34.41elapsed 99%CPU (0avgtext+0avgdata 351588maxresident)k
0inputs+0outputs (0major+125401minor)pagefaults 0swaps
33.17user 0.27system 0:33.69elapsed 99%CPU (0avgtext+0avgdata 331480maxresident)k
0inputs+0outputs (0major+120374minor)pagefaults 0swaps
33.26user 0.31system 0:33.83elapsed 99%CPU (0avgtext+0avgdata 332956maxresident)k
0inputs+0outputs (0major+148027minor)pagefaults 0swaps
33.38user 0.28system 0:33.90elapsed 99%CPU (0avgtext+0avgdata 334400maxresident)k
0inputs+0outputs (0major+133420minor)pagefaults 0swaps
33.13user 0.23system 0:33.61elapsed 99%CPU (0avgtext+0avgdata 331132maxresident)k
0inputs+0outputs (0major+120341minor)pagefaults 0swaps

-- New --

32.59user 0.35system 0:33.18elapsed 99%CPU (0avgtext+0avgdata 332528maxresident)k
0inputs+0outputs (0major+149273minor)pagefaults 0swaps
32.62user 0.31system 0:33.17elapsed 99%CPU (0avgtext+0avgdata 332532maxresident)k
0inputs+0outputs (0major+149274minor)pagefaults 0swaps
32.44user 0.30system 0:32.98elapsed 99%CPU (0avgtext+0avgdata 333696maxresident)k
0inputs+0outputs (0major+145349minor)pagefaults 0swaps
29.29user 0.30system 0:29.80elapsed 99%CPU (0avgtext+0avgdata 366444maxresident)k
0inputs+0outputs (0major+136105minor)pagefaults 0swaps
31.90user 0.33system 0:32.47elapsed 99%CPU (0avgtext+0avgdata 362092maxresident)k
0inputs+0outputs (0major+161330minor)pagefaults 0swaps
34.29user 0.34system 0:34.88elapsed 99%CPU (0avgtext+0avgdata 375636maxresident)k
0inputs+0outputs (0major+160050minor)pagefaults 0swaps
32.64user 0.31system 0:33.20elapsed 99%CPU (0avgtext+0avgdata 336572maxresident)k
0inputs+0outputs (0major+150284minor)pagefaults 0swaps
33.17user 0.27system 0:33.69elapsed 99%CPU (0avgtext+0avgdata 360560maxresident)k
0inputs+0outputs (0major+126406minor)pagefaults 0swaps

-- Results --

Got 2.5% better speed, but ~3.1% larger heap usage. It's expected that heap
usage should be smaller, why it isn't? Old code increments consing_since_gc with
the number of bytes allocated for each new string's data, but new code does so
only for non-immediate strings; so, old code calls GC earlier than new, thus
giving smaller peak heap usage.

* Benchmark 2, 8 runs for each executable:

-- Old --

91.86user 0.49system 2:27.21elapsed 62%CPU (0avgtext+0avgdata 74736maxresident)k
0inputs+77864outputs (0major+39292minor)pagefaults 0swaps
91.57user 0.54system 2:27.30elapsed 62%CPU (0avgtext+0avgdata 74648maxresident)k
0inputs+78536outputs (0major+38641minor)pagefaults 0swaps
89.58user 0.52system 2:21.93elapsed 63%CPU (0avgtext+0avgdata 74684maxresident)k
0inputs+78536outputs (0major+38903minor)pagefaults 0swaps
91.53user 0.53system 2:25.14elapsed 63%CPU (0avgtext+0avgdata 74612maxresident)k
0inputs+78536outputs (0major+38538minor)pagefaults 0swaps
91.49user 0.56system 2:24.56elapsed 63%CPU (0avgtext+0avgdata 74708maxresident)k
0inputs+78528outputs (0major+38716minor)pagefaults 0swaps
91.77user 0.53system 2:24.01elapsed 64%CPU (0avgtext+0avgdata 74660maxresident)k
0inputs+78536outputs (0major+39164minor)pagefaults 0swaps
91.44user 0.54system 2:27.12elapsed 62%CPU (0avgtext+0avgdata 74728maxresident)k
0inputs+78536outputs (0major+39173minor)pagefaults 0swaps
91.72user 0.50system 2:24.25elapsed 63%CPU (0avgtext+0avgdata 74680maxresident)k
0inputs+78528outputs (0major+39538minor)pagefaults 0swaps

-- New --

89.98user 0.53system 2:22.79elapsed 63%CPU (0avgtext+0avgdata 73440maxresident)k
0inputs+78536outputs (0major+36362minor)pagefaults 0swaps
89.91user 0.51system 2:24.10elapsed 62%CPU (0avgtext+0avgdata 73528maxresident)k
0inputs+78528outputs (0major+36753minor)pagefaults 0swaps
89.85user 0.48system 2:24.74elapsed 62%CPU (0avgtext+0avgdata 73392maxresident)k
0inputs+78536outputs (0major+36745minor)pagefaults 0swaps
90.12user 0.54system 2:22.56elapsed 63%CPU (0avgtext+0avgdata 73440maxresident)k
0inputs+78536outputs (0major+37347minor)pagefaults 0swaps
89.95user 0.53system 2:23.74elapsed 62%CPU (0avgtext+0avgdata 73416maxresident)k
0inputs+78536outputs (0major+37292minor)pagefaults 0swaps
91.26user 0.53system 2:25.64elapsed 63%CPU (0avgtext+0avgdata 73440maxresident)k
0inputs+78536outputs (0major+36782minor)pagefaults 0swaps
90.03user 0.56system 2:25.01elapsed 62%CPU (0avgtext+0avgdata 73376maxresident)k
0inputs+78536outputs (0major+37418minor)pagefaults 0swaps
90.15user 0.54system 2:25.73elapsed 62%CPU (0avgtext+0avgdata 73448maxresident)k
0inputs+78536outputs (0major+37279minor)pagefaults 0swaps

-- Results --

Got ~1.3% better speed, ~1.7% smaller heap usage. Since this benchmark does a lot
of things besides string allocation, 'later GC' effect is negligible here.

Obviously, new string code is more complex, and, аs it seems at first, should be
slower because any access to string data involves an evaluation of a conditional
expression, which creates more pressure to instruction cache and branch prediction
logic. But an overall improvement may be explained by better spatial locality and
thus better data cache utilization (normal string and it's data may be allocated
far away from each other, so when cache line is filled by accessing a member of
Lisp_String, it's very unlikely to get the same cache line filled with string data;
for an immediate string, such a case should be quite rare). This may be checked,
for example, with valgrind by using it's cachegrind tool (but I didn't tried this
yet).

Dmitry

[-- Attachment #2: stringbench.el --]
[-- Type: text/plain, Size: 546 bytes --]

(defun lot-of-strings (nstrings maxlength)
  (let ((count 0) (strings nil))
    (while (<= count nstrings)
      (let* ((length (1+ (% count maxlength)))
	     (string (make-string length 65)))
	(and (zerop (logand length 1))
	     (setq strings (cons string strings)))
	(setq count (1+ count))))))

(defun runtest ()
  (let ((nstrings 16))
    (while (<= nstrings 1048576)
      (let ((maxlength 2))
	(while (<= maxlength 1024)
	  (lot-of-strings nstrings maxlength)
	  (setq maxlength (* 2 maxlength))))
      (setq nstrings (* 2 nstrings)))))

[-- Attachment #3: immstr2.patch --]
[-- Type: text/plain, Size: 18180 bytes --]

=== modified file 'src/alloc.c'
--- src/alloc.c	2011-11-20 03:07:02 +0000
+++ src/alloc.c	2011-11-28 05:32:19 +0000
@@ -136,20 +136,14 @@
 /* Mark, unmark, query mark bit of a Lisp string.  S must be a pointer
    to a struct Lisp_String.  */
 
-#define MARK_STRING(S)		((S)->size |= ARRAY_MARK_FLAG)
-#define UNMARK_STRING(S)	((S)->size &= ~ARRAY_MARK_FLAG)
-#define STRING_MARKED_P(S)	(((S)->size & ARRAY_MARK_FLAG) != 0)
+#define MARK_STRING(S)		((S)->gcmarkbit = 1)
+#define UNMARK_STRING(S)	((S)->gcmarkbit = 0)
+#define STRING_MARKED_P(S)	((S)->gcmarkbit)
 
 #define VECTOR_MARK(V)		((V)->header.size |= ARRAY_MARK_FLAG)
 #define VECTOR_UNMARK(V)	((V)->header.size &= ~ARRAY_MARK_FLAG)
 #define VECTOR_MARKED_P(V)	(((V)->header.size & ARRAY_MARK_FLAG) != 0)
 
-/* Value is the number of bytes of S, a pointer to a struct Lisp_String.
-   Be careful during GC, because S->size contains the mark bit for
-   strings.  */
-
-#define GC_STRING_BYTES(S)	(STRING_BYTES (S))
-
 /* Global variables.  */
 struct emacs_globals globals;
 
@@ -383,6 +377,7 @@
 static void mark_stack (void);
 static int live_vector_p (struct mem_node *, void *);
 static int live_buffer_p (struct mem_node *, void *);
+static int live_string_data_p (struct Lisp_String *);
 static int live_string_p (struct mem_node *, void *);
 static int live_cons_p (struct mem_node *, void *);
 static int live_symbol_p (struct mem_node *, void *);
@@ -1733,7 +1728,8 @@
    a pointer to the `u.data' member of its sdata structure; the
    structure starts at a constant offset in front of that.  */
 
-#define SDATA_OF_STRING(S) ((struct sdata *) ((S)->data - SDATA_DATA_OFFSET))
+#define SDATA_OF_STRING(S) ((S)->immbit ? (abort (), (struct sdata *) NULL) \
+  : ((struct sdata *) ((S)->u.dat.data - SDATA_DATA_OFFSET)))
 
 
 #ifdef GC_CHECK_STRING_OVERRUN
@@ -1815,21 +1811,34 @@
 
 static int check_string_bytes_count;
 
-#define CHECK_STRING_BYTES(S)	STRING_BYTES (S)
-
-
-/* Like GC_STRING_BYTES, but with debugging check.  */
+#define CHECK_STRING_BYTES(S)	string_bytes (S)
 
 EMACS_INT
 string_bytes (struct Lisp_String *s)
 {
-  EMACS_INT nbytes =
-    (s->size_byte < 0 ? s->size & ~ARRAY_MARK_FLAG : s->size_byte);
+  EMACS_INT nbytes;
 
-  if (!PURE_POINTER_P (s)
-      && s->data
-      && nbytes != SDATA_NBYTES (SDATA_OF_STRING (s)))
-    abort ();
+  if (s->immbit)
+    {
+      nbytes = s->u.imm.size_byte < 0 ?
+	s->u.imm.size : s->u.imm.size_byte;
+      if (nbytes >= STRING_IMM_MAX)
+	/* Impossible immediate string.  */
+	abort ();
+    }
+  else
+    {
+      nbytes = s->u.dat.size_byte < 0 ?
+	s->u.dat.size : s->u.dat.size_byte;
+      if (nbytes < STRING_IMM_MAX)
+	/* Impossible normal string.  */
+	abort ();
+      if (!PURE_POINTER_P (s) &&
+	  s->u.dat.data &&
+	  nbytes != SDATA_NBYTES (SDATA_OF_STRING (s)))
+	/* Normal non-pure string with size mismatch.  */
+	abort ();
+    }
   return nbytes;
 }
 
@@ -1854,7 +1863,7 @@
 	CHECK_STRING_BYTES (from->string);
 
       if (from->string)
-	nbytes = GC_STRING_BYTES (from->string);
+	nbytes = string_bytes (from->string);
       else
 	nbytes = SDATA_NBYTES (from);
 
@@ -2000,8 +2009,8 @@
   /* Determine the number of bytes needed to store NBYTES bytes
      of string data.  */
   needed = SDATA_SIZE (nbytes);
-  old_data = s->data ? SDATA_OF_STRING (s) : NULL;
-  old_nbytes = GC_STRING_BYTES (s);
+  old_data = s->u.dat.data ? SDATA_OF_STRING (s) : NULL;
+  old_nbytes = string_bytes (s);
 
   MALLOC_BLOCK_INPUT;
 
@@ -2060,13 +2069,11 @@
   MALLOC_UNBLOCK_INPUT;
 
   data->string = s;
-  s->data = SDATA_DATA (data);
+  s->u.dat.data = SDATA_DATA (data);
 #ifdef GC_CHECK_STRING_BYTES
   SDATA_NBYTES (data) = nbytes;
 #endif
-  s->size = nchars;
-  s->size_byte = nbytes;
-  s->data[nbytes] = '\0';
+  s->u.dat.data[nbytes] = '\0';
 #ifdef GC_CHECK_STRING_OVERRUN
   memcpy ((char *) data + needed, string_overrun_cookie,
 	  GC_STRING_OVERRUN_COOKIE_SIZE);
@@ -2084,6 +2091,12 @@
   consing_since_gc += needed;
 }
 
+#ifdef GC_STRING_STATS
+
+static EMACS_INT total_imm_strings, total_dat_strings;
+static EMACS_INT total_imm_bytes, total_dat_bytes;
+
+#endif
 
 /* Sweep and compact strings.  */
 
@@ -2097,6 +2110,11 @@
   total_strings = total_free_strings = 0;
   total_string_size = 0;
 
+#ifdef GC_STRING_STATS
+  total_imm_strings = total_dat_strings = 0;
+  total_imm_bytes = total_dat_bytes = 0;
+#endif
+
   /* Scan strings_blocks, free Lisp_Strings that aren't marked.  */
   for (b = string_blocks; b; b = next)
     {
@@ -2109,49 +2127,60 @@
 	{
 	  struct Lisp_String *s = b->strings + i;
 
-	  if (s->data)
+	  if (STRING_MARKED_P (s))
+	    {	      
+	      /* String is live; unmark it and its intervals.  */
+	      UNMARK_STRING (s);
+
+	      if (!NULL_INTERVAL_P (s->intervals))
+		UNMARK_BALANCE_INTERVALS (s->intervals);
+
+	      ++total_strings;
+	      total_string_size += string_bytes (s);
+#ifdef GC_STRING_STATS
+	      if (s->immbit)
+		{
+		  total_imm_strings++;
+		  total_imm_bytes += string_bytes (s);
+		}
+	      else
+		{
+		  total_dat_strings++;
+		  total_dat_bytes += string_bytes (s);
+		}
+#endif /* GC_STRING_STATS */
+	    }
+	  else
 	    {
-	      /* String was not on free-list before.  */
-	      if (STRING_MARKED_P (s))
-		{
-		  /* String is live; unmark it and its intervals.  */
-		  UNMARK_STRING (s);
-
-		  if (!NULL_INTERVAL_P (s->intervals))
-		    UNMARK_BALANCE_INTERVALS (s->intervals);
-
-		  ++total_strings;
-		  total_string_size += STRING_BYTES (s);
-		}
+	      if (s->immbit)
+		/* Fill data with special pattern. Used by
+		   GC to find dead immediate strings.  */
+		memset (s->u.imm.data, 0xff, STRING_IMM_MAX);
 	      else
 		{
-		  /* String is dead.  Put it on the free-list.  */
-		  struct sdata *data = SDATA_OF_STRING (s);
+		  if (s->u.dat.data)
+		    {
+		      /* String is dead.  Put it on the free-list.  */
+		      struct sdata *data = SDATA_OF_STRING (s);
 
-		  /* Save the size of S in its sdata so that we know
-		     how large that is.  Reset the sdata's string
-		     back-pointer so that we know it's free.  */
+		      /* Save the size of S in its sdata so that we know
+			 how large that is.  Reset the sdata's string
+			 back-pointer so that we know it's free.  */
 #ifdef GC_CHECK_STRING_BYTES
-		  if (GC_STRING_BYTES (s) != SDATA_NBYTES (data))
-		    abort ();
+		      if (string_bytes (s) != SDATA_NBYTES (data))
+			abort ();
 #else
-		  data->u.nbytes = GC_STRING_BYTES (s);
+		      data->u.nbytes = string_bytes (s);
 #endif
-		  data->string = NULL;
-
-		  /* Reset the strings's `data' member so that we
-		     know it's free.  */
-		  s->data = NULL;
-
-		  /* Put the string on the free-list.  */
-		  NEXT_FREE_LISP_STRING (s) = string_free_list;
-		  string_free_list = s;
-		  ++nfree;
+		      data->string = NULL;
+
+		      /* Reset the strings's `data' member
+			 so that we know it's free.  */
+		      s->u.dat.data = NULL;
+		    }
 		}
-	    }
-	  else
-	    {
-	      /* S was on the free-list before.  Put it there again.  */
+
+	      /* Put the string on the free-list.  */
 	      NEXT_FREE_LISP_STRING (s) = string_free_list;
 	      string_free_list = s;
 	      ++nfree;
@@ -2243,12 +2272,12 @@
 	  /* Check that the string size recorded in the string is the
 	     same as the one recorded in the sdata structure. */
 	  if (from->string
-	      && GC_STRING_BYTES (from->string) != SDATA_NBYTES (from))
+	      && string_bytes (from->string) != SDATA_NBYTES (from))
 	    abort ();
 #endif /* GC_CHECK_STRING_BYTES */
 
 	  if (from->string)
-	    nbytes = GC_STRING_BYTES (from->string);
+	    nbytes = string_bytes (from->string);
 	  else
 	    nbytes = SDATA_NBYTES (from);
 
@@ -2284,7 +2313,7 @@
 		{
 		  xassert (tb != b || to < from);
 		  memmove (to, from, nbytes + GC_STRING_EXTRA);
-		  to->string->data = SDATA_DATA (to);
+		  to->string->u.dat.data = SDATA_DATA (to);
 		}
 
 	      /* Advance past the sdata we copied to.  */
@@ -2533,7 +2562,19 @@
     return empty_multibyte_string;
 
   s = allocate_string ();
-  allocate_string_data (s, nchars, nbytes);
+  if (nbytes < STRING_IMM_MAX)
+    {
+      s->immbit = 1;
+      s->u.imm.size = nchars;
+      s->u.imm.size_byte = nbytes;
+    }
+  else
+    {
+      s->immbit = 0;
+      s->u.dat.size = nchars;
+      s->u.dat.size_byte = nbytes;
+      allocate_string_data (s, nchars, nbytes);
+    }
   XSETSTRING (string, s);
   string_chars_consed += nbytes;
   return string;
@@ -3884,6 +3925,22 @@
   x->color = MEM_BLACK;
 }
 
+/* Non-zero if data of S is valid.  */
+
+static inline int
+live_string_data_p (struct Lisp_String *s)
+{
+  if (s->immbit)
+    {
+      unsigned char *p;
+
+      for (p = s->u.imm.data; p < s->u.imm.data + STRING_IMM_MAX; p++)
+	if (*p != 0xff)
+	  return 1;
+      return 0;
+    }
+  return s->u.dat.data != NULL;
+}
 
 /* Value is non-zero if P is a pointer to a live Lisp string on
    the heap.  M is a pointer to the mem_block for P.  */
@@ -3901,7 +3958,7 @@
       return (offset >= 0
 	      && offset % sizeof b->strings[0] == 0
 	      && offset < (STRING_BLOCK_SIZE * sizeof b->strings[0])
-	      && ((struct Lisp_String *) p)->data != NULL);
+	      && live_string_data_p ((struct Lisp_String *) p));
     }
   else
     return 0;
@@ -4801,15 +4858,29 @@
   struct Lisp_String *s;
 
   s = (struct Lisp_String *) pure_alloc (sizeof *s, Lisp_String);
-  s->data = (unsigned char *) find_string_data_in_pure (data, nbytes);
-  if (s->data == NULL)
-    {
-      s->data = (unsigned char *) pure_alloc (nbytes + 1, -1);
-      memcpy (s->data, data, nbytes);
-      s->data[nbytes] = '\0';
-    }
-  s->size = nchars;
-  s->size_byte = multibyte ? nbytes : -1;
+
+  if (nbytes < STRING_IMM_MAX)
+    {
+      memcpy (s->u.imm.data, data, nbytes);
+      s->u.imm.data[nbytes] = '\0';
+      s->immbit = 1;
+      s->u.imm.size = nchars;
+      s->u.imm.size_byte = multibyte ? nbytes : -1;
+    }
+  else
+    {
+      s->u.dat.data = (unsigned char *) find_string_data_in_pure (data, nbytes);
+      if (s->u.dat.data == NULL)
+	{
+	  s->u.dat.data = (unsigned char *) pure_alloc (nbytes + 1, -1);
+	  memcpy (s->u.dat.data, data, nbytes);
+	  s->u.dat.data[nbytes] = '\0';
+	}
+      s->immbit = 0;
+      s->u.dat.size = nchars;
+      s->u.dat.size_byte = multibyte ? nbytes : -1;
+    }
+
   s->intervals = NULL_INTERVAL;
   XSETSTRING (string, s);
   return string;
@@ -4826,9 +4897,23 @@
   EMACS_INT nchars = strlen (data);
 
   s = (struct Lisp_String *) pure_alloc (sizeof *s, Lisp_String);
-  s->size = nchars;
-  s->size_byte = -1;
-  s->data = (unsigned char *) data;
+
+  if (nchars < STRING_IMM_MAX)
+    {
+      memcpy (s->u.imm.data, data, nchars);
+      s->u.imm.data[nchars] = '\0';
+      s->immbit = 1;
+      s->u.imm.size = nchars;
+      s->u.imm.size_byte = -1;
+    }
+  else
+    {
+      s->u.dat.data = (unsigned char *) data;
+      s->immbit = 0;
+      s->u.dat.size = nchars;
+      s->u.dat.size_byte = -1;
+    }
+
   s->intervals = NULL_INTERVAL;
   XSETSTRING (string, s);
   return string;
@@ -6250,6 +6335,31 @@
   return Flist (8, consed);
 }
 
+#ifdef GC_STRING_STATS
+
+DEFUN ("string-stats", Fstring_stats, Sstring_stats, 0, 0, 0,
+       doc: /* Return a list of counters that measures how much
+strings of a particular internal structure are alive after last
+garbage collection, and how many bytes are in them.
+The elements of the value are are as follows:
+  (IMM-STRINGS IMM-BYTES DAT-STRINGS DAT-BYTES)
+where IMM-STRINGS is the number of immediate strings, IMM-BYTES
+is the total number of bytes in them, DAT-STRINGS is the number of
+normal strings and DAT-BYES is the total number of bytes in them.  */)
+  (void)
+{
+  Lisp_Object data[4];
+
+  data[0] = make_number (min (MOST_POSITIVE_FIXNUM, total_imm_strings));
+  data[1] = make_number (min (MOST_POSITIVE_FIXNUM, total_imm_bytes));
+  data[2] = make_number (min (MOST_POSITIVE_FIXNUM, total_dat_strings));
+  data[3] = make_number (min (MOST_POSITIVE_FIXNUM, total_dat_bytes));
+
+  return Flist (4, data);
+}
+
+#endif /* GC_STRING_STATS */
+
 /* Find at most FIND_MAX symbols which have OBJ as their value or
    function.  This is used in gdbinit's `xwhichsymbols' command.  */
 
@@ -6475,7 +6585,9 @@
   defsubr (&Sgarbage_collect);
   defsubr (&Smemory_limit);
   defsubr (&Smemory_use_counts);
-
+#ifdef GC_STRING_STATS
+  defsubr (&Sstring_stats);
+#endif
 #if GC_MARK_STACK == GC_USE_GCPROS_CHECK_ZOMBIES
   defsubr (&Sgc_status);
 #endif

=== modified file 'src/fns.c'
--- src/fns.c	2011-11-19 09:18:31 +0000
+++ src/fns.c	2011-11-28 05:21:59 +0000
@@ -2176,8 +2176,8 @@
 	  int len = CHAR_STRING (charval, str);
 	  EMACS_INT size_byte = SBYTES (array);
 
-	  if (INT_MULTIPLY_OVERFLOW (SCHARS (array), len)
-	      || SCHARS (array) * len != size_byte)
+	  if (INT_MULTIPLY_OVERFLOW (size, len)
+	      || size * len != size_byte)
 	    error ("Attempt to change byte length of a string");
 	  for (idx = 0; idx < size_byte; idx++)
 	    *p++ = str[idx % len];

=== modified file 'src/lisp.h'
--- src/lisp.h	2011-11-27 18:52:53 +0000
+++ src/lisp.h	2011-11-28 06:01:45 +0000
@@ -696,17 +696,23 @@
 
 /* Convenience macros for dealing with Lisp strings.  */
 
-#define SDATA(string)		(XSTRING (string)->data + 0)
+#define SDATA(string)		(XSTRING (string)->immbit ? \
+				 (XSTRING (string)->u.imm.data + 0) : \
+				 (XSTRING (string)->u.dat.data + 0))
 #define SREF(string, index)	(SDATA (string)[index] + 0)
 #define SSET(string, index, new) (SDATA (string)[index] = (new))
-#define SCHARS(string)		(XSTRING (string)->size + 0)
-#define SBYTES(string)		(STRING_BYTES (XSTRING (string)) + 0)
+#define SCHARS(string)		(XSTRING (string)->immbit ? \
+				 (XSTRING (string)->u.imm.size + 0) : \
+				 (XSTRING (string)->u.dat.size + 0))
+#define SBYTES(string)		(string_bytes (XSTRING (string)) + 0)
 
 /* Avoid "differ in sign" warnings.  */
 #define SSDATA(x)  ((char *) SDATA (x))
 
 #define STRING_SET_CHARS(string, newsize) \
-    (XSTRING (string)->size = (newsize))
+  (XSTRING (string)->immbit ? \
+   (XSTRING (string)->u.imm.size = (newsize)) : \
+   (XSTRING (string)->u.dat.size = (newsize)))
 
 #define STRING_COPYIN(string, index, new, count) \
     memcpy (SDATA (string) + index, new, count)
@@ -796,24 +802,12 @@
 #define CDR_SAFE(c)				\
   (CONSP ((c)) ? XCDR ((c)) : Qnil)
 
+#define STRING_SIZE_BYTE(string) (XSTRING (string)->immbit ? \
+				  XSTRING (string)->u.imm.size_byte : \
+				  XSTRING (string)->u.dat.size_byte)
+
 /* Nonzero if STR is a multibyte string.  */
-#define STRING_MULTIBYTE(STR)  \
-  (XSTRING (STR)->size_byte >= 0)
-
-/* Return the length in bytes of STR.  */
-
-#ifdef GC_CHECK_STRING_BYTES
-
-struct Lisp_String;
-extern EMACS_INT string_bytes (struct Lisp_String *);
-#define STRING_BYTES(S) string_bytes ((S))
-
-#else /* not GC_CHECK_STRING_BYTES */
-
-#define STRING_BYTES(STR)  \
-  ((STR)->size_byte < 0 ? (STR)->size : (STR)->size_byte)
-
-#endif /* not GC_CHECK_STRING_BYTES */
+#define STRING_MULTIBYTE(string) (STRING_SIZE_BYTE (string) > 0)
 
 /* An upper bound on the number of bytes in a Lisp string, not
    counting the terminating null.  This a tight enough bound to
@@ -829,18 +823,28 @@
 #define STRING_BYTES_BOUND  \
   min (MOST_POSITIVE_FIXNUM, (ptrdiff_t) min (SIZE_MAX, PTRDIFF_MAX) - 1)
 
+/* Maximum amount of bytes, including '\0', in an immediate string.
+   This assumes that sizeof (EMACS_INT) is equal to sizeof (void * ).  */
+#define STRING_IMM_MAX (3 * sizeof (EMACS_INT) - 2)
+
 /* Mark STR as a unibyte string.  */
 #define STRING_SET_UNIBYTE(STR)  \
   do { if (EQ (STR, empty_multibyte_string))  \
       (STR) = empty_unibyte_string;  \
-    else XSTRING (STR)->size_byte = -1; } while (0)
+    else if (XSTRING (STR)->immbit) \
+      XSTRING (STR)->u.imm.size_byte = -1; \
+    else \
+      XSTRING (STR)->u.dat.size_byte = -1; } while (0)
 
 /* Mark STR as a multibyte string.  Assure that STR contains only
    ASCII characters in advance.  */
 #define STRING_SET_MULTIBYTE(STR)  \
   do { if (EQ (STR, empty_unibyte_string))  \
       (STR) = empty_multibyte_string;  \
-    else XSTRING (STR)->size_byte = XSTRING (STR)->size; } while (0)
+    else if (XSTRING (STR)->immbit) \
+      XSTRING (STR)->u.imm.size_byte = XSTRING (STR)->u.imm.size; \
+    else \
+      XSTRING (STR)->u.dat.size_byte = XSTRING (STR)->u.dat.size; } while (0)
 
 /* Get text properties.  */
 #define STRING_INTERVALS(STR)  (XSTRING (STR)->intervals + 0)
@@ -848,16 +852,59 @@
 /* Set text properties.  */
 #define STRING_SET_INTERVALS(STR, INT) (XSTRING (STR)->intervals = (INT))
 
-/* In a string or vector, the sign bit of the `size' is the gc mark bit */
-
 struct Lisp_String
   {
-    EMACS_INT size;
-    EMACS_INT size_byte;
-    INTERVAL intervals;		/* text properties in this string */
-    unsigned char *data;
+    /* Text properties in this string.  */
+    INTERVAL intervals;
+
+    /* Mark bit used for GC.  */
+    unsigned gcmarkbit : 1;
+
+    /* String subtype.  */
+    unsigned immbit : 1;
+
+    union {
+
+      /* When IMMBIT is 1. */
+      struct {
+	EMACS_INT size : 7;
+	EMACS_INT size_byte : 7;
+	unsigned char data[STRING_IMM_MAX];
+      } imm;
+      
+      /* When IMMBIT is 0.  */
+      struct {
+	EMACS_INT size : BITS_PER_EMACS_INT - 1;
+	EMACS_INT size_byte : BITS_PER_EMACS_INT - 1;
+	unsigned char *data;
+      } dat;
+    } u;
   };
 
+/* Return the length in bytes of STR.  */
+
+#ifdef GC_CHECK_STRING_BYTES
+
+/* Slower version with debugging check.  */
+
+extern EMACS_INT string_bytes (struct Lisp_String *);
+
+#else /* not GC_CHECK_STRING_BYTES */
+
+static inline
+EMACS_INT string_bytes (struct Lisp_String *s)
+{
+  EMACS_INT size, size_byte;
+
+  if (s->immbit)
+    size = s->u.imm.size, size_byte = s->u.imm.size_byte;
+  else
+    size = s->u.dat.size, size_byte = s->u.dat.size_byte;
+  return size_byte < 0 ? size : size_byte;
+}
+
+#endif /* GC_CHECK_STRING_BYTES */
+
 /* Header of vector-like objects.  This documents the layout constraints on
    vectors and pseudovectors other than struct Lisp_Subr.  It also prevents
    compilers from being fooled by Emacs's type punning: the XSETPSEUDOVECTOR


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-28  9:11 immediate strings #2 Dmitry Antipov
@ 2011-11-28 17:33 ` Stefan Monnier
  2011-11-28 19:48   ` Ken Raeburn
                     ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Stefan Monnier @ 2011-11-28 17:33 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>  struct Lisp_String
>    {
> +    /* Text properties in this string.  */
> +    INTERVAL intervals;
> +
> +    /* Mark bit used for GC.  */
> +    unsigned gcmarkbit : 1;
> +
> +    /* String subtype.  */
> +    unsigned immbit : 1;
> +
> +    union {
> +
> +      /* When IMMBIT is 1. */
> +      struct {
> +	EMACS_INT size : 7;
> +	EMACS_INT size_byte : 7;
> +	unsigned char data[STRING_IMM_MAX];
> +      } imm;
> +      
> +      /* When IMMBIT is 0.  */
> +      struct {
> +	EMACS_INT size : BITS_PER_EMACS_INT - 1;
> +	EMACS_INT size_byte : BITS_PER_EMACS_INT - 1;
> +	unsigned char *data;
> +      } dat;
> +    } u;
>    };

I don't know any C compiler able to allocate unions at the bit level, so
the above struct will have the following layout:

   INTERVAL:  32
   gcmarkbit: 1
   immbit:    1
   <padding>: 30
   union:     96

I'm not sure about the layout of dat.size_byte, but I could even imagine
it straddling two words.  You need to move the immbit and gcmarkbit
into the union :-(

It's great to see that it can speed up compilation, tho (although
the 1.3% difference could just as well be due to noise).  You might want
to check what proportion of those strings have a NULL `intervals' field.
   

        Stefan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-28 17:33 ` Stefan Monnier
@ 2011-11-28 19:48   ` Ken Raeburn
  2011-11-28 20:10   ` Andreas Schwab
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Ken Raeburn @ 2011-11-28 19:48 UTC (permalink / raw)
  To: Emacs Dev

On Nov 28, 2011, at 12:33, Stefan Monnier wrote:
> I don't know any C compiler able to allocate unions at the bit level, so
> the above struct will have the following layout:

The union still must be addressable (even without a tag or typedefname, the address could be converted to void* and used for something, in theory), so non-byte-aligned addressing would be broken anyways (assuming a lack of bit-level addressing in pointers, which exists but isn't common).

Ken


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-28 17:33 ` Stefan Monnier
  2011-11-28 19:48   ` Ken Raeburn
@ 2011-11-28 20:10   ` Andreas Schwab
  2011-11-28 21:54     ` Stefan Monnier
  2011-11-28 22:18   ` Paul Eggert
  2011-11-29  5:29   ` Dmitry Antipov
  3 siblings, 1 reply; 21+ messages in thread
From: Andreas Schwab @ 2011-11-28 20:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Dmitry Antipov, emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

> I don't know any C compiler able to allocate unions at the bit level,

It can't since it's not a bitfield (which isn't possible anyway).

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-28 20:10   ` Andreas Schwab
@ 2011-11-28 21:54     ` Stefan Monnier
  2011-11-28 22:25       ` Andreas Schwab
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Monnier @ 2011-11-28 21:54 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Dmitry Antipov, emacs-devel

>> I don't know any C compiler able to allocate unions at the bit level,
> It can't since it's not a bitfield (which isn't possible anyway).

Not that it's relevant to Emacs, but I don't know which part of the
C standard would force a C compiler to layout all unions at an
"addressable" offset.  I mean, wouldn't it be valid for a compiler to
analyze the whole program and decide "oh, the code uses the union in
such a way that I can bit-align it to save some padding space and noone
will notice"?


        Stefan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-28 17:33 ` Stefan Monnier
  2011-11-28 19:48   ` Ken Raeburn
  2011-11-28 20:10   ` Andreas Schwab
@ 2011-11-28 22:18   ` Paul Eggert
  2011-11-29  2:07     ` Stefan Monnier
  2011-11-29  3:17     ` Dmitry Antipov
  2011-11-29  5:29   ` Dmitry Antipov
  3 siblings, 2 replies; 21+ messages in thread
From: Paul Eggert @ 2011-11-28 22:18 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Dmitry Antipov, emacs-devel

On 11/28/11 09:33, Stefan Monnier wrote:
> You need to move the immbit and gcmarkbit
> into the union :-(

Yes, something like this perhaps?  It also adds a bit of checking for
underlying assumptions about word size.

#include <verify.h>

#define IMMEDIATE_STRING_LENGTH_BITS 7

struct Data_Lisp_String
  {
    unsigned int immediate_bit : 1;
    signed int : IMMEDIATE_STRING_LENGTH_BITS; /* padding for immediate size */
    unsigned int gcmarkbit : 1;
    signed int : IMMEDIATE_STRING_LENGTH_BITS; /* and for immediate size_byte */
    INTERVAL intervals;
    ptrdiff_t size;
    ptrdiff_t size_byte;
    unsigned char *data;
  };

#define IMMEDIATE_STRING_SIZE \
  (sizeof (struct Data_Lisp_String) - offsetof (struct Data_Lisp_String, size))
verify (IMMEDIATE_STRING_SIZE <= 1 << (IMMEDIATE_STRING_LENGTH_BITS - 1));

struct Immediate_Lisp_String
  {
    unsigned int immediate_bit : 1;
    signed int size : IMMEDIATE_STRING_LENGTH_BITS;
    unsigned int gcmarkbit : 1;
    signed int size_byte : IMMEDIATE_STRING_LENGTH_BITS;
    INTERVAL intervals;
    unsigned char data[IMMEDIATE_STRING_SIZE];
  };

union Lisp_String
  {
    /* When IMMEDIATE.IMMEDIATE_BIT is 0.  */
    struct Data_Lisp_String data;

    /* When IMMEDIATE.IMMEDIATE_BIT is 1. */
    struct Immediate_Lisp_String immediate;
  };



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-28 21:54     ` Stefan Monnier
@ 2011-11-28 22:25       ` Andreas Schwab
  2011-11-29  0:57         ` Ken Raeburn
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Schwab @ 2011-11-28 22:25 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Dmitry Antipov, emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

> Not that it's relevant to Emacs, but I don't know which part of the
> C standard would force a C compiler to layout all unions at an
> "addressable" offset.

6.7.2.1#13

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-28 22:25       ` Andreas Schwab
@ 2011-11-29  0:57         ` Ken Raeburn
  2011-11-29  8:44           ` Andreas Schwab
  0 siblings, 1 reply; 21+ messages in thread
From: Ken Raeburn @ 2011-11-29  0:57 UTC (permalink / raw)
  To: Emacs Dev

On Nov 28, 2011, at 17:25, Andreas Schwab wrote:
> Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
> 
>> Not that it's relevant to Emacs, but I don't know which part of the
>> C standard would force a C compiler to layout all unions at an
>> "addressable" offset.
> 
> 6.7.2.1#13

Well, I think Stefan's technically right... the "as-if" rule lets the compiler get away with a lot, if it can analyze enough of the program to figure out that it wouldn't make a difference to the semantics ("no one will notice", as Stefan put it).  For example, gcc can make some variables that have their addresses taken still live in registers anyways.  But few or none of the compilers we care about right now will do that when dealing with structure layouts and multiple source files as in Emacs; they'll implement something close enough to the abstract machine description in the standard that the union would have to be addressable.

Ken


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-28 22:18   ` Paul Eggert
@ 2011-11-29  2:07     ` Stefan Monnier
  2011-11-29  3:37       ` Dmitry Antipov
  2011-11-29  8:50       ` Paul Eggert
  2011-11-29  3:17     ` Dmitry Antipov
  1 sibling, 2 replies; 21+ messages in thread
From: Stefan Monnier @ 2011-11-29  2:07 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Dmitry Antipov, emacs-devel

> struct Data_Lisp_String
>   {
>     unsigned int immediate_bit : 1;
>     signed int : IMMEDIATE_STRING_LENGTH_BITS; /* padding for immediate size */
>     unsigned int gcmarkbit : 1;
>     signed int : IMMEDIATE_STRING_LENGTH_BITS; /* and for immediate size_byte */
>     INTERVAL intervals;
>     ptrdiff_t size;
>     ptrdiff_t size_byte;
>     unsigned char *data;
>   };

Why?  IIUC that sums up to 5x32bit, which will break the "multiple of
8 alignment" rule and hence will need to be be rounded up to 6x32bit,
for an overall  increase of 50% in the size of struct Lisp_String.
I.e. a non-starter.  There are bits available in size and size_byte, we
have to use those (like we currently do with gcmarkbit in `size').


        Stefan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-28 22:18   ` Paul Eggert
  2011-11-29  2:07     ` Stefan Monnier
@ 2011-11-29  3:17     ` Dmitry Antipov
  1 sibling, 0 replies; 21+ messages in thread
From: Dmitry Antipov @ 2011-11-29  3:17 UTC (permalink / raw)
  To: emacs-devel; +Cc: Paul Eggert, Stefan Monnier

On 11/29/2011 02:18 AM, Paul Eggert wrote:

> union Lisp_String
>    {
>      /* When IMMEDIATE.IMMEDIATE_BIT is 0.  */
>      struct Data_Lisp_String data;
>
>      /* When IMMEDIATE.IMMEDIATE_BIT is 1. */
>      struct Immediate_Lisp_String immediate;
>    };

This is possible, of course, but personally I don't like
this for aesthetic reasons - Lisp_String is a basic type,
and it should stays basic without too much data type
bloating like Lisp_Misc.

Dmitry



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-29  2:07     ` Stefan Monnier
@ 2011-11-29  3:37       ` Dmitry Antipov
  2011-11-29  8:50       ` Paul Eggert
  1 sibling, 0 replies; 21+ messages in thread
From: Dmitry Antipov @ 2011-11-29  3:37 UTC (permalink / raw)
  To: emacs-devel; +Cc: Paul Eggert, Stefan Monnier

Hm, this is still 40 bytes on 64-bit (and so 20 on 32-bit):

struct Lisp_String
   {
     INTERVAL intervals;

     union {

       struct {
         unsigned gcmarkbit : 1;
         unsigned immbit : 1;
         EMACS_INT size : 7;
         EMACS_INT size_byte : 7;
         unsigned char data[STRING_IMM_MAX];
       } imm;

       struct {
         unsigned unused : 2;
         EMACS_INT size : BITS_PER_EMACS_INT - 1;
         EMACS_INT size_byte : BITS_PER_EMACS_INT - 1;
         unsigned char *data;
       } dat;

     } u;
   };

The only way I found to fit it within 32 (or 16, respectively) bytes is:

struct Lisp_String
   {
     INTERVAL intervals;

     union {

       struct {
         unsigned gcmarkbit : 1;
         unsigned immbit : 1;
         EMACS_INT size : 7;
         EMACS_INT size_byte : 7;
         unsigned char data[STRING_IMM_MAX];
       } imm;

       struct {
         unsigned unused : 2;
         EMACS_INT size : BITS_PER_EMACS_INT - 1;
         EMACS_INT size_byte : BITS_PER_EMACS_INT - 1;
         unsigned char *data;
       } __attribute__ ((packed)) dat;

     } u;
   };

It's worth mentioning that only DAT should be packed, not IMM.

Dmitry




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-28 17:33 ` Stefan Monnier
                     ` (2 preceding siblings ...)
  2011-11-28 22:18   ` Paul Eggert
@ 2011-11-29  5:29   ` Dmitry Antipov
  3 siblings, 0 replies; 21+ messages in thread
From: Dmitry Antipov @ 2011-11-29  5:29 UTC (permalink / raw)
  To: emacs-devel; +Cc: Stefan Monnier

On 11/28/2011 09:33 PM, Stefan Monnier wrote:

> It's great to see that it can speed up compilation, tho (although
> the 1.3% difference could just as well be due to noise).

This noise is quite repetitive, and it should be even more repetitive
and visible after fitting Lisp_String within 32 (or 16, on 32-bit) bytes.

> You might want to check what proportion of those strings have a
> NULL `intervals' field.

I believe it's typical to have 20-50 intervals for 10000 strings, so
it's worth trying to store string intervals separately (in a kind
of hash table, for example). On the other side, there is a reason to
have extra sizeof(void *) bytes at the beginning of Lisp_String - to
use by NEXT_FREE_LISP_STRING.

Dmitry



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-29  0:57         ` Ken Raeburn
@ 2011-11-29  8:44           ` Andreas Schwab
  2011-11-29 15:48             ` Ken Raeburn
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Schwab @ 2011-11-29  8:44 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: Emacs Dev

Ken Raeburn <raeburn@raeburn.org> writes:

> Well, I think Stefan's technically right... the "as-if" rule lets the
> compiler get away with a lot, if it can analyze enough of the program
> to figure out that it wouldn't make a difference to the semantics

Only if the address is never taken, or sizeof is never applied.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-29  2:07     ` Stefan Monnier
  2011-11-29  3:37       ` Dmitry Antipov
@ 2011-11-29  8:50       ` Paul Eggert
  2011-11-30  5:37         ` Dmitry Antipov
  1 sibling, 1 reply; 21+ messages in thread
From: Paul Eggert @ 2011-11-29  8:50 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Dmitry Antipov, emacs-devel

On 11/28/11 18:07, Stefan Monnier wrote:
> There are bits available in size and size_byte, we
> have to use those (like we currently do with gcmarkbit in `size')

There is a bit available in size (since it's always nonnegative)
but not in size_byte (since it ranges from -1 .. PTRDIFF_MAX
and is a ptrdiff_t, assuming a 32-bit host configured --with-wide-int
and assuming the memory-saving patch of Bug#9874).

This is in contrast with our current uses of mark bits (e.g.,
ARRAY_MARK_FLAG), which use bits that are otherwise unused, even if
a vector has its maximal size.

We can fairly easily get that bit back from size_byte by restricting
its range to (say) 0 .. PTRDIFF_MAX.

So this is a fairly minor glitch that can be fixed but is not yet
fixed in the current proposal.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-29  8:44           ` Andreas Schwab
@ 2011-11-29 15:48             ` Ken Raeburn
  2011-11-29 16:08               ` Andreas Schwab
  0 siblings, 1 reply; 21+ messages in thread
From: Ken Raeburn @ 2011-11-29 15:48 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Emacs Dev

On Nov 29, 2011, at 03:44, Andreas Schwab wrote:
> Ken Raeburn <raeburn@raeburn.org> writes:
> 
>> Well, I think Stefan's technically right... the "as-if" rule lets the
>> compiler get away with a lot, if it can analyze enough of the program
>> to figure out that it wouldn't make a difference to the semantics
> 
> Only if the address is never taken, or sizeof is never applied.

If the compiler really wants to play some games in the name of space optimization, it could pack the type as densely as possible (rearrange or eliminate fields, limit integer or pointer fields to the number of bits that will actually get used, etc) for actual storage, and still print sizes and offsets consistent with the official alignment rules for the platform, if it can keep the two cases straight.  (Allocating the actually-used smaller size instead of the larger "normal" size would still be consistent as long as the difference isn't visible by certain criteria; I'm pretty sure process size under "ps" isn't one of those criteria.)  It'd be a lot of effort and probably not worthwhile, but not outside of the rules.

Ken




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-29 15:48             ` Ken Raeburn
@ 2011-11-29 16:08               ` Andreas Schwab
  2011-11-30 16:43                 ` Ken Raeburn
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Schwab @ 2011-11-29 16:08 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: Emacs Dev

Ken Raeburn <raeburn@raeburn.org> writes:

> It'd be a lot of effort and probably not worthwhile, but not outside
> of the rules.

It would also be completely pointless.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-29  8:50       ` Paul Eggert
@ 2011-11-30  5:37         ` Dmitry Antipov
  2011-11-30  9:35           ` Paul Eggert
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry Antipov @ 2011-11-30  5:37 UTC (permalink / raw)
  To: emacs-devel; +Cc: Paul Eggert, Stefan Monnier

On 11/29/2011 12:50 PM, Paul Eggert wrote:

> There is a bit available in size (since it's always nonnegative)
> but not in size_byte (since it ranges from -1 .. PTRDIFF_MAX
> and is a ptrdiff_t, assuming a 32-bit host configured --with-wide-int
> and assuming the memory-saving patch of Bug#9874).
>
> This is in contrast with our current uses of mark bits (e.g.,
> ARRAY_MARK_FLAG), which use bits that are otherwise unused, even if
> a vector has its maximal size.
>
> We can fairly easily get that bit back from size_byte by restricting
> its range to (say) 0 .. PTRDIFF_MAX.
>
> So this is a fairly minor glitch that can be fixed but is not yet
> fixed in the current proposal.

It looks like I miss the point. For an immediate strings, we need
two extra bits - 1 for GC, 1 for string subtype, so the question is
simple: where to get them if both size and size_byte are ptrdiff_t?

Dmitry



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-30  5:37         ` Dmitry Antipov
@ 2011-11-30  9:35           ` Paul Eggert
  2011-11-30 16:43             ` Ken Raeburn
  0 siblings, 1 reply; 21+ messages in thread
From: Paul Eggert @ 2011-11-30  9:35 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

On 11/29/11 21:37, Dmitry Antipov wrote:
> It looks like I miss the point. For an immediate strings, we need
> two extra bits - 1 for GC, 1 for string subtype, so the question is
> simple: where to get them if both size and size_byte are ptrdiff_t?

You can get 1 bit from 'size' since it's always nonnegative.

You can get the other bit from 'size_byte' if you change it
so that it's always nonnegative.  This can be done by using
0 rather than -1 as its special value indicating that it
is unibyte and the actual byte size is given in 'size'.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-29 16:08               ` Andreas Schwab
@ 2011-11-30 16:43                 ` Ken Raeburn
  0 siblings, 0 replies; 21+ messages in thread
From: Ken Raeburn @ 2011-11-30 16:43 UTC (permalink / raw)
  To: Emacs Dev

On Nov 29, 2011, at 11:08, Andreas Schwab wrote:
> Ken Raeburn <raeburn@raeburn.org> writes:
> 
>> It'd be a lot of effort and probably not worthwhile, but not outside
>> of the rules.
> 
> It would also be completely pointless.

No argument there.  The standard permits a lot of behavior that would be pointless, or at least seem so but perhaps have some point in some bizarre use case....

Ken


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-30  9:35           ` Paul Eggert
@ 2011-11-30 16:43             ` Ken Raeburn
  2011-11-30 21:44               ` Paul Eggert
  0 siblings, 1 reply; 21+ messages in thread
From: Ken Raeburn @ 2011-11-30 16:43 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Dmitry Antipov, emacs-devel

On Nov 30, 2011, at 04:35, Paul Eggert wrote:
> You can get the other bit from 'size_byte' if you change it
> so that it's always nonnegative.  This can be done by using
> 0 rather than -1 as its special value indicating that it
> is unibyte and the actual byte size is given in 'size'.

Currently it appears that we can have both unibyte and multibyte zero-length strings, and the multibyteness is preserved if you "concat" with a simple ASCII string.  I'm not sure any existing code in the wild would care, but it sounds like what you're suggesting would be a functional change that makes that no longer work.

Ken


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: immediate strings #2
  2011-11-30 16:43             ` Ken Raeburn
@ 2011-11-30 21:44               ` Paul Eggert
  0 siblings, 0 replies; 21+ messages in thread
From: Paul Eggert @ 2011-11-30 21:44 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: Dmitry Antipov, emacs-devel

On 11/30/11 08:43, Ken Raeburn wrote:
> Currently it appears that we can have both unibyte and multibyte zero-length strings

Thanks good point.  There are other possibilities.  We can use
PTRDIFF_MAX, not -1, as the special value indicating that
the string is unibyte.  Or we could have size_byte count
the trailing null, and use 0 as the special value; that
might be more efficient.  Or we could use a special
marker in the immediate data (after the trailing null), used
only with empty strings, to specify whether the empty string
is multibyte.  I'm sure there are other ways to do it --
the point is that we need not arbitrarily restrict strings
to half their size merely to get one special size value.



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-11-30 21:44 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-28  9:11 immediate strings #2 Dmitry Antipov
2011-11-28 17:33 ` Stefan Monnier
2011-11-28 19:48   ` Ken Raeburn
2011-11-28 20:10   ` Andreas Schwab
2011-11-28 21:54     ` Stefan Monnier
2011-11-28 22:25       ` Andreas Schwab
2011-11-29  0:57         ` Ken Raeburn
2011-11-29  8:44           ` Andreas Schwab
2011-11-29 15:48             ` Ken Raeburn
2011-11-29 16:08               ` Andreas Schwab
2011-11-30 16:43                 ` Ken Raeburn
2011-11-28 22:18   ` Paul Eggert
2011-11-29  2:07     ` Stefan Monnier
2011-11-29  3:37       ` Dmitry Antipov
2011-11-29  8:50       ` Paul Eggert
2011-11-30  5:37         ` Dmitry Antipov
2011-11-30  9:35           ` Paul Eggert
2011-11-30 16:43             ` Ken Raeburn
2011-11-30 21:44               ` Paul Eggert
2011-11-29  3:17     ` Dmitry Antipov
2011-11-29  5:29   ` Dmitry Antipov

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.