unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Pip Cet via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Geza Herman <geza.herman@gmail.com>
Cc: "Gerd Möllmann" <gerd.moellmann@gmail.com>,
	74547@debbugs.gnu.org, "Óscar Fuentes" <oscarfv@telefonica.net>
Subject: bug#74547: 31.0.50; igc: assertion failed in buffer.c
Date: Sun, 01 Dec 2024 21:15:41 +0000	[thread overview]
Message-ID: <87cyibksuu.fsf@protonmail.com> (raw)
In-Reply-To: <11705193-9151-43fb-9109-8a579e77d32b@gmail.com>

"Geza Herman" <geza.herman@gmail.com> writes:
>    On 12/1/24 16:48, Pip Cet wrote:
> Gerd M¶llmann <gerd.moellmann@gmail.com> writes:
>
>    Back then, the future of the new GC was a question, so Gerd said
>    (https://lists.gnu.org/archive/html/emacs-devel/2024-03/msg00544.html)
>    that
>    "Please don't take my GC efforts into consideration. That may succeed
>    or not. But this is also a matter of good design, using the stack,
>    (which BTW pdumper does, too), vs. bad design." That's why we went with
>    the fastest implementation that doesn't use lisp vectors for storage.
>    But we suspected that this JSON parser design will likely cause a
>    problem with the new GC. So I think even if it turned out that the
>    current problem was not caused by the parser, I still think that there
>    should be something done about this JSON parser design to eliminate
>    this potential problem. The lisp vector based approach was reverted
>    because it added an extra pressure to the GC. For large JSON messages,
>    it doesn't matter too much, but when the JSON is small, the extra GC
>    time made the parser measurably slower. But, as far as I remember, that
>    version hadn't have the small internal storage optimization yet. If we
>    convert back to the vector based approach, the extra GC pressure will
>    be smaller (compared to the original vector based approach without the
>    internal storage), as for smaller sizes the vector won't be actually
>    used.
>    G©za

Thank you for the summary, that makes sense. Is there a standard corpus
of JSON documents that you use to benchmark the code? That would be very
helpful, I think, since Eli correctly points out JSON parsing
performance is critical.

My gut feeling is that we should get rid of the object_workspace
entirely, instead modifying the general Lisp code to avoid performance
issues (and sacrifice some memory in the process, on most systems).

When building a hash table, we can do so directly and fine-tune the hash
table code not to reallocate too often. I'd imagine such fine tuning
would be generally useful (JSON-derived hash tables are unlikely to
exceed the initial 65 elements, but when they do, growing by a factor of
four might be appropriate).

When building a vector, we might want to introduce a `vtruncate'
function which destructively reduces a vector's size without
reallocating it. Again, I think that function would be useful in a few
other places.

Then we could start by allocating a 64-element vector, fill in the array
elements, grow it in the rare case that we exceed 64 elements, and
truncate it in the common case that fewer than 64 slots are used.  No
need to copy anything from the stack to the heap, the elements would
just appear where they are needed.

However, some memory would be wasted, and on low-memory systems, we
might want to define `vtruncate' and `hash-table-truncate' to reduce
memory usage.

Here's some code which might illustrate this idea, but WON'T WORK
yet, since the old GC requires vtruncate to do more work.

From 9c84c08fd9d4bb9c419025787663b7f7f2611952 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@protonmail.com>
Date: Sun, 1 Dec 2024 21:00:46 +0000
Subject: [PATCH] Optimize json.c performance

* src/alloc.c (Fvtruncate): New function.
(syms_of_alloc): Register it.
* src/fns.c (Fhash_table_truncate): New function.
(syms_of_fns): Register it.
* src/json.c (JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE): Rename to ...
(JSON_PARSER_INITIAL_VECTOR_SIZE): ... this.
(struct json_parser): Remove object workspace.
(json_parser_init): Adjust to removed object workspace.
(json_make_object_workspace_for_slow_path,
json_make_object_workspace_for): Remove.
(json_parse_array): Parse directly to a heap vector, truncate it when
done.
(json_parse_object): Parse directly to a hash table, truncate it when
done.
---
 src/alloc.c |  25 +++++++++
 src/fns.c   |  13 +++++
 src/json.c  | 143 +++++++++++++---------------------------------------
 3 files changed, 74 insertions(+), 107 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index 55126b1d551..b78c27ccf35 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -3935,6 +3935,30 @@ DEFUN ("vector", Fvector, Svector, 0, MANY, 0,
   return val;
 }
 
+DEFUN ("vtruncate", Fvtruncate, Svtruncate, 2, 2, 0,
+       doc: /* Truncate VECTOR so only the first LENGTH elements remain.
+
+Return a vector to be used instead of VECTOR, which may also be modified
+destructively.  This function is a performance optimization.  */)
+  (Lisp_Object length, Lisp_Object vector)
+{
+  CHECK_TYPE (FIXNATP (length), Qwholenump, length);
+  CHECK_TYPE (VECTORP (vector) &&
+	      PSEUDOVECTOR_TYPE (XVECTOR (vector)) == PVEC_NORMAL_VECTOR,
+	      Qvectorp, vector);
+  if (XFIXNAT (length) > ASIZE (vector))
+    signal_error ("vector too short", vector);
+  if (XFIXNAT (length) == 0)
+    return zero_vector;
+
+#ifdef HAVE_MPS
+  XVECTOR (vector)->header.size = XFIXNAT (length);
+  return vector;
+#else
+  return Fvector (XFIXNAT (length), XVECTOR (vector)->contents);
+#endif
+}
+
 DEFUN ("make-byte-code", Fmake_byte_code, Smake_byte_code, 4, MANY, 0,
        doc: /* Create a byte-code object with specified arguments as elements.
 The arguments should be the ARGLIST, bytecode-string BYTE-CODE, constant
@@ -8600,6 +8624,7 @@ syms_of_alloc (void)
   defsubr (&Sgarbage_collect_maybe);
   defsubr (&Smemory_info);
   defsubr (&Smemory_use_counts);
+  defsubr (&Svtruncate);
 #if defined GNU_LINUX && defined __GLIBC__ && \
   (__GLIBC__ > 2 || __GLIBC_MINOR__ >= 10)
 
diff --git a/src/fns.c b/src/fns.c
index f8191d72443..51bc7488457 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -6651,6 +6651,18 @@ DEFUN ("define-hash-table-test", Fdefine_hash_table_test,
   return Fput (name, Qhash_table_test, list2 (test, hash));
 }
 
+DEFUN ("hash-table-truncate", Fhash_table_truncate,
+       Shash_table_truncate, 1, 1, 0,
+       doc: /* Indicate that TABLE is unlikely to grow further.
+
+Returns TABLE, with its internal structures potentially reduced to fit
+the current number of elements precisely.  This function is a
+performance optimization.  */)
+  (Lisp_Object table)
+{
+  return table;
+}
+
 DEFUN ("internal--hash-table-histogram",
        Finternal__hash_table_histogram,
        Sinternal__hash_table_histogram,
@@ -7408,6 +7420,7 @@ syms_of_fns (void)
   defsubr (&Shash_table_rehash_threshold);
   defsubr (&Shash_table_size);
   defsubr (&Shash_table_test);
+  defsubr (&Shash_table_truncate);
   defsubr (&Shash_table_weakness);
   defsubr (&Shash_table_p);
   defsubr (&Sclrhash);
diff --git a/src/json.c b/src/json.c
index 0e17b893087..dd8e45003c4 100644
--- a/src/json.c
+++ b/src/json.c
@@ -535,7 +535,7 @@ DEFUN ("json-insert", Fjson_insert, Sjson_insert, 1, MANY,
 }
 
 
-#define JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE 64
+#define JSON_PARSER_INITIAL_VECTOR_SIZE 64
 #define JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE 512
 
 struct json_parser
@@ -565,22 +565,8 @@ #define JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE 512
 
   size_t additional_bytes_count;
 
-  /* Lisp_Objects are collected in this area during object/array
-     parsing.  To avoid allocations, initially
-     internal_object_workspace is used.  If it runs out of space then
-     we switch to allocated space.  Important note: with this design,
-     GC must not run during JSON parsing, otherwise Lisp_Objects in
-     the workspace may get incorrectly collected. */
-  Lisp_Object internal_object_workspace
-  [JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE];
-  Lisp_Object *object_workspace;
-  Lisp_Object object_workspace_vector;
-  size_t object_workspace_size;
-  size_t object_workspace_current;
-
-  /* String and number parsing uses this workspace.  The idea behind
-     internal_byte_workspace is the same as the idea behind
-     internal_object_workspace */
+  /* String and number parsing uses this workspace.  It starts out on
+     the stack, but moves to the heap if more space is needed.  */
   unsigned char
   internal_byte_workspace[JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE];
   unsigned char *byte_workspace;
@@ -638,12 +624,6 @@ json_parser_init (struct json_parser *parser,
 
   parser->additional_bytes_count = 0;
 
-  parser->object_workspace = parser->internal_object_workspace;
-  parser->object_workspace_size
-    = JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE;
-  parser->object_workspace_current = 0;
-  parser->object_workspace_vector = Qnil;
-
   parser->byte_workspace = parser->internal_byte_workspace;
   parser->byte_workspace_end = (parser->byte_workspace
 				+ JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE);
@@ -657,50 +637,6 @@ json_parser_done (void *parser)
     xfree (p->byte_workspace);
 }
 
-/* Makes sure that the object_workspace has 'size' available space for
-   Lisp_Objects */
-NO_INLINE static void
-json_make_object_workspace_for_slow_path (struct json_parser *parser,
-					  size_t size)
-{
-  if (NILP (parser->object_workspace_vector))
-    {
-      parser->object_workspace_vector =
-	Fvector(parser->object_workspace_current, parser->object_workspace);
-    }
-  size_t needed_workspace_size
-    = (parser->object_workspace_current + size);
-  size_t new_workspace_size = parser->object_workspace_size;
-  while (new_workspace_size < needed_workspace_size)
-    {
-      if (ckd_mul (&new_workspace_size, new_workspace_size, 2))
-	{
-	  json_signal_error (parser, Qjson_out_of_memory);
-	}
-    }
-
-  Lisp_Object new_workspace_vector =
-    larger_vector (parser->object_workspace_vector,
-		   new_workspace_size - parser->object_workspace_size, -1);
-
-  Lisp_Object *new_workspace_ptr = XVECTOR (new_workspace_vector)->contents;
-
-  parser->object_workspace_vector = new_workspace_vector;
-  parser->object_workspace = new_workspace_ptr;
-  parser->object_workspace_size = new_workspace_size;
-}
-
-INLINE void
-json_make_object_workspace_for (struct json_parser *parser,
-				size_t size)
-{
-  if (parser->object_workspace_size - parser->object_workspace_current
-      < size)
-    {
-      json_make_object_workspace_for_slow_path (parser, size);
-    }
-}
-
 static void
 json_byte_workspace_reset (struct json_parser *parser)
 {
@@ -1258,10 +1194,18 @@ json_parse_number (struct json_parser *parser, int c)
 json_parse_array (struct json_parser *parser)
 {
   int c = json_skip_whitespace (parser);
-
-  const size_t first = parser->object_workspace_current;
   Lisp_Object result = Qnil;
+  ptrdiff_t index = 0;
 
+  switch (parser->conf.array_type)
+    {
+    case json_array_array:
+      result = make_vector (0, Qnil);
+      break;
+
+    default:
+      break;
+    }
   if (c != ']')
     {
       parser->available_depth--;
@@ -1277,10 +1221,15 @@ json_parse_array (struct json_parser *parser)
 	  switch (parser->conf.array_type)
 	    {
 	    case json_array_array:
-	      json_make_object_workspace_for (parser, 1);
-	      parser->object_workspace[parser->object_workspace_current]
-		= element;
-	      parser->object_workspace_current++;
+	      if (index == ASIZE (result))
+		{
+		  ptrdiff_t incr = ASIZE (result);
+		  if (incr == 0)
+		    incr = JSON_PARSER_INITIAL_VECTOR_SIZE;
+		  result = larger_vector (result, incr, -1);
+		}
+	      ASET (result, index, element);
+	      index++;
 	      break;
 	    case json_array_list:
 	      {
@@ -1310,18 +1259,8 @@ json_parse_array (struct json_parser *parser)
   switch (parser->conf.array_type)
     {
     case json_array_array:
-      {
-	size_t number_of_elements
-	  = parser->object_workspace_current - first;
-	result = make_vector (number_of_elements, Qnil);
-	for (size_t i = 0; i < number_of_elements; i++)
-	  {
-	    rarely_quit (~i);
-	    ASET (result, i, parser->object_workspace[first + i]);
-	  }
-	parser->object_workspace_current = first;
-	break;
-      }
+      result = Fvtruncate (result, make_fixnum (index));
+      break;
     case json_array_list:
       break;
     default:
@@ -1349,10 +1288,18 @@ json_parse_object_member_value (struct json_parser *parser)
 json_parse_object (struct json_parser *parser)
 {
   int c = json_skip_whitespace (parser);
-
-  const size_t first = parser->object_workspace_current;
   Lisp_Object result = Qnil;
 
+  switch (parser->conf.object_type)
+    {
+    case json_object_hashtable:
+      result = CALLN (Fmake_hash_table, QCtest, Qequal);
+      break;
+
+    default:
+      break;
+    }
+
   if (c != '}')
     {
       parser->available_depth--;
@@ -1374,11 +1321,7 @@ json_parse_object (struct json_parser *parser)
 	      {
 		Lisp_Object key = json_parse_string (parser, false, false);
 		Lisp_Object value = json_parse_object_member_value (parser);
-		json_make_object_workspace_for (parser, 2);
-		parser->object_workspace[parser->object_workspace_current] = key;
-		parser->object_workspace_current++;
-		parser->object_workspace[parser->object_workspace_current] = value;
-		parser->object_workspace_current++;
+		Fputhash (key, value, result);
 		break;
 	      }
 	    case json_object_alist:
@@ -1426,21 +1369,7 @@ json_parse_object (struct json_parser *parser)
     {
     case json_object_hashtable:
       {
-	EMACS_INT value = (parser->object_workspace_current - first) / 2;
-	result = make_hash_table (&hashtest_equal, value, Weak_None, false);
-	struct Lisp_Hash_Table *h = XHASH_TABLE (result);
-	for (size_t i = first; i < parser->object_workspace_current; i += 2)
-	  {
-	    hash_hash_t hash;
-	    Lisp_Object key = parser->object_workspace[i];
-	    Lisp_Object value = parser->object_workspace[i + 1];
-	    ptrdiff_t i = hash_lookup_get_hash (h, key, &hash);
-	    if (i < 0)
-	      hash_put (h, key, value, hash);
-	    else
-	      set_hash_value_slot (h, i, value);
-	  }
-	parser->object_workspace_current = first;
+	result = Fhash_table_truncate (result);
 	break;
       }
     case json_object_alist:
-- 
2.47.0






  parent reply	other threads:[~2024-12-01 21:15 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-26 18:35 bug#74547: 31.0.50; igc: assertion failed in buffer.c Óscar Fuentes
2024-11-27  6:54 ` Gerd Möllmann
2024-12-01 10:49   ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 12:05     ` Gerd Möllmann
2024-12-01 12:17       ` Gerd Möllmann
2024-12-01 12:30         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 12:39           ` Gerd Möllmann
2024-12-01 12:57             ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 13:30               ` Gerd Möllmann
2024-12-01 14:58                 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 15:18                   ` Gerd Möllmann
2024-12-01 15:48                     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 16:32                       ` Geza Herman
2024-12-01 19:41                         ` Gerd Möllmann
2024-12-01 21:15                         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2024-12-01 15:55                     ` Eli Zaretskii
2024-12-01 15:23                   ` Eli Zaretskii
2024-12-01 15:30                   ` Óscar Fuentes
2024-12-01 15:48                     ` Gerd Möllmann
2024-12-01 15:58                     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 16:24                       ` Óscar Fuentes
2024-12-01 13:18         ` Óscar Fuentes
2024-12-01 13:44           ` Gerd Möllmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cyibksuu.fsf@protonmail.com \
    --to=bug-gnu-emacs@gnu.org \
    --cc=74547@debbugs.gnu.org \
    --cc=gerd.moellmann@gmail.com \
    --cc=geza.herman@gmail.com \
    --cc=oscarfv@telefonica.net \
    --cc=pipcet@protonmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).