unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Pip Cet via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: "Gerd Möllmann" <gerd.moellmann@gmail.com>
Cc: 74547@debbugs.gnu.org, "Óscar Fuentes" <oscarfv@telefonica.net>,
	geza.herman@gmail.com
Subject: bug#74547: 31.0.50; igc: assertion failed in buffer.c
Date: Sun, 01 Dec 2024 14:58:10 +0000	[thread overview]
Message-ID: <87wmgjlabu.fsf@protonmail.com> (raw)
In-Reply-To: <m2h67nv8ci.fsf@gmail.com>

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:

>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>> Pip Cet <pipcet@protonmail.com> writes:
>>> Yes, exactly, json.c. First thing I saw when searching for xfree
>>>
>>> static void
>>> json_parser_done (void *parser)
>>> {
>>>   struct json_parser *p = (struct json_parser *) parser;
>>>   if (p->object_workspace != p->internal_object_workspace)
>>>     xfree (p->object_workspace);
>>>
>>> That at least needs an explanation. I would have expected it to be
>>> allocated as root.
>>
>> Well, the explanation is this comment:
>>
>>   /* Lisp_Objects are collected in this area during object/array
>>      parsing.  To avoid allocations, initially
>>      internal_object_workspace is used.  If it runs out of space then
>>      we switch to allocated space.  Important note: with this design,
>>      GC must not run during JSON parsing, otherwise Lisp_Objects in
>>      the workspace may get incorrectly collected. */
>
> That explains it, indeed :-(.

Just to be clear, I think the mixed heap/stack allocation is the right
thing to do here, but we need to let both garbage collectors know about
the Lisp_Objects we allocated.

I think the best way to do that is to use a Lisp_Vector when we run out
of stack space. That way, we don't have to worry about forgetting to GC
it, and we can use standard functions rather than rolling our own.

>> Obviously, we cannot make any such guarantees when MPS is in use. (I
>> don't think we can make the guarantee when MPS is not in use, but I'm
>> not totally certain; we certainly allocate strings while parsing JSON,
>> which is sufficient to trigger GC in the MPS case).
>
> If json.c calls something like maybe_quit, which I's expect it must,
> then GC can indeed happen. See bug#56108 for an example in the regexp
> code found with ASAN. It's not as risky in the old code as with
> concurrent GC, but anyway.

There's a rarely_quit in json_parse_array, which, AFAICS, always
triggers in the first loop iteration (when i == 0), but probably never
reaches 65536 for the second trigger.

My proposal is to modify json.c so it uses a lisp vector if more than 64
objects are needed, and to remove the home-grown symset hash set,
replacing it by a standard hash table.

Note that the symset is only used to detect duplicate JSON keys. When
such duplication is detected, we simply ignore the second plist entry.
(I think it would be better to throw an error, but the documentation
disagrees.)

So here's the patch with the old behavior, where

  (json-serialize '(a "test" a "ignored"))

doesn't throw an error and simply returns

"{\"a\":\"test\"}"

commit 85fbd342d3b4a8afabe8078e19be9b45fe3e20d2
Author: Pip Cet <pipcet@protonmail.com>
Date:   Sun Dec 1 12:46:08 2024 +0000

    Use standard Lisp objects in json.c (bug#74547)
    
    * src/json.c (json_out_t): Make the symset table a Lisp_Object.
    (symset_t):
    (pop_symset):
    (cleanup_symset_tables):
    (symset_hash):
    (symset_expand):
    (symset_size): Remove.
    (make_symset_table): Use an ordinary hash table for the symset.
    (push_symset): Don't return a value.
    (symset_add): Use ordinary hash table accessors.
    (cleanup_json_out): Remove.
    (json_out_object_cons): Use ordinary hash table for symsets.
    (json_serialize):
    (json_parser_init):
    (json_parser_done): Adjust to use ordinary hash table code.
    (json_make_object_workspace_for_slow_path): Use an ordinary vector for
    the workspace.
    (json_parse_array): Avoid calling rarely_quit(0)
    (json_parser_done): Remove manual memory management.

diff --git a/src/json.c b/src/json.c
index eb446f5c221..0e17b893087 100644
--- a/src/json.c
+++ b/src/json.c
@@ -28,7 +28,6 @@ Copyright (C) 2017-2024 Free Software Foundation, Inc.
 #include "lisp.h"
 #include "buffer.h"
 #include "coding.h"
-#include "igc.h"
 
 enum json_object_type
   {
@@ -111,161 +110,9 @@ json_parse_args (ptrdiff_t nargs, Lisp_Object *args,
   ptrdiff_t chars_delta;      /* size - {number of characters in buf} */
 
   int maxdepth;
-  struct symset_tbl *ss_table;	/* table used by containing object */
   struct json_configuration conf;
 } json_out_t;
 
-/* Set of symbols.  */
-typedef struct
-{
-  ptrdiff_t count;		/* symbols in table */
-  int bits;			/* log2(table size) */
-  struct symset_tbl *table;	/* heap-allocated table */
-} symset_t;
-
-struct symset_tbl
-{
-  /* Table used by the containing object if any, so that we can free all
-     tables if an error occurs.  */
-  struct symset_tbl *up;
-  /* Table of symbols (2**bits elements), Qunbound where unused.  */
-  Lisp_Object entries[];
-};
-
-static inline ptrdiff_t
-symset_size (int bits)
-{
-  return (ptrdiff_t) 1 << bits;
-}
-
-static struct symset_tbl *
-make_symset_table (int bits, struct symset_tbl *up)
-{
-  int maxbits = min (SIZE_WIDTH - 2 - (word_size < 8 ? 2 : 3), 32);
-  if (bits > maxbits)
-    memory_full (PTRDIFF_MAX);	/* Will never happen in practice.  */
-#ifdef HAVE_MPS
-  struct symset_tbl *st = igc_xzalloc_ambig (sizeof *st + (sizeof *st->entries << bits));
-#else
-  struct symset_tbl *st = xmalloc (sizeof *st + (sizeof *st->entries << bits));
-#endif
-  st->up = up;
-  ptrdiff_t size = symset_size (bits);
-  for (ptrdiff_t i = 0; i < size; i++)
-    st->entries[i] = Qunbound;
-  return st;
-}
-
-/* Create a new symset to use for a new object.  */
-static symset_t
-push_symset (json_out_t *jo)
-{
-  int bits = 4;
-  struct symset_tbl *tbl = make_symset_table (bits, jo->ss_table);
-  jo->ss_table = tbl;
-  return (symset_t){ .count = 0, .bits = bits, .table = tbl };
-}
-
-/* Destroy the current symset.  */
-static void
-pop_symset (json_out_t *jo, symset_t *ss)
-{
-  jo->ss_table = ss->table->up;
-#ifdef HAVE_MPS
-  igc_xfree (ss->table);
-#else
-  xfree (ss->table);
-#endif
-}
-
-/* Remove all heap-allocated symset tables, in case an error occurred.  */
-static void
-cleanup_symset_tables (struct symset_tbl *st)
-{
-  while (st)
-    {
-      struct symset_tbl *up = st->up;
-#ifdef HAVE_MPS
-      igc_xfree (st);
-#else
-      xfree (st);
-#endif
-      st = up;
-    }
-}
-
-static inline uint32_t
-symset_hash (Lisp_Object sym, int bits)
-{
-  EMACS_UINT hash;
-#ifdef HAVE_MPS
-  hash = igc_hash (sym);
-#else
-  hash = XHASH (sym);
-#endif
-  return knuth_hash (reduce_emacs_uint_to_hash_hash (hash), bits);
-}
-
-/* Enlarge the table used by a symset.  */
-static NO_INLINE void
-symset_expand (symset_t *ss)
-{
-  struct symset_tbl *old_table = ss->table;
-  int oldbits = ss->bits;
-  ptrdiff_t oldsize = symset_size (oldbits);
-  int bits = oldbits + 1;
-  ss->bits = bits;
-  ss->table = make_symset_table (bits, old_table->up);
-  /* Move all entries from the old table to the new one.  */
-  ptrdiff_t mask = symset_size (bits) - 1;
-  struct symset_tbl *tbl = ss->table;
-  for (ptrdiff_t i = 0; i < oldsize; i++)
-    {
-      Lisp_Object sym = old_table->entries[i];
-      if (!BASE_EQ (sym, Qunbound))
-	{
-	  ptrdiff_t j = symset_hash (sym, bits);
-	  while (!BASE_EQ (tbl->entries[j], Qunbound))
-	    j = (j + 1) & mask;
-	  tbl->entries[j] = sym;
-	}
-    }
-#ifdef HAVE_MPS
-  igc_xfree (old_table);
-#else
-  xfree (old_table);
-#endif
-}
-
-/* If sym is in ss, return false; otherwise add it and return true.
-   Comparison is done by strict identity.  */
-static inline bool
-symset_add (json_out_t *jo, symset_t *ss, Lisp_Object sym)
-{
-  /* Make sure we don't fill more than half of the table.  */
-  if (ss->count >= (symset_size (ss->bits) >> 1))
-    {
-      symset_expand (ss);
-      jo->ss_table = ss->table;
-    }
-
-  struct symset_tbl *tbl = ss->table;
-  ptrdiff_t mask = symset_size (ss->bits) - 1;
-  for (ptrdiff_t i = symset_hash (sym, ss->bits); ; i = (i + 1) & mask)
-    {
-      Lisp_Object s = tbl->entries[i];
-      if (BASE_EQ (s, sym))
-	return false;		/* Previous occurrence found.  */
-      if (BASE_EQ (s, Qunbound))
-	{
-	  /* Not in set, add it.  */
-	  tbl->entries[i] = sym;
-	  ss->count++;
-	  return true;
-	}
-    }
-}
-
 static NO_INLINE void
 json_out_grow_buf (json_out_t *jo, ptrdiff_t bytes)
 {
@@ -283,7 +130,6 @@ cleanup_json_out (void *arg)
   json_out_t *jo = arg;
   xfree (jo->buf);
   jo->buf = NULL;
-  cleanup_symset_tables (jo->ss_table);
 }
 
 /* Make room for `bytes` more bytes in buffer.  */
@@ -442,8 +288,8 @@ json_out_unnest (json_out_t *jo)
 static void
 json_out_object_cons (json_out_t *jo, Lisp_Object obj)
 {
+  Lisp_Object symset = CALLN (Fmake_hash_table, QCtest, Qeq);
   json_out_nest (jo);
-  symset_t ss = push_symset (jo);
   json_out_byte (jo, '{');
   bool is_alist = CONSP (XCAR (obj));
   bool first = true;
@@ -469,8 +315,9 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj)
       key = maybe_remove_pos_from_symbol (key);
       CHECK_TYPE (BARE_SYMBOL_P (key), Qsymbolp, key);
 
-      if (symset_add (jo, &ss, key))
+      if (NILP (Fgethash (key, symset, Qnil)))
 	{
+	  Fputhash (key, Qt, symset);
 	  if (!first)
 	    json_out_byte (jo, ',');
 	  first = false;
@@ -486,7 +333,6 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj)
     }
   CHECK_LIST_END (tail, obj);
   json_out_byte (jo, '}');
-  pop_symset (jo, &ss);
   json_out_unnest (jo);
 }
 
@@ -591,7 +437,6 @@ json_serialize (json_out_t *jo, Lisp_Object object,
   jo->capacity = 0;
   jo->chars_delta = 0;
   jo->buf = NULL;
-  jo->ss_table = NULL;
   jo->conf.object_type = json_object_hashtable;
   jo->conf.array_type = json_array_array;
   jo->conf.null_object = QCnull;
@@ -729,6 +574,7 @@ #define JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE 512
   Lisp_Object internal_object_workspace
   [JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE];
   Lisp_Object *object_workspace;
+  Lisp_Object object_workspace_vector;
   size_t object_workspace_size;
   size_t object_workspace_current;
 
@@ -796,6 +642,7 @@ json_parser_init (struct json_parser *parser,
   parser->object_workspace_size
     = JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE;
   parser->object_workspace_current = 0;
+  parser->object_workspace_vector = Qnil;
 
   parser->byte_workspace = parser->internal_byte_workspace;
   parser->byte_workspace_end = (parser->byte_workspace
@@ -806,8 +653,6 @@ json_parser_init (struct json_parser *parser,
 json_parser_done (void *parser)
 {
   struct json_parser *p = (struct json_parser *) parser;
-  if (p->object_workspace != p->internal_object_workspace)
-    xfree (p->object_workspace);
   if (p->byte_workspace != p->internal_byte_workspace)
     xfree (p->byte_workspace);
 }
@@ -818,6 +663,11 @@ json_parser_done (void *parser)
 json_make_object_workspace_for_slow_path (struct json_parser *parser,
 					  size_t size)
 {
+  if (NILP (parser->object_workspace_vector))
+    {
+      parser->object_workspace_vector =
+	Fvector(parser->object_workspace_current, parser->object_workspace);
+    }
   size_t needed_workspace_size
     = (parser->object_workspace_current + size);
   size_t new_workspace_size = parser->object_workspace_size;
@@ -829,23 +679,13 @@ json_make_object_workspace_for_slow_path (struct json_parser *parser,
 	}
     }
 
-  Lisp_Object *new_workspace_ptr;
-  if (parser->object_workspace_size
-      == JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE)
-    {
-      new_workspace_ptr
-	= xnmalloc (new_workspace_size, sizeof (Lisp_Object));
-      memcpy (new_workspace_ptr, parser->object_workspace,
-	      (sizeof (Lisp_Object)
-	       * parser->object_workspace_current));
-    }
-  else
-    {
-      new_workspace_ptr
-	= xnrealloc (parser->object_workspace, new_workspace_size,
-		     sizeof (Lisp_Object));
-    }
+  Lisp_Object new_workspace_vector =
+    larger_vector (parser->object_workspace_vector,
+		   new_workspace_size - parser->object_workspace_size, -1);
+
+  Lisp_Object *new_workspace_ptr = XVECTOR (new_workspace_vector)->contents;
 
+  parser->object_workspace_vector = new_workspace_vector;
   parser->object_workspace = new_workspace_ptr;
   parser->object_workspace_size = new_workspace_size;
 }
@@ -1476,7 +1316,7 @@ json_parse_array (struct json_parser *parser)
 	result = make_vector (number_of_elements, Qnil);
 	for (size_t i = 0; i < number_of_elements; i++)
 	  {
-	    rarely_quit (i);
+	    rarely_quit (~i);
 	    ASET (result, i, parser->object_workspace[first + i]);
 	  }
 	parser->object_workspace_current = first;






  reply	other threads:[~2024-12-01 14:58 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-26 18:35 bug#74547: 31.0.50; igc: assertion failed in buffer.c Óscar Fuentes
2024-11-27  6:54 ` Gerd Möllmann
2024-12-01 10:49   ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 12:05     ` Gerd Möllmann
2024-12-01 12:17       ` Gerd Möllmann
2024-12-01 12:30         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 12:39           ` Gerd Möllmann
2024-12-01 12:57             ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 13:30               ` Gerd Möllmann
2024-12-01 14:58                 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2024-12-01 15:18                   ` Gerd Möllmann
2024-12-01 15:48                     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 16:32                       ` Geza Herman
2024-12-01 19:41                         ` Gerd Möllmann
2024-12-01 21:15                         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 15:55                     ` Eli Zaretskii
2024-12-01 15:23                   ` Eli Zaretskii
2024-12-01 15:30                   ` Óscar Fuentes
2024-12-01 15:48                     ` Gerd Möllmann
2024-12-01 15:58                     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 16:24                       ` Óscar Fuentes
2024-12-01 13:18         ` Óscar Fuentes
2024-12-01 13:44           ` Gerd Möllmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wmgjlabu.fsf@protonmail.com \
    --to=bug-gnu-emacs@gnu.org \
    --cc=74547@debbugs.gnu.org \
    --cc=gerd.moellmann@gmail.com \
    --cc=geza.herman@gmail.com \
    --cc=oscarfv@telefonica.net \
    --cc=pipcet@protonmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).