unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: "Pip Cet" <pipcet@protonmail.com>,
	"Mattias Engdegård" <mattiase@acm.org>,
	"Géza Herman" <geza.herman@gmail.com>
Cc: gerd.moellmann@gmail.com, 74547@debbugs.gnu.org,
	oscarfv@telefonica.net, geza.herman@gmail.com
Subject: bug#74547: 31.0.50; igc: assertion failed in buffer.c
Date: Sun, 01 Dec 2024 17:23:45 +0200	[thread overview]
Message-ID: <86wmgj4ebi.fsf@gnu.org> (raw)
In-Reply-To: <87wmgjlabu.fsf@protonmail.com> (bug-gnu-emacs@gnu.org)

> Cc: 74547@debbugs.gnu.org,
>  Óscar Fuentes <oscarfv@telefonica.net>, geza.herman@gmail.com
> Date: Sun, 01 Dec 2024 14:58:10 +0000
> From:  Pip Cet via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
> > Pip Cet <pipcet@protonmail.com> writes:
> 
> >> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> >>> Pip Cet <pipcet@protonmail.com> writes:
> >>> Yes, exactly, json.c. First thing I saw when searching for xfree
> >>>
> >>> static void
> >>> json_parser_done (void *parser)
> >>> {
> >>>   struct json_parser *p = (struct json_parser *) parser;
> >>>   if (p->object_workspace != p->internal_object_workspace)
> >>>     xfree (p->object_workspace);
> >>>
> >>> That at least needs an explanation. I would have expected it to be
> >>> allocated as root.
> >>
> >> Well, the explanation is this comment:
> >>
> >>   /* Lisp_Objects are collected in this area during object/array
> >>      parsing.  To avoid allocations, initially
> >>      internal_object_workspace is used.  If it runs out of space then
> >>      we switch to allocated space.  Important note: with this design,
> >>      GC must not run during JSON parsing, otherwise Lisp_Objects in
> >>      the workspace may get incorrectly collected. */
> >
> > That explains it, indeed :-(.
> 
> Just to be clear, I think the mixed heap/stack allocation is the right
> thing to do here, but we need to let both garbage collectors know about
> the Lisp_Objects we allocated.
> 
> I think the best way to do that is to use a Lisp_Vector when we run out
> of stack space. That way, we don't have to worry about forgetting to GC
> it, and we can use standard functions rather than rolling our own.
> 
> >> Obviously, we cannot make any such guarantees when MPS is in use. (I
> >> don't think we can make the guarantee when MPS is not in use, but I'm
> >> not totally certain; we certainly allocate strings while parsing JSON,
> >> which is sufficient to trigger GC in the MPS case).
> >
> > If json.c calls something like maybe_quit, which I's expect it must,
> > then GC can indeed happen. See bug#56108 for an example in the regexp
> > code found with ASAN. It's not as risky in the old code as with
> > concurrent GC, but anyway.
> 
> There's a rarely_quit in json_parse_array, which, AFAICS, always
> triggers in the first loop iteration (when i == 0), but probably never
> reaches 65536 for the second trigger.
> 
> My proposal is to modify json.c so it uses a lisp vector if more than 64
> objects are needed, and to remove the home-grown symset hash set,
> replacing it by a standard hash table.
> 
> Note that the symset is only used to detect duplicate JSON keys. When
> such duplication is detected, we simply ignore the second plist entry.
> (I think it would be better to throw an error, but the documentation
> disagrees.)
> 
> So here's the patch with the old behavior, where
> 
>   (json-serialize '(a "test" a "ignored"))
> 
> doesn't throw an error and simply returns
> 
> "{\"a\":\"test\"}"

Adding Mattias and Géza, who were involved in implementing json.c.

> commit 85fbd342d3b4a8afabe8078e19be9b45fe3e20d2
> Author: Pip Cet <pipcet@protonmail.com>
> Date:   Sun Dec 1 12:46:08 2024 +0000
> 
>     Use standard Lisp objects in json.c (bug#74547)
>     
>     * src/json.c (json_out_t): Make the symset table a Lisp_Object.
>     (symset_t):
>     (pop_symset):
>     (cleanup_symset_tables):
>     (symset_hash):
>     (symset_expand):
>     (symset_size): Remove.
>     (make_symset_table): Use an ordinary hash table for the symset.
>     (push_symset): Don't return a value.
>     (symset_add): Use ordinary hash table accessors.
>     (cleanup_json_out): Remove.
>     (json_out_object_cons): Use ordinary hash table for symsets.
>     (json_serialize):
>     (json_parser_init):
>     (json_parser_done): Adjust to use ordinary hash table code.
>     (json_make_object_workspace_for_slow_path): Use an ordinary vector for
>     the workspace.
>     (json_parse_array): Avoid calling rarely_quit(0)
>     (json_parser_done): Remove manual memory management.
> 
> diff --git a/src/json.c b/src/json.c
> index eb446f5c221..0e17b893087 100644
> --- a/src/json.c
> +++ b/src/json.c
> @@ -28,7 +28,6 @@ Copyright (C) 2017-2024 Free Software Foundation, Inc.
>  #include "lisp.h"
>  #include "buffer.h"
>  #include "coding.h"
> -#include "igc.h"
>  
>  enum json_object_type
>    {
> @@ -111,161 +110,9 @@ json_parse_args (ptrdiff_t nargs, Lisp_Object *args,
>    ptrdiff_t chars_delta;      /* size - {number of characters in buf} */
>  
>    int maxdepth;
> -  struct symset_tbl *ss_table;	/* table used by containing object */
>    struct json_configuration conf;
>  } json_out_t;
>  
> -/* Set of symbols.  */
> -typedef struct
> -{
> -  ptrdiff_t count;		/* symbols in table */
> -  int bits;			/* log2(table size) */
> -  struct symset_tbl *table;	/* heap-allocated table */
> -} symset_t;
> -
> -struct symset_tbl
> -{
> -  /* Table used by the containing object if any, so that we can free all
> -     tables if an error occurs.  */
> -  struct symset_tbl *up;
> -  /* Table of symbols (2**bits elements), Qunbound where unused.  */
> -  Lisp_Object entries[];
> -};
> -
> -static inline ptrdiff_t
> -symset_size (int bits)
> -{
> -  return (ptrdiff_t) 1 << bits;
> -}
> -
> -static struct symset_tbl *
> -make_symset_table (int bits, struct symset_tbl *up)
> -{
> -  int maxbits = min (SIZE_WIDTH - 2 - (word_size < 8 ? 2 : 3), 32);
> -  if (bits > maxbits)
> -    memory_full (PTRDIFF_MAX);	/* Will never happen in practice.  */
> -#ifdef HAVE_MPS
> -  struct symset_tbl *st = igc_xzalloc_ambig (sizeof *st + (sizeof *st->entries << bits));
> -#else
> -  struct symset_tbl *st = xmalloc (sizeof *st + (sizeof *st->entries << bits));
> -#endif
> -  st->up = up;
> -  ptrdiff_t size = symset_size (bits);
> -  for (ptrdiff_t i = 0; i < size; i++)
> -    st->entries[i] = Qunbound;
> -  return st;
> -}
> -
> -/* Create a new symset to use for a new object.  */
> -static symset_t
> -push_symset (json_out_t *jo)
> -{
> -  int bits = 4;
> -  struct symset_tbl *tbl = make_symset_table (bits, jo->ss_table);
> -  jo->ss_table = tbl;
> -  return (symset_t){ .count = 0, .bits = bits, .table = tbl };
> -}
> -
> -/* Destroy the current symset.  */
> -static void
> -pop_symset (json_out_t *jo, symset_t *ss)
> -{
> -  jo->ss_table = ss->table->up;
> -#ifdef HAVE_MPS
> -  igc_xfree (ss->table);
> -#else
> -  xfree (ss->table);
> -#endif
> -}
> -
> -/* Remove all heap-allocated symset tables, in case an error occurred.  */
> -static void
> -cleanup_symset_tables (struct symset_tbl *st)
> -{
> -  while (st)
> -    {
> -      struct symset_tbl *up = st->up;
> -#ifdef HAVE_MPS
> -      igc_xfree (st);
> -#else
> -      xfree (st);
> -#endif
> -      st = up;
> -    }
> -}
> -
> -static inline uint32_t
> -symset_hash (Lisp_Object sym, int bits)
> -{
> -  EMACS_UINT hash;
> -#ifdef HAVE_MPS
> -  hash = igc_hash (sym);
> -#else
> -  hash = XHASH (sym);
> -#endif
> -  return knuth_hash (reduce_emacs_uint_to_hash_hash (hash), bits);
> -}
> -
> -/* Enlarge the table used by a symset.  */
> -static NO_INLINE void
> -symset_expand (symset_t *ss)
> -{
> -  struct symset_tbl *old_table = ss->table;
> -  int oldbits = ss->bits;
> -  ptrdiff_t oldsize = symset_size (oldbits);
> -  int bits = oldbits + 1;
> -  ss->bits = bits;
> -  ss->table = make_symset_table (bits, old_table->up);
> -  /* Move all entries from the old table to the new one.  */
> -  ptrdiff_t mask = symset_size (bits) - 1;
> -  struct symset_tbl *tbl = ss->table;
> -  for (ptrdiff_t i = 0; i < oldsize; i++)
> -    {
> -      Lisp_Object sym = old_table->entries[i];
> -      if (!BASE_EQ (sym, Qunbound))
> -	{
> -	  ptrdiff_t j = symset_hash (sym, bits);
> -	  while (!BASE_EQ (tbl->entries[j], Qunbound))
> -	    j = (j + 1) & mask;
> -	  tbl->entries[j] = sym;
> -	}
> -    }
> -#ifdef HAVE_MPS
> -  igc_xfree (old_table);
> -#else
> -  xfree (old_table);
> -#endif
> -}
> -
> -/* If sym is in ss, return false; otherwise add it and return true.
> -   Comparison is done by strict identity.  */
> -static inline bool
> -symset_add (json_out_t *jo, symset_t *ss, Lisp_Object sym)
> -{
> -  /* Make sure we don't fill more than half of the table.  */
> -  if (ss->count >= (symset_size (ss->bits) >> 1))
> -    {
> -      symset_expand (ss);
> -      jo->ss_table = ss->table;
> -    }
> -
> -  struct symset_tbl *tbl = ss->table;
> -  ptrdiff_t mask = symset_size (ss->bits) - 1;
> -  for (ptrdiff_t i = symset_hash (sym, ss->bits); ; i = (i + 1) & mask)
> -    {
> -      Lisp_Object s = tbl->entries[i];
> -      if (BASE_EQ (s, sym))
> -	return false;		/* Previous occurrence found.  */
> -      if (BASE_EQ (s, Qunbound))
> -	{
> -	  /* Not in set, add it.  */
> -	  tbl->entries[i] = sym;
> -	  ss->count++;
> -	  return true;
> -	}
> -    }
> -}
> -
>  static NO_INLINE void
>  json_out_grow_buf (json_out_t *jo, ptrdiff_t bytes)
>  {
> @@ -283,7 +130,6 @@ cleanup_json_out (void *arg)
>    json_out_t *jo = arg;
>    xfree (jo->buf);
>    jo->buf = NULL;
> -  cleanup_symset_tables (jo->ss_table);
>  }
>  
>  /* Make room for `bytes` more bytes in buffer.  */
> @@ -442,8 +288,8 @@ json_out_unnest (json_out_t *jo)
>  static void
>  json_out_object_cons (json_out_t *jo, Lisp_Object obj)
>  {
> +  Lisp_Object symset = CALLN (Fmake_hash_table, QCtest, Qeq);
>    json_out_nest (jo);
> -  symset_t ss = push_symset (jo);
>    json_out_byte (jo, '{');
>    bool is_alist = CONSP (XCAR (obj));
>    bool first = true;
> @@ -469,8 +315,9 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj)
>        key = maybe_remove_pos_from_symbol (key);
>        CHECK_TYPE (BARE_SYMBOL_P (key), Qsymbolp, key);
>  
> -      if (symset_add (jo, &ss, key))
> +      if (NILP (Fgethash (key, symset, Qnil)))
>  	{
> +	  Fputhash (key, Qt, symset);
>  	  if (!first)
>  	    json_out_byte (jo, ',');
>  	  first = false;
> @@ -486,7 +333,6 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj)
>      }
>    CHECK_LIST_END (tail, obj);
>    json_out_byte (jo, '}');
> -  pop_symset (jo, &ss);
>    json_out_unnest (jo);
>  }
>  
> @@ -591,7 +437,6 @@ json_serialize (json_out_t *jo, Lisp_Object object,
>    jo->capacity = 0;
>    jo->chars_delta = 0;
>    jo->buf = NULL;
> -  jo->ss_table = NULL;
>    jo->conf.object_type = json_object_hashtable;
>    jo->conf.array_type = json_array_array;
>    jo->conf.null_object = QCnull;
> @@ -729,6 +574,7 @@ #define JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE 512
>    Lisp_Object internal_object_workspace
>    [JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE];
>    Lisp_Object *object_workspace;
> +  Lisp_Object object_workspace_vector;
>    size_t object_workspace_size;
>    size_t object_workspace_current;
>  
> @@ -796,6 +642,7 @@ json_parser_init (struct json_parser *parser,
>    parser->object_workspace_size
>      = JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE;
>    parser->object_workspace_current = 0;
> +  parser->object_workspace_vector = Qnil;
>  
>    parser->byte_workspace = parser->internal_byte_workspace;
>    parser->byte_workspace_end = (parser->byte_workspace
> @@ -806,8 +653,6 @@ json_parser_init (struct json_parser *parser,
>  json_parser_done (void *parser)
>  {
>    struct json_parser *p = (struct json_parser *) parser;
> -  if (p->object_workspace != p->internal_object_workspace)
> -    xfree (p->object_workspace);
>    if (p->byte_workspace != p->internal_byte_workspace)
>      xfree (p->byte_workspace);
>  }
> @@ -818,6 +663,11 @@ json_parser_done (void *parser)
>  json_make_object_workspace_for_slow_path (struct json_parser *parser,
>  					  size_t size)
>  {
> +  if (NILP (parser->object_workspace_vector))
> +    {
> +      parser->object_workspace_vector =
> +	Fvector(parser->object_workspace_current, parser->object_workspace);
> +    }
>    size_t needed_workspace_size
>      = (parser->object_workspace_current + size);
>    size_t new_workspace_size = parser->object_workspace_size;
> @@ -829,23 +679,13 @@ json_make_object_workspace_for_slow_path (struct json_parser *parser,
>  	}
>      }
>  
> -  Lisp_Object *new_workspace_ptr;
> -  if (parser->object_workspace_size
> -      == JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE)
> -    {
> -      new_workspace_ptr
> -	= xnmalloc (new_workspace_size, sizeof (Lisp_Object));
> -      memcpy (new_workspace_ptr, parser->object_workspace,
> -	      (sizeof (Lisp_Object)
> -	       * parser->object_workspace_current));
> -    }
> -  else
> -    {
> -      new_workspace_ptr
> -	= xnrealloc (parser->object_workspace, new_workspace_size,
> -		     sizeof (Lisp_Object));
> -    }
> +  Lisp_Object new_workspace_vector =
> +    larger_vector (parser->object_workspace_vector,
> +		   new_workspace_size - parser->object_workspace_size, -1);
> +
> +  Lisp_Object *new_workspace_ptr = XVECTOR (new_workspace_vector)->contents;
>  
> +  parser->object_workspace_vector = new_workspace_vector;
>    parser->object_workspace = new_workspace_ptr;
>    parser->object_workspace_size = new_workspace_size;
>  }
> @@ -1476,7 +1316,7 @@ json_parse_array (struct json_parser *parser)
>  	result = make_vector (number_of_elements, Qnil);
>  	for (size_t i = 0; i < number_of_elements; i++)
>  	  {
> -	    rarely_quit (i);
> +	    rarely_quit (~i);
>  	    ASET (result, i, parser->object_workspace[first + i]);
>  	  }
>  	parser->object_workspace_current = first;
> 
> 
> 
> 
> 





  parent reply	other threads:[~2024-12-01 15:23 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-26 18:35 bug#74547: 31.0.50; igc: assertion failed in buffer.c Óscar Fuentes
2024-11-27  6:54 ` Gerd Möllmann
2024-12-01 10:49   ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 12:05     ` Gerd Möllmann
2024-12-01 12:17       ` Gerd Möllmann
2024-12-01 12:30         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 12:39           ` Gerd Möllmann
2024-12-01 12:57             ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 13:30               ` Gerd Möllmann
2024-12-01 14:58                 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 15:18                   ` Gerd Möllmann
2024-12-01 15:48                     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 16:32                       ` Geza Herman
2024-12-01 19:41                         ` Gerd Möllmann
2024-12-01 21:15                         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 15:55                     ` Eli Zaretskii
2024-12-01 15:23                   ` Eli Zaretskii [this message]
2024-12-01 15:30                   ` Óscar Fuentes
2024-12-01 15:48                     ` Gerd Möllmann
2024-12-01 15:58                     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 16:24                       ` Óscar Fuentes
2024-12-01 13:18         ` Óscar Fuentes
2024-12-01 13:44           ` Gerd Möllmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86wmgj4ebi.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=74547@debbugs.gnu.org \
    --cc=gerd.moellmann@gmail.com \
    --cc=geza.herman@gmail.com \
    --cc=mattiase@acm.org \
    --cc=oscarfv@telefonica.net \
    --cc=pipcet@protonmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).