From: Eli Zaretskii <eliz@gnu.org>
To: "Pip Cet" <pipcet@protonmail.com>,
"Mattias Engdegård" <mattiase@acm.org>,
"Géza Herman" <geza.herman@gmail.com>
Cc: gerd.moellmann@gmail.com, 74547@debbugs.gnu.org,
oscarfv@telefonica.net, geza.herman@gmail.com
Subject: bug#74547: 31.0.50; igc: assertion failed in buffer.c
Date: Sun, 01 Dec 2024 17:23:45 +0200 [thread overview]
Message-ID: <86wmgj4ebi.fsf@gnu.org> (raw)
In-Reply-To: <87wmgjlabu.fsf@protonmail.com> (bug-gnu-emacs@gnu.org)
> Cc: 74547@debbugs.gnu.org,
> Óscar Fuentes <oscarfv@telefonica.net>, geza.herman@gmail.com
> Date: Sun, 01 Dec 2024 14:58:10 +0000
> From: Pip Cet via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
> > Pip Cet <pipcet@protonmail.com> writes:
>
> >> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> >>> Pip Cet <pipcet@protonmail.com> writes:
> >>> Yes, exactly, json.c. First thing I saw when searching for xfree
> >>>
> >>> static void
> >>> json_parser_done (void *parser)
> >>> {
> >>> struct json_parser *p = (struct json_parser *) parser;
> >>> if (p->object_workspace != p->internal_object_workspace)
> >>> xfree (p->object_workspace);
> >>>
> >>> That at least needs an explanation. I would have expected it to be
> >>> allocated as root.
> >>
> >> Well, the explanation is this comment:
> >>
> >> /* Lisp_Objects are collected in this area during object/array
> >> parsing. To avoid allocations, initially
> >> internal_object_workspace is used. If it runs out of space then
> >> we switch to allocated space. Important note: with this design,
> >> GC must not run during JSON parsing, otherwise Lisp_Objects in
> >> the workspace may get incorrectly collected. */
> >
> > That explains it, indeed :-(.
>
> Just to be clear, I think the mixed heap/stack allocation is the right
> thing to do here, but we need to let both garbage collectors know about
> the Lisp_Objects we allocated.
>
> I think the best way to do that is to use a Lisp_Vector when we run out
> of stack space. That way, we don't have to worry about forgetting to GC
> it, and we can use standard functions rather than rolling our own.
>
> >> Obviously, we cannot make any such guarantees when MPS is in use. (I
> >> don't think we can make the guarantee when MPS is not in use, but I'm
> >> not totally certain; we certainly allocate strings while parsing JSON,
> >> which is sufficient to trigger GC in the MPS case).
> >
> > If json.c calls something like maybe_quit, which I's expect it must,
> > then GC can indeed happen. See bug#56108 for an example in the regexp
> > code found with ASAN. It's not as risky in the old code as with
> > concurrent GC, but anyway.
>
> There's a rarely_quit in json_parse_array, which, AFAICS, always
> triggers in the first loop iteration (when i == 0), but probably never
> reaches 65536 for the second trigger.
>
> My proposal is to modify json.c so it uses a lisp vector if more than 64
> objects are needed, and to remove the home-grown symset hash set,
> replacing it by a standard hash table.
>
> Note that the symset is only used to detect duplicate JSON keys. When
> such duplication is detected, we simply ignore the second plist entry.
> (I think it would be better to throw an error, but the documentation
> disagrees.)
>
> So here's the patch with the old behavior, where
>
> (json-serialize '(a "test" a "ignored"))
>
> doesn't throw an error and simply returns
>
> "{\"a\":\"test\"}"
Adding Mattias and Géza, who were involved in implementing json.c.
> commit 85fbd342d3b4a8afabe8078e19be9b45fe3e20d2
> Author: Pip Cet <pipcet@protonmail.com>
> Date: Sun Dec 1 12:46:08 2024 +0000
>
> Use standard Lisp objects in json.c (bug#74547)
>
> * src/json.c (json_out_t): Make the symset table a Lisp_Object.
> (symset_t):
> (pop_symset):
> (cleanup_symset_tables):
> (symset_hash):
> (symset_expand):
> (symset_size): Remove.
> (make_symset_table): Use an ordinary hash table for the symset.
> (push_symset): Don't return a value.
> (symset_add): Use ordinary hash table accessors.
> (cleanup_json_out): Remove.
> (json_out_object_cons): Use ordinary hash table for symsets.
> (json_serialize):
> (json_parser_init):
> (json_parser_done): Adjust to use ordinary hash table code.
> (json_make_object_workspace_for_slow_path): Use an ordinary vector for
> the workspace.
> (json_parse_array): Avoid calling rarely_quit(0)
> (json_parser_done): Remove manual memory management.
>
> diff --git a/src/json.c b/src/json.c
> index eb446f5c221..0e17b893087 100644
> --- a/src/json.c
> +++ b/src/json.c
> @@ -28,7 +28,6 @@ Copyright (C) 2017-2024 Free Software Foundation, Inc.
> #include "lisp.h"
> #include "buffer.h"
> #include "coding.h"
> -#include "igc.h"
>
> enum json_object_type
> {
> @@ -111,161 +110,9 @@ json_parse_args (ptrdiff_t nargs, Lisp_Object *args,
> ptrdiff_t chars_delta; /* size - {number of characters in buf} */
>
> int maxdepth;
> - struct symset_tbl *ss_table; /* table used by containing object */
> struct json_configuration conf;
> } json_out_t;
>
> -/* Set of symbols. */
> -typedef struct
> -{
> - ptrdiff_t count; /* symbols in table */
> - int bits; /* log2(table size) */
> - struct symset_tbl *table; /* heap-allocated table */
> -} symset_t;
> -
> -struct symset_tbl
> -{
> - /* Table used by the containing object if any, so that we can free all
> - tables if an error occurs. */
> - struct symset_tbl *up;
> - /* Table of symbols (2**bits elements), Qunbound where unused. */
> - Lisp_Object entries[];
> -};
> -
> -static inline ptrdiff_t
> -symset_size (int bits)
> -{
> - return (ptrdiff_t) 1 << bits;
> -}
> -
> -static struct symset_tbl *
> -make_symset_table (int bits, struct symset_tbl *up)
> -{
> - int maxbits = min (SIZE_WIDTH - 2 - (word_size < 8 ? 2 : 3), 32);
> - if (bits > maxbits)
> - memory_full (PTRDIFF_MAX); /* Will never happen in practice. */
> -#ifdef HAVE_MPS
> - struct symset_tbl *st = igc_xzalloc_ambig (sizeof *st + (sizeof *st->entries << bits));
> -#else
> - struct symset_tbl *st = xmalloc (sizeof *st + (sizeof *st->entries << bits));
> -#endif
> - st->up = up;
> - ptrdiff_t size = symset_size (bits);
> - for (ptrdiff_t i = 0; i < size; i++)
> - st->entries[i] = Qunbound;
> - return st;
> -}
> -
> -/* Create a new symset to use for a new object. */
> -static symset_t
> -push_symset (json_out_t *jo)
> -{
> - int bits = 4;
> - struct symset_tbl *tbl = make_symset_table (bits, jo->ss_table);
> - jo->ss_table = tbl;
> - return (symset_t){ .count = 0, .bits = bits, .table = tbl };
> -}
> -
> -/* Destroy the current symset. */
> -static void
> -pop_symset (json_out_t *jo, symset_t *ss)
> -{
> - jo->ss_table = ss->table->up;
> -#ifdef HAVE_MPS
> - igc_xfree (ss->table);
> -#else
> - xfree (ss->table);
> -#endif
> -}
> -
> -/* Remove all heap-allocated symset tables, in case an error occurred. */
> -static void
> -cleanup_symset_tables (struct symset_tbl *st)
> -{
> - while (st)
> - {
> - struct symset_tbl *up = st->up;
> -#ifdef HAVE_MPS
> - igc_xfree (st);
> -#else
> - xfree (st);
> -#endif
> - st = up;
> - }
> -}
> -
> -static inline uint32_t
> -symset_hash (Lisp_Object sym, int bits)
> -{
> - EMACS_UINT hash;
> -#ifdef HAVE_MPS
> - hash = igc_hash (sym);
> -#else
> - hash = XHASH (sym);
> -#endif
> - return knuth_hash (reduce_emacs_uint_to_hash_hash (hash), bits);
> -}
> -
> -/* Enlarge the table used by a symset. */
> -static NO_INLINE void
> -symset_expand (symset_t *ss)
> -{
> - struct symset_tbl *old_table = ss->table;
> - int oldbits = ss->bits;
> - ptrdiff_t oldsize = symset_size (oldbits);
> - int bits = oldbits + 1;
> - ss->bits = bits;
> - ss->table = make_symset_table (bits, old_table->up);
> - /* Move all entries from the old table to the new one. */
> - ptrdiff_t mask = symset_size (bits) - 1;
> - struct symset_tbl *tbl = ss->table;
> - for (ptrdiff_t i = 0; i < oldsize; i++)
> - {
> - Lisp_Object sym = old_table->entries[i];
> - if (!BASE_EQ (sym, Qunbound))
> - {
> - ptrdiff_t j = symset_hash (sym, bits);
> - while (!BASE_EQ (tbl->entries[j], Qunbound))
> - j = (j + 1) & mask;
> - tbl->entries[j] = sym;
> - }
> - }
> -#ifdef HAVE_MPS
> - igc_xfree (old_table);
> -#else
> - xfree (old_table);
> -#endif
> -}
> -
> -/* If sym is in ss, return false; otherwise add it and return true.
> - Comparison is done by strict identity. */
> -static inline bool
> -symset_add (json_out_t *jo, symset_t *ss, Lisp_Object sym)
> -{
> - /* Make sure we don't fill more than half of the table. */
> - if (ss->count >= (symset_size (ss->bits) >> 1))
> - {
> - symset_expand (ss);
> - jo->ss_table = ss->table;
> - }
> -
> - struct symset_tbl *tbl = ss->table;
> - ptrdiff_t mask = symset_size (ss->bits) - 1;
> - for (ptrdiff_t i = symset_hash (sym, ss->bits); ; i = (i + 1) & mask)
> - {
> - Lisp_Object s = tbl->entries[i];
> - if (BASE_EQ (s, sym))
> - return false; /* Previous occurrence found. */
> - if (BASE_EQ (s, Qunbound))
> - {
> - /* Not in set, add it. */
> - tbl->entries[i] = sym;
> - ss->count++;
> - return true;
> - }
> - }
> -}
> -
> static NO_INLINE void
> json_out_grow_buf (json_out_t *jo, ptrdiff_t bytes)
> {
> @@ -283,7 +130,6 @@ cleanup_json_out (void *arg)
> json_out_t *jo = arg;
> xfree (jo->buf);
> jo->buf = NULL;
> - cleanup_symset_tables (jo->ss_table);
> }
>
> /* Make room for `bytes` more bytes in buffer. */
> @@ -442,8 +288,8 @@ json_out_unnest (json_out_t *jo)
> static void
> json_out_object_cons (json_out_t *jo, Lisp_Object obj)
> {
> + Lisp_Object symset = CALLN (Fmake_hash_table, QCtest, Qeq);
> json_out_nest (jo);
> - symset_t ss = push_symset (jo);
> json_out_byte (jo, '{');
> bool is_alist = CONSP (XCAR (obj));
> bool first = true;
> @@ -469,8 +315,9 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj)
> key = maybe_remove_pos_from_symbol (key);
> CHECK_TYPE (BARE_SYMBOL_P (key), Qsymbolp, key);
>
> - if (symset_add (jo, &ss, key))
> + if (NILP (Fgethash (key, symset, Qnil)))
> {
> + Fputhash (key, Qt, symset);
> if (!first)
> json_out_byte (jo, ',');
> first = false;
> @@ -486,7 +333,6 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj)
> }
> CHECK_LIST_END (tail, obj);
> json_out_byte (jo, '}');
> - pop_symset (jo, &ss);
> json_out_unnest (jo);
> }
>
> @@ -591,7 +437,6 @@ json_serialize (json_out_t *jo, Lisp_Object object,
> jo->capacity = 0;
> jo->chars_delta = 0;
> jo->buf = NULL;
> - jo->ss_table = NULL;
> jo->conf.object_type = json_object_hashtable;
> jo->conf.array_type = json_array_array;
> jo->conf.null_object = QCnull;
> @@ -729,6 +574,7 @@ #define JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE 512
> Lisp_Object internal_object_workspace
> [JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE];
> Lisp_Object *object_workspace;
> + Lisp_Object object_workspace_vector;
> size_t object_workspace_size;
> size_t object_workspace_current;
>
> @@ -796,6 +642,7 @@ json_parser_init (struct json_parser *parser,
> parser->object_workspace_size
> = JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE;
> parser->object_workspace_current = 0;
> + parser->object_workspace_vector = Qnil;
>
> parser->byte_workspace = parser->internal_byte_workspace;
> parser->byte_workspace_end = (parser->byte_workspace
> @@ -806,8 +653,6 @@ json_parser_init (struct json_parser *parser,
> json_parser_done (void *parser)
> {
> struct json_parser *p = (struct json_parser *) parser;
> - if (p->object_workspace != p->internal_object_workspace)
> - xfree (p->object_workspace);
> if (p->byte_workspace != p->internal_byte_workspace)
> xfree (p->byte_workspace);
> }
> @@ -818,6 +663,11 @@ json_parser_done (void *parser)
> json_make_object_workspace_for_slow_path (struct json_parser *parser,
> size_t size)
> {
> + if (NILP (parser->object_workspace_vector))
> + {
> + parser->object_workspace_vector =
> + Fvector(parser->object_workspace_current, parser->object_workspace);
> + }
> size_t needed_workspace_size
> = (parser->object_workspace_current + size);
> size_t new_workspace_size = parser->object_workspace_size;
> @@ -829,23 +679,13 @@ json_make_object_workspace_for_slow_path (struct json_parser *parser,
> }
> }
>
> - Lisp_Object *new_workspace_ptr;
> - if (parser->object_workspace_size
> - == JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE)
> - {
> - new_workspace_ptr
> - = xnmalloc (new_workspace_size, sizeof (Lisp_Object));
> - memcpy (new_workspace_ptr, parser->object_workspace,
> - (sizeof (Lisp_Object)
> - * parser->object_workspace_current));
> - }
> - else
> - {
> - new_workspace_ptr
> - = xnrealloc (parser->object_workspace, new_workspace_size,
> - sizeof (Lisp_Object));
> - }
> + Lisp_Object new_workspace_vector =
> + larger_vector (parser->object_workspace_vector,
> + new_workspace_size - parser->object_workspace_size, -1);
> +
> + Lisp_Object *new_workspace_ptr = XVECTOR (new_workspace_vector)->contents;
>
> + parser->object_workspace_vector = new_workspace_vector;
> parser->object_workspace = new_workspace_ptr;
> parser->object_workspace_size = new_workspace_size;
> }
> @@ -1476,7 +1316,7 @@ json_parse_array (struct json_parser *parser)
> result = make_vector (number_of_elements, Qnil);
> for (size_t i = 0; i < number_of_elements; i++)
> {
> - rarely_quit (i);
> + rarely_quit (~i);
> ASET (result, i, parser->object_workspace[first + i]);
> }
> parser->object_workspace_current = first;
>
>
>
>
>
next prev parent reply other threads:[~2024-12-01 15:23 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-26 18:35 bug#74547: 31.0.50; igc: assertion failed in buffer.c Óscar Fuentes
2024-11-27 6:54 ` Gerd Möllmann
2024-12-01 10:49 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 12:05 ` Gerd Möllmann
2024-12-01 12:17 ` Gerd Möllmann
2024-12-01 12:30 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 12:39 ` Gerd Möllmann
2024-12-01 12:57 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 13:30 ` Gerd Möllmann
2024-12-01 14:58 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 15:18 ` Gerd Möllmann
2024-12-01 15:48 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 16:32 ` Geza Herman
2024-12-01 19:41 ` Gerd Möllmann
2024-12-01 21:15 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 15:55 ` Eli Zaretskii
2024-12-01 15:23 ` Eli Zaretskii [this message]
2024-12-01 15:30 ` Óscar Fuentes
2024-12-01 15:48 ` Gerd Möllmann
2024-12-01 15:58 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-01 16:24 ` Óscar Fuentes
2024-12-01 13:18 ` Óscar Fuentes
2024-12-01 13:44 ` Gerd Möllmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86wmgj4ebi.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=74547@debbugs.gnu.org \
--cc=gerd.moellmann@gmail.com \
--cc=geza.herman@gmail.com \
--cc=mattiase@acm.org \
--cc=oscarfv@telefonica.net \
--cc=pipcet@protonmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).