From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Pip Cet via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#74547: 31.0.50; igc: assertion failed in buffer.c Date: Sun, 01 Dec 2024 14:58:10 +0000 Message-ID: <87wmgjlabu.fsf@protonmail.com> References: <87serdu9m3.fsf@telefonica.net> <87cyibn0dz.fsf@protonmail.com> <875xo3mvqh.fsf@protonmail.com> <871pyrmui4.fsf@protonmail.com> Reply-To: Pip Cet Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30579"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 74547@debbugs.gnu.org, =?UTF-8?Q?=C3=93scar?= Fuentes , geza.herman@gmail.com To: Gerd =?UTF-8?Q?M=C3=B6llmann?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Dec 01 15:59:35 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tHlQ7-0007m6-6K for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 01 Dec 2024 15:59:35 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tHlPd-0000A6-BJ; Sun, 01 Dec 2024 09:59:05 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tHlPa-00009B-VH for bug-gnu-emacs@gnu.org; Sun, 01 Dec 2024 09:59:03 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tHlPa-0001YB-Me for bug-gnu-emacs@gnu.org; Sun, 01 Dec 2024 09:59:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:References:In-Reply-To:From:Date:To:Subject; bh=EqYXTOrsJ3uxI3JRZ18CF8/j08kVI77dwfd54W2ccf4=; b=VdSoKbRigON/qFrZmlwfMz1MHx3V2NcTuDzYkh5zFaDKAnrqzlLNTsy6KoFND7qFpc8kyh0wGQyO+sBPCqxSSELiWSM7yYCE7oV4QN5vREo3Tc2b3m94j9K5XKOIXEVY/nU1I7AVqC7J3mlKF2yJqyj3YGqqzExmgvvrJhrWtGRssr3Rq9nZWgmd/3PHOxZd575z+PSCYEOyic2S6bvXaFJ8UB9/NldQd5nAcuBi1WaEEfJgnqthdrfbSOp+mQwOtGTszXQ8fN2iKxFL09W5doaOrzEHyMzanTmClb9cMc6tCBB5WYjUe67TpSlACneD6AFciv4lQlBRS/chPdw19A==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tHlPa-000686-Hm for bug-gnu-emacs@gnu.org; Sun, 01 Dec 2024 09:59:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Pip Cet Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 01 Dec 2024 14:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 74547 X-GNU-PR-Package: emacs Original-Received: via spool by 74547-submit@debbugs.gnu.org id=B74547.173306510623498 (code B ref 74547); Sun, 01 Dec 2024 14:59:02 +0000 Original-Received: (at 74547) by debbugs.gnu.org; 1 Dec 2024 14:58:26 +0000 Original-Received: from localhost ([127.0.0.1]:52610 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tHlOz-00066v-0v for submit@debbugs.gnu.org; Sun, 01 Dec 2024 09:58:25 -0500 Original-Received: from mail-10631.protonmail.ch ([79.135.106.31]:55763) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tHlOv-00066W-HS for 74547@debbugs.gnu.org; Sun, 01 Dec 2024 09:58:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1733065094; x=1733324294; bh=EqYXTOrsJ3uxI3JRZ18CF8/j08kVI77dwfd54W2ccf4=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector:List-Unsubscribe:List-Unsubscribe-Post; b=XPcB9lwCGHMts0teIP2bvGuNMF/cQnhxjy8TpwuK+PE5NKHIxfc+tTIDX1Ov8a9NO Et3su0NQa+RYajH9Uygyu2KQLU0EC0MgI8vOfngSb/lFzlF16evFONMlPi136qKZFx TC55/pIWKTYOuMUcMy4WU/88ScznXr7wShbJFRLN8Vg4htYFoTesrw3iIPrpx4lH8q sXuiO/yC9qvFvJeCd3vmgWAR/HeZcgxgHk4slThwNRAdjbkPW++aTttnLps+nJfXx8 vr45tCr43TaQLHg7/R4KvHlqXVL4mDa40iEXKizebmY92Ut3IEKQCs8VyNfksLzNct HdHF9yddaZrMg== In-Reply-To: Feedback-ID: 112775352:user:proton X-Pm-Message-ID: a08e1bb9f9dcc154f958aee35f6dfb42ee723016 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:296255 Archived-At: Gerd M=C3=B6llmann writes: > Pip Cet writes: >> Gerd M=C3=B6llmann writes: >>> Pip Cet writes: >>> Yes, exactly, json.c. First thing I saw when searching for xfree >>> >>> static void >>> json_parser_done (void *parser) >>> { >>> struct json_parser *p =3D (struct json_parser *) parser; >>> if (p->object_workspace !=3D p->internal_object_workspace) >>> xfree (p->object_workspace); >>> >>> That at least needs an explanation. I would have expected it to be >>> allocated as root. >> >> Well, the explanation is this comment: >> >> /* Lisp_Objects are collected in this area during object/array >> parsing. To avoid allocations, initially >> internal_object_workspace is used. If it runs out of space then >> we switch to allocated space. Important note: with this design, >> GC must not run during JSON parsing, otherwise Lisp_Objects in >> the workspace may get incorrectly collected. */ > > That explains it, indeed :-(. Just to be clear, I think the mixed heap/stack allocation is the right thing to do here, but we need to let both garbage collectors know about the Lisp_Objects we allocated. I think the best way to do that is to use a Lisp_Vector when we run out of stack space. That way, we don't have to worry about forgetting to GC it, and we can use standard functions rather than rolling our own. >> Obviously, we cannot make any such guarantees when MPS is in use. (I >> don't think we can make the guarantee when MPS is not in use, but I'm >> not totally certain; we certainly allocate strings while parsing JSON, >> which is sufficient to trigger GC in the MPS case). > > If json.c calls something like maybe_quit, which I's expect it must, > then GC can indeed happen. See bug#56108 for an example in the regexp > code found with ASAN. It's not as risky in the old code as with > concurrent GC, but anyway. There's a rarely_quit in json_parse_array, which, AFAICS, always triggers in the first loop iteration (when i =3D=3D 0), but probably never reaches 65536 for the second trigger. My proposal is to modify json.c so it uses a lisp vector if more than 64 objects are needed, and to remove the home-grown symset hash set, replacing it by a standard hash table. Note that the symset is only used to detect duplicate JSON keys. When such duplication is detected, we simply ignore the second plist entry. (I think it would be better to throw an error, but the documentation disagrees.) So here's the patch with the old behavior, where (json-serialize '(a "test" a "ignored")) doesn't throw an error and simply returns "{\"a\":\"test\"}" commit 85fbd342d3b4a8afabe8078e19be9b45fe3e20d2 Author: Pip Cet Date: Sun Dec 1 12:46:08 2024 +0000 Use standard Lisp objects in json.c (bug#74547) =20 * src/json.c (json_out_t): Make the symset table a Lisp_Object. (symset_t): (pop_symset): (cleanup_symset_tables): (symset_hash): (symset_expand): (symset_size): Remove. (make_symset_table): Use an ordinary hash table for the symset. (push_symset): Don't return a value. (symset_add): Use ordinary hash table accessors. (cleanup_json_out): Remove. (json_out_object_cons): Use ordinary hash table for symsets. (json_serialize): (json_parser_init): (json_parser_done): Adjust to use ordinary hash table code. (json_make_object_workspace_for_slow_path): Use an ordinary vector for the workspace. (json_parse_array): Avoid calling rarely_quit(0) (json_parser_done): Remove manual memory management. diff --git a/src/json.c b/src/json.c index eb446f5c221..0e17b893087 100644 --- a/src/json.c +++ b/src/json.c @@ -28,7 +28,6 @@ Copyright (C) 2017-2024 Free Software Foundation, Inc. #include "lisp.h" #include "buffer.h" #include "coding.h" -#include "igc.h" =20 enum json_object_type { @@ -111,161 +110,9 @@ json_parse_args (ptrdiff_t nargs, Lisp_Object *args, ptrdiff_t chars_delta; /* size - {number of characters in buf} */ =20 int maxdepth; - struct symset_tbl *ss_table;=09/* table used by containing object */ struct json_configuration conf; } json_out_t; =20 -/* Set of symbols. */ -typedef struct -{ - ptrdiff_t count;=09=09/* symbols in table */ - int bits;=09=09=09/* log2(table size) */ - struct symset_tbl *table;=09/* heap-allocated table */ -} symset_t; - -struct symset_tbl -{ - /* Table used by the containing object if any, so that we can free all - tables if an error occurs. */ - struct symset_tbl *up; - /* Table of symbols (2**bits elements), Qunbound where unused. */ - Lisp_Object entries[]; -}; - -static inline ptrdiff_t -symset_size (int bits) -{ - return (ptrdiff_t) 1 << bits; -} - -static struct symset_tbl * -make_symset_table (int bits, struct symset_tbl *up) -{ - int maxbits =3D min (SIZE_WIDTH - 2 - (word_size < 8 ? 2 : 3), 32); - if (bits > maxbits) - memory_full (PTRDIFF_MAX);=09/* Will never happen in practice. */ -#ifdef HAVE_MPS - struct symset_tbl *st =3D igc_xzalloc_ambig (sizeof *st + (sizeof *st->e= ntries << bits)); -#else - struct symset_tbl *st =3D xmalloc (sizeof *st + (sizeof *st->entries << = bits)); -#endif - st->up =3D up; - ptrdiff_t size =3D symset_size (bits); - for (ptrdiff_t i =3D 0; i < size; i++) - st->entries[i] =3D Qunbound; - return st; -} - -/* Create a new symset to use for a new object. */ -static symset_t -push_symset (json_out_t *jo) -{ - int bits =3D 4; - struct symset_tbl *tbl =3D make_symset_table (bits, jo->ss_table); - jo->ss_table =3D tbl; - return (symset_t){ .count =3D 0, .bits =3D bits, .table =3D tbl }; -} - -/* Destroy the current symset. */ -static void -pop_symset (json_out_t *jo, symset_t *ss) -{ - jo->ss_table =3D ss->table->up; -#ifdef HAVE_MPS - igc_xfree (ss->table); -#else - xfree (ss->table); -#endif -} - -/* Remove all heap-allocated symset tables, in case an error occurred. */ -static void -cleanup_symset_tables (struct symset_tbl *st) -{ - while (st) - { - struct symset_tbl *up =3D st->up; -#ifdef HAVE_MPS - igc_xfree (st); -#else - xfree (st); -#endif - st =3D up; - } -} - -static inline uint32_t -symset_hash (Lisp_Object sym, int bits) -{ - EMACS_UINT hash; -#ifdef HAVE_MPS - hash =3D igc_hash (sym); -#else - hash =3D XHASH (sym); -#endif - return knuth_hash (reduce_emacs_uint_to_hash_hash (hash), bits); -} - -/* Enlarge the table used by a symset. */ -static NO_INLINE void -symset_expand (symset_t *ss) -{ - struct symset_tbl *old_table =3D ss->table; - int oldbits =3D ss->bits; - ptrdiff_t oldsize =3D symset_size (oldbits); - int bits =3D oldbits + 1; - ss->bits =3D bits; - ss->table =3D make_symset_table (bits, old_table->up); - /* Move all entries from the old table to the new one. */ - ptrdiff_t mask =3D symset_size (bits) - 1; - struct symset_tbl *tbl =3D ss->table; - for (ptrdiff_t i =3D 0; i < oldsize; i++) - { - Lisp_Object sym =3D old_table->entries[i]; - if (!BASE_EQ (sym, Qunbound)) -=09{ -=09 ptrdiff_t j =3D symset_hash (sym, bits); -=09 while (!BASE_EQ (tbl->entries[j], Qunbound)) -=09 j =3D (j + 1) & mask; -=09 tbl->entries[j] =3D sym; -=09} - } -#ifdef HAVE_MPS - igc_xfree (old_table); -#else - xfree (old_table); -#endif -} - -/* If sym is in ss, return false; otherwise add it and return true. - Comparison is done by strict identity. */ -static inline bool -symset_add (json_out_t *jo, symset_t *ss, Lisp_Object sym) -{ - /* Make sure we don't fill more than half of the table. */ - if (ss->count >=3D (symset_size (ss->bits) >> 1)) - { - symset_expand (ss); - jo->ss_table =3D ss->table; - } - - struct symset_tbl *tbl =3D ss->table; - ptrdiff_t mask =3D symset_size (ss->bits) - 1; - for (ptrdiff_t i =3D symset_hash (sym, ss->bits); ; i =3D (i + 1) & mask= ) - { - Lisp_Object s =3D tbl->entries[i]; - if (BASE_EQ (s, sym)) -=09return false;=09=09/* Previous occurrence found. */ - if (BASE_EQ (s, Qunbound)) -=09{ -=09 /* Not in set, add it. */ -=09 tbl->entries[i] =3D sym; -=09 ss->count++; -=09 return true; -=09} - } -} - static NO_INLINE void json_out_grow_buf (json_out_t *jo, ptrdiff_t bytes) { @@ -283,7 +130,6 @@ cleanup_json_out (void *arg) json_out_t *jo =3D arg; xfree (jo->buf); jo->buf =3D NULL; - cleanup_symset_tables (jo->ss_table); } =20 /* Make room for `bytes` more bytes in buffer. */ @@ -442,8 +288,8 @@ json_out_unnest (json_out_t *jo) static void json_out_object_cons (json_out_t *jo, Lisp_Object obj) { + Lisp_Object symset =3D CALLN (Fmake_hash_table, QCtest, Qeq); json_out_nest (jo); - symset_t ss =3D push_symset (jo); json_out_byte (jo, '{'); bool is_alist =3D CONSP (XCAR (obj)); bool first =3D true; @@ -469,8 +315,9 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj) key =3D maybe_remove_pos_from_symbol (key); CHECK_TYPE (BARE_SYMBOL_P (key), Qsymbolp, key); =20 - if (symset_add (jo, &ss, key)) + if (NILP (Fgethash (key, symset, Qnil))) =09{ +=09 Fputhash (key, Qt, symset); =09 if (!first) =09 json_out_byte (jo, ','); =09 first =3D false; @@ -486,7 +333,6 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj) } CHECK_LIST_END (tail, obj); json_out_byte (jo, '}'); - pop_symset (jo, &ss); json_out_unnest (jo); } =20 @@ -591,7 +437,6 @@ json_serialize (json_out_t *jo, Lisp_Object object, jo->capacity =3D 0; jo->chars_delta =3D 0; jo->buf =3D NULL; - jo->ss_table =3D NULL; jo->conf.object_type =3D json_object_hashtable; jo->conf.array_type =3D json_array_array; jo->conf.null_object =3D QCnull; @@ -729,6 +574,7 @@ #define JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE 512 Lisp_Object internal_object_workspace [JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE]; Lisp_Object *object_workspace; + Lisp_Object object_workspace_vector; size_t object_workspace_size; size_t object_workspace_current; =20 @@ -796,6 +642,7 @@ json_parser_init (struct json_parser *parser, parser->object_workspace_size =3D JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE; parser->object_workspace_current =3D 0; + parser->object_workspace_vector =3D Qnil; =20 parser->byte_workspace =3D parser->internal_byte_workspace; parser->byte_workspace_end =3D (parser->byte_workspace @@ -806,8 +653,6 @@ json_parser_init (struct json_parser *parser, json_parser_done (void *parser) { struct json_parser *p =3D (struct json_parser *) parser; - if (p->object_workspace !=3D p->internal_object_workspace) - xfree (p->object_workspace); if (p->byte_workspace !=3D p->internal_byte_workspace) xfree (p->byte_workspace); } @@ -818,6 +663,11 @@ json_parser_done (void *parser) json_make_object_workspace_for_slow_path (struct json_parser *parser, =09=09=09=09=09 size_t size) { + if (NILP (parser->object_workspace_vector)) + { + parser->object_workspace_vector =3D +=09Fvector(parser->object_workspace_current, parser->object_workspace); + } size_t needed_workspace_size =3D (parser->object_workspace_current + size); size_t new_workspace_size =3D parser->object_workspace_size; @@ -829,23 +679,13 @@ json_make_object_workspace_for_slow_path (struct json= _parser *parser, =09} } =20 - Lisp_Object *new_workspace_ptr; - if (parser->object_workspace_size - =3D=3D JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE) - { - new_workspace_ptr -=09=3D xnmalloc (new_workspace_size, sizeof (Lisp_Object)); - memcpy (new_workspace_ptr, parser->object_workspace, -=09 (sizeof (Lisp_Object) -=09 * parser->object_workspace_current)); - } - else - { - new_workspace_ptr -=09=3D xnrealloc (parser->object_workspace, new_workspace_size, -=09=09 sizeof (Lisp_Object)); - } + Lisp_Object new_workspace_vector =3D + larger_vector (parser->object_workspace_vector, +=09=09 new_workspace_size - parser->object_workspace_size, -1); + + Lisp_Object *new_workspace_ptr =3D XVECTOR (new_workspace_vector)->conte= nts; =20 + parser->object_workspace_vector =3D new_workspace_vector; parser->object_workspace =3D new_workspace_ptr; parser->object_workspace_size =3D new_workspace_size; } @@ -1476,7 +1316,7 @@ json_parse_array (struct json_parser *parser) =09result =3D make_vector (number_of_elements, Qnil); =09for (size_t i =3D 0; i < number_of_elements; i++) =09 { -=09 rarely_quit (i); +=09 rarely_quit (~i); =09 ASET (result, i, parser->object_workspace[first + i]); =09 } =09parser->object_workspace_current =3D first;