From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#74547: 31.0.50; igc: assertion failed in buffer.c Date: Sun, 01 Dec 2024 17:23:45 +0200 Message-ID: <86wmgj4ebi.fsf@gnu.org> References: <87serdu9m3.fsf@telefonica.net> <87cyibn0dz.fsf@protonmail.com> <875xo3mvqh.fsf@protonmail.com> <871pyrmui4.fsf@protonmail.com> <87wmgjlabu.fsf@protonmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="908"; mail-complaints-to="usenet@ciao.gmane.io" Cc: gerd.moellmann@gmail.com, 74547@debbugs.gnu.org, oscarfv@telefonica.net, geza.herman@gmail.com To: Pip Cet , Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= , =?UTF-8?Q?G=C3=A9za?= Herman Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Dec 01 16:25:19 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tHlp0-00005k-LC for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 01 Dec 2024 16:25:18 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tHlon-0005Ja-OU; Sun, 01 Dec 2024 10:25:05 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tHlol-0005He-Dj for bug-gnu-emacs@gnu.org; Sun, 01 Dec 2024 10:25:03 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tHlok-0002EN-TS for bug-gnu-emacs@gnu.org; Sun, 01 Dec 2024 10:25:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-version:References:In-Reply-To:From:Date:To:Subject; bh=39XhSevY9aqzOUcNWestWOZBnyi0L75jbfKOGTPD6Xw=; b=nKXM7Fhf34gbxYYQjA6yp22k0Y7duWK6/nnQS9yihZECZ8YEJh3JTwlP4g4gVDdw4knHlbnO+w4C8Yz1trIel82B/V5kqbJ2T4cUXQVnENpf8GOA7UwXsWl9L4i3F4fpepx9nIKD1gavgmmP8qtrp3ScZItNE//CZVRTAai3Sv9Qq4ZXtP75yng0Lh1IwSJ48VMp1+vWyglN5VHZkaJCmGxrvb1toC1Ib7haH8Mq7doMMppWIfc7cPhnqVu/6OspjdlSLiMDQNL880hrQRcc8+DXjthlNm391GCZuzaJCktWFHppqL9Yr/aQJqZhQy15HAYY7bmIsi9dUYJIZwf3Ig==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tHloj-0007Wn-T2 for bug-gnu-emacs@gnu.org; Sun, 01 Dec 2024 10:25:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 01 Dec 2024 15:25:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 74547 X-GNU-PR-Package: emacs Original-Received: via spool by 74547-submit@debbugs.gnu.org id=B74547.173306668528901 (code B ref 74547); Sun, 01 Dec 2024 15:25:01 +0000 Original-Received: (at 74547) by debbugs.gnu.org; 1 Dec 2024 15:24:45 +0000 Original-Received: from localhost ([127.0.0.1]:52651 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tHloS-0007W4-GK for submit@debbugs.gnu.org; Sun, 01 Dec 2024 10:24:45 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:56720) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tHloQ-0007Vj-1O for 74547@debbugs.gnu.org; Sun, 01 Dec 2024 10:24:43 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tHloJ-00025D-5Y; Sun, 01 Dec 2024 10:24:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=39XhSevY9aqzOUcNWestWOZBnyi0L75jbfKOGTPD6Xw=; b=hmCqfdYLj24jdz6xhGY0 7bzNqHeeXaP08Gx0KTD3dNu2Fd0ymk7shEMXFV5QoNZigR5w2A/tNsQzlx7E/Rbpyv29a+Ng9mQuY OlZmp9dKfE7eE765+3Q8/xGQTwME8TL8Io4yAr2GU772lyy4cdRxZeEvroqPAmqQBmAFa4ijPFWQr LP9LR85OSmqTrXXnKjRKXLtlgfsoNJ8vnvfrwir/HGkIgjqyE9xGvy9dSzI1sTgroLOdO/EAwiO5b h1qQKNW2tSOsfm6pTBoLGwppOGWPNxopgLonGv8hwD7XgOv75bZDj8Ro20K5i+2E05N5WEGTbJ0Pp GScWLzc1osD3GQ==; In-Reply-To: <87wmgjlabu.fsf@protonmail.com> (bug-gnu-emacs@gnu.org) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:296257 Archived-At: > Cc: 74547@debbugs.gnu.org, > Óscar Fuentes , geza.herman@gmail.com > Date: Sun, 01 Dec 2024 14:58:10 +0000 > From: Pip Cet via "Bug reports for GNU Emacs, > the Swiss army knife of text editors" > > Gerd Möllmann writes: > > > Pip Cet writes: > > >> Gerd Möllmann writes: > >>> Pip Cet writes: > >>> Yes, exactly, json.c. First thing I saw when searching for xfree > >>> > >>> static void > >>> json_parser_done (void *parser) > >>> { > >>> struct json_parser *p = (struct json_parser *) parser; > >>> if (p->object_workspace != p->internal_object_workspace) > >>> xfree (p->object_workspace); > >>> > >>> That at least needs an explanation. I would have expected it to be > >>> allocated as root. > >> > >> Well, the explanation is this comment: > >> > >> /* Lisp_Objects are collected in this area during object/array > >> parsing. To avoid allocations, initially > >> internal_object_workspace is used. If it runs out of space then > >> we switch to allocated space. Important note: with this design, > >> GC must not run during JSON parsing, otherwise Lisp_Objects in > >> the workspace may get incorrectly collected. */ > > > > That explains it, indeed :-(. > > Just to be clear, I think the mixed heap/stack allocation is the right > thing to do here, but we need to let both garbage collectors know about > the Lisp_Objects we allocated. > > I think the best way to do that is to use a Lisp_Vector when we run out > of stack space. That way, we don't have to worry about forgetting to GC > it, and we can use standard functions rather than rolling our own. > > >> Obviously, we cannot make any such guarantees when MPS is in use. (I > >> don't think we can make the guarantee when MPS is not in use, but I'm > >> not totally certain; we certainly allocate strings while parsing JSON, > >> which is sufficient to trigger GC in the MPS case). > > > > If json.c calls something like maybe_quit, which I's expect it must, > > then GC can indeed happen. See bug#56108 for an example in the regexp > > code found with ASAN. It's not as risky in the old code as with > > concurrent GC, but anyway. > > There's a rarely_quit in json_parse_array, which, AFAICS, always > triggers in the first loop iteration (when i == 0), but probably never > reaches 65536 for the second trigger. > > My proposal is to modify json.c so it uses a lisp vector if more than 64 > objects are needed, and to remove the home-grown symset hash set, > replacing it by a standard hash table. > > Note that the symset is only used to detect duplicate JSON keys. When > such duplication is detected, we simply ignore the second plist entry. > (I think it would be better to throw an error, but the documentation > disagrees.) > > So here's the patch with the old behavior, where > > (json-serialize '(a "test" a "ignored")) > > doesn't throw an error and simply returns > > "{\"a\":\"test\"}" Adding Mattias and Géza, who were involved in implementing json.c. > commit 85fbd342d3b4a8afabe8078e19be9b45fe3e20d2 > Author: Pip Cet > Date: Sun Dec 1 12:46:08 2024 +0000 > > Use standard Lisp objects in json.c (bug#74547) > > * src/json.c (json_out_t): Make the symset table a Lisp_Object. > (symset_t): > (pop_symset): > (cleanup_symset_tables): > (symset_hash): > (symset_expand): > (symset_size): Remove. > (make_symset_table): Use an ordinary hash table for the symset. > (push_symset): Don't return a value. > (symset_add): Use ordinary hash table accessors. > (cleanup_json_out): Remove. > (json_out_object_cons): Use ordinary hash table for symsets. > (json_serialize): > (json_parser_init): > (json_parser_done): Adjust to use ordinary hash table code. > (json_make_object_workspace_for_slow_path): Use an ordinary vector for > the workspace. > (json_parse_array): Avoid calling rarely_quit(0) > (json_parser_done): Remove manual memory management. > > diff --git a/src/json.c b/src/json.c > index eb446f5c221..0e17b893087 100644 > --- a/src/json.c > +++ b/src/json.c > @@ -28,7 +28,6 @@ Copyright (C) 2017-2024 Free Software Foundation, Inc. > #include "lisp.h" > #include "buffer.h" > #include "coding.h" > -#include "igc.h" > > enum json_object_type > { > @@ -111,161 +110,9 @@ json_parse_args (ptrdiff_t nargs, Lisp_Object *args, > ptrdiff_t chars_delta; /* size - {number of characters in buf} */ > > int maxdepth; > - struct symset_tbl *ss_table; /* table used by containing object */ > struct json_configuration conf; > } json_out_t; > > -/* Set of symbols. */ > -typedef struct > -{ > - ptrdiff_t count; /* symbols in table */ > - int bits; /* log2(table size) */ > - struct symset_tbl *table; /* heap-allocated table */ > -} symset_t; > - > -struct symset_tbl > -{ > - /* Table used by the containing object if any, so that we can free all > - tables if an error occurs. */ > - struct symset_tbl *up; > - /* Table of symbols (2**bits elements), Qunbound where unused. */ > - Lisp_Object entries[]; > -}; > - > -static inline ptrdiff_t > -symset_size (int bits) > -{ > - return (ptrdiff_t) 1 << bits; > -} > - > -static struct symset_tbl * > -make_symset_table (int bits, struct symset_tbl *up) > -{ > - int maxbits = min (SIZE_WIDTH - 2 - (word_size < 8 ? 2 : 3), 32); > - if (bits > maxbits) > - memory_full (PTRDIFF_MAX); /* Will never happen in practice. */ > -#ifdef HAVE_MPS > - struct symset_tbl *st = igc_xzalloc_ambig (sizeof *st + (sizeof *st->entries << bits)); > -#else > - struct symset_tbl *st = xmalloc (sizeof *st + (sizeof *st->entries << bits)); > -#endif > - st->up = up; > - ptrdiff_t size = symset_size (bits); > - for (ptrdiff_t i = 0; i < size; i++) > - st->entries[i] = Qunbound; > - return st; > -} > - > -/* Create a new symset to use for a new object. */ > -static symset_t > -push_symset (json_out_t *jo) > -{ > - int bits = 4; > - struct symset_tbl *tbl = make_symset_table (bits, jo->ss_table); > - jo->ss_table = tbl; > - return (symset_t){ .count = 0, .bits = bits, .table = tbl }; > -} > - > -/* Destroy the current symset. */ > -static void > -pop_symset (json_out_t *jo, symset_t *ss) > -{ > - jo->ss_table = ss->table->up; > -#ifdef HAVE_MPS > - igc_xfree (ss->table); > -#else > - xfree (ss->table); > -#endif > -} > - > -/* Remove all heap-allocated symset tables, in case an error occurred. */ > -static void > -cleanup_symset_tables (struct symset_tbl *st) > -{ > - while (st) > - { > - struct symset_tbl *up = st->up; > -#ifdef HAVE_MPS > - igc_xfree (st); > -#else > - xfree (st); > -#endif > - st = up; > - } > -} > - > -static inline uint32_t > -symset_hash (Lisp_Object sym, int bits) > -{ > - EMACS_UINT hash; > -#ifdef HAVE_MPS > - hash = igc_hash (sym); > -#else > - hash = XHASH (sym); > -#endif > - return knuth_hash (reduce_emacs_uint_to_hash_hash (hash), bits); > -} > - > -/* Enlarge the table used by a symset. */ > -static NO_INLINE void > -symset_expand (symset_t *ss) > -{ > - struct symset_tbl *old_table = ss->table; > - int oldbits = ss->bits; > - ptrdiff_t oldsize = symset_size (oldbits); > - int bits = oldbits + 1; > - ss->bits = bits; > - ss->table = make_symset_table (bits, old_table->up); > - /* Move all entries from the old table to the new one. */ > - ptrdiff_t mask = symset_size (bits) - 1; > - struct symset_tbl *tbl = ss->table; > - for (ptrdiff_t i = 0; i < oldsize; i++) > - { > - Lisp_Object sym = old_table->entries[i]; > - if (!BASE_EQ (sym, Qunbound)) > - { > - ptrdiff_t j = symset_hash (sym, bits); > - while (!BASE_EQ (tbl->entries[j], Qunbound)) > - j = (j + 1) & mask; > - tbl->entries[j] = sym; > - } > - } > -#ifdef HAVE_MPS > - igc_xfree (old_table); > -#else > - xfree (old_table); > -#endif > -} > - > -/* If sym is in ss, return false; otherwise add it and return true. > - Comparison is done by strict identity. */ > -static inline bool > -symset_add (json_out_t *jo, symset_t *ss, Lisp_Object sym) > -{ > - /* Make sure we don't fill more than half of the table. */ > - if (ss->count >= (symset_size (ss->bits) >> 1)) > - { > - symset_expand (ss); > - jo->ss_table = ss->table; > - } > - > - struct symset_tbl *tbl = ss->table; > - ptrdiff_t mask = symset_size (ss->bits) - 1; > - for (ptrdiff_t i = symset_hash (sym, ss->bits); ; i = (i + 1) & mask) > - { > - Lisp_Object s = tbl->entries[i]; > - if (BASE_EQ (s, sym)) > - return false; /* Previous occurrence found. */ > - if (BASE_EQ (s, Qunbound)) > - { > - /* Not in set, add it. */ > - tbl->entries[i] = sym; > - ss->count++; > - return true; > - } > - } > -} > - > static NO_INLINE void > json_out_grow_buf (json_out_t *jo, ptrdiff_t bytes) > { > @@ -283,7 +130,6 @@ cleanup_json_out (void *arg) > json_out_t *jo = arg; > xfree (jo->buf); > jo->buf = NULL; > - cleanup_symset_tables (jo->ss_table); > } > > /* Make room for `bytes` more bytes in buffer. */ > @@ -442,8 +288,8 @@ json_out_unnest (json_out_t *jo) > static void > json_out_object_cons (json_out_t *jo, Lisp_Object obj) > { > + Lisp_Object symset = CALLN (Fmake_hash_table, QCtest, Qeq); > json_out_nest (jo); > - symset_t ss = push_symset (jo); > json_out_byte (jo, '{'); > bool is_alist = CONSP (XCAR (obj)); > bool first = true; > @@ -469,8 +315,9 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj) > key = maybe_remove_pos_from_symbol (key); > CHECK_TYPE (BARE_SYMBOL_P (key), Qsymbolp, key); > > - if (symset_add (jo, &ss, key)) > + if (NILP (Fgethash (key, symset, Qnil))) > { > + Fputhash (key, Qt, symset); > if (!first) > json_out_byte (jo, ','); > first = false; > @@ -486,7 +333,6 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj) > } > CHECK_LIST_END (tail, obj); > json_out_byte (jo, '}'); > - pop_symset (jo, &ss); > json_out_unnest (jo); > } > > @@ -591,7 +437,6 @@ json_serialize (json_out_t *jo, Lisp_Object object, > jo->capacity = 0; > jo->chars_delta = 0; > jo->buf = NULL; > - jo->ss_table = NULL; > jo->conf.object_type = json_object_hashtable; > jo->conf.array_type = json_array_array; > jo->conf.null_object = QCnull; > @@ -729,6 +574,7 @@ #define JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE 512 > Lisp_Object internal_object_workspace > [JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE]; > Lisp_Object *object_workspace; > + Lisp_Object object_workspace_vector; > size_t object_workspace_size; > size_t object_workspace_current; > > @@ -796,6 +642,7 @@ json_parser_init (struct json_parser *parser, > parser->object_workspace_size > = JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE; > parser->object_workspace_current = 0; > + parser->object_workspace_vector = Qnil; > > parser->byte_workspace = parser->internal_byte_workspace; > parser->byte_workspace_end = (parser->byte_workspace > @@ -806,8 +653,6 @@ json_parser_init (struct json_parser *parser, > json_parser_done (void *parser) > { > struct json_parser *p = (struct json_parser *) parser; > - if (p->object_workspace != p->internal_object_workspace) > - xfree (p->object_workspace); > if (p->byte_workspace != p->internal_byte_workspace) > xfree (p->byte_workspace); > } > @@ -818,6 +663,11 @@ json_parser_done (void *parser) > json_make_object_workspace_for_slow_path (struct json_parser *parser, > size_t size) > { > + if (NILP (parser->object_workspace_vector)) > + { > + parser->object_workspace_vector = > + Fvector(parser->object_workspace_current, parser->object_workspace); > + } > size_t needed_workspace_size > = (parser->object_workspace_current + size); > size_t new_workspace_size = parser->object_workspace_size; > @@ -829,23 +679,13 @@ json_make_object_workspace_for_slow_path (struct json_parser *parser, > } > } > > - Lisp_Object *new_workspace_ptr; > - if (parser->object_workspace_size > - == JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE) > - { > - new_workspace_ptr > - = xnmalloc (new_workspace_size, sizeof (Lisp_Object)); > - memcpy (new_workspace_ptr, parser->object_workspace, > - (sizeof (Lisp_Object) > - * parser->object_workspace_current)); > - } > - else > - { > - new_workspace_ptr > - = xnrealloc (parser->object_workspace, new_workspace_size, > - sizeof (Lisp_Object)); > - } > + Lisp_Object new_workspace_vector = > + larger_vector (parser->object_workspace_vector, > + new_workspace_size - parser->object_workspace_size, -1); > + > + Lisp_Object *new_workspace_ptr = XVECTOR (new_workspace_vector)->contents; > > + parser->object_workspace_vector = new_workspace_vector; > parser->object_workspace = new_workspace_ptr; > parser->object_workspace_size = new_workspace_size; > } > @@ -1476,7 +1316,7 @@ json_parse_array (struct json_parser *parser) > result = make_vector (number_of_elements, Qnil); > for (size_t i = 0; i < number_of_elements; i++) > { > - rarely_quit (i); > + rarely_quit (~i); > ASET (result, i, parser->object_workspace[first + i]); > } > parser->object_workspace_current = first; > > > > >