From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Pip Cet via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#74547: 31.0.50; igc: assertion failed in buffer.c Date: Sun, 01 Dec 2024 21:15:41 +0000 Message-ID: <87cyibksuu.fsf@protonmail.com> References: <87serdu9m3.fsf@telefonica.net> <875xo3mvqh.fsf@protonmail.com> <871pyrmui4.fsf@protonmail.com> <87wmgjlabu.fsf@protonmail.com> <87r06rl807.fsf@protonmail.com> <11705193-9151-43fb-9109-8a579e77d32b@gmail.com> Reply-To: Pip Cet Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35530"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Gerd =?UTF-8?Q?M=C3=B6llmann?= , 74547@debbugs.gnu.org, =?UTF-8?Q?=C3=93scar?= Fuentes To: Geza Herman Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Dec 01 22:16:24 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tHrIm-000951-Es for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 01 Dec 2024 22:16:24 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tHrIS-0001LL-Tp; Sun, 01 Dec 2024 16:16:05 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tHrIR-0001L7-4B for bug-gnu-emacs@gnu.org; Sun, 01 Dec 2024 16:16:03 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tHrIQ-0001fZ-R4 for bug-gnu-emacs@gnu.org; Sun, 01 Dec 2024 16:16:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:References:In-Reply-To:From:Date:To:Subject; bh=rgX0cB6f0d56HBHNdrB+rVRh4a+cW7EaQjGMwJ5EnHw=; b=em2OElaLL+QvOzeyk9kxPd4x9l0f0pqhhCYL5SrWzfRqIRodNFByZrWqQV37bTMo8fwGZhqlvKRi6EHLrYNx/rR0QotMs3OGLXYDVsIS5JCO4YYpIIL6IxP/Jadjfc60Z0K4ooSs37dsM5cmOA58noRnYIHSen4kTKWVZmS253+SPuqIjIfc88U+KiEcR4Z3zOIgtXJLq5n3xn38Limnf1d8yHPo+4RaQP1mU5VvbowZ7GuKGS8GacenNUE6dUHpInX09nbI1f2+f3A5EO71B/NmZCT2d92Y14WIYw56aHpdwvm6IAV/rv+h04vUCiDfaok07iqhF+MFIVIvpQ7eKA==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tHrIQ-0000kV-D5 for bug-gnu-emacs@gnu.org; Sun, 01 Dec 2024 16:16:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Pip Cet Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 01 Dec 2024 21:16:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 74547 X-GNU-PR-Package: emacs Original-Received: via spool by 74547-submit@debbugs.gnu.org id=B74547.17330877562863 (code B ref 74547); Sun, 01 Dec 2024 21:16:02 +0000 Original-Received: (at 74547) by debbugs.gnu.org; 1 Dec 2024 21:15:56 +0000 Original-Received: from localhost ([127.0.0.1]:53288 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tHrIJ-0000k7-0n for submit@debbugs.gnu.org; Sun, 01 Dec 2024 16:15:55 -0500 Original-Received: from mail-10629.protonmail.ch ([79.135.106.29]:10511) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tHrIF-0000jp-C1 for 74547@debbugs.gnu.org; Sun, 01 Dec 2024 16:15:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1733087743; x=1733346943; bh=rgX0cB6f0d56HBHNdrB+rVRh4a+cW7EaQjGMwJ5EnHw=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector:List-Unsubscribe:List-Unsubscribe-Post; b=rzOMXFmFIUw9ETY3k1+Z3UREtei+uuNkcrH8dCvWJQoY6jIf0G5QrgVcIa+atNVHJ UfX36/jy+eL7zkF3QqVEml9teGHEdOTeiGpk8IMOhAaOYrKB/0gFEAWoQSIevJE0eo hdpLaxfwMnd3oMWsAjL/go48nQuZMuMyJ7OJ22UZWn8FrVITJy7Q8HPT6D9MqvxtDT 00tI3JoWjREheq2yPUP51x0/YwwcbCbGHH8pkSDbpGd0iVR/Yvi4RjqoitJL1ARgWt Ffhl6YLwsL8kTYae6t7sdpP2yJRlfc7rYmqqu8HSbyD94mO0bAKCthWNJQwjHnXjT5 ySGwGkRmvwm6w== In-Reply-To: <11705193-9151-43fb-9109-8a579e77d32b@gmail.com> Feedback-ID: 112775352:user:proton X-Pm-Message-ID: 3f8fadac9a75bf7da887997aee3303e7e1f03064 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:296280 Archived-At: "Geza Herman" writes: > On 12/1/24 16:48, Pip Cet wrote: > Gerd M=C2=B6llmann writes: > > Back then, the future of the new GC was a question, so Gerd said > (https://lists.gnu.org/archive/html/emacs-devel/2024-03/msg00544.html) > that > "Please don't take my GC efforts into consideration. That may succeed > or not. But this is also a matter of good design, using the stack, > (which BTW pdumper does, too), vs. bad design." That's why we went wit= h > the fastest implementation that doesn't use lisp vectors for storage. > But we suspected that this JSON parser design will likely cause a > problem with the new GC. So I think even if it turned out that the > current problem was not caused by the parser, I still think that there > should be something done about this JSON parser design to eliminate > this potential problem. The lisp vector based approach was reverted > because it added an extra pressure to the GC. For large JSON messages, > it doesn't matter too much, but when the JSON is small, the extra GC > time made the parser measurably slower. But, as far as I remember, tha= t > version hadn't have the small internal storage optimization yet. If we > convert back to the vector based approach, the extra GC pressure will > be smaller (compared to the original vector based approach without the > internal storage), as for smaller sizes the vector won't be actually > used. > G=C2=A9za Thank you for the summary, that makes sense. Is there a standard corpus of JSON documents that you use to benchmark the code? That would be very helpful, I think, since Eli correctly points out JSON parsing performance is critical. My gut feeling is that we should get rid of the object_workspace entirely, instead modifying the general Lisp code to avoid performance issues (and sacrifice some memory in the process, on most systems). When building a hash table, we can do so directly and fine-tune the hash table code not to reallocate too often. I'd imagine such fine tuning would be generally useful (JSON-derived hash tables are unlikely to exceed the initial 65 elements, but when they do, growing by a factor of four might be appropriate). When building a vector, we might want to introduce a `vtruncate' function which destructively reduces a vector's size without reallocating it. Again, I think that function would be useful in a few other places. Then we could start by allocating a 64-element vector, fill in the array elements, grow it in the rare case that we exceed 64 elements, and truncate it in the common case that fewer than 64 slots are used. No need to copy anything from the stack to the heap, the elements would just appear where they are needed. However, some memory would be wasted, and on low-memory systems, we might want to define `vtruncate' and `hash-table-truncate' to reduce memory usage. Here's some code which might illustrate this idea, but WON'T WORK yet, since the old GC requires vtruncate to do more work. >From 9c84c08fd9d4bb9c419025787663b7f7f2611952 Mon Sep 17 00:00:00 2001 From: Pip Cet Date: Sun, 1 Dec 2024 21:00:46 +0000 Subject: [PATCH] Optimize json.c performance * src/alloc.c (Fvtruncate): New function. (syms_of_alloc): Register it. * src/fns.c (Fhash_table_truncate): New function. (syms_of_fns): Register it. * src/json.c (JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE): Rename to ... (JSON_PARSER_INITIAL_VECTOR_SIZE): ... this. (struct json_parser): Remove object workspace. (json_parser_init): Adjust to removed object workspace. (json_make_object_workspace_for_slow_path, json_make_object_workspace_for): Remove. (json_parse_array): Parse directly to a heap vector, truncate it when done. (json_parse_object): Parse directly to a hash table, truncate it when done. --- src/alloc.c | 25 +++++++++ src/fns.c | 13 +++++ src/json.c | 143 +++++++++++++--------------------------------------- 3 files changed, 74 insertions(+), 107 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index 55126b1d551..b78c27ccf35 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -3935,6 +3935,30 @@ DEFUN ("vector", Fvector, Svector, 0, MANY, 0, return val; } =20 +DEFUN ("vtruncate", Fvtruncate, Svtruncate, 2, 2, 0, + doc: /* Truncate VECTOR so only the first LENGTH elements remain. + +Return a vector to be used instead of VECTOR, which may also be modified +destructively. This function is a performance optimization. */) + (Lisp_Object length, Lisp_Object vector) +{ + CHECK_TYPE (FIXNATP (length), Qwholenump, length); + CHECK_TYPE (VECTORP (vector) && +=09 PSEUDOVECTOR_TYPE (XVECTOR (vector)) =3D=3D PVEC_NORMAL_VECTOR, +=09 Qvectorp, vector); + if (XFIXNAT (length) > ASIZE (vector)) + signal_error ("vector too short", vector); + if (XFIXNAT (length) =3D=3D 0) + return zero_vector; + +#ifdef HAVE_MPS + XVECTOR (vector)->header.size =3D XFIXNAT (length); + return vector; +#else + return Fvector (XFIXNAT (length), XVECTOR (vector)->contents); +#endif +} + DEFUN ("make-byte-code", Fmake_byte_code, Smake_byte_code, 4, MANY, 0, doc: /* Create a byte-code object with specified arguments as eleme= nts. The arguments should be the ARGLIST, bytecode-string BYTE-CODE, constant @@ -8600,6 +8624,7 @@ syms_of_alloc (void) defsubr (&Sgarbage_collect_maybe); defsubr (&Smemory_info); defsubr (&Smemory_use_counts); + defsubr (&Svtruncate); #if defined GNU_LINUX && defined __GLIBC__ && \ (__GLIBC__ > 2 || __GLIBC_MINOR__ >=3D 10) =20 diff --git a/src/fns.c b/src/fns.c index f8191d72443..51bc7488457 100644 --- a/src/fns.c +++ b/src/fns.c @@ -6651,6 +6651,18 @@ DEFUN ("define-hash-table-test", Fdefine_hash_table_= test, return Fput (name, Qhash_table_test, list2 (test, hash)); } =20 +DEFUN ("hash-table-truncate", Fhash_table_truncate, + Shash_table_truncate, 1, 1, 0, + doc: /* Indicate that TABLE is unlikely to grow further. + +Returns TABLE, with its internal structures potentially reduced to fit +the current number of elements precisely. This function is a +performance optimization. */) + (Lisp_Object table) +{ + return table; +} + DEFUN ("internal--hash-table-histogram", Finternal__hash_table_histogram, Sinternal__hash_table_histogram, @@ -7408,6 +7420,7 @@ syms_of_fns (void) defsubr (&Shash_table_rehash_threshold); defsubr (&Shash_table_size); defsubr (&Shash_table_test); + defsubr (&Shash_table_truncate); defsubr (&Shash_table_weakness); defsubr (&Shash_table_p); defsubr (&Sclrhash); diff --git a/src/json.c b/src/json.c index 0e17b893087..dd8e45003c4 100644 --- a/src/json.c +++ b/src/json.c @@ -535,7 +535,7 @@ DEFUN ("json-insert", Fjson_insert, Sjson_insert, 1, MA= NY, } =20 =20 -#define JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE 64 +#define JSON_PARSER_INITIAL_VECTOR_SIZE 64 #define JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE 512 =20 struct json_parser @@ -565,22 +565,8 @@ #define JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE 512 =20 size_t additional_bytes_count; =20 - /* Lisp_Objects are collected in this area during object/array - parsing. To avoid allocations, initially - internal_object_workspace is used. If it runs out of space then - we switch to allocated space. Important note: with this design, - GC must not run during JSON parsing, otherwise Lisp_Objects in - the workspace may get incorrectly collected. */ - Lisp_Object internal_object_workspace - [JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE]; - Lisp_Object *object_workspace; - Lisp_Object object_workspace_vector; - size_t object_workspace_size; - size_t object_workspace_current; - - /* String and number parsing uses this workspace. The idea behind - internal_byte_workspace is the same as the idea behind - internal_object_workspace */ + /* String and number parsing uses this workspace. It starts out on + the stack, but moves to the heap if more space is needed. */ unsigned char internal_byte_workspace[JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE]; unsigned char *byte_workspace; @@ -638,12 +624,6 @@ json_parser_init (struct json_parser *parser, =20 parser->additional_bytes_count =3D 0; =20 - parser->object_workspace =3D parser->internal_object_workspace; - parser->object_workspace_size - =3D JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE; - parser->object_workspace_current =3D 0; - parser->object_workspace_vector =3D Qnil; - parser->byte_workspace =3D parser->internal_byte_workspace; parser->byte_workspace_end =3D (parser->byte_workspace =09=09=09=09+ JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE); @@ -657,50 +637,6 @@ json_parser_done (void *parser) xfree (p->byte_workspace); } =20 -/* Makes sure that the object_workspace has 'size' available space for - Lisp_Objects */ -NO_INLINE static void -json_make_object_workspace_for_slow_path (struct json_parser *parser, -=09=09=09=09=09 size_t size) -{ - if (NILP (parser->object_workspace_vector)) - { - parser->object_workspace_vector =3D -=09Fvector(parser->object_workspace_current, parser->object_workspace); - } - size_t needed_workspace_size - =3D (parser->object_workspace_current + size); - size_t new_workspace_size =3D parser->object_workspace_size; - while (new_workspace_size < needed_workspace_size) - { - if (ckd_mul (&new_workspace_size, new_workspace_size, 2)) -=09{ -=09 json_signal_error (parser, Qjson_out_of_memory); -=09} - } - - Lisp_Object new_workspace_vector =3D - larger_vector (parser->object_workspace_vector, -=09=09 new_workspace_size - parser->object_workspace_size, -1); - - Lisp_Object *new_workspace_ptr =3D XVECTOR (new_workspace_vector)->conte= nts; - - parser->object_workspace_vector =3D new_workspace_vector; - parser->object_workspace =3D new_workspace_ptr; - parser->object_workspace_size =3D new_workspace_size; -} - -INLINE void -json_make_object_workspace_for (struct json_parser *parser, -=09=09=09=09size_t size) -{ - if (parser->object_workspace_size - parser->object_workspace_current - < size) - { - json_make_object_workspace_for_slow_path (parser, size); - } -} - static void json_byte_workspace_reset (struct json_parser *parser) { @@ -1258,10 +1194,18 @@ json_parse_number (struct json_parser *parser, int = c) json_parse_array (struct json_parser *parser) { int c =3D json_skip_whitespace (parser); - - const size_t first =3D parser->object_workspace_current; Lisp_Object result =3D Qnil; + ptrdiff_t index =3D 0; =20 + switch (parser->conf.array_type) + { + case json_array_array: + result =3D make_vector (0, Qnil); + break; + + default: + break; + } if (c !=3D ']') { parser->available_depth--; @@ -1277,10 +1221,15 @@ json_parse_array (struct json_parser *parser) =09 switch (parser->conf.array_type) =09 { =09 case json_array_array: -=09 json_make_object_workspace_for (parser, 1); -=09 parser->object_workspace[parser->object_workspace_current] -=09=09=3D element; -=09 parser->object_workspace_current++; +=09 if (index =3D=3D ASIZE (result)) +=09=09{ +=09=09 ptrdiff_t incr =3D ASIZE (result); +=09=09 if (incr =3D=3D 0) +=09=09 incr =3D JSON_PARSER_INITIAL_VECTOR_SIZE; +=09=09 result =3D larger_vector (result, incr, -1); +=09=09} +=09 ASET (result, index, element); +=09 index++; =09 break; =09 case json_array_list: =09 { @@ -1310,18 +1259,8 @@ json_parse_array (struct json_parser *parser) switch (parser->conf.array_type) { case json_array_array: - { -=09size_t number_of_elements -=09 =3D parser->object_workspace_current - first; -=09result =3D make_vector (number_of_elements, Qnil); -=09for (size_t i =3D 0; i < number_of_elements; i++) -=09 { -=09 rarely_quit (~i); -=09 ASET (result, i, parser->object_workspace[first + i]); -=09 } -=09parser->object_workspace_current =3D first; -=09break; - } + result =3D Fvtruncate (result, make_fixnum (index)); + break; case json_array_list: break; default: @@ -1349,10 +1288,18 @@ json_parse_object_member_value (struct json_parser = *parser) json_parse_object (struct json_parser *parser) { int c =3D json_skip_whitespace (parser); - - const size_t first =3D parser->object_workspace_current; Lisp_Object result =3D Qnil; =20 + switch (parser->conf.object_type) + { + case json_object_hashtable: + result =3D CALLN (Fmake_hash_table, QCtest, Qequal); + break; + + default: + break; + } + if (c !=3D '}') { parser->available_depth--; @@ -1374,11 +1321,7 @@ json_parse_object (struct json_parser *parser) =09 { =09=09Lisp_Object key =3D json_parse_string (parser, false, false); =09=09Lisp_Object value =3D json_parse_object_member_value (parser); -=09=09json_make_object_workspace_for (parser, 2); -=09=09parser->object_workspace[parser->object_workspace_current] =3D key; -=09=09parser->object_workspace_current++; -=09=09parser->object_workspace[parser->object_workspace_current] =3D value= ; -=09=09parser->object_workspace_current++; +=09=09Fputhash (key, value, result); =09=09break; =09 } =09 case json_object_alist: @@ -1426,21 +1369,7 @@ json_parse_object (struct json_parser *parser) { case json_object_hashtable: { -=09EMACS_INT value =3D (parser->object_workspace_current - first) / 2; -=09result =3D make_hash_table (&hashtest_equal, value, Weak_None, false); -=09struct Lisp_Hash_Table *h =3D XHASH_TABLE (result); -=09for (size_t i =3D first; i < parser->object_workspace_current; i +=3D 2= ) -=09 { -=09 hash_hash_t hash; -=09 Lisp_Object key =3D parser->object_workspace[i]; -=09 Lisp_Object value =3D parser->object_workspace[i + 1]; -=09 ptrdiff_t i =3D hash_lookup_get_hash (h, key, &hash); -=09 if (i < 0) -=09 hash_put (h, key, value, hash); -=09 else -=09 set_hash_value_slot (h, i, value); -=09 } -=09parser->object_workspace_current =3D first; +=09result =3D Fhash_table_truncate (result); =09break; } case json_object_alist: --=20 2.47.0