From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Herman=2C_G=C3=A9za?= Newsgroups: gmane.emacs.devel Subject: Re: I created a faster JSON parser Date: Mon, 11 Mar 2024 15:35:45 +0100 Message-ID: <87r0ggdcki.fsf@gmail.com> References: <87a5n96mb5.fsf@gmail.com> <20240309203725.x456m7c6soxtgj6q@nullprogram.com> <86jzmawqbm.fsf@gnu.org> <87ttldydf2.fsf@posteo.net> <867ci8vqvl.fsf@gnu.org> <5396AC95-1D8F-4A89-B4A8-647B717A1E3C@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25073"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , Philip Kaludercic , wellons@nullprogram.com, geza.herman@gmail.com, emacs-devel@gnu.org To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Mar 11 16:16:46 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rjhOP-0006HB-VE for ged-emacs-devel@m.gmane-mx.org; Mon, 11 Mar 2024 16:16:46 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rjhNZ-0006k5-G4; Mon, 11 Mar 2024 11:15:53 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rjhNV-0006jn-RX for emacs-devel@gnu.org; Mon, 11 Mar 2024 11:15:52 -0400 Original-Received: from mail-wm1-x332.google.com ([2a00:1450:4864:20::332]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rjhNT-0002QV-3G; Mon, 11 Mar 2024 11:15:49 -0400 Original-Received: by mail-wm1-x332.google.com with SMTP id 5b1f17b1804b1-4132953e130so8420615e9.1; Mon, 11 Mar 2024 08:15:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710170145; x=1710774945; darn=gnu.org; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:references:from:to:cc:subject:date:message-id :reply-to; bh=DRJA+2Aw1Bd+tOEkOISUae2R7fmYtTFSt1Wnmk7eIhM=; b=nLFD37E//AF47o9H74qP/91Ys2cMskAbNmvfRI0pf5w8PtIwsAYbzTYjzvNcQ8np0/ tQQZwH+XlIPEwVk4uE2/1QIWUdFkMqzaRK2LzSL0BCrMR+37vmlqjGRgVyt+TVrcSWvJ Ax0g4qAinJX36ahqlbUnl7ioMYZakBWXKD0BVJr3TsIuaUk8wKYw7ho5CHnCJiaFNYlT fzeaoVRbSNbQOQ3sGFmNl6zVu5g4bqV+eTR7bh3iPEm+7ebnHY+4FAermuP5vfijPLAV 5KvsSAqaExF/iuUZ5wBmIHF+6ryx2q0zqF/d4zGY1GsIjUyeKfdTCB8P9Md1r6m9ABiq o9mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710170145; x=1710774945; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:references:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=DRJA+2Aw1Bd+tOEkOISUae2R7fmYtTFSt1Wnmk7eIhM=; b=JHpZrcXc9lyk8YWIYJeXQjssth+l3W7ie+qcAe5TjiRPHKRITmszWA6q/6rsQc2L5D t3IrRRKjgcM88Fs4y1MmumcSKJSyKdiWjF8I+bQ8DMnw4FGYO2aysPIv5GA1XOrr9CYM 7/FTdvYVJkEsVe51B2sHncLCeR8u9CluwdxEYj3agH7kc5GOUWykoaBVZRQKvBLC7vGm wo0ATbOkHppcnVlYj3TzyGNySe8ZMdxT65JUK8jWF9NHZIqgXdo9lmIr789quGKgB9bD 3O5nXEoxsW5++1Hs8Kk94y+qFyrQ26g2WxoIfxXWE9USn1irqzT77sWEmB/FOAvx6Yxm lkmQ== X-Forwarded-Encrypted: i=1; AJvYcCVcDslqR2CqXo5qH+W+xfrE3lrVfrYwZ9/mC0myxXzyL1f/oMu7/gK0ndYdXmWywtssF2g1VWrjVG7XHk/wLWRHlAaN X-Gm-Message-State: AOJu0YwNe7V6msFMzoWgNSDqUgZWcMnzqJgk9FH3jgVXn0YPcqgcN4Js Dp3jMhhIrdm7GW7HVNtF4l/cLHJFcR2hp8ASMS5mJ8zhwPSdYKD4 X-Google-Smtp-Source: AGHT+IHwrtyuAX8hcVL+vWbYZ5hqJWm+9poZKaSutfCExg5QHcl9/qoFQffZ/QxjM7wIKa7JJhv/Ew== X-Received: by 2002:a05:600c:511d:b0:412:e38a:d83e with SMTP id o29-20020a05600c511d00b00412e38ad83emr5616516wms.5.1710170144858; Mon, 11 Mar 2024 08:15:44 -0700 (PDT) Original-Received: from localhost (netacc-gpn-4-233-159.pool.yettel.hu. [84.224.233.159]) by smtp.gmail.com with ESMTPSA id bk5-20020a0560001d8500b0033e699fc6b4sm6759015wrb.69.2024.03.11.08.15.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Mar 2024 08:15:44 -0700 (PDT) In-reply-to: <5396AC95-1D8F-4A89-B4A8-647B717A1E3C@gmail.com> Received-SPF: pass client-ip=2a00:1450:4864:20::332; envelope-from=geza.herman@gmail.com; helo=mail-wm1-x332.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:317001 Archived-At: Mattias Engdeg=C3=A5rd writes: > 11 mars 2024 kl. 14.29 skrev Eli Zaretskii : > >> What you describe are possible fallbacks, but I would prefer=20 >> not to >> use any fallback at all, but instead have a full C=20 >> implementation. > > Yes, I definitely think we should do that. I'm pretty sure that > writing a JSON unparser is a lot easier than doing the parser,=20 > and the > extra speed we stand to gain from not having the intermediate=20 > jansson > step is not without interest. FYI: I checked out a JSON benchmark, and it turned out that=20 jansson is not a fast parser, there are faster libraries. If a=20 library has a SAX interface, that could be a potentially useful=20 library for Emacs. According to=20 https://github.com/miloyip/nativejson-benchmark, RapidJSON is at=20 least 10x faster than jansson. I'm just saying this because Emacs=20 doesn't have to stick with my parser, there are possible=20 alternatives, which have JSON serializers as well. (But note: I am happy to make my parser into a mergeable state,=20 and if eventually it gets merged then fixing its bugs, but I'm not=20 motivated to work on integrating other JSON libraries). > Overall the proposed parser looks fine, nothing terribly wrong=20 > that can't be fixed later on. A few minor points: > > * The `is_single_uninteresting` array is hard to review and=20 > badly > formatted. It appears to be 1 for all printable ASCII plus DEL=20 > except > double-quote and backslash. (Why DEL?) Yep, the formatting of that table got destroyed when I reformatted=20 the code into GNU style. Now I formatted the table back, and=20 added comments for each row/col. Here's the latest version:=20 https://github.com/geza-herman/emacs/commit/4b5895636c1ec06e630baf47881b246= c198af056.patch I'm not sure about DEL: I haven't seen anything which says that=20 it's an invalid character in a string, so the parser currently=20 allows it. > * Do you really need to maintain line and column during the=20 > parse? If > you want them for error reporting, you can materialise them from=20 > the > offset that you already have. Yeah, I thought of that, but it turned out that maintaining the=20 line/column doesn't have an impact on performance. I added that=20 easily, tough admittedly it's a little bit awkward to maintain=20 these variables. If emacs has a way to tell from the byte-pointer=20 the line/col position (both for strings and buffers), I am happy=20 to use that instead. It would be a better solution, because=20 currently the parser always starts from line 1, col 1, which means=20 that if json-parse-buffer is used, these numbers will be local to=20 the current parsing, not actual numbers related to the whole=20 buffer. But as the jansson based parsed behaves the same, I=20 thought it's OK. > * Are you sure that GC can't run during parsing or that all your=20 > Lisp > objects are reachable directly from the stack? (It's the > `object_workspace` in particular that's worrying me a bit.) That's a very good question. I suppose that object_workspace is=20 invisible to the Lisp VM, as it is just a malloc'd object. But=20 I've never seen a problem because of this. What triggers the GC?=20 Is it possible that for the duration of the whole parsing, GC is=20 never get triggered? Otherwise it should have GCd the objects in=20 object_workspace, causing problems (I tried this parser in a loop,=20 where GC is caused hundreds of times. In the loop, I compared the=20 result to json-read, everything was fine).