From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Herman=2C_G=C3=A9za?= Newsgroups: gmane.emacs.devel Subject: Re: I created a faster JSON parser Date: Fri, 08 Mar 2024 13:34:04 +0100 Message-ID: <87cys499v3.fsf@gmail.com> References: <87a5n96mb5.fsf@gmail.com> <8734t154sr.fsf@posteo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="6792"; mail-complaints-to="usenet@ciao.gmane.io" Cc: =?utf-8?Q?Herman=2C_G=C3=A9za?= , "emacs-devel@gnu.org" To: Philip Kaludercic Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Mar 08 13:39:50 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1riZVt-0001aE-BY for ged-emacs-devel@m.gmane-mx.org; Fri, 08 Mar 2024 13:39:49 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1riZV4-0005Mn-QA; Fri, 08 Mar 2024 07:38:58 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1riZUu-0005Ko-H0 for emacs-devel@gnu.org; Fri, 08 Mar 2024 07:38:48 -0500 Original-Received: from mail-wm1-x32c.google.com ([2a00:1450:4864:20::32c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1riZUq-0005ak-G6 for emacs-devel@gnu.org; Fri, 08 Mar 2024 07:38:47 -0500 Original-Received: by mail-wm1-x32c.google.com with SMTP id 5b1f17b1804b1-412f1961101so17695225e9.0 for ; Fri, 08 Mar 2024 04:38:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709901523; x=1710506323; darn=gnu.org; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:references:from:to:cc:subject:date:message-id :reply-to; bh=V8WxiViKc3ArEmoVAYTBhEz0Aow5xGXR5AuOBNZWjTA=; b=UfqYqMsrYUE+HnjOsk7BG4oeH7rIuRTqGAu0n2SU3b/0JqN886cFNBSdQs4DL2cLfM 5HjF0zLWfiA0Uus5axqSy8yAe4gHqiAKeE6B8/OrPRtKzdUPqCzCb9tEeZqCDmgce1bZ T4lOJux5CBeJpnOBnt7pGBWvR/dUrmd4O9FapDOubjkJiEK59J0WeOtU8Ewykmz2wCW3 K1oPoBulP7EUwWJIeuTvVrh4VhvVGS1HeyokIbDSPZhwIwhjpsg1EkBufJ5Khgn0A5Q3 bD8ixeoIewxjny2yK9PM07bwZUTMRCuHq3fgvtbxXvkYdEBQNNe7vVMEzajSDK6+YwcS i1MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709901523; x=1710506323; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:references:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=V8WxiViKc3ArEmoVAYTBhEz0Aow5xGXR5AuOBNZWjTA=; b=hIOxNrv0MPO/apxsKgRKmQ8fyFQcUpfnOic0zv4pFSYgcxyFXPzt2W1OPteDsGXlEp l3HyQbFWmMlny9W2B2CDAwSsxXTFyjVg6o3Loc7/Ohh9sNnPomyFHV69o2BLY2ECRzT7 ma3fQjA+8CtmSDv+URdQRPcwI1NXuh3bEtxWegRDOC/XGgAOCxdfaFvifTnjeWLFBbJ3 5FceMwkxzhpBT4RdSj0gtyZf0lNGMDRmiUj71AqW0lMb0fCeZ4uU8M1d4AuB3QFUYo5B 4PWtkXIe90R1yFC4U+xO8Ce/0e0kY87QfgB87SBj6KWTOGQWOKKK+1K9+hMdJYl1z5NM tYhQ== X-Forwarded-Encrypted: i=1; AJvYcCWRhIxmVJyjvOW9togIeVGRWy/O7am2294LmJ4tB8tHa6IhbPI2S3uCqtnF8n4FKL9Mh12WlRBRiOKKP3b/bI/6Lr0+ X-Gm-Message-State: AOJu0YzfscibYuy8rP6ylChXyKkC0saa2P7L6YoHg/b333qenB0TV844 dk7NJ5/PvjjbEN63xLzoB8Kv8vGHgy9LO+sKouPXblPqCaOmf/2tIprfFLsJ X-Google-Smtp-Source: AGHT+IHxxdlx7+2V3LpaZDL6wON3+j980DUmY0nXSH3EtCEPJNI/7Px6YDkxJGGKXwSqy8wWziIf4Q== X-Received: by 2002:a5d:4848:0:b0:33e:767a:c39f with SMTP id n8-20020a5d4848000000b0033e767ac39fmr1165190wrs.15.1709901522503; Fri, 08 Mar 2024 04:38:42 -0800 (PST) Original-Received: from localhost (netacc-gpn-4-80-29.pool.yettel.hu. [84.224.80.29]) by smtp.gmail.com with ESMTPSA id f15-20020adfb60f000000b0033dc931eb06sm23140980wre.0.2024.03.08.04.38.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 04:38:42 -0800 (PST) In-reply-to: <8734t154sr.fsf@posteo.net> Received-SPF: pass client-ip=2a00:1450:4864:20::32c; envelope-from=geza.herman@gmail.com; helo=mail-wm1-x32c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:316901 Archived-At: Philip Kaludercic writes: > "Herman, G=C3=A9za" writes: > >> It replaces json-parse-string and json-parse-buffer=20 >> functions. The >> behavior should be the same as before, with the only exception=20 >> that >> objects with duplicated keys are not detected if :object-type=20 >> is not >> 'hash-table. > > Is that a problem? Not sure. I just mentioned it because it's a behavior change.=20 But I intentionally designed it this way, because it is faster.=20 To me, it makes some sense that if the user specifies 'alist or=20 'plist, then they want to have all the object members, even if the=20 keys are duplicated. I didn't find a clear direction from JSON=20 descriptions of how duplicated keys should be handled. >> This parser runs 8-9x faster than the jansson based parser on=20 >> my >> machine (tested on clangd language server messages). An=20 >> additional >> tiny benefit is that large integers are parsed, instead of=20 >> having an >> "out of range" error. > > That sounds interesting, but I am reminded of this article: > https://seriot.ch/projects/parsing_json.html. There seem to be=20 > plenty > of difficult edge-cases when dealing with JSON input, that=20 > should > probably be tested if Emacs has it's own custom parser built-in. I've now run my parser on the tests in this repo, it passes all of=20 them.