From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.devel Subject: Re: I created a faster JSON parser Date: Tue, 12 Mar 2024 10:26:36 +0100 Message-ID: <437D901F-CEC6-45E0-8ABE-B036A7B0AAF5@gmail.com> References: <87a5n96mb5.fsf@gmail.com> <20240309203725.x456m7c6soxtgj6q@nullprogram.com> <86jzmawqbm.fsf@gnu.org> <87ttldydf2.fsf@posteo.net> <867ci8vqvl.fsf@gnu.org> <5396AC95-1D8F-4A89-B4A8-647B717A1E3C@gmail.com> <87r0ggdcki.fsf@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16233"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , Philip Kaludercic , wellons@nullprogram.com, emacs-devel@gnu.org To: =?utf-8?B?Ikhlcm1hbiwgR8OpemEi?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Mar 12 10:27:06 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rjyPa-00040J-DU for ged-emacs-devel@m.gmane-mx.org; Tue, 12 Mar 2024 10:27:06 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rjyPJ-0003Uv-1B; Tue, 12 Mar 2024 05:26:49 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rjyPG-0003Qh-NJ for emacs-devel@gnu.org; Tue, 12 Mar 2024 05:26:46 -0400 Original-Received: from mail-lj1-x22a.google.com ([2a00:1450:4864:20::22a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rjyPB-0003Qx-Me; Tue, 12 Mar 2024 05:26:43 -0400 Original-Received: by mail-lj1-x22a.google.com with SMTP id 38308e7fff4ca-2d29111272eso86762481fa.0; Tue, 12 Mar 2024 02:26:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710235598; x=1710840398; darn=gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject :date:message-id:reply-to; bh=ZWxe+SpKz1i08WJxUjFmRjyRzcgHcWJn3Kdf8CkXTbM=; b=TzrEnjCjTk5nwcDOLZLskfOkYAzYNihdEWFE+COs7EZU2Qs2bedn9vhcIUG5JrEsef eCVAi0W2q4YHYfi6S0ThtINVHtvFA70E5AyQOmqDsZYaWO1cCckFRpS7wijiZbFCrQ13 warPaKiVOmqixcv/Ia0sPfnvgED2+p7azdKjjWnzW76boqfAjINUJ0nvzvKaz6Aey0Zo XhqI3wy4zKFHMJiDU+t+Cb1vEnRwG3SeXiUGclnHs3ctV/hl461fDh+k6UvgZGwV0pkn f24gerWUxuapbZwU9sTUyEPMG4LLbVKkt14BCreG2X8oJD5iH2aytiFjy54BRcKhWqq5 VgsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710235598; x=1710840398; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ZWxe+SpKz1i08WJxUjFmRjyRzcgHcWJn3Kdf8CkXTbM=; b=OQwck3DfpYzKPBoes8AAZAjk36tF90ZEcAFTwVtxriCmLAR5C+DnRP6shoCiwYqq6V YyRWZvSmKdJFOIBn9jtBhgvWl75vZXlbs++xF9IeBDxgeKTFCbs6NBDOmcy9uecC7NQ1 VTQUDfDMO1nUEi/PQ3aI7MSmRs0E8KeddV3OjWkDUiL71CJJjX24WUhTp7C3+2aOOWmH rTdOfVLo5ud5KuLiZ/6AXxBQBJnhWTWzom3d4mff7IOoZn2jciHIfft4s6OExp88AXKz jBDsPYGsx6RZI1vDQD/jEkBNQ6y7VzGi0MmFxeBQM74TeU54YDDDxcD7U83JkNjWALFx afEQ== X-Forwarded-Encrypted: i=1; AJvYcCXo+RQK0hkhu9AEHrszuUexZGydI+S8LeR03QXApetPMbZLrfqYAZZXx1fXmbDVo3MSMV3kR6Iu4oJLfNx73T8G41dG X-Gm-Message-State: AOJu0YxpeegR/2vhQsjRbgr3PIsYBbIIq5vA+/q/HHhKsNiESxWdDFHb n+RM3dZKwHINFGUrGijAspD6/4KXKBoYx+H+9rxTPGhMELsJG7cy X-Google-Smtp-Source: AGHT+IEGPIj26hkbk+xMgbdUGv2jNs/tgywiDDYlWpTxaITlaqdhTd5w2G84DCxcTZ5LdQfIlJ7qCQ== X-Received: by 2002:a2e:a70d:0:b0:2d4:2b9a:e853 with SMTP id s13-20020a2ea70d000000b002d42b9ae853mr825012lje.8.1710235597590; Tue, 12 Mar 2024 02:26:37 -0700 (PDT) Original-Received: from smtpclient.apple (c80-217-1-132.bredband.tele2.se. [80.217.1.132]) by smtp.gmail.com with ESMTPSA id q10-20020a2e84ca000000b002d29e1845c9sm1546494ljh.58.2024.03.12.02.26.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Mar 2024 02:26:37 -0700 (PDT) In-Reply-To: <87r0ggdcki.fsf@gmail.com> X-Mailer: Apple Mail (2.3654.120.0.1.15) Received-SPF: pass client-ip=2a00:1450:4864:20::22a; envelope-from=mattias.engdegard@gmail.com; helo=mail-lj1-x22a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:317010 Archived-At: 11 mars 2024 kl. 15.35 skrev Herman, G=C3=A9za : > According to https://github.com/miloyip/nativejson-benchmark, = RapidJSON is at least 10x faster than jansson. I'm just saying this = because Emacs doesn't have to stick with my parser, there are possible = alternatives, which have JSON serializers as well. Thanks for the benchmark page reference. Yes, if this turns out to = matter more we may consider a faster library. Right now I think your = efforts are good enough (at least if we finish the job with a JSON = serialiser). > Yep, the formatting of that table got destroyed when I reformatted the = code into GNU style. Now I formatted the table back, and added comments = for each row/col. Here's the latest version: = https://github.com/geza-herman/emacs/commit/4b5895636c1ec06e630baf47881b24= 6c198af056.patch Much better, thank you. >> * Do you really need to maintain line and column during the parse? If >> you want them for error reporting, you can materialise them from the >> offset that you already have. >=20 > Yeah, I thought of that, but it turned out that maintaining the = line/column doesn't have an impact on performance. That's just because your code isn't fast enough! We are very = disappointed. Very. > I added that easily, tough admittedly it's a little bit awkward to = maintain these variables. If emacs has a way to tell from the = byte-pointer the line/col position (both for strings and buffers), I am = happy to use that instead. Since error handling isn't performance-critical it doesn't matter if = it's a bit slow. (I'd just count newlines.) >> * Are you sure that GC can't run during parsing or that all your Lisp >> objects are reachable directly from the stack? (It's the >> `object_workspace` in particular that's worrying me a bit.) >=20 > That's a very good question. I suppose that object_workspace is = invisible to the Lisp VM, as it is just a malloc'd object. But I've = never seen a problem because of this. What triggers the GC? Is it = possible that for the duration of the whole parsing, GC is never get = triggered? Otherwise it should have GCd the objects in = object_workspace, causing problems (I tried this parser in a loop, where = GC is caused hundreds of times. In the loop, I compared the result to = json-read, everything was fine). You can't test that code is GC-safe, you have to show that it's correct = by design. Looking at the code it is quite possible that GC cannot take place. But = it can signal errors, and getting into the debugger should open GC = windows unless I'm mistaken. There are some options. `record_unwind_protect_ptr_mark` would be one, = and it was made for code like this, but Gerd has been grumbling about it = lately. Perhaps it's easier just to disable GC in the dynamic scope = (inhibit_garbage_collection).