From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Herman=2C_G=C3=A9za?= Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Implement fast verisons of json-parse functions Date: Sat, 30 Mar 2024 19:36:57 +0100 Message-ID: <87wmpjmsie.fsf@gmail.com> References: <87h6h2rsgn.fsf@gmail.com> <867chy3vpm.fsf@gnu.org> <87cyrqrqnb.fsf@gmail.com> <865xxi3tsu.fsf@gnu.org> <874jd2rnwj.fsf@gmail.com> <864jd14lqs.fsf@gnu.org> <87edc1rzig.fsf@gmail.com> <865xx4dv0g.fsf@gnu.org> <871q7snffr.fsf@gmail.com> <86plvbdgcx.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36108"; mail-complaints-to="usenet@ciao.gmane.io" Cc: =?utf-8?Q?G=C3=A9za?= Herman , Mattias =?utf-8?Q?Engdeg=C3=A5rd?= , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Mar 30 20:26:43 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rqeLi-00097j-CJ for ged-emacs-devel@m.gmane-mx.org; Sat, 30 Mar 2024 20:26:42 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rqeKi-0008Aa-AQ; Sat, 30 Mar 2024 15:25:40 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rqeKg-0008AL-Co for emacs-devel@gnu.org; Sat, 30 Mar 2024 15:25:38 -0400 Original-Received: from mail-wm1-x333.google.com ([2a00:1450:4864:20::333]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rqeKd-00048n-UR; Sat, 30 Mar 2024 15:25:37 -0400 Original-Received: by mail-wm1-x333.google.com with SMTP id 5b1f17b1804b1-4149749cc36so19881085e9.0; Sat, 30 Mar 2024 12:25:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711826732; x=1712431532; darn=gnu.org; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:references:from:to:cc:subject:date:message-id :reply-to; bh=lyNZeVQRAp8BIq8X1jI6qbU6z+bPfJzblzfz8BJINn0=; b=HGQ6qr7siD2o29eK27yy04mDVwHCssKjqBDYfTWBsbUX212E4GLzNhexJjpmzGiciD SBJMS9rHtovBk07dFOoqd1k3s59zyP4RmHZWntneukeQneJ9Dv5zs5VjNOXX67y533Sh UxSsjRyW0O+X2gMfnB9jqZN7ypNQ+t5M3D1UVQv7ErRKHwZtD9sfuMSQm+j8YUpQmqDq w0sRN27DcIsI0a9Z72oLjhwC4W9CfD9HGzPD9L61pJvpm103VZysq2mJE4C0vF5aU2QR pKD5cHzlfLrKZNXAx4OnUz/GMHeFbpzL3YKisy28i+PhnYfqFyFtibmw9DVs39+IWMxc UQdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711826732; x=1712431532; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:references:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=lyNZeVQRAp8BIq8X1jI6qbU6z+bPfJzblzfz8BJINn0=; b=eydTAl1HV7/40Wt5Rl9QArxFIUniUkyEH6z8lD+sdqrTdBSLok3LiT0ZqFOSrqcc/z 8tbS+hococfgYsradulfgGdgxcOOldfpGokYhvhDVOUYMlFno+HUIlVBPTddy2tk/3P/ mKhQrE2dqid/B8OjaY5eb9e1Fp1jTH7ULu0ViSE7vJeD+QmA0Pei9TeF5zn0KZvr7XI7 L+sC4BQq+r0gshiHUfyo/jQJy2ygSumF3Ylqa+bV737IoINBzVYhJFtCGhAH2jCem3Se q3LECpAH7I9RCwmneBVD5PZV91urb3oXQ3vTDgfJNqTz1naJHjdIW7k0LhCj/ZERhqZL /eRQ== X-Forwarded-Encrypted: i=1; AJvYcCVOg3yZqxctRnmlb2CQkg9BC1e3vMqyXJn/cB19/vk+D9CPtcaCBRgs3nqWnjF6/oVxW/Kid/fqwDFbh/xEnaLYEZgu X-Gm-Message-State: AOJu0YxhdaL2/D/JurC75W7IQjSyPr38ZOw3loAEDApAWVya/9rydx43 xMW3bDjC10yC0amrtlUSF1r/46afkyLxIUcjgBaPo5GDdecyRV14rFpbgzqT X-Google-Smtp-Source: AGHT+IHgAPlaACXf0lOE04myrejHnJmBlLQSO+DSFZDJkM5XOhrlKRsxxOukhM0nRifTZ/uw99A8qQ== X-Received: by 2002:a05:600c:4587:b0:414:8be4:7f24 with SMTP id r7-20020a05600c458700b004148be47f24mr3457031wmo.13.1711826732089; Sat, 30 Mar 2024 12:25:32 -0700 (PDT) Original-Received: from localhost (netacc-gpn-104-145-196.pool.yettel.hu. [91.104.145.196]) by smtp.gmail.com with ESMTPSA id b10-20020a05600c4e0a00b004155a2f66d6sm2441986wmq.16.2024.03.30.12.25.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 30 Mar 2024 12:25:31 -0700 (PDT) In-reply-to: <86plvbdgcx.fsf@gnu.org> Received-SPF: pass client-ip=2a00:1450:4864:20::333; envelope-from=geza.herman@gmail.com; helo=mail-wm1-x333.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:317399 Archived-At: Eli Zaretskii writes: >> From: Herman, G=C3=A9za >> 3 test failures: >> 1. Handling of utf-8 decode errors: the new parser emits >> json-utf8-decode-error instead of json-parse-error (this is=20 >> what >> the test expects). I can fix this by modifying the test > > OK, but we will need to mention this in NEWS as an incompatible > change. Yes. I'm just mentioning this as an alternative solution:=20 originally the parser emitted json-parse-error for this, it was=20 changed during the review. So if we prefer maintaining=20 compatibility, it's easy to revert this change. >> 2. Handling of a single \0 byte > > Does JSON allow null bytes in its strings? If not, why > wrong-type-argument is not TRT? That's correct, null bytes are not allowed (anywhere, not just in=20 strings). But my point is that the old parser made a special=20 distinction here. It is not just null bytes which is not allowed=20 in JSON, but for example, \x01 isn't allowed either. But, for=20 null bytes, the old parser gives a different error message than=20 for \x01 bytes. But from the JSON spec perspective, both \x00 and=20 \x01 are forbidden in the same way. I don't know why null bytes=20 are handled specially in this regard, so I didn't follow this=20 behavior in my parser. Maybe this special error case was added=20 because libjansson couldn't parse strings with null bytes back=20 then (because the API only accepted zero-terminated strings)? To me, wrong-type-argument means that the input argument to the=20 parser is incorrect. Like it's not a string, but an integer. But=20 here, the parser gets a string, it's just that the string has null=20 bytes in it somewhere. The type of the argument to json-parse-*=20 is fine, it's the value which has the problem. So the parser=20 should give some kind of json-error in my opinion, not=20 wrong-type-argument. But, of, course, if we consider=20 strings-with-null and strings-without-null as two different types,=20 then the wrong-type-argument error makes sense (though I don't=20 know why we'd want to do this). >> 3. Handling objects with duplicate keys. > > I think we should modify the expected results of the test to=20 > match the > new behavior, and leave the order as it is now. OK. > But please also compare with what the Lisp implementation does=20 > in > these cases, as that could give us further ideas or make us > reconsider. I checked json-read, and it seems that it has the exact same=20 behavior that my parser has. I thought that json-read can only=20 produce one format, but it turned out it has json-object-type and=20 json-array-type variables, so it can produce the same variety of=20 output that the C-based parsers can do. I think that the doc of=20 json-read should mention this fact. Anyways, the doc says: (defvar json-object-type 'alist "Type to convert JSON objects to. Must be one of `alist', `plist', or `hash-table'. Consider=20 let-binding this around your call to `json-read' instead of `setq'ing it.=20 Ordering is maintained for `alist' and `plist', but not for `hash-table'.") I played with this a little bit, and it works as described (for=20 hash tables, it keeps the last key-value pair). I think this behavior is important, because this is used when=20 pretty-formatting JSON. Pretty formatting shouldn't remove=20 duplicate entries, nor change the ordering of members. Because=20 the new parser also behaves like this, it can be used to speed up=20 pretty formatting as well (yeah, I know, half of it, as there is=20 no new to-JSON serializer implemented yet).