From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#70007: [PATCH] native JSON encoder Date: Wed, 27 Mar 2024 19:57:24 +0100 Message-ID: <291DD5F1-85B8-4647-A40A-EBBD4C51E253@gmail.com> References: <1BF559D1-DB9F-4FEB-90ED-72E0EFD76424@gmail.com> <86wmpphrg7.fsf@gnu.org> <4589243D-C11A-45C1-AF3E-6F4A5BADEB54@gmail.com> <864jcrindg.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24624"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Yuan Fu , 70007@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Mar 27 19:58:22 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rpYTd-00069x-9e for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 27 Mar 2024 19:58:21 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rpYTL-00053j-PW; Wed, 27 Mar 2024 14:58:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rpYTK-00053Y-Ng for bug-gnu-emacs@gnu.org; Wed, 27 Mar 2024 14:58:02 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rpYTK-0005Kj-9J for bug-gnu-emacs@gnu.org; Wed, 27 Mar 2024 14:58:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rpYTJ-0002vN-P4 for bug-gnu-emacs@gnu.org; Wed, 27 Mar 2024 14:58:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 27 Mar 2024 18:58:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 70007 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 70007-submit@debbugs.gnu.org id=B70007.171156585911182 (code B ref 70007); Wed, 27 Mar 2024 18:58:01 +0000 Original-Received: (at 70007) by debbugs.gnu.org; 27 Mar 2024 18:57:39 +0000 Original-Received: from localhost ([127.0.0.1]:38360 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rpYSx-0002uI-3k for submit@debbugs.gnu.org; Wed, 27 Mar 2024 14:57:39 -0400 Original-Received: from mail-lf1-x130.google.com ([2a00:1450:4864:20::130]:46144) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rpYSt-0002tc-Fq for 70007@debbugs.gnu.org; Wed, 27 Mar 2024 14:57:37 -0400 Original-Received: by mail-lf1-x130.google.com with SMTP id 2adb3069b0e04-512e39226efso76026e87.0 for <70007@debbugs.gnu.org>; Wed, 27 Mar 2024 11:57:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711565847; x=1712170647; darn=debbugs.gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject :date:message-id:reply-to; bh=yn6up81U/LxJJM14IRSJXShzBsuckqKmsBet6jsHqsQ=; b=Au3IAkKex5/qOF3NvsWbeZjrFUL33JQoUCYztCfQepG1ncLwgcMX5AfKUCw4+gq2Dx t/3jbxFi14yh5E1CvFEd7ooh51VZRbd0e6mTjqGkSsNkFE+w+XEvisazUAdWwsPS+Vqm TkC1vKhotyobQv9UcomB3KpuFNP1lH/diRll4pGhchR8FfK+zlciB9la4LkasSy/U5O6 o0eDI8FDvLHboviz0BShJhuSqHXtbGWVJRliHOP0F7ZSTQNmC0T7UmWg7FzSKvscRVyl 0GZTom3o0r6aeDwrrdV8GPOw/gvdqxykZs3yAyeKNaAiGlo6WoZkUSxpPJ7C7dtr13qj aKvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711565847; x=1712170647; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=yn6up81U/LxJJM14IRSJXShzBsuckqKmsBet6jsHqsQ=; b=hunh9J9xPG3BMKg4l90LufP7KJaB8NOz1pK5X15vR6iBA+OpESzKQlS2NuJ2cSUdFv 6csPdvkoYJ9CPJBENBKiob13iZGnFodboLUCJKav2XcyHGafG+l/rE5Li65P3jV0nIFq BtFckGwuAqw2VXuy/Mycgi1czYUinSYV6QGkj6uDkwsYTKHmFz2NCG9B9voK3hxd6RNQ adbrODpsXE9aS1mUqBEF6rfBzX98aLhlp7OpI9WQJaakxxm6uxMCKTfzblSfaSGRChBf xSTH497y5EIs9wgMPux5U/XONUsxqsOTBiiPx8kVrdxLHsKpD8dqJPzX1QMwJ6MSBpfl i5HA== X-Forwarded-Encrypted: i=1; AJvYcCUQszLLiKM2v+KEhFsY6jBtp3VrxwDRh7Qj9LR9O8MW1eBNYhw7rW6UXa8vMk/FJ3uh0R5phNGkBTdJnPgFTgJHQLeY85Y= X-Gm-Message-State: AOJu0Yz+Ys9Zm+F5V8vTKDs20eWEjN3t5IbdJAG5vJzDGtIIOplHIbWE hRNUMf6x+o8VJ/nsy7yt17D0827TI/LzWeuPmUm+I8EVfku1XHYO X-Google-Smtp-Source: AGHT+IFthrONboqDSxnd80YkfjlW2VR+WYFiREKmjTRqmZEuDPDnhOTxjnkk+6+CnsZJmifo8eUfvQ== X-Received: by 2002:a05:6512:289:b0:515:8159:788d with SMTP id j9-20020a056512028900b005158159788dmr214313lfp.64.1711565846982; Wed, 27 Mar 2024 11:57:26 -0700 (PDT) Original-Received: from smtpclient.apple (c80-217-1-132.bredband.tele2.se. [80.217.1.132]) by smtp.gmail.com with ESMTPSA id c42-20020a05651223aa00b00515ac42651dsm1287909lfv.257.2024.03.27.11.57.25 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Mar 2024 11:57:26 -0700 (PDT) In-Reply-To: <864jcrindg.fsf@gnu.org> X-Mailer: Apple Mail (2.3654.120.0.1.15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:282150 Archived-At: Eli, thank you for your comments! 27 mars 2024 kl. 18.40 skrev Eli Zaretskii : >> +/* JSON encoding context */ >=20 > This is not our comment style. I'll go through the code and clean up all comments. >> +typedef struct { >> + char *buf; >> + ptrdiff_t size; /* number of bytes in buf */ >> + ptrdiff_t capacity; /* allocated size of buf */ >> + ptrdiff_t chars_delta; /* size - {number of Unicode chars = in buf} */ >=20 > When you say "Unicode chars", what do you mean? characters or bytes? > If characters, then why do you need to qualify them with "Unicode"? Characters. Will clarify. >> + Lisp_Object entries[]; > ^^ > Is this portable enough? Something I'd like to know, too. We rely on C99 in many other aspects. = Are there still compilers that are important to us but don't get this = right? 10 years ago this was apparently an issue for IBM XL C 12.1, but modern = versions are based on Clang. We could take our chances here; obviously = we'll change it if someone complains but it seems unlikely. What do you = think? > I'd add an assertion here that BITS is not large enough to produce = zero. I'll deal with that in some way or another. > This rejects unibyte non-ASCII strings, AFAU, in which case I suggest > to think whether we really want that. E.g., why is it wrong to encode > a string to UTF-8, and then send it to JSON? The way I see it, that would break the JSON abstraction: it transports = strings of Unicode characters, not strings of bytes. A user who for some = reason has a string of bytes that encode Unicode characters can just = decode it in order to prove it to us. It's not the JSON encoder's job to = decode the user's strings. (It would also be a pain to deal with and risks slowing down the string = serialiser even if it's a case that never happens.) > Is JSON unable to handle Inf and NaN? That's right. >> + /* FIXME: should this be a unibyte or multibyte string? >> + Right now we make a multibyte string for test compatibility, >> + but we are really encoding so unibyte would make more sense. = */ >=20 > I indeed think this should be a unibyte string, because otherwise > writing it to a file or a process will/might encode it, which would be > wrong. I would prefer that, too, but used multibyte for compatibility with the = old code and so that its tests pass. It should probably be a separate change if we decide that unibyte is = better here. >> + json_out_t jo =3D { >> + .maxdepth =3D 25, >=20 > Is this arbitrary, or is it what JSON expects? If arbitrary, should > it be customizable? should it be documented? It's semi-arbitrary but reasonable: the JSON_checker at json.org uses a = maximum depth of 20 by default, and many implementations use its test = suite. RFC-8259 states that the maximum depth is = implementation-dependent. It's hardly worth making this into a parameter for the user to adjust = but I'll clarify the code. >> + /* FIXME: Do we really need to do all this work below to insert a = string? >> + Is there no function already written? I must be missing = something. */ >=20 > There is no function. All the insert_from_* functions in insdel.c do > something similar. Thank you for confirming that. Looks like we could use some abstraction = then.