From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#70007: [PATCH] native JSON encoder Date: Thu, 28 Mar 2024 21:59:38 +0100 Message-ID: <2CF47DA5-A65B-47C4-A28A-6FEE1469BD13@gmail.com> References: <1BF559D1-DB9F-4FEB-90ED-72E0EFD76424@gmail.com> <86wmpphrg7.fsf@gnu.org> <4589243D-C11A-45C1-AF3E-6F4A5BADEB54@gmail.com> <864jcrindg.fsf@gnu.org> <291DD5F1-85B8-4647-A40A-EBBD4C51E253@gmail.com> <8634sbijfx.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28352"; mail-complaints-to="usenet@ciao.gmane.io" Cc: casouri@gmail.com, 70007@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Mar 28 22:00:31 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rpwrN-00076i-7g for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 28 Mar 2024 22:00:29 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rpwr7-00067A-OW; Thu, 28 Mar 2024 17:00:13 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rpwr1-00066n-Us for bug-gnu-emacs@gnu.org; Thu, 28 Mar 2024 17:00:09 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rpwqy-0002mn-EU for bug-gnu-emacs@gnu.org; Thu, 28 Mar 2024 17:00:07 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rpwqz-0005ZF-6Z for bug-gnu-emacs@gnu.org; Thu, 28 Mar 2024 17:00:05 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 28 Mar 2024 21:00:05 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 70007 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 70007-submit@debbugs.gnu.org id=B70007.171165959221280 (code B ref 70007); Thu, 28 Mar 2024 21:00:05 +0000 Original-Received: (at 70007) by debbugs.gnu.org; 28 Mar 2024 20:59:52 +0000 Original-Received: from localhost ([127.0.0.1]:41240 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rpwql-0005XA-Qm for submit@debbugs.gnu.org; Thu, 28 Mar 2024 16:59:52 -0400 Original-Received: from mail-lf1-x12a.google.com ([2a00:1450:4864:20::12a]:58371) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rpwqh-0005Wv-C9 for 70007@debbugs.gnu.org; Thu, 28 Mar 2024 16:59:51 -0400 Original-Received: by mail-lf1-x12a.google.com with SMTP id 2adb3069b0e04-512bde3d197so1029786e87.0 for <70007@debbugs.gnu.org>; Thu, 28 Mar 2024 13:59:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711659580; x=1712264380; darn=debbugs.gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject :date:message-id:reply-to; bh=4Tuv6+7kjKZCcxEEJiwBcy1o0tt9jJ0qwO60l/ePvzc=; b=MK/CcA52KiRVr6otf1ykYjmIZ2EYVCO34hiwg2d0Fq3TBPOpr9Ar14wkOO6GqR5PGP 7pmKnKIos3leRlyO+jd9/QCUvYNzuFIaMFI71jyoGuYzHSVrePk32t7efu5FALYzXN+w PoU80/6Hkv9NmoZT2fLtIxbeWE4VV3mmThpxz+UXBwAMubKnLytXVJY/YmfQoQuC/0QR ZpZX+hV+qjROPR9gpS44KyCamutBC9PtLJ+evDwM8mn1ZsX7CuIpfFSQ2kl7UT56pqmU xzZy12o9viq1rJM007Xly2E9UIHkwPYHBnVG8KfAm9exYCBdgQxp0TRK6MZ53XpQej5R Kl/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711659580; x=1712264380; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=4Tuv6+7kjKZCcxEEJiwBcy1o0tt9jJ0qwO60l/ePvzc=; b=r0BrcHB/dK7Z94Vntgq2vX4SgqDPe2HmmYlQB+S1FUmx6YQAq3g257hVhdTcnJrmyk mggAsRGIF1iYaXR2bGH+P1rm5CoW1YMKkV6Y9Vw8QE9jjVmaSu5+UPBVPihC2jnuMI0G hIyORnuOyev5DA0symdSEXqFtLvYIwHMrFw/7YGxnMIhKUQexJirk7NkXHflE4FNOyrt mzF4SMCn+Pcxm0KOT9l65bgjKX0WsOavuSAzvMASDPIY/3vBZu88KQ3mw7rLaz2GOT25 f5zwgG3pOuBzLsfkQapTN7OuSVHXwKElbClhTmP8QgB1oqUEKCp4NbFIGdx3MrNBTPjI +3eg== X-Forwarded-Encrypted: i=1; AJvYcCXL1toMhB7T6v8NLyvdXOTflZtdI4JqJFRlnEYn8YZqnFOWQPoaaoRJzKmHgjj/zWbYP2QlhazX7elkIM9leRZekB9Nsys= X-Gm-Message-State: AOJu0YxUKp34oPNjnlPwLg3ih2+PYmbUazmxaeBeBDsRgiNCXodK5huU j4O5xSx03EPmsA+viwECY9zGUQtLTX0biIJb9wD1JtyNILluHadV X-Google-Smtp-Source: AGHT+IGRvp+WedSJeTQ7Myaxj4wZCsW+VeH9DOpp9xit6y0lr9eHRpV16Xex7yVwiOoRG8oIjo0Nbw== X-Received: by 2002:ac2:5e62:0:b0:515:ab7f:b13e with SMTP id a2-20020ac25e62000000b00515ab7fb13emr408226lfr.33.1711659579551; Thu, 28 Mar 2024 13:59:39 -0700 (PDT) Original-Received: from smtpclient.apple (c80-217-1-132.bredband.tele2.se. [80.217.1.132]) by smtp.gmail.com with ESMTPSA id z14-20020a056512308e00b00515c9605ea5sm297828lfd.261.2024.03.28.13.59.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Mar 2024 13:59:39 -0700 (PDT) In-Reply-To: <8634sbijfx.fsf@gnu.org> X-Mailer: Apple Mail (2.3654.120.0.1.15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:282257 Archived-At: 27 mars 2024 kl. 20.05 skrev Eli Zaretskii : >>> This rejects unibyte non-ASCII strings, AFAU, in which case I = suggest >>> to think whether we really want that. E.g., why is it wrong to = encode >>> a string to UTF-8, and then send it to JSON? >>=20 >> The way I see it, that would break the JSON abstraction: it = transports strings of Unicode characters, not strings of bytes. >=20 > What's the difference? AFAIU, JSON expects UTF-8 encoded strings, and > whether that is used as a sequence of bytes or a sequence of > characters is in the eyes of the beholder: the bytestream is the same, > only the interpretation changes. Well no -- JSON transports Unicode strings: the JSON serialiser takes a = Unicode string as input and outputs a byte sequence; the JSON parser = takes a byte sequence and returns a Unicode string (assuming we are just = interested in strings). That the transport format uses UTF-8 is unrelated; if the user hands an = encoded byte sequence to us then it seems more likely that it's a = mistake. After all, it cannot have come from a received JSON message. I think it was just an another artefact of the old implementation. That = code incorrectly used encode_string_utf_8 even on non-ASCII unibyte = strings and trusted Jansson to validate the result. That resulted in a = lot of wasted work and some strange strings getting accepted. While it's theoretically possible that there are users with code relying = on this behaviour, I can't find any evidence for it in the packages that = I've looked at. > I didn't suggest to decode the input string, not at all. I suggested > to allow unibyte strings, and process them just like you process > pure-ASCII strings, leaving it to the caller to make sure the string > has only valid UTF-8 sequences. Users of this raw-bytes-input feature (if they exist at all) previously = had their input validated by Jansson. While mistakes would probably be = detected at the other end I'm not sure it's a good idea. > Forcing callers to decode such > strings is IMO too harsh and largely unjustified. We usually force them to do so in most other contexts. To take a random = example, `princ` doesn't work with encoded strings. But it's rarely a = problem. Let's see how testing goes. We'll find a solution no matter what, = pass-through or separate slow-path validation, if it turns out that we = really need to after all.