From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#70007: [PATCH] native JSON encoder Date: Wed, 27 Mar 2024 13:46:17 +0100 Message-ID: References: <1BF559D1-DB9F-4FEB-90ED-72E0EFD76424@gmail.com> <86wmpphrg7.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="9563"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 70007@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Mar 27 13:47:30 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rpSgj-0002HK-N1 for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 27 Mar 2024 13:47:29 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rpSgJ-0006Fq-I5; Wed, 27 Mar 2024 08:47:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rpSgI-0006Fh-9V for bug-gnu-emacs@gnu.org; Wed, 27 Mar 2024 08:47:02 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rpSgH-0003vP-SB for bug-gnu-emacs@gnu.org; Wed, 27 Mar 2024 08:47:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rpSgH-0002kD-U6 for bug-gnu-emacs@gnu.org; Wed, 27 Mar 2024 08:47:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 27 Mar 2024 12:47:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 70007 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 70007-submit@debbugs.gnu.org id=B70007.171154358710465 (code B ref 70007); Wed, 27 Mar 2024 12:47:01 +0000 Original-Received: (at 70007) by debbugs.gnu.org; 27 Mar 2024 12:46:27 +0000 Original-Received: from localhost ([127.0.0.1]:36023 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rpSfj-0002ij-Af for submit@debbugs.gnu.org; Wed, 27 Mar 2024 08:46:27 -0400 Original-Received: from mail-lj1-x234.google.com ([2a00:1450:4864:20::234]:44200) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rpSfh-0002ht-5k for 70007@debbugs.gnu.org; Wed, 27 Mar 2024 08:46:25 -0400 Original-Received: by mail-lj1-x234.google.com with SMTP id 38308e7fff4ca-2d24a727f78so75800111fa.0 for <70007@debbugs.gnu.org>; Wed, 27 Mar 2024 05:46:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711543579; x=1712148379; darn=debbugs.gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject :date:message-id:reply-to; bh=DztfssGyopG5qFfZJc3NT1p7c4+ClTaQwe0sm5q2yNU=; b=kp/VANVjBtHjQLfzS9g+WrAypj1+Rtwr7P0nlHzyaj5O9knxMaxsbwb9hW33GhT0PU 3LcDQ1GFLvqSG+vrLl/lSmExdA3MAGDeuBobfv8cJK1pUUDZOOnhztIspNYi2lF5AF+M Ztf6UIghx5EfTy46goCuayS5jwrhuovPf0gMcuyICBIbTVyP0fPRp4nN8twRFri6O11J YRH8omj45bRqJ2oWihMAPI+JRrPz9g2rVW5DSKBx+lePxRw82B1lw6vRZRa1pFDqQEtn Lt+Rjk3csXKNnSOfIbDdV9xZ0pxI2spxz6WEpeFTWSKqTlNRoLKlTALvsapFsLYvvipD lWZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711543579; x=1712148379; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=DztfssGyopG5qFfZJc3NT1p7c4+ClTaQwe0sm5q2yNU=; b=l9SiVN1XttAw3hAlGkE0NNV4U4oyhkvMzGxImDCK+VM88i6XKgc12tKPWpMP3JsDnF Va+Z+8O/TVMtOAxL7vdaiKBzOdkFAhzK6NYyx2hGJ7fsoi9f12LcGa6meoaMEP7DXMra dh57e1eBZRbGAqo6USv07GlrV89Mc5CYBXDCi2w0esTGnPOZyQose0tcw1uEp1UJlv/6 craU/0InrTki2f4YjYV9hL+3XaJGqRUG/nQE0egT2bNoD5Go1hATFNL3PvgxJTnA5ZGo YaI1QIOLhMmR0XeHBIjkZnBpBWmog8tbc6pchMZAsc9jGmPBkC4HCcQfSCfLc95eKuK9 kLBw== X-Gm-Message-State: AOJu0YxVngko4dSHV7dLRKNa33E+QB7zFs2NIo5H+mxF+VM7SLSERojk z3HJ7PcDYMxPEQCY6+SS0XmYmndHoQ9uWiEXaTHWYxHFdN6Pa4Ku X-Google-Smtp-Source: AGHT+IEqZ97Mt78jBdWdCLZoJgJE1qR7BvGLFGefVGKui+88aUzmfOy7CFDaZ6FXgvumv1K8Si9C1w== X-Received: by 2002:a05:651c:1407:b0:2d4:3d86:54e2 with SMTP id u7-20020a05651c140700b002d43d8654e2mr3311688lje.27.1711543578637; Wed, 27 Mar 2024 05:46:18 -0700 (PDT) Original-Received: from smtpclient.apple (c80-217-1-132.bredband.tele2.se. [80.217.1.132]) by smtp.gmail.com with ESMTPSA id a17-20020a2e8611000000b002d6afe0ba04sm2019145lji.74.2024.03.27.05.46.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Mar 2024 05:46:17 -0700 (PDT) In-Reply-To: <86wmpphrg7.fsf@gnu.org> X-Mailer: Apple Mail (2.3654.120.0.1.15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:282136 Archived-At: 26 mars 2024 kl. 17.46 skrev Eli Zaretskii : >> - The old code incorrectly accepted strings with non-Unicode = characters (raw bytes). There is no reason to do this; JSON is UTF-8 = only. >=20 > Would it complicate the code not to reject raw bytes? I'd like to > avoid incompatibilities if it's practical. Also, Emacs traditionally > doesn't reject raw bytes, leaving that to the application or the user. Actually I may have misrepresented the behaviour of the old encoder. It = doesn't accept any raw bytes but only sequences that happen to form = valid UTF-8. It's quite strange, and I don't really think this was ever = intended, just a consequence of the implementation. This means that it accepts an already encoded unibyte UTF-8 string: (json-serialize "\303\251") -> "\"=C3=A9\"" which is doubly odd since it's supposed to be encoding, but it ends up = decoding the characters instead. Even worse, it accepts mixtures of encoded and decoded chars: (json-serialize "=C3=A9\303\251") -> "\"=C3=A9=C3=A9\"" which is just bonkers. So while we could try to replicate this 'interesting' behaviour it would = definitely complicate the code and be of questionable use. The JSON spec is quite clear that it's UTF-8 only. The only useful = deviation that I can think of would be to allow unpaired surrogates = (WTF-8) to pass through for transmission of Windows file names, but that = would be an extension -- the old encoder doesn't permit those.