From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Improve error reporting when serializing non-Unicode strings to JSON Date: Sat, 23 Dec 2017 12:33:46 +0000 Message-ID: References: <20171222210031.30811-1-phst@google.com> <83efnllufm.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="94eb2c062a5893de020561012541" X-Trace: blaine.gmane.org 1514032373 1221 195.159.176.226 (23 Dec 2017 12:32:53 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 23 Dec 2017 12:32:53 +0000 (UTC) Cc: phst@google.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Dec 23 13:32:48 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eSiys-0008E8-PZ for ged-emacs-devel@m.gmane.org; Sat, 23 Dec 2017 13:32:47 +0100 Original-Received: from localhost ([::1]:41673 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eSj0r-0007Eb-D7 for ged-emacs-devel@m.gmane.org; Sat, 23 Dec 2017 07:34:49 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52206) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eSj05-0007EV-LN for emacs-devel@gnu.org; Sat, 23 Dec 2017 07:34:02 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eSj04-0007yj-MK for emacs-devel@gnu.org; Sat, 23 Dec 2017 07:34:01 -0500 Original-Received: from mail-qk0-x235.google.com ([2607:f8b0:400d:c09::235]:41708) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eSj01-0007xQ-Mb; Sat, 23 Dec 2017 07:33:57 -0500 Original-Received: by mail-qk0-x235.google.com with SMTP id a8so215921qkb.8; Sat, 23 Dec 2017 04:33:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DHMk/Imyw+ws6mWqlsHHa5afKMSQcV3FWQehkvcXTr4=; b=aWtEIJ9VosJs0XCF3pmfobQszLq0Ffj+L7xrfdRXvHJwKkiIX0THffG+vPSRBHhWPP 77qMIOoTtiukJDTFQm64+HvvRS22RsnVZKPD2IIYl8QaocLm9Ah2I7TtwvaqnRWMEn1o AFLNMwzFJNzv5UugCWyy6AUM1og80+wUAYkCdKVJTKTnpAZcBc74+UfRr2oYFzjd9aF3 opKUmoIc5+ngk4vfDfn6ypdq2eFKAbcvbKBiyc0K67quBaydyO8BZrg21qdAAYI/qkAR NxKeCgkb1a2Otw8Z7MTVCi/t9Uw3NgaaxFwEI/Y3gOg/E5P3Qx0NVvjEpEs+klV4Qd4w Kt2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DHMk/Imyw+ws6mWqlsHHa5afKMSQcV3FWQehkvcXTr4=; b=CQxcQR0UJP42Eo8uZTQdB96bFrStfkCFsE47kyZILNUnuC1JZJwYu0XGCFElk3EYUp YjZWipVmuBfBWz/ec3O7BdAkcggC+Rf5c/J7uEOEXwIeqN5Y+ljoSsftAxgJWIuJNf7a 8BLfW8iXxpmE7aKCRcuHpCl/xbX34kxmt0iu/fvjZst//YfsWdlvmfOI1cuXB7meyxi2 tDNEm4kIJ0YCm29fM9dwBLtVJU9t1qbScAKplrFAXpoLLB0wtlONAaRdJHxBolEhApW5 sLPlRfDvimkPQIh0m9LDFwawyeSMfMe40jMQ+jdVRNXbCTUWjC6HXvGL82QZZjIhfu4/ bnmQ== X-Gm-Message-State: AKGB3mLIo/tUTfbjz2sPfDKcWReXoSvALKPp6DWjYZsA65KFAbM6FBHi wUH/r4SVHZZ8Fz4SHqanPvQr7kllk7sIG/Gc/RYYag== X-Google-Smtp-Source: ACJfBotU1nFjbhFxVM33toOz/hkhnXJgoHpoan4WabZMKkmY/XFMwvZLpMqBHJsCPGGVPIA/0zqCNHKE533d7HSs9Vg= X-Received: by 10.55.122.134 with SMTP id v128mr23387848qkc.354.1514032436860; Sat, 23 Dec 2017 04:33:56 -0800 (PST) In-Reply-To: <83efnllufm.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c09::235 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:221370 Archived-At: --94eb2c062a5893de020561012541 Content-Type: text/plain; charset="UTF-8" Eli Zaretskii schrieb am Sa., 23. Dez. 2017 um 09:28 Uhr: > > From: Philipp Stephani > > Date: Fri, 22 Dec 2017 22:00:31 +0100 > > Cc: Philipp Stephani > > > > * admin/merge-gnulib (GNULIB_MODULES): Add unistr modules. > > > > * lib/Makefile.in (.c.o): Fix output file for files in subdirectories. > > (${DEPDIR}/unistr, unistr/u8-check.o): Create missing deps directory. > > > > * src/json.c (json_check_utf8): New helper function. > > (lisp_to_json_toplevel_1, lisp_to_json): Use it. To save a bit of > > time, check for invalid UTF-8 strings only after encountering an > > error, since Jansson already rejects them. > > > > * test/src/json-tests.el (json-serialize/invalid-unicode): Adapt > > expected error symbol. > > I'm not sure I understand the need for this. Emacs never does that > anywhere else, it just converts the invalid bytes to a special > character set. An application that cares could then test for presence > of those bytes to see if anything like that happened. What am I > missing? > This patch only improves the error symbol when trying to serialize a non-Unicode string to JSON. Right now this raises "out of memory", which is somewhat confusing. This patch improves this to raise "wrong-type-argument". --94eb2c062a5893de020561012541 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= Sa., 23. Dez. 2017 um 09:28=C2=A0Uhr:
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Fri, 22 Dec 2017 22:00:31 +0100
> Cc: Philipp Stephani <phst@google.com>
>
> * admin/merge-gnulib (GNULIB_MODULES): Add unistr modules.
>
> * lib/Makefile.in (.c.o): Fix output file for files in subdirectories.=
> (${DEPDIR}/unistr, unistr/u8-check.o): Create missing deps directory.<= br> >
> * src/json.c (json_check_utf8): New helper function.
> (lisp_to_json_toplevel_1, lisp_to_json): Use it.=C2=A0 To save a bit o= f
> time, check for invalid UTF-8 strings only after encountering an
> error, since Jansson already rejects them.
>
> * test/src/json-tests.el (json-serialize/invalid-unicode): Adapt
> expected error symbol.

I'm not sure I understand the need for this.=C2=A0 Emacs never does tha= t
anywhere else, it just converts the invalid bytes to a special
character set.=C2=A0 An application that cares could then test for presence=
of those bytes to see if anything like that happened.=C2=A0 What am I
missing?

This patch only improves the e= rror symbol when trying to serialize a non-Unicode string to JSON. Right no= w this raises "out of memory", which is somewhat confusing. This = patch improves this to raise "wrong-type-argument".
--94eb2c062a5893de020561012541--