From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Improve error reporting when serializing non-Unicode strings to JSON Date: Sat, 23 Dec 2017 16:20:04 +0000 Message-ID: References: <20171222210031.30811-1-phst@google.com> <83efnllufm.fsf@gnu.org> <83wp1dk18g.fsf@gnu.org> <83shc1jy3j.fsf@gnu.org> <83o9mpjw5n.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="001a11414e1eeda34f0561044e0b" X-Trace: blaine.gmane.org 1514045919 26194 195.159.176.226 (23 Dec 2017 16:18:39 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 23 Dec 2017 16:18:39 +0000 (UTC) Cc: phst@google.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Dec 23 17:18:34 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eSmVJ-00061h-SU for ged-emacs-devel@m.gmane.org; Sat, 23 Dec 2017 17:18:30 +0100 Original-Received: from localhost ([::1]:50017 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eSmXF-0003lo-CU for ged-emacs-devel@m.gmane.org; Sat, 23 Dec 2017 11:20:29 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58043) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eSmX5-0003ic-F8 for emacs-devel@gnu.org; Sat, 23 Dec 2017 11:20:20 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eSmX4-0007p7-Dn for emacs-devel@gnu.org; Sat, 23 Dec 2017 11:20:19 -0500 Original-Received: from mail-qt0-x22b.google.com ([2607:f8b0:400d:c0d::22b]:36247) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eSmX2-0007mF-9K; Sat, 23 Dec 2017 11:20:16 -0500 Original-Received: by mail-qt0-x22b.google.com with SMTP id a16so39497663qtj.3; Sat, 23 Dec 2017 08:20:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bJXxvUaJlgy0GdkHGWJTy2VieJMRl3XeSm85SRMQWeo=; b=th92V67tT5pg151G1G0u2QyKnZ+aNxwraVPT4tzSSfL0RlUa2SP9R2b9b2tKoXijGz POIYbm7Lqw4lv9pPD4luKMU1zqhocO1Z8cIEcRPBPSYtdxH8MkrL0dhHp82/xUryoUaj uILV/viu6in5P7LBPtrHJq5gATQM/vTr1LzDm2qzWL+Vpm5wS3GJqxvo/+VKsGqmEgrO CQDT9n/Sw24Hskr1hprkB2hraHD7Idc0WiGjqMf+5HXlF7d//oanhVb6fXUviHC58QU1 Tu1JRplyAaLFZyF9wYBk5zGFaDYiLgq5UiD2cGtwY6B3mARwar8mg68i4/A3GOypmRk+ 8xQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bJXxvUaJlgy0GdkHGWJTy2VieJMRl3XeSm85SRMQWeo=; b=fIPjPmkF54nFi6vYLPagD76G4++pUpaCguQ80DzaBzUoA7JRM103AMlRUwforS6gAo /L3POEVPC9f6K4OtEGZ/AsILODQTYsM4vXy/NH1/NefS8pHDHxir2uQ5vohlGxGwk/DI Pss1skZ8J6ZTAD5O6ODS5GJ09+N2cnEBtpXd9dR/OanWupJIUyPy0LM1ZT9lGoEgUbqS +EcY68y9/cTxoQTN8QrowHiRE59y5EdF2UYLSJDIvvXAc/kGzBD/t7slt9z1VU71cSG+ hvUvl3wAHuEbxWfJPyMERJ099GbkFbh8IYqcT+ranZYPCvI/++/qf/UIYuJM/zI29LBN zHhw== X-Gm-Message-State: AKGB3mLF7hgCXDHV9iYZtDQhT7szQUpWiY11S/MLPEk39QttvvkTIvff D2jOpPnyJNZ5EirZqYfPzn0U64iKpqn4i5j3IgRHzw== X-Google-Smtp-Source: ACJfBouGdYzsxhJL1tGB+feTrFY8tCitot/MhwD02qAzMMy0cSoMkl33adHyFRxEgePnNOrMTzbk1aRg0dzEJT/M0Ig= X-Received: by 10.237.60.206 with SMTP id e14mr18079763qtf.157.1514046015512; Sat, 23 Dec 2017 08:20:15 -0800 (PST) In-Reply-To: <83o9mpjw5n.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c0d::22b X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:221388 Archived-At: --001a11414e1eeda34f0561044e0b Content-Type: text/plain; charset="UTF-8" Eli Zaretskii schrieb am Sa., 23. Dez. 2017 um 16:34 Uhr: > > From: Philipp Stephani > > Date: Sat, 23 Dec 2017 15:19:17 +0000 > > Cc: phst@google.com, emacs-devel@gnu.org > > > > So let's use check_utf_8, as its downsides don't sound serious to me, > > > > Well it needs to be rewritten significantly to take a char*, length > argument instead of the coding_system > > struct. > > But creating a coding_system struct requires a single call to > setup_coding_system, which doesn't sound too bad. > Unfortunately it's not that easy. coding_system has many fields that setup_coding_system doesn't initialize. I've tried using check_utf_8, but it doesn't seem to work: static void json_check_utf8 (Lisp_Object string) { eassert (!STRING_MULTIBYTE (string)); struct coding_system coding; setup_coding_system (Qutf_8_unix, &coding); /* We initialize only the fields that check_utf_8 accesses. */ coding.src_pos = 0; coding.src_pos_byte = 0; coding.src_chars = SCHARS (string); coding.src_bytes = SBYTES (string); coding.src_object = string; coding.eol_seen = EOL_SEEN_NONE; CHECK_TYPE (check_utf_8 (&coding) >= 0, Qutf_8_string_p, string); } This apparently fails to detect the unibyte string "u\xFFv" as invalid UTF-8. --001a11414e1eeda34f0561044e0b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= Sa., 23. Dez. 2017 um 16:34=C2=A0Uhr:
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 23 Dec 2017 15:19:17 +0000
> Cc: phst@google.c= om, emacs-deve= l@gnu.org
>
>=C2=A0 So let's use check_utf_8, as its downsides don't sound s= erious to me,
>
> Well it needs to be rewritten significantly to take a char*, length ar= gument instead of the coding_system
> struct.

But creating a coding_system struct requires a single call to
setup_coding_system, which doesn't sound too bad.
=
Unfortunately it's not that easy. coding_system has many= fields that setup_coding_system doesn't initialize.
I've= tried using check_utf_8, but it doesn't seem to work:

static void
json_check_utf8 (Lisp_Object string)
{
=C2=A0 eassert (!STRING_MULTIBYTE (string));
= =C2=A0 struct coding_system coding;
=C2=A0 setup_coding_system (Q= utf_8_unix, &coding);
=C2=A0 /* We initialize only the fields= that check_utf_8 accesses.=C2=A0 */
=C2=A0 coding.src_pos =3D 0;=
=C2=A0 coding.src_pos_byte =3D 0;
=C2=A0 coding.src_ch= ars =3D SCHARS (string);
=C2=A0 coding.src_bytes =3D SBYTES (stri= ng);
=C2=A0 coding.src_object =3D string;
=C2=A0 coding= .eol_seen =3D EOL_SEEN_NONE;
=C2=A0 CHECK_TYPE (check_utf_8 (&= ;coding) >=3D 0, Qutf_8_string_p, string);
}
<= br>
This apparently fails to detect the unibyte string "u\xF= Fv" as invalid UTF-8.
--001a11414e1eeda34f0561044e0b--