From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Improve error reporting when serializing non-Unicode strings to JSON Date: Sat, 23 Dec 2017 16:58:30 +0000 Message-ID: References: <20171222210031.30811-1-phst@google.com> <83efnllufm.fsf@gnu.org> <83wp1dk18g.fsf@gnu.org> <83shc1jy3j.fsf@gnu.org> <83o9mpjw5n.fsf@gnu.org> <83lghtjt99.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="001a1144d8b254ce1d056104d89b" X-Trace: blaine.gmane.org 1514048218 5632 195.159.176.226 (23 Dec 2017 16:56:58 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 23 Dec 2017 16:56:58 +0000 (UTC) Cc: phst@google.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Dec 23 17:56:54 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eSn6R-0000wY-8p for ged-emacs-devel@m.gmane.org; Sat, 23 Dec 2017 17:56:51 +0100 Original-Received: from localhost ([::1]:53718 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eSn8P-0005QN-Nq for ged-emacs-devel@m.gmane.org; Sat, 23 Dec 2017 11:58:53 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57455) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eSn8G-0005Pc-OM for emacs-devel@gnu.org; Sat, 23 Dec 2017 11:58:45 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eSn8F-0004d7-Pz for emacs-devel@gnu.org; Sat, 23 Dec 2017 11:58:44 -0500 Original-Received: from mail-qk0-x22f.google.com ([2607:f8b0:400d:c09::22f]:36966) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eSn8D-0004YK-Sz; Sat, 23 Dec 2017 11:58:41 -0500 Original-Received: by mail-qk0-x22f.google.com with SMTP id p13so22017916qke.4; Sat, 23 Dec 2017 08:58:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WsnakTnir7BkchXmdvFCrfKkRzUpDQmWkxNhqdNtT2U=; b=a6RqYkWQZzoR+ICjuHXAnkBw32hF2uRp12w4+QuwydY+IuVVxi7e5ThDqBD5TQYDBK IqZeFCmfZ0locVAwItM59dzcXXvEtDYfN9LUoKWppFDXl8wFRkgFgaubO9dxRCCJ0lhV 5HJXaonUG+L1n+LI13WqvE7O0WRLOapk7sYXOnGdwhkhywpjX4R+fBoYLCPIhaorBdm5 uKMT6gQjYckcOTCfAzkj/iTFev2CB3In1Jz2KbMHLfeCc2BN3V2PSEydpQiKmJl2Lm/L WtMuOpuDDab54JFzks1PpYwejAGiUbfLvQcaOsbp/XB5aGFX3/eoJD4YUGlp3WSM/JgG l+8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WsnakTnir7BkchXmdvFCrfKkRzUpDQmWkxNhqdNtT2U=; b=ivij6AGDo484v7rz3ALuFPDwUzlp+p2VK5/hdU6aRbj7fl1rWMTmvnQshLf7VD4rCp HQbxrD7UoKtA5cZXTk4yrkZIkdXXqEs05+CyxLGsdH7D4HVCPqeZr6VXuD9lpWKn34SN aQ3+E3fsCoroSVHNv1dHS4QP5TeGCrc88tAXTQyeRLsCHq8RmhLfmOuK6K340vjSXa3w r5j19JvonyM27LOb+V+/KmfS2ahzJbqY+ZViyK5T08V4X87wy4yOliVW6kiMzTfuLFzz pQc/0qgSQZB5DP1xFLZ+ZJv5MPdBmMLcMYZge+EmwJDQHZhjyAIOajFK0ALunjQGbptT k8HA== X-Gm-Message-State: AKGB3mKITgSy4wDbjoGP+JMEtGJF8V+KnqqkVFAC3r1lE7gZkXxOeVB4 jy/A9GYmCdhQ/QnOEdciK0vF8sCQUP505OitUERWQg== X-Google-Smtp-Source: ACJfBotYiSloi2MMYf+DMOoop0b1TqJb30p1jOBS/EiNvTR/D8i2CFssvXQ9yMy/AdpiMZer8c5OgYw0ImQIVffyVV0= X-Received: by 10.55.33.17 with SMTP id h17mr21918209qkh.143.1514048320751; Sat, 23 Dec 2017 08:58:40 -0800 (PST) In-Reply-To: <83lghtjt99.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c09::22f X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:221390 Archived-At: --001a1144d8b254ce1d056104d89b Content-Type: text/plain; charset="UTF-8" Eli Zaretskii schrieb am Sa., 23. Dez. 2017 um 17:37 Uhr: > > From: Philipp Stephani > > Date: Sat, 23 Dec 2017 16:20:04 +0000 > > Cc: phst@google.com, emacs-devel@gnu.org > > > > static void > > json_check_utf8 (Lisp_Object string) > > { > > eassert (!STRING_MULTIBYTE (string)); > > struct coding_system coding; > > setup_coding_system (Qutf_8_unix, &coding); > > /* We initialize only the fields that check_utf_8 accesses. */ > > coding.src_pos = 0; > > coding.src_pos_byte = 0; > > coding.src_chars = SCHARS (string); > > coding.src_bytes = SBYTES (string); > > coding.src_object = string; > > coding.eol_seen = EOL_SEEN_NONE; > > CHECK_TYPE (check_utf_8 (&coding) >= 0, Qutf_8_string_p, string); > > } > > > > This apparently fails to detect the unibyte string "u\xFFv" as invalid > UTF-8. > > How can that be? 0xFF shouldn't pass any of the > UTF_8_n_OCTET_LEADING_P tests, so it should end up in 'else', where > the function returns -1. Right? > > You also need to initialize the head_ascii member, I think. Maybe its > being uninitialized explains the failure. > > Yes, that's it, thanks. --001a1144d8b254ce1d056104d89b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= Sa., 23. Dez. 2017 um 17:37=C2=A0Uhr:
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 23 Dec 2017 16:20:04 +0000
> Cc: phst@google.c= om, emacs-deve= l@gnu.org
>
> static void
> json_check_utf8 (Lisp_Object string)
> {
>=C2=A0 =C2=A0eassert (!STRING_MULTIBYTE (string));
>=C2=A0 =C2=A0struct coding_system coding;
>=C2=A0 =C2=A0setup_coding_system (Qutf_8_unix, &coding);
>=C2=A0 =C2=A0/* We initialize only the fields that check_utf_8 accesses= .=C2=A0 */
>=C2=A0 =C2=A0coding.src_pos =3D 0;
>=C2=A0 =C2=A0coding.src_pos_byte =3D 0;
>=C2=A0 =C2=A0coding.src_chars =3D SCHARS (string);
>=C2=A0 =C2=A0coding.src_bytes =3D SBYTES (string);
>=C2=A0 =C2=A0coding.src_object =3D string;
>=C2=A0 =C2=A0coding.eol_seen =3D EOL_SEEN_NONE;
>=C2=A0 =C2=A0CHECK_TYPE (check_utf_8 (&coding) >=3D 0, Qutf_8_st= ring_p, string);
> }
>
> This apparently fails to detect the unibyte string "u\xFFv" = as invalid UTF-8.

How can that be?=C2=A0 0xFF shouldn't pass any of the
UTF_8_n_OCTET_LEADING_P tests, so it should end up in 'else', where=
the function returns -1.=C2=A0 Right?

You also need to initialize the head_ascii member, I think.=C2=A0 Maybe its=
being uninitialized explains the failure.


Yes, that's it, thanks.=C2=A0
--001a1144d8b254ce1d056104d89b--