From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.bugs Subject: bug#24784: 26.0.50; JSON strings with utf-16 escape codes Date: Sat, 31 Dec 2016 16:53:27 +0000 Message-ID: References: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7b86cf3eeacec30544f72823 X-Trace: blaine.gmane.org 1483203263 807 195.159.176.226 (31 Dec 2016 16:54:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 31 Dec 2016 16:54:23 +0000 (UTC) To: Dmitry Gutov , Helmut Eller , 24784@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Dec 31 17:54:17 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNMv7-00076c-1J for geb-bug-gnu-emacs@m.gmane.org; Sat, 31 Dec 2016 17:54:13 +0100 Original-Received: from localhost ([::1]:44761 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cNMv8-0006r0-Lr for geb-bug-gnu-emacs@m.gmane.org; Sat, 31 Dec 2016 11:54:14 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43808) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cNMuz-0006qr-DT for bug-gnu-emacs@gnu.org; Sat, 31 Dec 2016 11:54:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cNMuw-0001rC-7M for bug-gnu-emacs@gnu.org; Sat, 31 Dec 2016 11:54:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:46257) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cNMuw-0001r7-24 for bug-gnu-emacs@gnu.org; Sat, 31 Dec 2016 11:54:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cNMuv-0008FR-KR for bug-gnu-emacs@gnu.org; Sat, 31 Dec 2016 11:54:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Philipp Stephani Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 31 Dec 2016 16:54:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24784 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24784-submit@debbugs.gnu.org id=B24784.148320322531682 (code B ref 24784); Sat, 31 Dec 2016 16:54:01 +0000 Original-Received: (at 24784) by debbugs.gnu.org; 31 Dec 2016 16:53:45 +0000 Original-Received: from localhost ([127.0.0.1]:33423 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNMuf-0008Ew-H6 for submit@debbugs.gnu.org; Sat, 31 Dec 2016 11:53:45 -0500 Original-Received: from mail-wj0-f171.google.com ([209.85.210.171]:34839) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNMud-0008Eg-Eb for 24784@debbugs.gnu.org; Sat, 31 Dec 2016 11:53:44 -0500 Original-Received: by mail-wj0-f171.google.com with SMTP id v7so398007190wjy.2 for <24784@debbugs.gnu.org>; Sat, 31 Dec 2016 08:53:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Ad6jepg04IA5ByolmQPfzyjwRR3VIg2XvO5Kr7ojxk8=; b=LLkp0wlc19OpIM2cVdfwXgVFFvRmlnGkPQp0r/1oq2NxgSD0OuLBQIUCeltq2TuqSM fW66ITc68h9X1h2yGRvCkxXm2giaIGiWElkDmnb+bayC7Gd7GRCMgCy9TR/2J3/Izze8 NcjSh6cV8RueDERbqwFDZ2nlu3ICg9GQcIzFT9CbyNZwYMB1q/19gmWsDs8XkK3eYbze 6rX6nTF4PZI5JIFhzgC942TkIpMYASAv2B+nbOPjNqBQk8JCKvTrH/rj/HCj6GEUH8RD mPHYDoBheUjG0Nltn8710yogFAPJzx16AtV2OfEbCmDBkcA2Qf8rcslll1BpL0Ans5Xe AbVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Ad6jepg04IA5ByolmQPfzyjwRR3VIg2XvO5Kr7ojxk8=; b=WDqSXjAEsHfEJKovon2kXvYLC5zMDsSX2JLJ/JInp5Rceshbgi8MIcoVc01WG0aLej NFBbhWlFkx+SOHU078cCqNblzkQEUqQ9/Cz2C5Ntv43nBsYyvRhJpciMlZg3FS9ERjzW spvbZU/p4pek/fgWL775CKhbbJ5hJF7Pwv/lDzJTzdS3LOji2P2ciqBexIe2VenL2SGO PLqxKx9VjNHgTmg3zW11Wg0Q9Oah+aFrod1qu/JMFqm3LaGQ5F4a04QY/4wx8zjkak4d Fk/xUuS34HG6HkMbAY/qnu3EBBd4qH7iWh+OOv5bsRDC+p9RhU6DyoI3Ki9Bf/UpcQ7P RmGQ== X-Gm-Message-State: AIkVDXJNKwJuIFNBroLGfFKHSVsGIrojcDxZ72fhf9/A8epH3fpUQNUuHCmef/OzvAHeGa5PGIXiEl0IoBUxaA== X-Received: by 10.194.58.7 with SMTP id m7mr32052820wjq.73.1483203217639; Sat, 31 Dec 2016 08:53:37 -0800 (PST) In-Reply-To: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:127634 Archived-At: --047d7b86cf3eeacec30544f72823 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dmitry Gutov schrieb am Di., 25. Okt. 2016 um 01:19 Uhr: > Philipp, > > Thanks. Some comments: > > On 24.10.2016 22:57, Philipp Stephani wrote: > > > +(defsubst json--decode-utf-16-surrogates (high low) > > IIRC, there might be no actual benefit from making it a defsubst. If > someone could benchmark it, I'd like to see the result. > Agreed; converted to defun. I've only used defsubst because some other helper functions also used defsubst. > > > + ;; Special-case UTF-16 surrogate pairs, > > + ;; cf. https://tools.ietf.org/html/rfc7159#section-7 > > + ((looking-at > > + (rx (group (any "Dd") (any "89ABab") (=3D 2 (any "0-9A-Fa-f"))) > > + "\\u" (group (any "Dd") (any "C-Fc-f") (=3D 2 (any > "0-9A-Fa-f"))))) > > + (json-advance 10) > > + (json--decode-utf-16-surrogates > > + (string-to-number (match-string 1) 16) > > + (string-to-number (match-string 2) 16))) > > Shouldn't this go below the UTF-8 case, as the less-frequent one? > No, the below case is more general and therefore has to come last. > > > (ert-deftest test-json-encode-string () > > (should (equal (json-encode-string "foo") "\"foo\"")) > > (should (equal (json-encode-string "a\n\fb") "\"a\\n\\fb\"")) > > - (should (equal (json-encode-string "\nasd=C3=91=E2=80=9E=C3=91=E2=80= =B9=C3=90=C2=B2\u001f\u007ffgh\t") > > - "\"\\nasd=C3=91=E2=80=9E=C3=91=E2=80=B9=C3=90=C2=B2\\= u001f\u007ffgh\\t\""))) > > + (should (equal (json-encode-string "\nasd=C3=91=E2=80=9E=C3=91=E2=80= =B9=C3=90=C2=B2=C3=B0 =E2=80=9E=C5=BE\u001f\u007ffgh\t") > > + "\"\\nasd=C3=91=E2=80=9E=C3=91=E2=80=B9=C3=90=C2=B2= =C3=B0 =E2=80=9E=C5=BE\\u001f\u007ffgh\\t\""))) > > Why are we testing string encoding here? > It's not 100% related to the patch, but I think it can be included for symmetry reasons (testing encoding as well as decoding). --047d7b86cf3eeacec30544f72823 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


Dmitry= Gutov <dgutov@yandex.ru> sch= rieb am Di., 25. Okt. 2016 um 01:19=C2=A0Uhr:
Philipp,

Thanks. Some comments:

On 24.10.2016 22:57, Philipp Stephani wrote:

> +(defsubst json--decode-utf-16-surrogates (high low)

IIRC, there might be no actual benefit from making it a defsubst. If
someone could benchmark it, I'd like to see the result.

Agreed; converted to defun. I'= ve only used defsubst because some other helper functions also used defsubs= t.
=C2=A0

> +=C2=A0 =C2=A0 =C2=A0;; Special-case UTF-16 surrogate pairs,
> +=C2=A0 =C2=A0 =C2=A0;; cf. ht= tps://tools.ietf.org/html/rfc7159#section-7
> +=C2=A0 =C2=A0 =C2=A0((looking-at
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0(rx (group (any "Dd") (any "= ;89ABab") (=3D 2 (any "0-9A-Fa-f")))
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"\\u" (group (any = "Dd") (any "C-Fc-f") (=3D 2 (any "0-9A-Fa-f")= ))))
> +=C2=A0 =C2=A0 =C2=A0 (json-advance 10)
> +=C2=A0 =C2=A0 =C2=A0 (json--decode-utf-16-surrogates
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0(string-to-number (match-string 1) 16)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0(string-to-number (match-string 2) 16)))
Shouldn't this go below the UTF-8 case, as the less-frequent one?

No, the below case is mo= re general and therefore has to come last.
=C2=A0

>=C2=A0 (ert-deftest test-json-encode-string ()
>=C2=A0 =C2=A0 (should (equal (json-encode-string "foo") "= ;\"foo\""))
>=C2=A0 =C2=A0 (should (equal (json-encode-string "a\n\fb") &q= uot;\"a\\n\\fb\""))
> -=C2=A0 (should (equal (json-encode-string "\nasd=C3=91=E2=80=9E= =C3=91=E2=80=B9=C3=90=C2=B2\u001f\u007ffgh\t")
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"\= "\\nasd=C3=91=E2=80=9E=C3=91=E2=80=B9=C3=90=C2=B2\\u001f\u007ffgh\\t\&= quot;")))
> +=C2=A0 (should (equal (json-encode-string "\nasd=C3=91=E2=80=9E= =C3=91=E2=80=B9=C3=90=C2=B2=C3=B0 =E2=80=9E=C5=BE\u001f\u007ffgh\t") > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"\= "\\nasd=C3=91=E2=80=9E=C3=91=E2=80=B9=C3=90=C2=B2=C3=B0 =E2=80=9E=C5= =BE\\u001f\u007ffgh\\t\"")))

Why are we testing string encoding here?

It's not 100% related to the patch, but I think i= t can be included for symmetry reasons (testing encoding as well as decodin= g).=C2=A0
--047d7b86cf3eeacec30544f72823--