From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Helmut Eller Newsgroups: gmane.emacs.bugs Subject: bug#24784: 26.0.50; JSON strings with utf-16 escape codes Date: Wed, 26 Oct 2016 18:39:57 +0200 Message-ID: References: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1477503159 1448 195.159.176.226 (26 Oct 2016 17:32:39 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 26 Oct 2016 17:32:39 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) Cc: Philipp Stephani , 24784@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Oct 26 19:32:34 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bzS3t-0007LF-3a for geb-bug-gnu-emacs@m.gmane.org; Wed, 26 Oct 2016 19:32:25 +0200 Original-Received: from localhost ([::1]:36326 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bzS3v-0007um-8u for geb-bug-gnu-emacs@m.gmane.org; Wed, 26 Oct 2016 13:32:27 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43988) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bzRGE-0006Wg-O1 for bug-gnu-emacs@gnu.org; Wed, 26 Oct 2016 12:41:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bzRGB-0006JK-9r for bug-gnu-emacs@gnu.org; Wed, 26 Oct 2016 12:41:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:42366) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1bzRGB-0006JF-6F for bug-gnu-emacs@gnu.org; Wed, 26 Oct 2016 12:41:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bzRGB-0000T9-0A for bug-gnu-emacs@gnu.org; Wed, 26 Oct 2016 12:41:03 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Helmut Eller Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 26 Oct 2016 16:41:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24784 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24784-submit@debbugs.gnu.org id=B24784.14775000191718 (code B ref 24784); Wed, 26 Oct 2016 16:41:02 +0000 Original-Received: (at 24784) by debbugs.gnu.org; 26 Oct 2016 16:40:19 +0000 Original-Received: from localhost ([127.0.0.1]:57761 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bzRFP-0000RZ-OU for submit@debbugs.gnu.org; Wed, 26 Oct 2016 12:40:19 -0400 Original-Received: from mail-lf0-f46.google.com ([209.85.215.46]:36599) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bzRFK-0000RH-TC for 24784@debbugs.gnu.org; Wed, 26 Oct 2016 12:40:14 -0400 Original-Received: by mail-lf0-f46.google.com with SMTP id b75so9935058lfg.3 for <24784@debbugs.gnu.org>; Wed, 26 Oct 2016 09:40:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=83mIEAWr2hQx7rjXHN0JMT8h7LC0CFKXl73BR/6jCIU=; b=MMrH5o8k3jeine0FhE7Y7ctbS7tBVLE7zpYdoKHnIKlthXIzbKWIPNbuJtvFVwhGC4 KM/avNkdTjMVjaisSMUPSdCSFXjmIhfLdMYL45QU+JtkGbgDocJolSZrbPGDY8CtoC0I KIqb7ZtkZTR/yDXu+eX+qtH/ZGiJq06detKX3tCX55Xy4UdnK00yrtfyD656Ua6Oy9GW Fj+HCPlb8BcNrnZCTZqmct85kBwE587jpRb/5CC9XWo1cNq9HYIpzHzUE8HdFeCKGfQb yXGpRkdU2cnLs+HLSRvtH16BB3/XZRMhZRd2IHjN0wAhvc46Cs9VPf+WGLU2fNg7ZIQh e4kQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=83mIEAWr2hQx7rjXHN0JMT8h7LC0CFKXl73BR/6jCIU=; b=SesJnmoekhGNzjQxenQB7WwVvjiiWS9CXZ8EsyxlLpS7UecH0t4szrJbZnG2laMebg 8pxb998HIvFb4x1SfjuVBhSVL5YhlLjbT31SLe/XBWPrGkil/jdAevktuBzYqukMRqcO XFpq510okQdWZHb7k6biuirveMF6c41MfiWrAY/GwmgDhKuHxwHlbBpF2KEhOf4FlzlV nMENp6MF8cXsnPzIOF4hC1HXUDK8tfTm4Rl9BGllIp5/nLQx1EEYEUJ0DokUfLjKJB6F G5Rq1iBjrCImF2v9r+5YyrXNlVlFlFze8OqFd7EwpbTKiRsRekB3fimRNfqFChWuZg34 KUAA== X-Gm-Message-State: ABUngveORMXWQwydSnAhKvto6FCOAUMZvYSPNYyqM6xbfv4VGOsd3nbk5N3QC7+qSCoxSg== X-Received: by 10.194.28.5 with SMTP id x5mr3355541wjg.63.1477500004427; Wed, 26 Oct 2016 09:40:04 -0700 (PDT) Original-Received: from caladan (dial-184214.pool.broadband44.net. [212.46.184.214]) by smtp.gmail.com with ESMTPSA id 71sm10477251wmo.7.2016.10.26.09.40.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Oct 2016 09:40:02 -0700 (PDT) In-Reply-To: <63b3b672-f91c-f1b9-46e5-8f6dd8636714@yandex.ru> (Dmitry Gutov's message of "Tue, 25 Oct 2016 02:19:18 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:125037 Archived-At: On Tue, Oct 25 2016, Dmitry Gutov wrote: > On 24.10.2016 22:57, Philipp Stephani wrote: > >> +(defsubst json--decode-utf-16-surrogates (high low) > > IIRC, there might be no actual benefit from making it a defsubst. If > someone could benchmark it, I'd like to see the result. I guess it doesn't hurt but I also doubt that it makes a measurable difference as utf-16 surrogates are rarely needed. > >> + ;; Special-case UTF-16 surrogate pairs, >> + ;; cf. https://tools.ietf.org/html/rfc7159#section-7 >> + ((looking-at >> + (rx (group (any "Dd") (any "89ABab") (= 2 (any "0-9A-Fa-f"))) >> + "\\u" (group (any "Dd") (any "C-Fc-f") (= 2 (any "0-9A-Fa-f"))))) >> + (json-advance 10) >> + (json--decode-utf-16-surrogates >> + (string-to-number (match-string 1) 16) >> + (string-to-number (match-string 2) 16))) > > Shouldn't this go below the UTF-8 case, as the less-frequent one? There's also an opportunity to detect unpaired surrogates, e.g.: (defun json-read-escaped-char () "Read the JSON string escaped character at point." ;; Skip over the '\' (json-advance) (let* ((char (json-pop)) (special (assq char json-special-chars))) (cond (special (cdr special)) ((not (eq char ?u)) char) ((looking-at "[0-9A-Fa-f]\\{4\\}") (let* ((code (string-to-number (match-string 0) 16))) (json-advance 4) (cond ((<= #xD800 code #xDBFF) ; UTF-16 high surrogate (cond ((looking-at "\\\\u\\([Dd][C-Fc-f][0-9A-Fa-f]\\{2\\}\\)") (let ((low (string-to-number (match-string 1) 16))) (json-advance 6) (json--decode-utf-16-surrogates code low))) (t ;; Expected low surrogate missing (signal 'json-string-escape (list (point)))))) ((<= #xDC00 code #xDFFF) ;; Unexpected low surrogate (signal 'json-string-escape (list (point)))) (t code)))) (t (signal 'json-string-escape (list (point)))))))