From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: JSON/YAML/TOML/etc. parsing performance Date: Tue, 3 Oct 2017 13:52:54 -0700 Organization: UCLA Computer Science Department Message-ID: <43520b71-9e25-926c-d744-78098dad6441@cs.ucla.edu> References: <87poaqhc63.fsf@lifelogs.com> <8360ceh5f1.fsf@gnu.org> <83h8vl5lf9.fsf@gnu.org> <83r2um3fqi.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1507063993 11322 195.159.176.226 (3 Oct 2017 20:53:13 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 3 Oct 2017 20:53:13 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 Cc: emacs-devel@gnu.org To: Philipp Stephani , Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Oct 03 22:53:07 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dzUBe-00022L-Lu for ged-emacs-devel@m.gmane.org; Tue, 03 Oct 2017 22:53:06 +0200 Original-Received: from localhost ([::1]:60256 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzUBl-0003YJ-Va for ged-emacs-devel@m.gmane.org; Tue, 03 Oct 2017 16:53:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59186) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzUBb-0003YC-Di for emacs-devel@gnu.org; Tue, 03 Oct 2017 16:53:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dzUBa-0002NJ-2c for emacs-devel@gnu.org; Tue, 03 Oct 2017 16:53:03 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46042) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dzUBW-0002C6-9k; Tue, 03 Oct 2017 16:52:58 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 2A608160D3D; Tue, 3 Oct 2017 13:52:56 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 6spC7M0Nq4QZ; Tue, 3 Oct 2017 13:52:55 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3C2E9160D98; Tue, 3 Oct 2017 13:52:55 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id F6SAABbih5qP; Tue, 3 Oct 2017 13:52:55 -0700 (PDT) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 20174160D3D; Tue, 3 Oct 2017 13:52:55 -0700 (PDT) In-Reply-To: Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:219055 Archived-At: On 10/03/2017 05:26 AM, Philipp Stephani wrote: > > +#if __has_attribute (warn_unused_result) > > +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__=20 > ((__warn_unused_result__)) > > +#else > > +# define ATTRIBUTE_WARN_UNUSED_RESULT > > +#endif > > Hmm... why do we need this attribute?=C2=A0 You use it with 2 static > functions, so this sounds like a left-over from the development stage? > > It's not strictly needed (and if you don't like it, I can remove it),=20 > but it helps catching memory leaks. I've found __warn_unused_result__ to be more trouble than it's worth in=20 library functions. Emacs has lib/ignore-value.h in order to work around=20 __warn_unused_result__ brain damage, for example. For static functions=20 the problem is less, but still, I mildly favor leaving it out. > > And btw, how can size be greater than SIZE_MAX in this case? This i= s > a valid Lisp object, isn't it?=C2=A0 (There are more such tests in = the > patch, e.g. in lisp_to_json, and I think they, too, are redundant.) > > > Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C=20 > standard guarantees PTRDIFF_MAX <=3D SIZE_MAX. If we want to guarantee=20 > that, we should probably add at least a static assertion. There should be no need for that. No Lisp object can exceed min=20 (PTRDIFF_MAX, SIZE_MAX) bytes; alloc.c guarantees this, so that Emacs=20 should work even on oddball platforms where SIZE_MAX < PTRDIFF_MAX, and=20 there is no need for a runtime check here. The main practical problem here, by the way, is objects whose sizes=20 exceed PTRDIFF_MAX on mainstream 32-bit platforms. This Does Not Work=20 because you cannot subtract pointers within such objects reliably,=20 sometimes even when the true difference is representable as ptrdiff_t!=20 This is the main reason alloc.c prohibits creating such large objects,=20 and that Emacs should reject any attempt to support such objects (even=20 non-Lisp objects). > > +=C2=A0 if (buffer_and_size->size > PTRDIFF_MAX) > > +=C2=A0 =C2=A0 xsignal1 (Qoverflow_error, build_string ("buffer too l= arge")); > > +=C2=A0 insert (buffer_and_size->buffer, buffer_and_size->size); > > I don't think we need this test here, as 'insert' already has the > equivalent test in one of its subroutines. > > It can't, because it takes the byte length as ptrdiff_t. We need to=20 > check before whether the size is actually in the valid range of ptrdiff= _t. A PTRDIFF_MAX test is needed if the JSON library can return strings=20 longer than PTRDIFF_MAX. Please just use buffer_overflow () to signal=20 the error, though. > > +=C2=A0 =C2=A0 case JSON_INTEGER: > > +=C2=A0 =C2=A0 =C2=A0 { > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 json_int_t value =3D json_integer_value = (json); > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (FIXNUM_OVERFLOW_P (value)) > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_error, > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= build_string ("JSON integer is too large")); > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return make_number (value); > > This overflow test is also redundant, as make_number already does it. > > It can't, because json_int_t can be larger than EMACS_INT. Also,=20 > make_number doesn't contain any checks. You're right that a test is needed. However, elsewhere we just use=20 xsignal0 (Qoverflow_error) for this sort of thing, and I suggest being=20 consistent and doing that here as well. Similarly for other calls to=20 xsignal1 with Qoverflow_error. > > +=C2=A0 =C2=A0 case JSON_STRING: > > +=C2=A0 =C2=A0 =C2=A0 { > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 size_t size =3D json_string_length (json= ); > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (FIXNUM_OVERFLOW_P (size)) > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_error, build_= string ("JSON string is=20 > too long")); > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return json_make_string (json_string_val= ue (json), size); > > Once again, the overflow test is redundant, as make_specified_string > (called by json_make_string) already includes an equivalent test. > > And once again, we need to check at least whether the size fits into=20 > ptrdiff_t. You're right, a test is needed. However, I suggest using string_overflow=20 () to signal string overflows.