From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: JSON/YAML/TOML/etc. parsing performance Date: Fri, 06 Oct 2017 10:40:58 +0300 Message-ID: <83376wwwol.fsf@gnu.org> References: <87poaqhc63.fsf@lifelogs.com> <8360ceh5f1.fsf@gnu.org> <83h8vl5lf9.fsf@gnu.org> <83r2um3fqi.fsf@gnu.org> <43520b71-9e25-926c-d744-78098dad6441@cs.ucla.edu> <83o9pnzddc.fsf@gnu.org> <472176ce-846b-1f24-716b-98eb95ceaa47@cs.ucla.edu> <83d163z6dy.fsf@gnu.org> <73477c99-1600-a53d-d84f-737837d0f91f@cs.ucla.edu> <83poa2ya8j.fsf@gnu.org> <21b0ba97-ed49-43ae-e86f-63fba762353a@cs.ucla.edu> <83lgkqxe3l.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1507275708 931 195.159.176.226 (6 Oct 2017 07:41:48 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 6 Oct 2017 07:41:48 +0000 (UTC) Cc: p.stephani2@gmail.com, emacs-devel@gnu.org To: Paul Eggert Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Oct 06 09:41:44 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e0NGP-0007np-JK for ged-emacs-devel@m.gmane.org; Fri, 06 Oct 2017 09:41:41 +0200 Original-Received: from localhost ([::1]:43411 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e0NGW-0004wJ-UP for ged-emacs-devel@m.gmane.org; Fri, 06 Oct 2017 03:41:48 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52441) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e0NFu-0004vt-RP for emacs-devel@gnu.org; Fri, 06 Oct 2017 03:41:17 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e0NFr-0005wo-Np for emacs-devel@gnu.org; Fri, 06 Oct 2017 03:41:10 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:37231) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e0NFr-0005wk-JZ; Fri, 06 Oct 2017 03:41:07 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4465 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1e0NFq-0007GF-Vy; Fri, 06 Oct 2017 03:41:07 -0400 In-reply-to: (message from Paul Eggert on Thu, 5 Oct 2017 18:58:34 -0700) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:219166 Archived-At: > Cc: p.stephani2@gmail.com, emacs-devel@gnu.org > From: Paul Eggert > Date: Thu, 5 Oct 2017 18:58:34 -0700 > > > AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks > > whether a size_t value is too large > > In the most recent patch I proposed, there were only two such checks; > there were two other checks for too-large size_t that were fixnum > checks, not ptrdiff_t checks. > > However, I can see that you really don't like those checks. So I tweaked > the proposed patch to remove them all from json.c. Please see the > attached 3 patches (the first is just Philipp's patch rebased to > master). The basic idea here is that json.c should be using xmalloc for > allocation anyway, for reasons other than size overflow checking. And > once it uses the Emacs malloc we can make sure that it never allocates > objects that are too large for ptrdiff_t. Thanks, that's better, although it just pushes the extra checks down to our memory allocation routines. It is better because now these checks are applied to all the places where Lisp objects are created, instead of requiring higher-level code to include these checks. > > I'm not arguing for general replacement of ptrdiff_t with size_t, only > > for doing that in those primitives where negative values are a clear > > mistake/bug. > > This is exactly where we should be cautious about using unsigned types. > The longstanding Emacs style is to prefer signed integers, to avoid the > common typos that occur with unsigned. If we start changing the style, > these primitives (or people debugging these primitives) often won't be > able to distinguish buggy from valid-but-enormous cases. Those valid-but-enormous values are either invalid (if they are larger than PTRDIFF_MAX), or easily uncovered by looking at higher call-stack frames in the debugger, where the actual size of the object being created is visible. Yes, it requires some diligence, but then debugging Emacs is already more complicated than that of C programs which use only C objects, so I don't see that as a significant disadvantage. > And when we compile such primitives (or their callers) with > -fsanitize=undefined, we won't be able to catch integer-overflow > bugs automatically at runtime, since unsigned integer arithmetic > silently wraps around even when -fsanitize=undefined is used. I don't envision many primitives to need this kind of change, so again, the disadvantage doesn't sound too significant to me. The advantage is IMO significant, as doing so will remove the need for checks that a size_t value doesn't overflow a ptrdiff_t value, so we will have an overall speedup. Emacs is accused of being slow, so I think we should avoid overhead that is only needed in a tiny fraction of valid use cases. > I help maintain several GNU programs that use unsigned types for sizes, > and find that style to be trickier than the style Emacs uses, with > respect to integer-overflow bugs. I've been gradually changing some of > the non-Emacs GNU code to use signed types, and the results have been > encouraging: the code is more readable and more obviously correct. I'm not sure that experience is 100% applicable to Emacs, because Emacs has special needs due to the fact that our integers are narrower than the corresponding C integral types. And once again, I'm not arguing for a wholesale switch to size_t, only for its judicious use in a few primitives that create Lisp objects.