unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Fri, 06 Oct 2017 10:40:58 +0300	[thread overview]
Message-ID: <83376wwwol.fsf@gnu.org> (raw)
In-Reply-To: <b864ee4b-d691-821e-2e7c-32bdd03840df@cs.ucla.edu> (message from Paul Eggert on Thu, 5 Oct 2017 18:58:34 -0700)

> Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 5 Oct 2017 18:58:34 -0700
> 
> > AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
> > whether a size_t value is too large
> 
> In the most recent patch I proposed, there were only two such checks; 
> there were two other checks for too-large size_t that were fixnum 
> checks, not ptrdiff_t checks.
> 
> However, I can see that you really don't like those checks. So I tweaked 
> the proposed patch to remove them all from json.c. Please see the 
> attached 3 patches (the first is just Philipp's patch rebased to 
> master). The basic idea here is that json.c should be using xmalloc for 
> allocation anyway, for reasons other than size overflow checking. And 
> once it uses the Emacs malloc we can make sure that it never allocates 
> objects that are too large for ptrdiff_t.

Thanks, that's better, although it just pushes the extra checks down
to our memory allocation routines.  It is better because now these
checks are applied to all the places where Lisp objects are created,
instead of requiring higher-level code to include these checks.

> > I'm not arguing for general replacement of ptrdiff_t with size_t, only
> > for doing that in those primitives where negative values are a clear
> > mistake/bug.
> 
> This is exactly where we should be cautious about using unsigned types. 
> The longstanding Emacs style is to prefer signed integers, to avoid the 
> common typos that occur with unsigned. If we start changing the style, 
> these primitives (or people debugging these primitives) often won't be 
> able to distinguish buggy from valid-but-enormous cases.

Those valid-but-enormous values are either invalid (if they are larger
than PTRDIFF_MAX), or easily uncovered by looking at higher call-stack
frames in the debugger, where the actual size of the object being
created is visible.  Yes, it requires some diligence, but then
debugging Emacs is already more complicated than that of C programs
which use only C objects, so I don't see that as a significant
disadvantage.

> And when we compile such primitives (or their callers) with
> -fsanitize=undefined, we won't be able to catch integer-overflow
> bugs automatically at runtime, since unsigned integer arithmetic
> silently wraps around even when -fsanitize=undefined is used.

I don't envision many primitives to need this kind of change, so
again, the disadvantage doesn't sound too significant to me.  The
advantage is IMO significant, as doing so will remove the need for
checks that a size_t value doesn't overflow a ptrdiff_t value, so we
will have an overall speedup.  Emacs is accused of being slow, so I
think we should avoid overhead that is only needed in a tiny fraction
of valid use cases.

> I help maintain several GNU programs that use unsigned types for sizes, 
> and find that style to be trickier than the style Emacs uses, with 
> respect to integer-overflow bugs. I've been gradually changing some of 
> the non-Emacs GNU code to use signed types, and the results have been 
> encouraging: the code is more readable and more obviously correct.

I'm not sure that experience is 100% applicable to Emacs, because
Emacs has special needs due to the fact that our integers are narrower
than the corresponding C integral types.

And once again, I'm not arguing for a wholesale switch to size_t, only
for its judicious use in a few primitives that create Lisp objects.



  reply	other threads:[~2017-10-06  7:40 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
2017-09-17  0:02   ` Richard Stallman
2017-09-17  3:13     ` Mark Oteiza
2017-09-18  0:00       ` Richard Stallman
2017-09-17  0:02 ` Richard Stallman
2017-09-18 13:46   ` Ted Zlatanov
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05   ` Eli Zaretskii
2017-09-17 20:27     ` Philipp Stephani
2017-09-17 22:41       ` Mark Oteiza
2017-09-18 13:53       ` Ted Zlatanov
2017-09-17 21:17   ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
2017-09-18 13:26   ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58     ` Mark Oteiza
2017-09-18 14:14       ` Philipp Stephani
2017-09-18 14:28         ` Mark Oteiza
2017-09-18 14:36           ` Philipp Stephani
2017-09-18 15:02             ` Eli Zaretskii
2017-09-18 16:14               ` Philipp Stephani
2017-09-18 17:33                 ` Eli Zaretskii
2017-09-18 19:57                 ` Thien-Thi Nguyen
2017-09-18 14:57     ` Eli Zaretskii
2017-09-18 15:07       ` Mark Oteiza
2017-09-18 15:51         ` Eli Zaretskii
2017-09-18 16:22           ` Philipp Stephani
2017-09-18 18:08             ` Eli Zaretskii
2017-09-19 19:32               ` Richard Stallman
2017-09-18 17:26           ` Glenn Morris
2017-09-18 18:16             ` Eli Zaretskii
2017-09-18 16:08       ` Philipp Stephani
2017-09-19  8:18     ` Philipp Stephani
2017-09-19 19:09       ` Eli Zaretskii
2017-09-28 21:19         ` Philipp Stephani
2017-09-28 21:27           ` Stefan Monnier
2017-09-29 19:55           ` Eli Zaretskii
2017-09-30 22:02             ` Philipp Stephani
2017-10-01 18:06               ` Eli Zaretskii
2017-10-03 12:26                 ` Philipp Stephani
2017-10-03 15:31                   ` Eli Zaretskii
2017-10-03 15:52                     ` Philipp Stephani
2017-10-03 16:26                       ` Eli Zaretskii
2017-10-03 17:10                         ` Eli Zaretskii
2017-10-03 18:37                           ` Philipp Stephani
2017-10-03 20:52                   ` Paul Eggert
2017-10-04  5:33                     ` Eli Zaretskii
2017-10-04  6:41                       ` Paul Eggert
2017-10-04  8:03                         ` Eli Zaretskii
2017-10-04 17:51                           ` Paul Eggert
2017-10-04 19:38                             ` Eli Zaretskii
2017-10-04 21:24                               ` Paul Eggert
2017-10-05  1:48                                 ` Paul Eggert
2017-10-05  7:14                                   ` Eli Zaretskii
2017-10-08 22:52                                   ` Philipp Stephani
2017-10-09  5:54                                     ` Paul Eggert
2017-10-29 20:48                                       ` Philipp Stephani
2017-10-09  6:38                                     ` Eli Zaretskii
2017-10-05  7:12                                 ` Eli Zaretskii
2017-10-06  1:58                                   ` Paul Eggert
2017-10-06  7:40                                     ` Eli Zaretskii [this message]
2017-10-06 19:36                                       ` Paul Eggert
2017-10-06 21:03                                         ` Eli Zaretskii
2017-10-08 23:09                                     ` Philipp Stephani
2017-10-09  6:19                                       ` Paul Eggert
2017-10-29 20:48                                         ` Philipp Stephani
2017-10-29 22:49                                           ` Philipp Stephani
2017-12-09 23:05                                             ` Philipp Stephani
2017-12-10  7:08                                               ` Eli Zaretskii
2017-12-10 13:26                                                 ` Philipp Stephani
2017-12-10 13:32                                                   ` Ted Zlatanov
2017-10-08 23:04                                   ` Philipp Stephani
2017-10-09  6:47                                     ` Eli Zaretskii
2017-10-08 17:58                     ` Philipp Stephani
2017-10-08 18:42                       ` Eli Zaretskii
2017-10-08 23:14                         ` Philipp Stephani
2017-10-09  6:53                           ` Eli Zaretskii
2017-10-29 20:41                             ` Philipp Stephani
2017-10-09  6:22                       ` Paul Eggert
2017-10-01 18:38               ` Eli Zaretskii
2017-10-03 12:12                 ` Philipp Stephani
2017-10-03 14:54                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83376wwwol.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=emacs-devel@gnu.org \
    --cc=p.stephani2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).