unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Thu, 05 Oct 2017 10:12:30 +0300	[thread overview]
Message-ID: <83lgkqxe3l.fsf@gnu.org> (raw)
In-Reply-To: <21b0ba97-ed49-43ae-e86f-63fba762353a@cs.ucla.edu> (message from Paul Eggert on Wed, 4 Oct 2017 14:24:59 -0700)

> Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 4 Oct 2017 14:24:59 -0700
> 
> On 10/04/2017 12:38 PM, Eli Zaretskii wrote:
> > if we did use size_t for the arguments which can clearly only be
> > non-negative, the problems which we are discussing would not have
> > happened
> Sure, but we would also have worse problems, as size_t is inherently 
> more error-prone. ptrdiff_t overflows are reliably diagnosed when Emacs 
> is compiled with suitable GCC compiler options. size_t overflows cannot 
> be diagnosed, are all too common, and can cause serious trouble.

If ptrdiff_t overflows are reliably diagnosed, then why do we have to
test for them explicitly in our code, as in the proposed json.c?
AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
whether a size_t value is too large, because similar checks for
ptrdiff_t values are already in the low-level subroutines involved in
creating Lisp objects.  So why couldn't those checks be avoided by
simply assigning to a ptrdiff_t variables?

> The Emacs internals occasionally use size_t because underlying 
> primitives like 'malloc' do, so we do make some exceptions. Perhaps 
> there should be an exception here, for convenience with the JSON 
> library. The code snippets I've seen so far in this thread are not 
> enough context to judge whether an exception would be helpful in this 
> case. Generally speaking, though, unsigned types should be avoided 
> because they are more error-prone. This has long been the style in Emacs 
> internals, and it's served us well.

I'm not arguing for general replacement of ptrdiff_t with size_t, only
for doing that in those primitives where negative values are a clear
mistake/bug.

For example, let's take this case from your proposed changes:

   static Lisp_Object
  -json_make_string (const char *data, ptrdiff_t size)
  +json_make_string (const char *data, size_t size)
   {
  +  if (PTRDIFF_MAX < size)
  +    string_overflow ();
     return make_specified_string (data, -1, size, true);
   }

If we were to change make_specified_string (and its subroutines, like
make_uninit_multibyte_string etc.) to accept a size_t value in its 3rd
argument, the need for the above check against PTRDIFF_MAX would
disappear.

Another such case is 'insert', which is also used in json.c, and
requires a similar check:

  void
  insert (const char *string, ptrdiff_t nbytes)
  {
    if (nbytes > 0)
      {
	ptrdiff_t len = chars_in_text ((unsigned char *) string, nbytes), opoint;
	insert_1_both (string, len, nbytes, 0, 1, 0);
	opoint = PT - len;
	signal_after_change (opoint, 0, len);
	update_compositions (opoint, PT, CHECK_BORDER);
      }
  }

It clearly ignores negative values of nbytes, as expected.  So why not
make nbytes a size_t argument?  (We will probably need some low-level
changes inside the subroutines of insert_1_both, like move_gap, to
reject too large size_t values before we convert them to signed
values, but that's hardly rocket science.)

I envision that all the Fmake_SOMETHING primitives could use similar
changes to have the size specified as size_t, because it can never be
negative.  E.g., Fmake_vector is used by json.c and currently requires
a similar check because its size argument is a signed type.

IOW, I'm saying that using size_t judiciously, in a small number of
places, would make a lot of sense and allow us to simplify
higher-level code, and make it faster by avoiding duplicate checks of
the same values.  It would also make the higher-level code more
reliable, because application-level programmers will not need to
understand all the non-trivial intricacies of this stuff.  As Emacs
starts using more and more external libraries, whether built-in or via
modules, the issue of size_t vs ptrdiff_t will become more and more
important, and a source for more and more error-prone code.  Why not
fix that in advance in our primitives?

> (Ironically, just last week I was telling beginning students to beware 
> unsigned types, with (0u < -1) as an example....)

Well, "kids, don't do that at home -- we are trained professionals"
seems to apply here ;-)



  parent reply	other threads:[~2017-10-05  7:12 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
2017-09-17  0:02   ` Richard Stallman
2017-09-17  3:13     ` Mark Oteiza
2017-09-18  0:00       ` Richard Stallman
2017-09-17  0:02 ` Richard Stallman
2017-09-18 13:46   ` Ted Zlatanov
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05   ` Eli Zaretskii
2017-09-17 20:27     ` Philipp Stephani
2017-09-17 22:41       ` Mark Oteiza
2017-09-18 13:53       ` Ted Zlatanov
2017-09-17 21:17   ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
2017-09-18 13:26   ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58     ` Mark Oteiza
2017-09-18 14:14       ` Philipp Stephani
2017-09-18 14:28         ` Mark Oteiza
2017-09-18 14:36           ` Philipp Stephani
2017-09-18 15:02             ` Eli Zaretskii
2017-09-18 16:14               ` Philipp Stephani
2017-09-18 17:33                 ` Eli Zaretskii
2017-09-18 19:57                 ` Thien-Thi Nguyen
2017-09-18 14:57     ` Eli Zaretskii
2017-09-18 15:07       ` Mark Oteiza
2017-09-18 15:51         ` Eli Zaretskii
2017-09-18 16:22           ` Philipp Stephani
2017-09-18 18:08             ` Eli Zaretskii
2017-09-19 19:32               ` Richard Stallman
2017-09-18 17:26           ` Glenn Morris
2017-09-18 18:16             ` Eli Zaretskii
2017-09-18 16:08       ` Philipp Stephani
2017-09-19  8:18     ` Philipp Stephani
2017-09-19 19:09       ` Eli Zaretskii
2017-09-28 21:19         ` Philipp Stephani
2017-09-28 21:27           ` Stefan Monnier
2017-09-29 19:55           ` Eli Zaretskii
2017-09-30 22:02             ` Philipp Stephani
2017-10-01 18:06               ` Eli Zaretskii
2017-10-03 12:26                 ` Philipp Stephani
2017-10-03 15:31                   ` Eli Zaretskii
2017-10-03 15:52                     ` Philipp Stephani
2017-10-03 16:26                       ` Eli Zaretskii
2017-10-03 17:10                         ` Eli Zaretskii
2017-10-03 18:37                           ` Philipp Stephani
2017-10-03 20:52                   ` Paul Eggert
2017-10-04  5:33                     ` Eli Zaretskii
2017-10-04  6:41                       ` Paul Eggert
2017-10-04  8:03                         ` Eli Zaretskii
2017-10-04 17:51                           ` Paul Eggert
2017-10-04 19:38                             ` Eli Zaretskii
2017-10-04 21:24                               ` Paul Eggert
2017-10-05  1:48                                 ` Paul Eggert
2017-10-05  7:14                                   ` Eli Zaretskii
2017-10-08 22:52                                   ` Philipp Stephani
2017-10-09  5:54                                     ` Paul Eggert
2017-10-29 20:48                                       ` Philipp Stephani
2017-10-09  6:38                                     ` Eli Zaretskii
2017-10-05  7:12                                 ` Eli Zaretskii [this message]
2017-10-06  1:58                                   ` Paul Eggert
2017-10-06  7:40                                     ` Eli Zaretskii
2017-10-06 19:36                                       ` Paul Eggert
2017-10-06 21:03                                         ` Eli Zaretskii
2017-10-08 23:09                                     ` Philipp Stephani
2017-10-09  6:19                                       ` Paul Eggert
2017-10-29 20:48                                         ` Philipp Stephani
2017-10-29 22:49                                           ` Philipp Stephani
2017-12-09 23:05                                             ` Philipp Stephani
2017-12-10  7:08                                               ` Eli Zaretskii
2017-12-10 13:26                                                 ` Philipp Stephani
2017-12-10 13:32                                                   ` Ted Zlatanov
2017-10-08 23:04                                   ` Philipp Stephani
2017-10-09  6:47                                     ` Eli Zaretskii
2017-10-08 17:58                     ` Philipp Stephani
2017-10-08 18:42                       ` Eli Zaretskii
2017-10-08 23:14                         ` Philipp Stephani
2017-10-09  6:53                           ` Eli Zaretskii
2017-10-29 20:41                             ` Philipp Stephani
2017-10-09  6:22                       ` Paul Eggert
2017-10-01 18:38               ` Eli Zaretskii
2017-10-03 12:12                 ` Philipp Stephani
2017-10-03 14:54                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83lgkqxe3l.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=emacs-devel@gnu.org \
    --cc=p.stephani2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).