unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Philipp Stephani <p.stephani2@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>, Paul Eggert <eggert@cs.ucla.edu>
Cc: emacs-devel@gnu.org
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Sun, 08 Oct 2017 23:04:10 +0000	[thread overview]
Message-ID: <CAArVCkQddXEqLYwPU7xXUND0jpQ2dixWpcnDf+08Lc9OO5BuwA@mail.gmail.com> (raw)
In-Reply-To: <83lgkqxe3l.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 4019 bytes --]

Eli Zaretskii <eliz@gnu.org> schrieb am Do., 5. Okt. 2017 um 09:12 Uhr:

> > Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
> > From: Paul Eggert <eggert@cs.ucla.edu>
> > Date: Wed, 4 Oct 2017 14:24:59 -0700
> >
> > On 10/04/2017 12:38 PM, Eli Zaretskii wrote:
> > > if we did use size_t for the arguments which can clearly only be
> > > non-negative, the problems which we are discussing would not have
> > > happened
> > Sure, but we would also have worse problems, as size_t is inherently
> > more error-prone. ptrdiff_t overflows are reliably diagnosed when Emacs
> > is compiled with suitable GCC compiler options. size_t overflows cannot
> > be diagnosed, are all too common, and can cause serious trouble.
>
> If ptrdiff_t overflows are reliably diagnosed, then why do we have to
> test for them explicitly in our code, as in the proposed json.c?
> AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
> whether a size_t value is too large, because similar checks for
> ptrdiff_t values are already in the low-level subroutines involved in
> creating Lisp objects.  So why couldn't those checks be avoided by
> simply assigning to a ptrdiff_t variables?


Signed integer overflow is only diagnosed when compiling with -ftrapv or
-fsanitize=undefined, which normally doesn't happen in production builds.
IIUC lossy numeric conversions are never diagnosed because they are not
undefined behavior, so we always need to check for them explicitly.
We also need to strictly distinguish between the case where an overflow is
a bug (incorrect assumptions in the Emacs source code itself) and the case
where it is a legitimate outcome due to user input (in this case, return
values from Jansson that overflow ptrdiff_t). The two cases need to be
treated differently: In the first case we can say it's UB and rely on
(hopefully regular and comprehensive) UBSan runs, in the second case we
need explicit checks and normal signalling as opposed to assertions.


>
> > The Emacs internals occasionally use size_t because underlying
> > primitives like 'malloc' do, so we do make some exceptions. Perhaps
> > there should be an exception here, for convenience with the JSON
> > library. The code snippets I've seen so far in this thread are not
> > enough context to judge whether an exception would be helpful in this
> > case. Generally speaking, though, unsigned types should be avoided
> > because they are more error-prone. This has long been the style in Emacs
> > internals, and it's served us well.
>
> I'm not arguing for general replacement of ptrdiff_t with size_t, only
> for doing that in those primitives where negative values are a clear
> mistake/bug.
>
> For example, let's take this case from your proposed changes:
>
>    static Lisp_Object
>   -json_make_string (const char *data, ptrdiff_t size)
>   +json_make_string (const char *data, size_t size)
>    {
>   +  if (PTRDIFF_MAX < size)
>   +    string_overflow ();
>      return make_specified_string (data, -1, size, true);
>    }
>
> If we were to change make_specified_string (and its subroutines, like
> make_uninit_multibyte_string etc.) to accept a size_t value in its 3rd
> argument, the need for the above check against PTRDIFF_MAX would
> disappear.
>

It wouldn't disappear, it would merely be shifted around. Arguments could
be made for either choice, but somewhere the check needs to happen.

  It would also make the higher-level code more
> reliable, because application-level programmers will not need to
> understand all the non-trivial intricacies of this stuff.


Every C programmer needs to understand these issues, no matter what
codebase they are working with. Lossy integral conversions are fundamental
design choices of the C language that can't be avoided.

C is a nasty language full of traps. You can try to hide some of the traps,
but you can't remove them.
(Arguably I've yet to come across a language that doesn't have nasty
integer types. Python 3, with integers auto-upgrading to bignums, might be
closest.)

[-- Attachment #2: Type: text/html, Size: 5117 bytes --]

  parent reply	other threads:[~2017-10-08 23:04 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
2017-09-17  0:02   ` Richard Stallman
2017-09-17  3:13     ` Mark Oteiza
2017-09-18  0:00       ` Richard Stallman
2017-09-17  0:02 ` Richard Stallman
2017-09-18 13:46   ` Ted Zlatanov
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05   ` Eli Zaretskii
2017-09-17 20:27     ` Philipp Stephani
2017-09-17 22:41       ` Mark Oteiza
2017-09-18 13:53       ` Ted Zlatanov
2017-09-17 21:17   ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
2017-09-18 13:26   ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58     ` Mark Oteiza
2017-09-18 14:14       ` Philipp Stephani
2017-09-18 14:28         ` Mark Oteiza
2017-09-18 14:36           ` Philipp Stephani
2017-09-18 15:02             ` Eli Zaretskii
2017-09-18 16:14               ` Philipp Stephani
2017-09-18 17:33                 ` Eli Zaretskii
2017-09-18 19:57                 ` Thien-Thi Nguyen
2017-09-18 14:57     ` Eli Zaretskii
2017-09-18 15:07       ` Mark Oteiza
2017-09-18 15:51         ` Eli Zaretskii
2017-09-18 16:22           ` Philipp Stephani
2017-09-18 18:08             ` Eli Zaretskii
2017-09-19 19:32               ` Richard Stallman
2017-09-18 17:26           ` Glenn Morris
2017-09-18 18:16             ` Eli Zaretskii
2017-09-18 16:08       ` Philipp Stephani
2017-09-19  8:18     ` Philipp Stephani
2017-09-19 19:09       ` Eli Zaretskii
2017-09-28 21:19         ` Philipp Stephani
2017-09-28 21:27           ` Stefan Monnier
2017-09-29 19:55           ` Eli Zaretskii
2017-09-30 22:02             ` Philipp Stephani
2017-10-01 18:06               ` Eli Zaretskii
2017-10-03 12:26                 ` Philipp Stephani
2017-10-03 15:31                   ` Eli Zaretskii
2017-10-03 15:52                     ` Philipp Stephani
2017-10-03 16:26                       ` Eli Zaretskii
2017-10-03 17:10                         ` Eli Zaretskii
2017-10-03 18:37                           ` Philipp Stephani
2017-10-03 20:52                   ` Paul Eggert
2017-10-04  5:33                     ` Eli Zaretskii
2017-10-04  6:41                       ` Paul Eggert
2017-10-04  8:03                         ` Eli Zaretskii
2017-10-04 17:51                           ` Paul Eggert
2017-10-04 19:38                             ` Eli Zaretskii
2017-10-04 21:24                               ` Paul Eggert
2017-10-05  1:48                                 ` Paul Eggert
2017-10-05  7:14                                   ` Eli Zaretskii
2017-10-08 22:52                                   ` Philipp Stephani
2017-10-09  5:54                                     ` Paul Eggert
2017-10-29 20:48                                       ` Philipp Stephani
2017-10-09  6:38                                     ` Eli Zaretskii
2017-10-05  7:12                                 ` Eli Zaretskii
2017-10-06  1:58                                   ` Paul Eggert
2017-10-06  7:40                                     ` Eli Zaretskii
2017-10-06 19:36                                       ` Paul Eggert
2017-10-06 21:03                                         ` Eli Zaretskii
2017-10-08 23:09                                     ` Philipp Stephani
2017-10-09  6:19                                       ` Paul Eggert
2017-10-29 20:48                                         ` Philipp Stephani
2017-10-29 22:49                                           ` Philipp Stephani
2017-12-09 23:05                                             ` Philipp Stephani
2017-12-10  7:08                                               ` Eli Zaretskii
2017-12-10 13:26                                                 ` Philipp Stephani
2017-12-10 13:32                                                   ` Ted Zlatanov
2017-10-08 23:04                                   ` Philipp Stephani [this message]
2017-10-09  6:47                                     ` Eli Zaretskii
2017-10-08 17:58                     ` Philipp Stephani
2017-10-08 18:42                       ` Eli Zaretskii
2017-10-08 23:14                         ` Philipp Stephani
2017-10-09  6:53                           ` Eli Zaretskii
2017-10-29 20:41                             ` Philipp Stephani
2017-10-09  6:22                       ` Paul Eggert
2017-10-01 18:38               ` Eli Zaretskii
2017-10-03 12:12                 ` Philipp Stephani
2017-10-03 14:54                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAArVCkQddXEqLYwPU7xXUND0jpQ2dixWpcnDf+08Lc9OO5BuwA@mail.gmail.com \
    --to=p.stephani2@gmail.com \
    --cc=eggert@cs.ucla.edu \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).