From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: JSON/YAML/TOML/etc. parsing performance Date: Sun, 29 Oct 2017 20:41:56 +0000 Message-ID: References: <87poaqhc63.fsf@lifelogs.com> <8360ceh5f1.fsf@gnu.org> <83h8vl5lf9.fsf@gnu.org> <83r2um3fqi.fsf@gnu.org> <43520b71-9e25-926c-d744-78098dad6441@cs.ucla.edu> <83r2udscpy.fsf@gnu.org> <83infostfs.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="001a1147da4e1f7025055cb58ecb" X-Trace: blaine.gmane.org 1509309773 14591 195.159.176.226 (29 Oct 2017 20:42:53 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 29 Oct 2017 20:42:53 +0000 (UTC) Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Oct 29 21:42:47 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e8uPo-0002Qf-4D for ged-emacs-devel@m.gmane.org; Sun, 29 Oct 2017 21:42:40 +0100 Original-Received: from localhost ([::1]:37682 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e8uPv-0005Xs-B2 for ged-emacs-devel@m.gmane.org; Sun, 29 Oct 2017 16:42:47 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59309) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e8uPK-0005Xc-Mq for emacs-devel@gnu.org; Sun, 29 Oct 2017 16:42:11 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e8uPJ-0007FA-Hj for emacs-devel@gnu.org; Sun, 29 Oct 2017 16:42:10 -0400 Original-Received: from mail-qk0-x22d.google.com ([2607:f8b0:400d:c09::22d]:44090) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1e8uPH-0007D3-QU; Sun, 29 Oct 2017 16:42:07 -0400 Original-Received: by mail-qk0-x22d.google.com with SMTP id r64so13917462qkc.1; Sun, 29 Oct 2017 13:42:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SREEO3p+6J0I5UxVY+S6lJB37O1dyQCu9/NnafLKgVs=; b=iNRLi2uuIOhu7Qi/Ods42irDuPqVCZ8i3er5L5Kgnk/mtwsQifNsT7DkzoIZPesTGJ v13qCeMkeEKpj72FPbYDz6pkjpKWVr/sIF55+TiCSVSdwIDo3z7/LPMZiOyKn91CWFNF Q+BKZL1RF6WOWOaqMji8QqTHCCefxBHEffp8ScEKw6D09QwIFmgszDXdcF/r48Bm7BQn wh5yVmuiCao/B9hNveJu3dUHh1ZrlUdGKO+El0aYZXpbfVrvZ/Kagu7M+p+G++uLGc+5 t9EBCA/Xwp+76aFiKcMeeRv9L5stseC/eVvViOk3KyXFBU4hGrS2Q5lktKdVpBice7dd 5idA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SREEO3p+6J0I5UxVY+S6lJB37O1dyQCu9/NnafLKgVs=; b=eR2qkjN+zK16HFL8imVH8cIzRCXSoT6nMEko13ykjk7ixeZqCJp0jb2xx5C6aSSzlD 42LshoFLbAoZpja4to7UdrBdWOQPnkgMhufXng3FajJ+igX5B/mnYOe6klYU6JlDdgK8 KDyiWnTf5c2EAignHaUD3raBqnKBT88fLR8uAmBBguhvqGPNPBsS3jAf5f088fFA/LsS hWutp4+NW5ffMCcHp1wjqMjzY+61aviPEmHb45jzpgzbD9Kf62r6WiCgA3LUfGo3/Gx1 F82dV4QuGAHVsa85URnNWCqk7fT5N7VNnkqzo7dlmBA95WUHKJWGF54gSVawUqbk9qDy /l5A== X-Gm-Message-State: AMCzsaWJE/8xFq/7A/5E7nPUUgptv31jgJ7PFBjBDlLXU3au8jU+OhOp N6HUz/O+b+LspYfri0Pt0tvRKInpjC1QMJOpPmcYkQ== X-Google-Smtp-Source: ABhQp+SU20ZWSbm4kIDXP23LxEzwa8Uyy4ucKJ8QZsjIp3fCO3nn3CH22YKH5LEyAjkOXNce7IiUtuq4tx+Am9GkLyY= X-Received: by 10.55.21.30 with SMTP id f30mr10892144qkh.335.1509309726811; Sun, 29 Oct 2017 13:42:06 -0700 (PDT) In-Reply-To: <83infostfs.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c09::22d X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:219818 Archived-At: --001a1147da4e1f7025055cb58ecb Content-Type: text/plain; charset="UTF-8" Eli Zaretskii schrieb am Mo., 9. Okt. 2017 um 08:54 Uhr: > > From: Philipp Stephani > > Date: Sun, 08 Oct 2017 23:14:18 +0000 > > Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org > > > > Jansson only accepts UTF-8 strings, and at least in our usage will also > only hand out UTF-8 strings. > > How can we be 100% sure of that? We don't trust any other libraries > with such high fidelity, we always decode any external data. > We also trust glibc's malloc to never return overlapping non-freed blocks, right? This "trust" isn't different. Of course we can assume that libraries behave according to their specification. > > > It's totally OK to rely on this assumption since all code that's > involved here is part of the Emacs core, so it can > > rely on implementation details. > > That is in stark contrast with your usual coding style, which tends to > place checks and assertions where they are not always needed. I wouldn't mind placing an assertion here as well. An assertion primarily documents the assumptions made in the code and as a side effect is also tested in debug builds. It's generally a good idea to add such documentation. > Could > it be that you underestimate the damage that broken non-ASCII byte > stream can cause Emacs if inserted directly into a buffer or a string? > Doing so will usually cause Emacs die a horrible death quite soon, > because code that processes buffer or string text has no defenses > against such calamities. > If and when such a bug happens, we can work around it (after filing a bug against Jansson). But we can't work around potential bugs in libraries, see above. --001a1147da4e1f7025055cb58ecb Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= Mo., 9. Okt. 2017 um 08:54=C2=A0Uhr:
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 08 Oct 2017 23:14:18 +0000
> Cc: eggert@cs.= ucla.edu, emac= s-devel@gnu.org
>
> Jansson only accepts UTF-8 strings, and at least in our usage will als= o only hand out UTF-8 strings.

How can we be 100% sure of that?=C2=A0 We don't trust any other librari= es
with such high fidelity, we always decode any external data.

We also trust glibc's malloc to never return over= lapping non-freed blocks, right? This "trust" isn't different= . Of course we can assume that libraries behave according to their specific= ation.
=C2=A0

> It's totally OK to rely on this assumption since all code that'= ;s involved here is part of the Emacs core, so it can
> rely on implementation details.

That is in stark contrast with your usual coding style, which tends to
place checks and assertions where they are not always needed.
<= div>
I wouldn't mind placing an assertion here as well. A= n assertion primarily documents the assumptions made in the code and as a s= ide effect is also tested in debug builds. It's generally a good idea t= o add such documentation.
=C2=A0
=C2=A0 Could
it be that you underestimate the damage that broken non-ASCII byte
stream can cause Emacs if inserted directly into a buffer or a string?
Doing so will usually cause Emacs die a horrible death quite soon,
because code that processes buffer or string text has no defenses
against such calamities.

If and when su= ch a bug happens, we can work around it (after filing a bug against Jansson= ). But we can't work around potential bugs in libraries, see above.
--001a1147da4e1f7025055cb58ecb--