From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 23 Dec 2017 14:26:09 +0000
>
> I've benchmarked serialization and parsing of JSON with and withou= t explicit encoding. I've found that leaving
> out the coding makes both operations significantly faster =E2=80=93 fr= om a speedup of a factor of 1.11 =C2=B1 0.06 for
> parsing canada.json to 1.57 =C2=B1 0.08 for serializing twitter.json. = Other speedups are in between, but the
> speedup is always significant (to at least one standard deviation). Al= l unit tests pass when leaving out the
> coding steps =E2=80=93 which isn't surprising given that currently= the coding operations are expensive no-ops.
The coding operations are "expensive no-ops" except when they are= n't,
and that is exactly when we need their 'expensive" parts.In which case are they not no-ops? I've spo= t-checked some of the implementation details of coding.c, and I haven't= found obvious cases where they are not no-ops. Emacs appears to use the ob= vious extension of UTF-8 for integers that are not Unicode scalar values, a= nd that's even documented in character.h and the Elisp reference manual= . Using utf-8-unix as encoding seems to keep the encoding intact.=C2=A0
> Therefore I'd suggest to document the internal string encoding in = lisp.h or character.h and remove the explicit
> coding in json.c and emacs-module.c. It's very unlikely that the i= nternal string encoding will change frequently,
> and if so, the unit tests should catch potential issues caused by that= .
As I've already said, I don't think this particular case should be = an
exception wrt to how Emacs behaves with external strings everywhere
else.=C2=A0 We suffer similar slow-downs in those other places as well, and=
IMO this is a small penalty to pay for making sure our objects are
valid and won't crash Emacs.I'= ve spot-checked some other code where we interface with external libraries,= namely dbusbind.c and gnutls.c. In no cases I've found explicit coding= operations (except for filenames, where the situation is different); these= files always use SDATA directly. dbusbind.c even has the comment=C2=A0 /* We need t= o send a valid UTF-8 string.=C2=A0 We could encode `object'<= span style=3D"white-space:pre"> =C2=A0 =C2=A0 =C2=A0but by not encod= ing it, we guarantee it's valid utf-8, even if=C2=A0 =C2=A0 =C2=A0it contains eight-bit-bytes.= =C2=A0 Of course, you can still send=C2=A0 =C2=A0 =C2=A0manually-crafted junk by passing a unibyte = string.=C2=A0 */So not only do we not encode stri= ngs explicitly, we even *prefer* not encoding them, and we do rely on the i= nternal string encoding being an extension of UTF-8. It's the *current*= json.c (and emacs-module.c) that's inconsistent with the rest of the c= odebase.