From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Thu, 28 Sep 2017 21:19:00 +0000
> Cc: emacs-dev= el@gnu.org
>
> IIUC Jansson only accepts UTF-8 strings (i.e. it will generate an erro= r some input is not an UTF-8 string), and
> will only return UTF-8 strings as well. Therefore I think that direct = conversion between Lisp strings and C
> strings (using SDATA etc.) is always correct because the internal Emac= s encoding is a superset of UTF-8.
> Also build_string should always be correct because it will generate a = correct multibyte string for an UTF-8
> string with non-ASCII characters, and a correct unibyte string for an = ASCII string, right?
I don't think it's a good idea to write code which has such
assumptions embedded in it.=C2=A0 We don't do that in other cases, alth= ough
UTF-8 based systems are widespread nowadays.=C2=A0 Instead, we make sure
that encoding and decoding UTF-8 byte stream is implemented
efficiently, and when possible simply reuses the same string data.
Besides, these assumptions are not always true, for example:
=C2=A0 . The Emacs internal representation could include raw bytes, whose =C2=A0 =C2=A0 representations (both of them) is not valid UTF-8;
=C2=A0 . Strings we receive from the library could be invalid UTF-8, in
=C2=A0 =C2=A0 which case putting them into a buffer or string without decod= ing
=C2=A0 =C2=A0 will mean trouble for programs that will try to process them;=
So I think decoding and encoding any string passed to/from Jansson is
better for stability and future maintenance.=C2=A0 If you worry about
performance, you shouldn't: we convert UTF-8 into our internal
representation as efficiently as possible.
>=C2=A0 > + /* LISP now must be a vector or hashtable. */
>=C2=A0 > + if (++lisp_eval_depth > max_lisp_eval_depth)
>=C2=A0 > + xsignal0 (Qjson_object_too_deep);
>
>=C2=A0 This error could mislead: the problem could be in the nesting of=
>=C2=A0 surrounding Lisp being too deep, and the JSON part could be just= fine.
>
> Agreed, but I think it's better to use lisp_eval_depth here becaus= e it's the total nesting depth that could cause
> stack overflows.
Well, at least the error message should not point exclusively to a
JSON problem, it should mention the possibility of a Lisp eval depth
overflow as well.
>=C2=A0 > + Lisp_Object string
>=C2=A0 > + =3D make_string (buffer_and_size->buffer, buffer_and_s= ize->size);
>
>=C2=A0 This is arbitrary text, so I'm not sure make_string is appro= priate.
>=C2=A0 Could the text be a byte stream, i.e. not human-readable text? I= f so,
>=C2=A0 do we want to create a unibyte string or a multibyte string here= ?
>
> It should always be UTF-8.
How does JSON express byte streams, then?=C2=A0 Doesn't it support data= (as
opposed to text)?
>=C2=A0 > + {
>=C2=A0 > + bool overflow =3D INT_ADD_WRAPV (BUFFER_CEILING_OF (point= ), 1, &end);
>=C2=A0 > + eassert (!overflow);
>=C2=A0 > + }
>=C2=A0 > + size_t count;
>=C2=A0 > + {
>=C2=A0 > + bool overflow =3D INT_SUBTRACT_WRAPV (end, point, &co= unt);
>=C2=A0 > + eassert (!overflow);
>=C2=A0 > + }
>
>=C2=A0 Why did you need these blocks in braces?
>
> To be able to reuse the "overflow" name/
Why can't you reuse it without the braces?