From: "Herman, Géza" <geza.herman@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: "Herman Géza" <geza.herman@gmail.com>, emacs-devel@gnu.org
Subject: Re: I created a faster JSON parser
Date: Fri, 08 Mar 2024 21:22:13 +0100 [thread overview]
Message-ID: <87a5n8to8m.fsf@gmail.com> (raw)
In-Reply-To: <86cys4zec7.fsf@gnu.org>
Eli Zaretskii <eliz@gnu.org> writes:
>> I want to use a datatype which is likely to have the same size
>> as
>> a CPU register. On 32-bit machines, it should be 32-bit, on
>> 64-bit
>> architectures it should be 64-bit.
>
> Is there a reason for you to want it to be 64-bit type on a
> 64-bit
> machine? If the only bother is efficiency, then you can use
> 'int'
> without fear. But if a 64-bit machine will need the range of
> values
> beyond INT_MAX (does it?), then I suggest to use ptrdiff_t.
The only reason is if I use a 64-bit number on a 64-bit platform,
then the fast path will be chosen more frequently. So it makes
sense to use a register-sized integer here.
>> This is not a strong requirement, but it makes the parser
>> faster.
>
> Are you sure? AFAIK, 32-bit arithmetics on a 64-bit machine is
> as
> fast as 64-bit arithmetics. So if efficiency is the only
> consideration, using 'int' is OK.
Yes, that's right, but my reasoning is the other way around. I
shouldn't choose a larger integer than the platform natively
support. I could just use a 64-bit integer on all platforms, this
is what I originally had. But then on 32-bit platforms, this part
of the parser will be slower, because it unnecessarily will use
two registers instead of one (for most cases, as numbers are
usually smaller than 32-bit).
This whole thing is not big of a deal, as int is probably fine, as
it covers most of the commonly used range. But if I can just put a
more proper type here, then I think I should do it.
> If the value needs to fit in a fixnum, use EMACS_INT, which is a
> type
> that already takes the 32-but vs 64-bit nature of the machine
> into
> consideration.
Yes, it seems that EMACS_UINT is good for my purpose, thanks for
the suggestion.
> We should begin by deciding whether we want to support
> characters
> outside of the Unicode range. The error message depends on that
> decision.
>
>> I didn't find any code which handles this case in the jansson
>> parser
>> either.
>
> The jansson code required encoding/decoding strings to make sure
> we
> submit to jansson text that is always valid UTF-8.
I tried to use the jansson parser with a unicode 0x333333
character in a string, and it didn't work, it fails with
(json-parse-error "unable to decode byte... message. Also, I see
that json-parse-string calls some utf8 encoding related function
before parsing, but json-parse-buffer doesn't (and it doesn't do
anything encoding related thing in the callback, it just calls
memcpy). So based on these, does it have any benefit of
supporting these? Out of curiosity, what are these extra
characters used for? What is the purpose of the odd special
2-byte encoding of 8-bit characters (I mean where the 1st byte is
C0/C1)? Why don't just use the regular utf-8 encoding for these
values?
next prev parent reply other threads:[~2024-03-08 20:22 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-08 10:27 I created a faster JSON parser Herman, Géza
2024-03-08 11:41 ` Philip Kaludercic
2024-03-08 12:34 ` Herman, Géza
2024-03-08 12:03 ` Eli Zaretskii
2024-03-08 12:38 ` Herman, Géza
2024-03-08 12:59 ` Eli Zaretskii
2024-03-08 13:12 ` Herman, Géza
2024-03-08 14:10 ` Eli Zaretskii
2024-03-08 14:24 ` Collin Funk
2024-03-08 15:20 ` Herman, Géza
2024-03-08 16:22 ` Eli Zaretskii
2024-03-08 18:34 ` Herman, Géza
2024-03-08 19:57 ` Eli Zaretskii
2024-03-08 20:22 ` Herman, Géza [this message]
2024-03-09 6:52 ` Eli Zaretskii
2024-03-09 11:08 ` Herman, Géza
2024-03-09 12:23 ` Lynn Winebarger
2024-03-09 12:58 ` Po Lu
2024-03-09 13:13 ` Eli Zaretskii
2024-03-09 14:00 ` Herman, Géza
2024-03-09 14:21 ` Eli Zaretskii
2024-03-08 13:28 ` Po Lu
2024-03-08 16:14 ` Herman, Géza
2024-03-09 1:55 ` Po Lu
2024-03-09 20:37 ` Christopher Wellons
2024-03-10 6:31 ` Eli Zaretskii
2024-03-10 21:39 ` Philip Kaludercic
2024-03-11 13:29 ` Eli Zaretskii
2024-03-11 14:05 ` Mattias Engdegård
2024-03-11 14:35 ` Herman, Géza
2024-03-12 9:26 ` Mattias Engdegård
2024-03-12 10:20 ` Gerd Möllmann
2024-03-12 11:14 ` Mattias Engdegård
2024-03-12 11:33 ` Gerd Möllmann
2024-03-15 13:35 ` Herman, Géza
2024-03-15 14:56 ` Gerd Möllmann
2024-03-19 18:49 ` Mattias Engdegård
2024-03-19 19:05 ` Herman, Géza
2024-03-19 19:18 ` Gerd Möllmann
2024-03-19 19:13 ` Gerd Möllmann
2024-03-12 10:58 ` Herman, Géza
2024-03-12 13:11 ` Mattias Engdegård
2024-03-12 13:42 ` Mattias Engdegård
2024-03-12 15:23 ` Herman, Géza
2024-03-12 15:39 ` Gerd Möllmann
2024-03-10 6:58 ` Herman, Géza
2024-03-10 16:54 ` Christopher Wellons
2024-03-10 20:41 ` Herman, Géza
2024-03-10 23:22 ` Christopher Wellons
2024-03-11 9:34 ` Herman, Géza
2024-03-11 13:47 ` Christopher Wellons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a5n8to8m.fsf@gmail.com \
--to=geza.herman@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).