unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Herman, Géza" <geza.herman@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: "Herman Géza" <geza.herman@gmail.com>, emacs-devel@gnu.org
Subject: Re: I created a faster JSON parser
Date: Fri, 08 Mar 2024 21:22:13 +0100	[thread overview]
Message-ID: <87a5n8to8m.fsf@gmail.com> (raw)
In-Reply-To: <86cys4zec7.fsf@gnu.org>


Eli Zaretskii <eliz@gnu.org> writes:

>> I want to use a datatype which is likely to have the same size 
>> as
>> a CPU register. On 32-bit machines, it should be 32-bit, on 
>> 64-bit
>> architectures it should be 64-bit.
>
> Is there a reason for you to want it to be 64-bit type on a 
> 64-bit
> machine?  If the only bother is efficiency, then you can use 
> 'int'
> without fear.  But if a 64-bit machine will need the range of 
> values
> beyond INT_MAX (does it?), then I suggest to use ptrdiff_t.
The only reason is if I use a 64-bit number on a 64-bit platform, 
then the fast path will be chosen more frequently. So it makes 
sense to use a register-sized integer here.

>> This is not a strong requirement, but it makes the parser 
>> faster.
>
> Are you sure?  AFAIK, 32-bit arithmetics on a 64-bit machine is 
> as
> fast as 64-bit arithmetics.  So if efficiency is the only
> consideration, using 'int' is OK.
Yes, that's right, but my reasoning is the other way around. I 
shouldn't choose a larger integer than the platform natively 
support. I could just use a 64-bit integer on all platforms, this 
is what I originally had. But then on 32-bit platforms, this part 
of the parser will be slower, because it unnecessarily will use 
two registers instead of one (for most cases, as numbers are 
usually smaller than 32-bit).

This whole thing is not big of a deal, as int is probably fine, as 
it covers most of the commonly used range. But if I can just put a 
more proper type here, then I think I should do it.

> If the value needs to fit in a fixnum, use EMACS_INT, which is a 
> type
> that already takes the 32-but vs 64-bit nature of the machine 
> into
> consideration.
Yes, it seems that EMACS_UINT is good for my purpose, thanks for 
the suggestion.

> We should begin by deciding whether we want to support 
> characters
> outside of the Unicode range.  The error message depends on that
> decision.
>
>> I didn't find any code which handles this case in the jansson 
>> parser
>> either.
>
> The jansson code required encoding/decoding strings to make sure 
> we
> submit to jansson text that is always valid UTF-8.

I tried to use the jansson parser with a unicode 0x333333 
character in a string, and it didn't work, it fails with 
(json-parse-error "unable to decode byte... message. Also, I see 
that json-parse-string calls some utf8 encoding related function 
before parsing, but json-parse-buffer doesn't (and it doesn't do 
anything encoding related thing in the callback, it just calls 
memcpy).  So based on these, does it have any benefit of 
supporting these?  Out of curiosity, what are these extra 
characters used for?  What is the purpose of the odd special 
2-byte encoding of 8-bit characters (I mean where the 1st byte is 
C0/C1)? Why don't just use the regular utf-8 encoding for these 
values?



  reply	other threads:[~2024-03-08 20:22 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-08 10:27 I created a faster JSON parser Herman, Géza
2024-03-08 11:41 ` Philip Kaludercic
2024-03-08 12:34   ` Herman, Géza
2024-03-08 12:03 ` Eli Zaretskii
2024-03-08 12:38   ` Herman, Géza
2024-03-08 12:59     ` Eli Zaretskii
2024-03-08 13:12       ` Herman, Géza
2024-03-08 14:10         ` Eli Zaretskii
2024-03-08 14:24           ` Collin Funk
2024-03-08 15:20           ` Herman, Géza
2024-03-08 16:22             ` Eli Zaretskii
2024-03-08 18:34               ` Herman, Géza
2024-03-08 19:57                 ` Eli Zaretskii
2024-03-08 20:22                   ` Herman, Géza [this message]
2024-03-09  6:52                     ` Eli Zaretskii
2024-03-09 11:08                       ` Herman, Géza
2024-03-09 12:23                         ` Lynn Winebarger
2024-03-09 12:58                         ` Po Lu
2024-03-09 13:13                         ` Eli Zaretskii
2024-03-09 14:00                           ` Herman, Géza
2024-03-09 14:21                             ` Eli Zaretskii
2024-03-08 13:28 ` Po Lu
2024-03-08 16:14   ` Herman, Géza
2024-03-09  1:55     ` Po Lu
2024-03-09 20:37 ` Christopher Wellons
2024-03-10  6:31   ` Eli Zaretskii
2024-03-10 21:39     ` Philip Kaludercic
2024-03-11 13:29       ` Eli Zaretskii
2024-03-11 14:05         ` Mattias Engdegård
2024-03-11 14:35           ` Herman, Géza
2024-03-12  9:26             ` Mattias Engdegård
2024-03-12 10:20               ` Gerd Möllmann
2024-03-12 11:14                 ` Mattias Engdegård
2024-03-12 11:33                   ` Gerd Möllmann
2024-03-15 13:35                 ` Herman, Géza
2024-03-15 14:56                   ` Gerd Möllmann
2024-03-19 18:49                   ` Mattias Engdegård
2024-03-19 19:05                     ` Herman, Géza
2024-03-19 19:18                       ` Gerd Möllmann
2024-03-19 19:13                     ` Gerd Möllmann
2024-03-12 10:58               ` Herman, Géza
2024-03-12 13:11                 ` Mattias Engdegård
2024-03-12 13:42                   ` Mattias Engdegård
2024-03-12 15:23                   ` Herman, Géza
2024-03-12 15:39                     ` Gerd Möllmann
2024-03-10  6:58   ` Herman, Géza
2024-03-10 16:54     ` Christopher Wellons
2024-03-10 20:41       ` Herman, Géza
2024-03-10 23:22         ` Christopher Wellons
2024-03-11  9:34           ` Herman, Géza
2024-03-11 13:47             ` Christopher Wellons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a5n8to8m.fsf@gmail.com \
    --to=geza.herman@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).