From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Herman=2C_G=C3=A9za?= Newsgroups: gmane.emacs.devel Subject: Re: I created a faster JSON parser Date: Fri, 08 Mar 2024 16:20:40 +0100 Message-ID: <875xxw3f3a.fsf@gmail.com> References: <87a5n96mb5.fsf@gmail.com> <861q8l0w2c.fsf@gnu.org> <878r2s99j0.fsf@gmail.com> <86y1aszxom.fsf@gnu.org> <874jdg97xm.fsf@gmail.com> <86ttlgzuew.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1066"; mail-complaints-to="usenet@ciao.gmane.io" Cc: =?utf-8?Q?G=C3=A9za?= Herman , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Mar 08 16:43:24 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ricNY-00009c-8K for ged-emacs-devel@m.gmane-mx.org; Fri, 08 Mar 2024 16:43:24 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ricMe-0003sf-RF; Fri, 08 Mar 2024 10:42:28 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ricMc-0003sW-1a for emacs-devel@gnu.org; Fri, 08 Mar 2024 10:42:26 -0500 Original-Received: from mail-lf1-x132.google.com ([2a00:1450:4864:20::132]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ricMZ-0002Oh-Gl; Fri, 08 Mar 2024 10:42:24 -0500 Original-Received: by mail-lf1-x132.google.com with SMTP id 2adb3069b0e04-512f54fc2dbso2052997e87.1; Fri, 08 Mar 2024 07:42:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709912541; x=1710517341; darn=gnu.org; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:references:from:to:cc:subject:date:message-id :reply-to; bh=Fl7MjvlYvw23Fw6JQMexjsHZ2Xs4ZQ+T5LtWFl6RgwI=; b=l/gQmPwK08UiGn6NNHw3worXPLKHDeih59NPu5EXXH1p017mgcWk+sm5p0jTScAuf9 eqppH2NIb6ZblUacxByoMgmcHn/rr0i/27l273hxhL+D411okgMPknOTnS2FG/u7fPKo cQ4VA25T9nBN7Urtj2iAWk+/Gmi9P9VNKz7NCCzJBqVKRtZiiJvnchULF/yBz5avImQX DKBLqFJ+HfLcqlUIWCeQ+wone5sXVuLX6Om9VXOxKsjLmOzNGuvaYei0OK9UPkoerXDa qia1snHiQr+iQiukdN/ihv+irpZQmRGK2Bdyt8HB/hxaoFr3ComwaNEccz1hYx8qMKyO 5HLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709912541; x=1710517341; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:references:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Fl7MjvlYvw23Fw6JQMexjsHZ2Xs4ZQ+T5LtWFl6RgwI=; b=HSBC1UlV5znE3Z7uuln6qaIwqMT2jeCCS9HMuQS3MhRw9M3qlbVWsjbM+9DRWxUxcI qTMRgWgn/1K27ekzG7L5/rJDg+l5L/nVv3Gi9Tyj1r0Er9fSpnyv4ZT2Rqp9kGMiTqG/ HbydwUyPRIb+Re3XNMyMtYeslq3l9o3tkZy4W5s/XrERkWoti/Ecb2ARm0ArYmgL4eTh u/aebycOUt2NwYQB+dTuDh42O9hrEVxytYCoQFYdF3Yy0N9PBZwK0gwBV65FlhwF8EW2 sN8UngqtChg5vQxvZfHAw7CLmD3dBIKv7h2+mPXJiIjRv/3g1owdoFCkITt16+F+GDDu 7lOQ== X-Forwarded-Encrypted: i=1; AJvYcCWbgUT69EGySJq3FBoiADW5cGXvxgYThbjZEAB+VULeWIX16Z2Vm4QVcecByDFNl8nW87gcB1bkfw/tYCcX8HssplQD X-Gm-Message-State: AOJu0YyoIYBGu8fgPelq66phlZ2pdtSBSr+SALqlY+bMj06tXeIxuxb1 UPzf6HR4purXTDXj1eJlXlFbWZ/SCz4PkRsNQbFnKmTbFe0iSGjSwBXqF7tB X-Google-Smtp-Source: AGHT+IH9VUDQAtW64NQJqkhN9hSVjPDeDG21IgAn0YG8koID1rjX61CJxqunY+V053vU0um3SsSFaA== X-Received: by 2002:ac2:5a5c:0:b0:512:bdcd:f22b with SMTP id r28-20020ac25a5c000000b00512bdcdf22bmr3389293lfn.64.1709912540632; Fri, 08 Mar 2024 07:42:20 -0800 (PST) Original-Received: from localhost (netacc-gpn-4-80-29.pool.yettel.hu. [84.224.80.29]) by smtp.gmail.com with ESMTPSA id u2-20020adfeb42000000b0033b483d1abcsm23057087wrn.53.2024.03.08.07.42.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 07:42:20 -0800 (PST) In-reply-to: <86ttlgzuew.fsf@gnu.org> Received-SPF: pass client-ip=2a00:1450:4864:20::132; envelope-from=geza.herman@gmail.com; helo=mail-lf1-x132.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:316915 Archived-At: Eli Zaretskii writes: >> From: Herman, G=C3=A9za >> Cc: G=C3=A9za Herman , >> emacs-devel@gnu.org >> Date: Fri, 08 Mar 2024 14:12:19 +0100 > The following is based on an initial reading of the patch: > > . Redundant braces (for blocks of a single code line) is one=20 > issue. > . The way you break a long line at the equals sign '=3D' is=20 > another (we > break after '=3D', not before). I used clang-format to format my code (I use a completely=20 different coding style). I see that clang-format is configured=20 this way in Emacs. Shouldn't BreakBeforeBinaryOperators be set to=20 None or NonAssignment in .clang-format? > . The code which handles integers seems to assume that=20 > 'unsigned long' > is a 64-bit type? if so, this is not true on Windows; please=20 > see how > we handle this elsewhere in Emacs, in particular in the > WIDE_EMACS_INT case. That was a mistake on my part, though a different (but similar)=20 one. I originally used a 64-bit type, but then changed it to=20 long, because of 32-bit architectures. The idea is to use a type=20 which likely has the same size as a CPU register. So I think long=20 is OK, I just need to change the thresholds to ULONG_MAX. Or I=20 think I'll use ckd_* functions as Collin suggested. > A more general comment is that you seem to be parsing buffer=20 > text > assuming it's UTF-8? If so, this is not accurate, as the=20 > internal > representation is a superset of UTF-8, and can represent=20 > characters > above 0x10FFFF. When does a buffer have characters above 0x10ffff? I supposed=20 that a JSON shouldn't contain characters that are out of range.=20 But if the solution is to just remove the upper-range comparison,=20 I can do that easily. >> Can you please send me the necessary documents? > > Sent off-list. Thanks!