From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Pip Cet Newsgroups: gmane.emacs.devel Subject: Re: MPS: marker-vector Date: Mon, 05 Aug 2024 21:14:13 +0000 Message-ID: <87bk26vfj1.fsf@protonmail.com> References: <87le2hp6ug.fsf@localhost> <87v81455iw.fsf@gmail.com> <87jzh8djdt.fsf@gmail.com> <87a5hqsq2l.fsf_-_@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22883"; mail-complaints-to="usenet@ciao.gmane.io" Cc: =?utf-8?Q?Gerd_M=C3=B6llmann?= , Stefan Monnier , Ihor Radchenko , emacs-devel@gnu.org To: Helmut Eller Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Aug 06 04:25:45 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sb9tQ-0005iI-Kv for ged-emacs-devel@m.gmane-mx.org; Tue, 06 Aug 2024 04:25:44 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sb9sk-0007F0-If; Mon, 05 Aug 2024 22:25:02 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sb527-0001xf-M9 for emacs-devel@gnu.org; Mon, 05 Aug 2024 17:14:23 -0400 Original-Received: from mail-4316.protonmail.ch ([185.70.43.16]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sb525-0000pR-4d; Mon, 05 Aug 2024 17:14:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1722892457; x=1723151657; bh=8m4pUwHO/RJTFx/B2YdKh2lM+qaNF1Xsw9763QwfnEk=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=X6I+wIW3emHxS/N3HdbIDemXW2LkVTtqqSQxe0EXFtz8F3GvuLaWzl4GQw+i/kiGz jRgYZx6MihStBqtNGnpzZ1LQ5pu9YC9ZDambhrj2vOI9VcUFnEj9B5b6pgSgruACwy 3/f3KCbdIu+2CNAmgm/ZY8XAXfQLzvTNXo4Zr2lGsJTFn0ffVlY208X82+q/5EHBOY KBTYSSnWG4D1BuKdtkjwyia7mYreUsoqFUQiCy6ds9PY7qcwBpJQSCMNseXhsgmint jsdz4NIN/CFQOUWN32xZTU4cryGnVUB/aj4dusBc4CXpfMDt1d3JBnFdEtkBR2i0mE yDJjNqoiScWDg== In-Reply-To: <87a5hqsq2l.fsf_-_@gmail.com> Feedback-ID: 112775352:user:proton X-Pm-Message-ID: 750fdc0c402afd44be07922e81b0d5c184eed4dd Received-SPF: pass client-ip=185.70.43.16; envelope-from=pipcet@protonmail.com; helo=mail-4316.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Mon, 05 Aug 2024 22:25:00 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:322419 Archived-At: "Helmut Eller" writes: > What would you think about changing the marker-vector-with-free-list to > a mundane growable array like in the patch below? > The main motivation for this would that we could then iterate in > reverse, or more precisely, in an order that is more like the order used > for the linked-list-of-markers. This is relevant for the heuristic in > buf_charpos_to_bytepos that creates a temporary marker; it works better > if those temporary markers are visited first. Thanks for investigating this! I finally understand why the current MPS code is slower now, I think. I wonder whether we couldn't reuse more of the weak hash table code for this, though... > This replaces the vector-with-free-list by a growable vector, i.e. the > free entries are always kept at the end of the vector. I don't think that's entirely accurate; it's quite possible for an entry at the beginning of the array to be splatted and remain untouched until the next time a DO_MARKERS reaches it, which may take a long time. I see one somewhat theoretical problem with this patch: if a marker is simultaneously kept in a weak hash table, it's possible for it to be splatted from the marker vector while remaining in the weak hash table (there's no guarantee all references will be splatted at the same time); if it is then retrieved from the weak hash table and made to point nowhere, we will try to remove it from the marker vector, and hit the igc_assert. The rest of my comments are tiny nits, really: - capacity isn't redundant on 32-bit systems - I'd prefer the marker index to be signed; if it is unsigned, we don't nee= d to assert it's >=3D 0, and assigning -1 to it confused me... - you shouldn't compare Lisp_Objects with =3D=3D - I'd prefer checking for splatted elements before deciding to grow the vec= tor, if we can do so efficiently - I find XFIXNAT easier to read when the number is guaranteed to be nonnega= tive - using alloca is problematic for large vectors (which shouldn't be dumped,= thus a nit) > Using Stefan's elb-bytechar benchmark, I get for the > linked-list-of-markers: > > | test || tot avg (s) | tot avg err (s) | > |------------------------++-------------+-----------------| > | bytechar || 11.80 | 0.00 | > | bytechar-100k || 11.85 | 0.00 | > | bytechar-100k-nolookup || 9.19 | 0.00 | > | bytechar-100k-random || 16.73 | 0.02 | > | bytechar-100k-rev || 11.86 | 0.00 | > | bytechar-10k-random || 12.36 | 0.01 | > | bytechar-1k-random || 11.93 | 0.00 | > | bytechar-nolookup || 9.15 | 0.00 | > |------------------------++-------------+-----------------| > | total || 94.88 | 0.02 | > > for the vector-with-free-list: > > | test || tot avg (s) | tot avg err (s) | > |------------------------++-------------+-----------------| > | bytechar || 11.63 | 0.01 | > | bytechar-100k || 11.91 | 0.37 | > | bytechar-100k-nolookup || 8.80 | 0.01 | > | bytechar-100k-random || 248.07 | 3.84 | > | bytechar-100k-rev || 11.71 | 0.02 | > | bytechar-10k-random || 35.24 | 0.53 | > | bytechar-1k-random || 14.01 | 0.06 | > | bytechar-nolookup || 8.69 | 0.13 | > |------------------------++-------------+-----------------| > | total || 350.06 | 3.89 | > > and for the growable array: > > | test || tot avg (s) | tot avg err (s) | > |------------------------++-------------+-----------------| > | bytechar || 11.34 | 0.08 | > | bytechar-100k || 11.59 | 0.47 | > | bytechar-100k-nolookup || 8.78 | 0.12 | > | bytechar-100k-random || 16.17 | 0.33 | > | bytechar-100k-rev || 11.31 | 0.03 | > | bytechar-10k-random || 11.76 | 0.01 | > | bytechar-1k-random || 11.34 | 0.08 | > | bytechar-nolookup || 8.70 | 0.09 | > |------------------------++-------------+-----------------| > | total || 91.00 | 0.61 | Hard to argue with those numbers :-) I wonder whether it wouldn't be faster, upon encountering a marker that has been splatted, to fix the entire array all at once. That would ensure that creation order is respected, and splatting is relatively rare (and, when splatted, we can expect most of the array to have been splatted; indeed, I suspect it'd be best to give up on the marker vector and build a new one so the old one can be collected and we don't have to worry about never shrinking it). Pip