From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id EMXBEztjH2KDQwAAgWs5BA (envelope-from ) for ; Wed, 02 Mar 2022 13:29:47 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id EOsgETtjH2LrWgAA9RJhRA (envelope-from ) for ; Wed, 02 Mar 2022 13:29:47 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C81D63FDC7 for ; Wed, 2 Mar 2022 13:29:46 +0100 (CET) Received: from localhost ([::1]:49230 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nPO6y-0005bF-MU for larch@yhetil.org; Wed, 02 Mar 2022 07:29:44 -0500 Received: from eggs.gnu.org ([209.51.188.92]:47176) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nPO0f-0002rn-1L for emacs-orgmode@gnu.org; Wed, 02 Mar 2022 07:23:13 -0500 Received: from ciao.gmane.io ([116.202.254.214]:55816) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nPO0d-0008Tb-DB for emacs-orgmode@gnu.org; Wed, 02 Mar 2022 07:23:12 -0500 Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1nPO0a-0006x6-BJ for emacs-orgmode@gnu.org; Wed, 02 Mar 2022 13:23:08 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: emacs-orgmode@gnu.org From: Max Nikulin Subject: Re: profiling latency in large org-mode buffers (under both main & org-fold feature) Date: Wed, 2 Mar 2022 19:23:02 +0700 Message-ID: References: <87fsobpism.fsf@localhost> <87r17to817.fsf@localhost> <87lexy2hrz.fsf@localhost> <87y21wkdwu.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Content-Language: en-US In-Reply-To: <87y21wkdwu.fsf@localhost> Received-SPF: pass client-ip=116.202.254.214; envelope-from=geo-emacs-orgmode@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 28 X-Spam_score: 2.8 X-Spam_bar: ++ X-Spam_report: (2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FORGED_MUA_MOZILLA=2.309, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, NICE_REPLY_A=-0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1646224187; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=fiRAGcsrp+yYEGFx4tPBEN3oZ4AvjzSDTxFzuf+9QYw=; b=tWXYcnCufK4OsbdGfcQ+m2NAHlFpoCG5MBV9AfvhwAhOjKl/1cRYEzOw+6O+N39z35qJx8 BrusGlZiS6cDM2pj7v5V/iyOW7FNye7iCzVgeJcq5E4CZ7pkdkc0mTA/nmsXe+btpgP6Dl lVyjBGh8MbpLNiPK/Tu1LBKcBCemI9f09BNNamTXpZL9aJGo6IwbrJot3gDyYqEDSzyUFi z2lUxBcyntN9cZzS1Qs+AwwZZMqKT15P5loSpmU33u+HvHmYuCVsjExO9jyYLM0jXo6jOs cKfECAB+alQmsGWbVOv1bUh9TPzGXhBEqjgJZcPWqpgoo36CjPVInMuhp/s0rA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1646224187; a=rsa-sha256; cv=none; b=DhJJM4s6qCjdM5sQ9hUoOiQUFbu3IY0ZXIvr+epwSv7GG22CFe4FmFmmjsCxxkz9JcYtA4 XT8y4X+1B/vIQnaLbot17E6L7H6xCR/I/uIRw/MDPrNPuBUCYsieYSX0xeS9/C1uKHVeK3 TuL++RmIF5PaatobYvaZajxZKuchQSMjcFgMbpEgnLAjU9PDa4UiPaCSS4q7L44/9etYtK F++PwzpPvG0XRoiuYwQhAWbOu5dxDDTA5Ds7r+430jenfHflv9tSC3FVS5nMLv0c06jKJs M/mUPf3BTO+hwpy5RAbwB3Q63GflcDUTv1YHxy0XQAdRhNqSKpA1giekAeO8Kg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: 2.42 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: C81D63FDC7 X-Spam-Score: 2.42 X-Migadu-Scanner: scn1.migadu.com X-TUID: QRymXsZX8IaX On 27/02/2022 13:43, Ihor Radchenko wrote: > > Now, I did an extended profiling of what is happening using perf: > > 6.20% [.] buf_bytepos_to_charpos Maybe I am interpreting such results wrongly, but it does not look like a bottleneck. Anyway thank you very much for such efforts, however it is unlikely that I will join to profiling in near future. > buf_bytepos_to_charpos contains the following loop: > > for (tail = BUF_MARKERS (b); tail; tail = tail->next) > { > CONSIDER (tail->bytepos, tail->charpos); > > /* If we are down to a range of 50 chars, > don't bother checking any other markers; > scan the intervening chars directly now. */ > if (best_above - bytepos < distance > || bytepos - best_below < distance) > break; > else > distance += BYTECHAR_DISTANCE_INCREMENT; > } > > I am not sure if I understand the code correctly, but that loop is > clearly scaling performance with the number of markers I may be terribly wrong, but it looks like an optimization attempt that may actually ruin performance. My guess is the following. Due to multibyte characters position in buffer counted in characters may significantly differ from index in byte sequence. Since markers have both values bytepos and charpos, they are used (when available) to narrow down initial estimation interval [0, buffer size) to nearest existing markers. The code below even creates temporary markers to make next call of the function faster. It seems, buffers do not have any additional structures that track size in bytes and in characters of spans (I would not expect that representation of whole buffer in memory is single contiguous byte array). When there are no markers at all, the function has to iterate over each character and count its length. The problem is that when the buffer has a lot of markers far aside from the position passed as argument, then iteration over markers just consumes CPU with no significant improvement of original estimation of boundaries. If markers were organized in a tree than search would be much faster (at least for buffers with a lot of markers. In some cases such function may take a hint: previous known bytepos+charpos pair. I hope I missed something, but what I can expect from the code of buf_bytepos_to_charpos is that it is necessary to iterate over all markers to update positions after each typed character. > Finally, FYI. I plan to work on an alternative mechanism to access Org > headings - generic Org query library. It will not use markers and > implement ideas from org-ql. org-refile will eventually use that generic > library instead of current mechanism. I suppose that markers might be implemented in an efficient way, and much better performance may be achieved when low-level data structures are accessible. I am in doubts concerning attempts to create something that resembles markers but based purely on high-level API.