From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#48734: 28.0.50; Performance regression in `string-width`? Date: Sun, 30 May 2021 09:42:29 +0300 Message-ID: <83o8cs4t9m.fsf@gnu.org> References: <87a6odmfp6.fsf@teknik.io> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="39624"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 48734@debbugs.gnu.org To: Imran Khan Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun May 30 08:43:36 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lnFAe-000A8y-CC for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 30 May 2021 08:43:36 +0200 Original-Received: from localhost ([::1]:42328 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lnFAd-0003bl-Cs for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 30 May 2021 02:43:35 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:58148) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lnFA6-0003Zw-Jd for bug-gnu-emacs@gnu.org; Sun, 30 May 2021 02:43:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:46759) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lnFA6-0007Gw-C7 for bug-gnu-emacs@gnu.org; Sun, 30 May 2021 02:43:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lnFA6-0003zB-9E for bug-gnu-emacs@gnu.org; Sun, 30 May 2021 02:43:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 30 May 2021 06:43:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 48734 X-GNU-PR-Package: emacs Original-Received: via spool by 48734-submit@debbugs.gnu.org id=B48734.162235694715278 (code B ref 48734); Sun, 30 May 2021 06:43:02 +0000 Original-Received: (at 48734) by debbugs.gnu.org; 30 May 2021 06:42:27 +0000 Original-Received: from localhost ([127.0.0.1]:58305 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lnF9W-0003yL-UJ for submit@debbugs.gnu.org; Sun, 30 May 2021 02:42:27 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:36194) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lnF9V-0003yA-EQ for 48734@debbugs.gnu.org; Sun, 30 May 2021 02:42:25 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:34306) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lnF9Q-0006lE-1s; Sun, 30 May 2021 02:42:20 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:2638 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lnF9P-0002pc-LL; Sun, 30 May 2021 02:42:19 -0400 In-Reply-To: <87a6odmfp6.fsf@teknik.io> (message from Imran Khan on Sun, 30 May 2021 02:45:57 +0600) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:207592 Archived-At: > From: Imran Khan > Date: Sun, 30 May 2021 02:45:57 +0600 > > A package I use (deft-mode) has been hanging for minutes with high cpu > use recently. Profiler says most time is spent in `string-width`, and > upon looking it seems to happen in files that have multibyte characters > in them. > > I reproduced the problem by creating a file that has both single and > multi byte characters: > > with open("/tmp/test", "w") as f: > for i in range(50_000): > print("1", file=f, end="") > print("α", file=f, end="") > > And now: > > (benchmark-run 1 > (let ((str)) > (with-temp-buffer > (insert-file-contents-literally "/tmp/test") > (setq str (buffer-string))) > (string-width str))) > > This takes 20 seconds in my machine (if string is exclusively full of > either single or multibyte characters, weirdly it seems to finish > instantly). Since you use insert-file-contents-literally, why don't you also make the temporary buffer unibyte? That is: (benchmark-run 1 (let ((str)) (with-temp-buffer (set-buffer-multibyte nil) ; <<<<<<<<<<<<<<<<<<<<<<<<<<<<< (insert-file-contents-literally "/tmp/test") (setq str (buffer-string))) (string-width str))) Or maybe I don't understand your real-life use case? Because if you treat the file as a raw bytestream, why do you need to compute the width of its text?