From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.help Subject: Re: Most used words in current buffer Date: Sat, 21 Jul 2018 20:57:45 -0700 Message-ID: <87in57rkg6.fsf@ericabrahamsen.net> References: <861sc1iu1m.fsf@zoho.com> <87pnzkcgna.fsf@bsb.me.uk> <20180719140935156302029@bob.proulx.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1532231783 5837 195.159.176.226 (22 Jul 2018 03:56:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 22 Jul 2018 03:56:23 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sun Jul 22 05:56:18 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fh5Tm-0001N3-2i for geh-help-gnu-emacs@m.gmane.org; Sun, 22 Jul 2018 05:56:18 +0200 Original-Received: from localhost ([::1]:54704 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fh5Vr-0004Ef-9b for geh-help-gnu-emacs@m.gmane.org; Sat, 21 Jul 2018 23:58:27 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48408) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fh5VR-0004EO-UP for help-gnu-emacs@gnu.org; Sat, 21 Jul 2018 23:58:02 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fh5VO-0005uI-TH for help-gnu-emacs@gnu.org; Sat, 21 Jul 2018 23:58:02 -0400 Original-Received: from [195.159.176.226] (port=37827 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fh5VO-0005tv-Kj for help-gnu-emacs@gnu.org; Sat, 21 Jul 2018 23:57:58 -0400 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1fh5TC-0000lr-AX for help-gnu-emacs@gnu.org; Sun, 22 Jul 2018 05:55:42 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 43 Original-X-Complaints-To: usenet@blaine.gmane.org Cancel-Lock: sha1:HUlYQLt/Xv1bq0SIh8bNTLUg3x0= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.159.176.226 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:117543 Archived-At: Udyant Wig writes: > On 07/21/2018 09:45 PM, Eric Abrahamsen wrote: >> Interesting... In general I think Emacs is highly optimized to use the >> buffer as its textual data structure, more so than a string. >> Particularly when the code is compiled (many of the text-movement >> commands have opcodes). I made the following two commands to collect >> words from a novel in an Org file, and the one that uses >> `forward-word' and `buffer-substring' runs around twice as fast as the >> `split-string'. >> >> Of course, they don't collect the same list of words! But even if you >> add more code for trimming, etc., it will still likely be faster than >> operating on a string. >> [snip code] > > I have acted upon the advice (yours and Stefan Monnier's) to operate on > the buffer directly using BUFFER-SUBSTRING. Please see my follow up to > Stefan's message. > > BUFFER-SUBSTRING did gain me (somewhat) better performance. As Stefan said, going character by character is going to be slow... But my example with `forward-word' collects a lot of cruft. So I would suggest doing what `forward-word' does internally and move by syntax. This also opens up the possibility of tweaking the behavior of your function (ie, what constitutes a word) by setting temporary syntax tables. Here's a word scanner that only picks up actual words (according to the default syntax table): (defun test-buffer (&optional f) (let ((file (or f "/home/eric/org/hollowmountain.org")) pnt lst) (with-temp-buffer (insert-file-contents file) (goto-char (point-min)) (skip-syntax-forward "^w") (setq pnt (point)) (while (and (null (eobp)) (skip-syntax-forward "w")) (push (buffer-substring pnt (point)) lst) (skip-syntax-forward "^w") (setq pnt (point)))) (nreverse lst)))