From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.help Subject: Re: Most used words in current buffer Date: Sat, 21 Jul 2018 21:00:36 -0700 Message-ID: <87bmazrkbf.fsf@ericabrahamsen.net> References: <861sc1iu1m.fsf@zoho.com> <87pnzkcgna.fsf@bsb.me.uk> <20180719140935156302029@bob.proulx.com> <87in57rkg6.fsf@ericabrahamsen.net> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1532232055 23263 195.159.176.226 (22 Jul 2018 04:00:55 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 22 Jul 2018 04:00:55 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sun Jul 22 06:00:51 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fh5Y9-0005wC-U3 for geh-help-gnu-emacs@m.gmane.org; Sun, 22 Jul 2018 06:00:50 +0200 Original-Received: from localhost ([::1]:54718 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fh5aG-0005Mi-HS for geh-help-gnu-emacs@m.gmane.org; Sun, 22 Jul 2018 00:03:00 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48925) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fh5Zd-0005MS-LI for help-gnu-emacs@gnu.org; Sun, 22 Jul 2018 00:02:22 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fh5Za-0007N7-Jl for help-gnu-emacs@gnu.org; Sun, 22 Jul 2018 00:02:21 -0400 Original-Received: from [195.159.176.226] (port=43422 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fh5Za-0007Mx-BR for help-gnu-emacs@gnu.org; Sun, 22 Jul 2018 00:02:18 -0400 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1fh5XQ-0005Ft-UL for help-gnu-emacs@gnu.org; Sun, 22 Jul 2018 06:00:04 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 32 Original-X-Complaints-To: usenet@blaine.gmane.org Cancel-Lock: sha1:mL1PUO3/xUyUe6TR3LQY2SrCz5A= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.159.176.226 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:117544 Archived-At: Eric Abrahamsen writes: > Udyant Wig writes: > >> On 07/21/2018 09:45 PM, Eric Abrahamsen wrote: >>> Interesting... In general I think Emacs is highly optimized to use the >>> buffer as its textual data structure, more so than a string. >>> Particularly when the code is compiled (many of the text-movement >>> commands have opcodes). I made the following two commands to collect >>> words from a novel in an Org file, and the one that uses >>> `forward-word' and `buffer-substring' runs around twice as fast as the >>> `split-string'. >>> >>> Of course, they don't collect the same list of words! But even if you >>> add more code for trimming, etc., it will still likely be faster than >>> operating on a string. >>> [snip code] >> >> I have acted upon the advice (yours and Stefan Monnier's) to operate on >> the buffer directly using BUFFER-SUBSTRING. Please see my follow up to >> Stefan's message. >> >> BUFFER-SUBSTRING did gain me (somewhat) better performance. > > As Stefan said, going character by character is going to be slow... But > my example with `forward-word' collects a lot of cruft. So I would > suggest doing what `forward-word' does internally and move by syntax. Actually I think alternating `forward-word' with `forward-to-word' might do the exact same thing as alternating (skip-syntax-forward "w") with (skip-syntax-forward "^w"), and might get you some extra... stuff. Maybe worth benchmarking!