From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Xah Lee Newsgroups: gmane.emacs.help Subject: Re: processing a large buffer contents to a hash table Date: Fri, 9 Jan 2009 09:47:06 -0800 (PST) Organization: http://groups.google.com Message-ID: <84abced6-c9c2-4c8b-93f1-24174287060b@y1g2000pra.googlegroups.com> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1231544019 8233 80.91.229.12 (9 Jan 2009 23:33:39 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 9 Jan 2009 23:33:39 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Jan 10 00:34:50 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1LLQsJ-0000qJ-Gy for geh-help-gnu-emacs@m.gmane.org; Sat, 10 Jan 2009 00:34:48 +0100 Original-Received: from localhost ([127.0.0.1]:55643 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LLQr3-0005vM-Ev for geh-help-gnu-emacs@m.gmane.org; Fri, 09 Jan 2009 18:33:29 -0500 Original-Path: news.stanford.edu!newsfeed.stanford.edu!postnews.google.com!y1g2000pra.googlegroups.com!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 46 Original-NNTP-Posting-Host: 24.6.175.142 Original-X-Trace: posting.google.com 1231523226 23577 127.0.0.1 (9 Jan 2009 17:47:06 GMT) Original-X-Complaints-To: groups-abuse@google.com Original-NNTP-Posting-Date: Fri, 9 Jan 2009 17:47:06 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: y1g2000pra.googlegroups.com; posting-host=24.6.175.142; posting-account=qPxGtQkAAADb6PWdLGiWVucht1ZDR6fn User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1, gzip(gfe), gzip(gfe) Original-Xref: news.stanford.edu gnu.emacs.help:165881 X-Mailman-Approved-At: Fri, 09 Jan 2009 18:32:22 -0500 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:61228 Archived-At: On Jan 9, 8:56 am, Seweryn Kokot wrote: > Hello, > > I'm trying to write a function in elisp which returns word frequency of > a buffer content . It works but for a large file > (around 250 000 words) it takes 15 seconds, while a similar function in p= ython > takes 4s. > > Here is the function which process a buffer word by word and write word > frequency to a hash table. > > (defun word-frequency-process-buffer () > (interactive) > (let ((buffer (current-buffer)) bounds beg end word) > (save-excursion > (goto-char (point-min)) > (while (re-search-forward "\\<[[:word:]]+\\>" nil t) > ;; (while (forward-word 1) > (setq bounds (bounds-of-thing-at-point 'word)) > (setq beg (car bounds)) > (setq end (cdr bounds)) > (setq word (downcase (buffer-substring-no-properties beg = end))) > (word-frequency-incr word) > )))) > > The main function is word-frequency which operates on the current buffer > and gives *frequencies* with word statistics. > > Any idea how to optimize `word-frequency-process-buffer' function? > you doing many unnecessary things. you don't need save-excursion. don't need setting the bunch of boundary. you can use just match-string to capture the word and feed it to word- frequency-incr. ... Xah =E2=88=91 http://xahlee.org/ =E2=98=84