From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Emanuel Berg via Users list for the GNU Emacs text editor Newsgroups: gmane.emacs.help Subject: Re: Any faster way to find frequency of words? Date: Sun, 09 May 2021 20:00:30 +0200 Message-ID: <87v97rzt1d.fsf@zoho.eu> References: <87mtt40x2n.fsf@zoho.eu> Reply-To: Emanuel Berg Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29390"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Cancel-Lock: sha1:zJOzKRcMiEzEDUSMhewZMRuHvA0= Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sun May 09 20:01:16 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lfnjr-0007NU-Af for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 09 May 2021 20:01:11 +0200 Original-Received: from localhost ([::1]:45382 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lfnjq-0007Su-CP for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 09 May 2021 14:01:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40382) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lfnjO-0007Sm-AJ for help-gnu-emacs@gnu.org; Sun, 09 May 2021 14:00:42 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]:52314) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lfnjM-0005Y6-KW for help-gnu-emacs@gnu.org; Sun, 09 May 2021 14:00:42 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1lfnjK-0006fO-BL for help-gnu-emacs@gnu.org; Sun, 09 May 2021 20:00:38 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: help-gnu-emacs@gnu.org Mail-Copies-To: never Received-SPF: pass client-ip=116.202.254.214; envelope-from=geh-help-gnu-emacs@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:129634 Archived-At: Jean Louis wrote: > I think that your (4) is not necessary, as counting is > not necessary. Some counting is if you are to learn the frequency. How about `forward-word' the whole buffer and for every word feed it to a data structure, which keeps a record and a digit and increase that by 1? Then the challenge would be to pick a data structure where searching is fast and in particular where search time doesn't _grow_ fast with respect to it's overall size growing (size = the number of unique words) BTW the theoretical worst-case would be a buffer where all words are unique. Buffer cost is almost 1, ultimately n. With the theoretical worst-case, data structure would be, if linear, like this if we denote buffer cost : data structure cost 1: 0 <-- first word 1: 1 1: 2 1: 3 .. 1: n + 1 <-- last word linear! But probably data structure cost is less than linear, say logarithmic, then we would have linear(n) + n * logarithmic(n) linear(n) will grow the faster, so linear! Whatever you do with the data structure, it'll be fast enough! -- underground experts united https://dataswamp.org/~incal