From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Jean Louis Newsgroups: gmane.emacs.help Subject: Re: Any faster way to find frequency of words? Date: Sun, 9 May 2021 22:03:21 +0300 Message-ID: References: <87mtt40x2n.fsf@zoho.eu> <87v97rzt1d.fsf@zoho.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17217"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/2.0.6 (2021-03-06) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sun May 09 21:09:24 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lfonr-0004NN-Pw for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 09 May 2021 21:09:23 +0200 Original-Received: from localhost ([::1]:59474 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lfonq-0005nc-SX for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 09 May 2021 15:09:22 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:50960) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lfon7-0005nU-Ia for help-gnu-emacs@gnu.org; Sun, 09 May 2021 15:08:37 -0400 Original-Received: from stw1.rcdrun.com ([217.170.207.13]:42503) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lfomz-00052B-Tf for help-gnu-emacs@gnu.org; Sun, 09 May 2021 15:08:37 -0400 Original-Received: from localhost ([::ffff:197.239.7.47]) (AUTH: PLAIN securesender, TLS: TLS1.3,256bits,ECDHE_RSA_AES_256_GCM_SHA384) by stw1.rcdrun.com with ESMTPSA id 00000000000ABF23.000000006098332B.00005A2C; Sun, 09 May 2021 12:08:26 -0700 Mail-Followup-To: help-gnu-emacs@gnu.org Content-Disposition: inline In-Reply-To: <87v97rzt1d.fsf@zoho.eu> Received-SPF: pass client-ip=217.170.207.13; envelope-from=bugs@gnu.support; helo=stw1.rcdrun.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:129638 Archived-At: * Emanuel Berg via Users list for the GNU Emacs text editor [2021-05-09 21:01]: > Jean Louis wrote: > > > I think that your (4) is not necessary, as counting is > > not necessary. > > Some counting is if you are to learn the frequency. Iterating and increasing the value is not same as counting. That first creates the frequency of words. Counting could be useful when finding the most frequent words. But even in that case programmatical comparison of what is greater seem to be enough. Maybe the underlying C program is counting. > BTW the theoretical worst-case would be a buffer where all > words are unique. Buffer cost is almost 1, ultimately n. > With the theoretical worst-case, data structure would be, if > linear, like this Heaven thanks it is not theoretical case, in practice it just finds frequencies of words in some kilobytes. For speedy searching by word frequencies I am using PostgreSQL with Emacs interface. -- Jean Take action in Free Software Foundation campaigns: https://www.fsf.org/campaigns Sign an open letter in support of Richard M. Stallman https://stallmansupport.org/ https://rms-support-letter.github.io/