From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Udyant Wig Newsgroups: gmane.emacs.help Subject: Re: Most used words in current buffer Date: Thu, 19 Jul 2018 11:03:45 +0530 Organization: A noiseless patient Spider Message-ID: References: <861sc1iu1m.fsf@zoho.com> <87pnzkcgna.fsf@bsb.me.uk> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1531978403 7971 195.159.176.226 (19 Jul 2018 05:33:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 19 Jul 2018 05:33:23 +0000 (UTC) Injection-Date: Thu, 19 Jul 2018 05:33:50 -0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Jul 19 07:33:19 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fg1Z0-0001xR-Ss for geh-help-gnu-emacs@m.gmane.org; Thu, 19 Jul 2018 07:33:19 +0200 Original-Received: from localhost ([::1]:39813 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fg1b7-0000BU-P1 for geh-help-gnu-emacs@m.gmane.org; Thu, 19 Jul 2018 01:35:29 -0400 Original-Path: usenet.stanford.edu!goblin1!goblin.stu.neva.ru!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 55 Original-Injection-Info: h2725194.stratoserver.net; posting-host="904bcfad2d96091c5afd33fe80e5d657"; logging-data="31862"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+0DRMFLz4TdyDT67tsikZP" Cancel-Lock: sha1:Mq4bZc9liCSrMsPK8JhmKyOLfxM= In-Reply-To: Content-Language: en-US Openpgp: preference=signencrypt Original-Xref: usenet.stanford.edu gnu.emacs.help:223367 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:117492 Archived-At: On 07/19/2018 06:15 AM, Bob Proulx wrote: > Not wanting to be too annoying but I see no hashing in the awk > solution. It is using an awk associative array to store the words. > Perl and Pything call those "hashes" but they are just associative > arrays. I understand that associative arrays in awk are built upon hashing. Kernighan and Pike say The implementation of associative memory uses a hashing scheme to ensure that access to any element takes about the same time as to any other, and that (at least for moderate array sizes) the time doesn't depend on how many elements are in the array. However, on the previous page, in introducing the language construct, they do take the name _associative array_. > I will continue to be contrary here and say that awk does a much > better job of cutting by whitespace separated fields than does cut. > Both are standard and should be available everywhere. And here > because awk is already in use I expect it to be somewhat more > efficient to use awk again in the pipeline than to use a different > program. > > I also wish to improve the command line somewhat. Using $* by itself > does not sufficiently quote program arguments with whitespace. One > should use "$@" for that purpose. Also the old forms of sort and head > would be better left behind and use the new portable option set for > them instead. Let me suggest: > > ' "$@" | sort -k2,2nr | head -n10 | awk '{ print $1 }' > > Bob Thank you for the portable pipeline. It is interesting to compare it with the pipeline given in the book: $ wordfreq ch4.* | sort +1 -nr | sed 20q | 4 where wordfreq is the awk script proper, 4 is a shell script that prints its input in 4 columns, and sed 20q does the equivalent of head -20. On the last point, they say that given the ease of typing a sed command, they felt no need to write the program head. Udyant Wig -- We make our discoveries through our mistakes: we watch one another's success: and where there is freedom to experiment there is hope to improve. -- Arthur Quiller-Couch