From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Bob Proulx Newsgroups: gmane.emacs.help Subject: Re: Most used words in current buffer Date: Thu, 19 Jul 2018 14:42:36 -0600 Message-ID: <20180719140935156302029@bob.proulx.com> References: <861sc1iu1m.fsf@zoho.com> <87pnzkcgna.fsf@bsb.me.uk> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1532032864 4042 195.159.176.226 (19 Jul 2018 20:41:04 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 19 Jul 2018 20:41:04 +0000 (UTC) User-Agent: Mutt/1.10.0 (2018-05-17) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Jul 19 22:41:00 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fgFjO-0000uF-C9 for geh-help-gnu-emacs@m.gmane.org; Thu, 19 Jul 2018 22:40:58 +0200 Original-Received: from localhost ([::1]:45262 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fgFlV-0005sO-13 for geh-help-gnu-emacs@m.gmane.org; Thu, 19 Jul 2018 16:43:09 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46681) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fgFl5-0005sJ-MC for help-gnu-emacs@gnu.org; Thu, 19 Jul 2018 16:42:45 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fgFl0-0007Vk-Pt for help-gnu-emacs@gnu.org; Thu, 19 Jul 2018 16:42:43 -0400 Original-Received: from havoc.proulx.com ([96.88.95.61]:56570) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fgFl0-0007VM-IB for help-gnu-emacs@gnu.org; Thu, 19 Jul 2018 16:42:38 -0400 Original-Received: from joseki.proulx.com (localhost [127.0.0.1]) by havoc.proulx.com (Postfix) with ESMTP id 613B714CB for ; Thu, 19 Jul 2018 14:42:37 -0600 (MDT) Original-Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 2E57A21241 for ; Thu, 19 Jul 2018 14:42:37 -0600 (MDT) Original-Received: by hysteria.proulx.com (Postfix, from userid 1000) id 15ACB2DC71; Thu, 19 Jul 2018 14:42:36 -0600 (MDT) Mail-Followup-To: help-gnu-emacs@gnu.org Content-Disposition: inline In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 96.88.95.61 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:117503 Archived-At: Udyant Wig wrote: > Bob Proulx wrote: > > Hmm... I think looking behind the abstract data type at how it might > > or might not be implemented is a stretch to say the least. The entire > > reason it is an abstract data type is to hide those types of > > implementation details. :-) > > Indeed. The principle generalizes to abstract functionality as well, > doesn't it? E.g. qsort in the C library may or may not be an > implementation of quick sort; or closer to the topic of the newsgroup, > one ought not to care about the actual algorithm of the SORT function in > Emacs. Yes! That is it exactly! > > The naming of things usually says more about the person that named the > > thing than the thing itself. Associative arrays is a naming that > > reflects the concepts involved in what it does. This is the same as > > when someone else names it a map table, or a dictionary. Those are > > all the same thing. Just using different names because people took > > different paths to get there. > > Yes. Just as, arguably, vectors are a special case of the general > concept of arrays, though the terms are commonly used to name the same > thing. Agree completely. > > For such things I generally prefer balanced tree structures because > > work is amortized instead of lumped. But the important point here is > > that for every algorithm + data structure there is a trade-off of some > > sort between one thing and another thing. > > Hmm. I had written a tree version of the word counter I had mentioned > before. I had stumbled upon the AVL tree package in Emacs and thought I > might try using it. This tree-based attempt turned out to be slower > than my straightforward hashing solution. > > I have no doubts this code could be written better by someone more > experienced than I. I don't know if the AVL package you used was implemented in elisp or in C or otherwise. And even though I am a long time user of emacs I have never acquired the elisp skill to the same level as other languages and therefore can't comment on that part. But I know that when people have implemented such data structures in Perl that the result has never been as fast and efficient as in a native C version. If so then that may easily account for performance differences. And also the native implementation of "hashes" in awk, perl, python, ruby is quite optimized and very fast. They have had years of eyes and tweaking upon them. > > I am in total agreement over using sed instead of head if you want to > > do that. Seeing 'sed 20q' should roll off the keyboard as print lines > > until line 20 and then quit. Very simple and to the point. There is > > definitely no need for a separate head command. Other than for > > symmetry with tail which is not as simple in sed. > > I see that. You could implement head on top of sed if you wanted to. I > myself have been using head for long enough for its stated purpose that > grasping a sed equivalent was not immediately obvious. Writing clear code that can be understood immediately by the entire range of programmer skill is important in my not so humble opinion. One shouldn't need to be a master experienced programmer to understand what has been written. Therefore I usually use 'head' specifically for the clarity of it to everyone. Seeing "head -n40" is not going to confuse anyone. Therefore I usually use it instead of "sed 40q" even though I could remove 'head' entirely from my system if I were to uniformly implement one in terms of the other. Clarity is more important. And before someone mentions performance let me remind that we are talking shell scripts. In a shell script clarity is more important than performance. Always. If the resulting shell script results in a performance problem than choosing a better algorithm will almost certainly be the better solution. And if not than then choosing a different language more efficient at the task is next. I do expect some skill to be learned with 'awk' however. It is so very useful that seeing "awk '{print$1}' should not be that confusing that it is printing the first field column. Or that '{print$NF}' is a common idiom for printing the last field. (NF is the Number of Fields in the line that was split by whitespace. $NF is therefore the last field. If NF is 5 then $NF is saying $5 and therefore always the last field of the line.) A little bit of awk learning pays back a large return on the investment. > These things do take time to gain currency, don't they? Under Linux, > for example, the ip set of commands has been named the successor to > ifconfig, and it too is taking time to diffuse into general knowledge. Yes. And 'ip' is an excellent example! Even I have converted to using ip and the iproute2 family instead of ifconfig. One thing to note about the iproute2 family is that it is reasonably well written. We are not forced to use it. Instead we are attracted to using it in order to get access to the entire set of new networking features available only through them. It is a carrot not a stick. > (And, although there have been a number of revisions of Standard C since > 1989/1990, a lot of projects still write to that now legacy standard. > But there may be other issues to consider here.) Another good example. :-) Bob