From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Bob Proulx <bob@proulx.com>
Newsgroups: gmane.emacs.help
Subject: Re: Most used words in current buffer
Date: Thu, 19 Jul 2018 14:42:36 -0600
Message-ID: <20180719140935156302029@bob.proulx.com>
References: <pikcs5$6sm$1@dont-email.me> <861sc1iu1m.fsf@zoho.com>
	<pin1no$l8e$1@dont-email.me> <87pnzkcgna.fsf@bsb.me.uk>
	<mailman.3785.1531961144.1292.help-gnu-emacs@gnu.org>
	<pip7rt$v3m$1@dont-email.me>
	<mailman.3796.1531983885.1292.help-gnu-emacs@gnu.org>
	<piq3hm$sff$1@dont-email.me>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: blaine.gmane.org 1532032864 4042 195.159.176.226 (19 Jul 2018 20:41:04 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Thu, 19 Jul 2018 20:41:04 +0000 (UTC)
User-Agent: Mutt/1.10.0 (2018-05-17)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Jul 19 22:41:00 2018
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1fgFjO-0000uF-C9
	for geh-help-gnu-emacs@m.gmane.org; Thu, 19 Jul 2018 22:40:58 +0200
Original-Received: from localhost ([::1]:45262 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1fgFlV-0005sO-13
	for geh-help-gnu-emacs@m.gmane.org; Thu, 19 Jul 2018 16:43:09 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46681)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bob@proulx.com>) id 1fgFl5-0005sJ-MC
	for help-gnu-emacs@gnu.org; Thu, 19 Jul 2018 16:42:45 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <bob@proulx.com>) id 1fgFl0-0007Vk-Pt
	for help-gnu-emacs@gnu.org; Thu, 19 Jul 2018 16:42:43 -0400
Original-Received: from havoc.proulx.com ([96.88.95.61]:56570)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <bob@proulx.com>) id 1fgFl0-0007VM-IB
	for help-gnu-emacs@gnu.org; Thu, 19 Jul 2018 16:42:38 -0400
Original-Received: from joseki.proulx.com (localhost [127.0.0.1])
	by havoc.proulx.com (Postfix) with ESMTP id 613B714CB
	for <help-gnu-emacs@gnu.org>; Thu, 19 Jul 2018 14:42:37 -0600 (MDT)
Original-Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119])
	by joseki.proulx.com (Postfix) with ESMTP id 2E57A21241
	for <help-gnu-emacs@gnu.org>; Thu, 19 Jul 2018 14:42:37 -0600 (MDT)
Original-Received: by hysteria.proulx.com (Postfix, from userid 1000)
	id 15ACB2DC71; Thu, 19 Jul 2018 14:42:36 -0600 (MDT)
Mail-Followup-To: help-gnu-emacs@gnu.org
Content-Disposition: inline
In-Reply-To: <piq3hm$sff$1@dont-email.me>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
	[fuzzy]
X-Received-From: 96.88.95.61
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/help-gnu-emacs/>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: "help-gnu-emacs"
	<help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.help:117503
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/117503>

Udyant Wig wrote:
> Bob Proulx wrote:
> > Hmm...  I think looking behind the abstract data type at how it might
> > or might not be implemented is a stretch to say the least.  The entire
> > reason it is an abstract data type is to hide those types of
> > implementation details.  :-)
> 
> Indeed.  The principle generalizes to abstract functionality as well,
> doesn't it?  E.g. qsort in the C library may or may not be an
> implementation of quick sort; or closer to the topic of the newsgroup,
> one ought not to care about the actual algorithm of the SORT function in
> Emacs.

Yes!  That is it exactly!

> > The naming of things usually says more about the person that named the
> > thing than the thing itself.  Associative arrays is a naming that
> > reflects the concepts involved in what it does.  This is the same as
> > when someone else names it a map table, or a dictionary.  Those are
> > all the same thing.  Just using different names because people took
> > different paths to get there.
> 
> Yes.  Just as, arguably, vectors are a special case of the general
> concept of arrays, though the terms are commonly used to name the same
> thing.

Agree completely.

> > For such things I generally prefer balanced tree structures because
> > work is amortized instead of lumped.  But the important point here is
> > that for every algorithm + data structure there is a trade-off of some
> > sort between one thing and another thing.
> 
> Hmm.  I had written a tree version of the word counter I had mentioned
> before.  I had stumbled upon the AVL tree package in Emacs and thought I
> might try using it.  This tree-based attempt turned out to be slower
> than my straightforward hashing solution.
> 
> I have no doubts this code could be written better by someone more
> experienced than I.

I don't know if the AVL package you used was implemented in elisp or
in C or otherwise.  And even though I am a long time user of emacs I
have never acquired the elisp skill to the same level as other
languages and therefore can't comment on that part.  But I know that
when people have implemented such data structures in Perl that the
result has never been as fast and efficient as in a native C version.
If so then that may easily account for performance differences.  And
also the native implementation of "hashes" in awk, perl, python, ruby
is quite optimized and very fast.  They have had years of eyes and
tweaking upon them.

> > I am in total agreement over using sed instead of head if you want to
> > do that.  Seeing 'sed 20q' should roll off the keyboard as print lines
> > until line 20 and then quit.  Very simple and to the point.  There is
> > definitely no need for a separate head command.  Other than for
> > symmetry with tail which is not as simple in sed.
> 
> I see that.  You could implement head on top of sed if you wanted to.  I
> myself have been using head for long enough for its stated purpose that
> grasping a sed equivalent was not immediately obvious.

Writing clear code that can be understood immediately by the entire
range of programmer skill is important in my not so humble opinion.
One shouldn't need to be a master experienced programmer to understand
what has been written.  Therefore I usually use 'head' specifically
for the clarity of it to everyone.  Seeing "head -n40" is not going to
confuse anyone.  Therefore I usually use it instead of "sed 40q" even
though I could remove 'head' entirely from my system if I were to
uniformly implement one in terms of the other.  Clarity is more
important.

And before someone mentions performance let me remind that we are
talking shell scripts.  In a shell script clarity is more important
than performance.  Always.  If the resulting shell script results in a
performance problem than choosing a better algorithm will almost
certainly be the better solution.  And if not than then choosing a
different language more efficient at the task is next.

I do expect some skill to be learned with 'awk' however.  It is so
very useful that seeing "awk '{print$1}' should not be that confusing
that it is printing the first field column.  Or that '{print$NF}' is
a common idiom for printing the last field.  (NF is the Number of
Fields in the line that was split by whitespace.  $NF is therefore the
last field.  If NF is 5 then $NF is saying $5 and therefore always the
last field of the line.)  A little bit of awk learning pays back a
large return on the investment.

> These things do take time to gain currency, don't they?  Under Linux,
> for example, the ip set of commands has been named the successor to
> ifconfig, and it too is taking time to diffuse into general knowledge.

Yes.  And 'ip' is an excellent example!  Even I have converted to
using ip and the iproute2 family instead of ifconfig.

One thing to note about the iproute2 family is that it is reasonably
well written.  We are not forced to use it.  Instead we are attracted
to using it in order to get access to the entire set of new networking
features available only through them.  It is a carrot not a stick.

> (And, although there have been a number of revisions of Standard C since
> 1989/1990, a lot of projects still write to that now legacy standard.
> But there may be other issues to consider here.)

Another good example. :-)

Bob