From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Newsgroups: gmane.emacs.help Subject: Re: Most used words in current buffer Date: Mon, 23 Jul 2018 09:34:28 +0200 Message-ID: <20180723073428.GA14641@tuxteam.de> References: <861sc1iu1m.fsf@zoho.com> <87pnzkcgna.fsf@bsb.me.uk> <20180722090242.GA8678@tuxteam.de> <20180723000338486239475@bob.proulx.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; x-action=pgp-signed X-Trace: blaine.gmane.org 1532331179 13590 195.159.176.226 (23 Jul 2018 07:32:59 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 23 Jul 2018 07:32:59 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Jul 23 09:32:55 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhVKx-0003QF-Fb for geh-help-gnu-emacs@m.gmane.org; Mon, 23 Jul 2018 09:32:55 +0200 Original-Received: from localhost ([::1]:33146 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fhVN4-0006d7-8i for geh-help-gnu-emacs@m.gmane.org; Mon, 23 Jul 2018 03:35:06 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36954) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fhVMX-0006Yc-CQ for help-gnu-emacs@gnu.org; Mon, 23 Jul 2018 03:34:34 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fhVMU-0005Df-9b for help-gnu-emacs@gnu.org; Mon, 23 Jul 2018 03:34:33 -0400 Original-Received: from mail.tuxteam.de ([5.199.139.25]:53698) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fhVMT-0005DR-W6 for help-gnu-emacs@gnu.org; Mon, 23 Jul 2018 03:34:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tuxteam.de; s=mail; h=From:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:To:Date; bh=IKRhHWhN+44KUjbIS/CsGyBXssrsFcLd3n4eWSOyukw=; b=K+2L/1FBa5PB5dKNcrF6QRHGhE6hkNpi7XgkV9Lu+Oextd+93B92BcPZxhpVn0AXSJ+bmdwnpB4ZRkSV93aLAIZyrZu8xgz01j8Pi9WDN2iShgH6aVVxldt6SvjzwUHR//CjGF0aqbvDJ4V1ndoyyb5pLLIR2FU1iS4otKfGoOvpuC6IXmvp6eH6L6uJ34KT/VtmZ9Xt1HzeB0JTFE5hNrZ5OL6wWG8StUck9iFprqxQmDRmiVNDX84ipmOJI/of3yRFEc8Tb8ZeZPPj40PCVzea1DWPyZS9CH+EPzzPekIXtKE/Lb4Kv+yTy7F22eiluElWP4N3i1OwCpHYm5IvMg==; Original-Received: from tomas by mail.tuxteam.de with local (Exim 4.80) (envelope-from ) id 1fhVMS-000443-DJ for help-gnu-emacs@gnu.org; Mon, 23 Jul 2018 09:34:28 +0200 In-Reply-To: <20180723000338486239475@bob.proulx.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 5.199.139.25 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:117571 Archived-At: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, Jul 23, 2018 at 12:09:15AM -0600, Bob Proulx wrote: [...] > Again the hash table is good until you need to grow the table. Then > there is a pause while memory is shuffled around. Precisely. To get this "amortized", you usually grow by some constant factor, and there's where the log n comes in (which a tree has naturally built in). For really big hash tables (think all *dbm variations which are not some cousins of B-trees), you even have several "levels", at which point the hash tables start resembling B-trees somewhat (see below). > But that never happens with a balanced tree because that cost is > always amortized along as you go. But once you get big, you start "feeling" the structure of your storage (i.e. that the "storage as a big constant-time access array" is just a lie). Having two pointers per node is horribly wasteful, and if those pointers point to different continents in your RAM, your CPU ends up waiting on cache fills all the time. > Also if you need to extract the entries in sorted order then the > balanced tree is already sorted. Whereas with the hash table an extra > sort step is needed. That's right: once you need sorted walks or even lookup by sorted index (i.e. "find the first entry whose value is greater or equal to X" -- or "find all entries whose values are between X and Y"), ordered trees are (probably) the way to go. But I'd guess, once the tree becomes significant in size (yeah, handwaving here :) some kind of B-tree, where you manage several entries in one node, should be at an advantage. > One of looking at this is that it is global optimization versus local > optimization. And the ratio of lookups per insert. And whether you want a sorted index. > In some ways it is swings and roundabouts. Nicely put :-) > But in other ways it depends upon what you need. Absolutely Cheers - -- t -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAltVhQQACgkQBcgs9XrR2kYaIACfdi/XvTyfdbJ9yk4my+eJziMt yyYAnjw7NadASHDsgDrCJpWdzvDpmxDm =5dmS -----END PGP SIGNATURE-----