unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: <tomas@tuxteam.de>
To: help-gnu-emacs@gnu.org
Subject: Re: Most used words in current buffer
Date: Mon, 23 Jul 2018 09:34:28 +0200	[thread overview]
Message-ID: <20180723073428.GA14641@tuxteam.de> (raw)
In-Reply-To: <20180723000338486239475@bob.proulx.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Jul 23, 2018 at 12:09:15AM -0600, Bob Proulx wrote:

[...]

> Again the hash table is good until you need to grow the table.  Then
> there is a pause while memory is shuffled around.

Precisely. To get this "amortized", you usually grow by some constant
factor, and there's where the log n comes in (which a tree has naturally
built in).

For really big hash tables (think all *dbm variations which are not
some cousins of B-trees), you even have several "levels", at which
point the hash tables start resembling B-trees somewhat (see below).

> But that never happens with a balanced tree because that cost is
> always amortized along as you go.

But once you get big, you start "feeling" the structure of your
storage (i.e. that the "storage as a big constant-time access
array" is just a lie). Having two pointers per node is horribly
wasteful, and if those pointers point to different continents
in your RAM, your CPU ends up waiting on cache fills all the
time.

> Also if you need to extract the entries in sorted order then the
> balanced tree is already sorted.  Whereas with the hash table an extra
> sort step is needed.

That's right: once you need sorted walks or even lookup by sorted
index (i.e. "find the first entry whose value is greater or equal
to X" -- or "find all entries whose values are between X and Y"),
ordered trees are (probably) the way to go. But I'd guess, once
the tree becomes significant in size (yeah, handwaving here :)
some kind of B-tree, where you manage several entries in one node,
should be at an advantage.

> One of looking at this is that it is global optimization versus local
> optimization.

And the ratio of lookups per insert. And whether you want a sorted
index.

> In some ways it is swings and roundabouts.

Nicely put :-)

> But in other ways it depends upon what you need.

Absolutely

Cheers
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAltVhQQACgkQBcgs9XrR2kYaIACfdi/XvTyfdbJ9yk4my+eJziMt
yyYAnjw7NadASHDsgDrCJpWdzvDpmxDm
=5dmS
-----END PGP SIGNATURE-----



  reply	other threads:[~2018-07-23  7:34 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-17  9:28 Most used words in current buffer Udyant Wig
2018-07-17 18:41 ` Emanuel Berg
2018-07-18  9:36   ` Udyant Wig
2018-07-18 11:48     ` Emanuel Berg
2018-07-18 14:50       ` Udyant Wig
2018-07-18 16:32         ` Emanuel Berg
2018-07-18 22:39     ` Ben Bacarisse
2018-07-19  0:45       ` Bob Proulx
     [not found]       ` <mailman.3785.1531961144.1292.help-gnu-emacs@gnu.org>
2018-07-19  5:33         ` Udyant Wig
2018-07-19  7:04           ` Bob Proulx
2018-07-19  7:25             ` tomas
2018-07-19 17:19             ` Nick Dokos
2018-07-19 17:30               ` Eli Zaretskii
2018-07-19 20:08               ` Bob Proulx
2018-07-20 16:39                 ` Nick Dokos
     [not found]                 ` <mailman.3909.1532104802.1292.help-gnu-emacs@gnu.org>
2018-07-20 18:13                   ` Udyant Wig
2018-07-20 22:24                     ` Bob Newell
2018-07-21  0:00                       ` Nick Dokos
2018-07-21  0:18                     ` Nick Dokos
     [not found]               ` <mailman.3843.1532030947.1292.help-gnu-emacs@gnu.org>
2018-07-20  6:19                 ` Udyant Wig
2018-07-20 23:25                   ` Bob Proulx
2018-07-21  0:26                     ` Nick Dokos
2018-07-21  4:03                       ` Bob Proulx
     [not found]                   ` <mailman.3934.1532129163.1292.help-gnu-emacs@gnu.org>
2018-07-21 13:39                     ` Udyant Wig
     [not found]             ` <mailman.3826.1532020800.1292.help-gnu-emacs@gnu.org>
2018-07-20  5:52               ` Udyant Wig
     [not found]           ` <mailman.3796.1531983885.1292.help-gnu-emacs@gnu.org>
2018-07-19 13:26             ` Udyant Wig
2018-07-19 20:42               ` Bob Proulx
2018-07-20  3:08                 ` Bob Newell
     [not found]                 ` <mailman.3861.1532056120.1292.help-gnu-emacs@gnu.org>
2018-07-21 12:51                   ` Udyant Wig
2018-07-21 16:15                     ` Eric Abrahamsen
     [not found]                     ` <mailman.3982.1532189751.1292.help-gnu-emacs@gnu.org>
2018-07-21 19:46                       ` Udyant Wig
2018-07-22  3:57                         ` Eric Abrahamsen
2018-07-22  4:00                           ` Eric Abrahamsen
2018-07-22  4:05                             ` Eric Abrahamsen
     [not found]                           ` <mailman.4008.1532232144.1292.help-gnu-emacs@gnu.org>
2018-07-22 18:28                             ` Udyant Wig
2018-07-22 20:05                               ` Eric Abrahamsen
     [not found]                         ` <mailman.4007.1532231884.1292.help-gnu-emacs@gnu.org>
2018-07-22 18:19                           ` Udyant Wig
     [not found]               ` <mailman.3845.1532032966.1292.help-gnu-emacs@gnu.org>
2018-07-20 13:18                 ` Udyant Wig
2018-07-21 18:22               ` Stefan Monnier
2018-07-22  9:02                 ` tomas
2018-07-23  6:09                   ` Bob Proulx
2018-07-23  7:34                     ` tomas [this message]
     [not found]                   ` <mailman.4074.1532326162.1292.help-gnu-emacs@gnu.org>
2018-07-23  7:26                     ` Udyant Wig
     [not found]                 ` <mailman.4013.1532250176.1292.help-gnu-emacs@gnu.org>
2018-07-22 18:58                   ` Udyant Wig
     [not found]               ` <mailman.3991.1532197378.1292.help-gnu-emacs@gnu.org>
2018-07-21 19:39                 ` Udyant Wig
2018-07-21 20:54                   ` Stefan Monnier
     [not found]                   ` <mailman.3995.1532206511.1292.help-gnu-emacs@gnu.org>
2018-07-22 18:43                     ` Udyant Wig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180723073428.GA14641@tuxteam.de \
    --to=tomas@tuxteam.de \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).