From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Stefan Monnier <monnier@iro.umontreal.ca>
Newsgroups: gmane.emacs.help
Subject: Re: A couple of lisp questions
Date: Wed, 12 Nov 2003 18:28:27 GMT
Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Message-ID: <jwvu159bh1y.fsf-monnier+gnu.emacs.help@vor.iro.umontreal.ca>
References: <vfu15bgg3z.fsf@rpc71.cs.man.ac.uk>
	<jwvfzgufxal.fsf-monnier+gnu.emacs.help@vor.iro.umontreal.ca>
	<vfhe19k0tg.fsf@rpc71.cs.man.ac.uk>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1068664745 27899 80.91.224.253 (12 Nov 2003 19:19:05 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Wed, 12 Nov 2003 19:19:05 +0000 (UTC)
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Nov 12 20:19:02 2003
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AK0W5-0006SQ-01
	for <geh-help-gnu-emacs@m.gmane.org>; Wed, 12 Nov 2003 20:19:02 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.24)
	id 1AK1S8-00073U-SE
	for geh-help-gnu-emacs@m.gmane.org; Wed, 12 Nov 2003 15:19:00 -0500
Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!cyclone.bc.net!snoopy.risq.qc.ca!charlie.risq.qc.ca!53ab2750!not-for-mail
Original-Newsgroups: gnu.emacs.help
Original-Lines: 80
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50
Original-NNTP-Posting-Host: 132.204.24.42
Original-X-Complaints-To: abuse@umontreal.ca
Original-X-Trace: charlie.risq.qc.ca 1068661707 132.204.24.42 (Wed,
	12 Nov 2003 13:28:27 EST)
Original-NNTP-Posting-Date: Wed, 12 Nov 2003 13:28:27 EST
Original-Xref: shelby.stanford.edu gnu.emacs.help:118176
Original-To: help-gnu-emacs@gnu.org
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Users list for the GNU Emacs text editor  <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: main.gmane.org gmane.emacs.help:14117
X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:14117

Stefan> Take a look at how flyspell does it.  Or maybe auto-fill.

> I will. I think auto-fill cheats though, as its tied directly in to
> the command loop. I seem to remember reading that somewhere.

Not the command loop, just the self-command-insert command (which is
implemented in C).  You can hijack the auto-fill-function for
your own non-auto-fill use.

> usage-hash: "the"  -->  ("the" . 4)
>             "and"  -->  ("and" . 6)

Why not just

   "the" --> 4
   "and" --> 6

> Then a suffix hash

> suffix-hash: "t"   --> (("the" . 4) ("then" . 3) ("talk" . 2) etc)
>              "th"  --> (("the" . 4) etc )
>              "the" --> (("the" . 4) etc )

Is `try-completion' too slow (because the usage-hash is too large?) to
build the suffixes on the fly ?

> In this case the cons cells for each word are shared between the
> hashes, so this is not a massive memory waste as the written version
> appears. 

Each word of N letters has:
- one string (i.e. N + 16 bytes)
- one cons-cell (8 bytes)
- one hash-table entry (16 bytes)
in usage-hash, plus:
- N cons-cells (N*8 bytes)
- N hash entries shared with other words (at least 16 btes).
For a total of 9*N + 56 bytes per word.  Probably not a big deal.

> Ideally I would want to build up these word usage statistics as they
> are typed, but as you say its hard to do this. I think a flyspell like
> approach combined with text properties should work okay.

How do you avoid counting the same instance of a word several times?  Oh,
you mark them with a text-property, I see.  More like font-lock than flyspell.

> Anyway the idea with the weakness is that I want to garbage collect
> the dictionary periodically, throwing away old, or rarely used words.

I don't think weakness gives you that.  It seems difficult to use
weakness here to get even a vague approximation of what you want.

You can use a gc-hook to flush stuff every once in a while, but you
could just as well use an idle-timer for that.

> The serialization would be to enable saving across sessions. Most of
> the packages I know that do this depend on their objects having a read
> syntax, which doesn't work with hashes. I think the solution here is
> to convert the thing into a big alist to save it, and then reconstruct
> the hashes on loading.

Why not reconstruct the suffix upon loading?  This way you have no sharing
to worry about and you can just dump the hash via maphash & pp.

> Anyway the idea for all of this was to do a nifty version of
> abbreviation expansion, something like dabbrev-expand, but instead of
> searching local buffers, it would grab word stats as its going, and
> use these to offer appropriate suggestions. I was thinking of a user
> interface a little bit like the buffer/file switching of ido.el, of
> which I have become a committed user.

Sounds neat.

> the way, building an decent UI around this will probably take 10 times
> as much code!

And even more time,


        Stefan