A couple of lisp questions

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* A couple of lisp questions
@ 2003-11-11 14:00 Phillip Lord
  2003-11-12 14:11 ` Stefan Monnier
  0 siblings, 1 reply; 6+ messages in thread
From: Phillip Lord @ 2003-11-11 14:00 UTC (permalink / raw)






I have a couple of questions which will help me with a package that I
am writing. 


First, I want to call a function everytime a new word has been typed
into a buffer. The only way that I can think of doing this at the
moment is:-

add to post-command-hook, 

check whether self-insert-command was last-command

if so check whether the char before point is not a word constituent.

if not then the word before that has probably just been entered. 

This does not work in all cases, so better ideas would be welcome. 



Second, my data structures are current using a hashtable, and a set of
lists. The hashtable has a nice feature which is key/value weakness. I
would really like to use this feature, but over an ordered list
structure rather than a hash. As far as I can tell the only way I can
use a weak reference is through the hashtable. There are no other weak
data structures? 

Third, is there a good way of serializing hashtables, so that I can
load them again next time from a file? To get my system to work I need
multiple hashtables sharing the same objects not just objects with the
same values, so its fairly complicated. 

Cheers

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A couple of lisp questions
  2003-11-11 14:00 A couple of lisp questions Phillip Lord
@ 2003-11-12 14:11 ` Stefan Monnier
  2003-11-12 16:29   ` Phillip Lord
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Monnier @ 2003-11-12 14:11 UTC (permalink / raw)


> First, I want to call a function everytime a new word has been typed
> into a buffer. The only way that I can think of doing this at the

As you can imagine there's no perfect answer here, since words can be typed
piecemeal, or backwards, or split into two, or joined, or modified in some
other way.  So it's not even clear what you mean by "everytime a new word
has been typed".

> This does not work in all cases, so better ideas would be welcome. 

Take a look at how flyspell does it.  Or maybe auto-fill.

> Second, my data structures are current using a hashtable, and a set of
> lists. The hashtable has a nice feature which is key/value weakness. I
> would really like to use this feature, but over an ordered list
> structure rather than a hash. As far as I can tell the only way I can
> use a weak reference is through the hashtable. There are no other weak
> data structures? 

> Third, is there a good way of serializing hashtables, so that I can
> load them again next time from a file? To get my system to work I need
> multiple hashtables sharing the same objects not just objects with the
> same values, so its fairly complicated. 

As you probably know, the answer to both is "no can do".
But if you provide more info about what you're trying to do (rather than
how you're trying to do it), maybe there's a good answer that does not
involve the usual "patches welcome".


        Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A couple of lisp questions
  2003-11-12 14:11 ` Stefan Monnier
@ 2003-11-12 16:29   ` Phillip Lord
  2003-11-12 18:28     ` Stefan Monnier
  0 siblings, 1 reply; 6+ messages in thread
From: Phillip Lord @ 2003-11-12 16:29 UTC (permalink / raw)

>>>>> "Stefan" == Stefan Monnier <monnier@iro.umontreal.ca> writes:

  >> First, I want to call a function everytime a new word has been
  >> typed into a buffer. The only way that I can think of doing this
  >> at the

  Stefan> As you can imagine there's no perfect answer here, since
  Stefan> words can be typed piecemeal, or backwards, or split into
  Stefan> two, or joined, or modified in some other way.  So it's not
  Stefan> even clear what you mean by "everytime a new word has been
  Stefan> typed".

This is the case. I've been thinking about this a bit more, and think
I have a somewhat better solution, which is to move somewhat behind
point, gathering words, and then mark them up with text properties to
say that they have been found. 

  >> This does not work in all cases, so better ideas would be
  >> welcome.

  Stefan> Take a look at how flyspell does it.  Or maybe auto-fill.

I will. I think auto-fill cheats though, as its tied directly in to
the command loop. I seem to remember reading that somewhere. 

  >> Second, my data structures are current using a hashtable, and a
  >> set of lists. The hashtable has a nice feature which is key/value
  >> weakness. I would really like to use this feature, but over an
  >> ordered list structure rather than a hash. As far as I can tell
  >> the only way I can use a weak reference is through the
  >> hashtable. There are no other weak data structures?

  >> Third, is there a good way of serializing hashtables, so that I
  >> can load them again next time from a file? To get my system to
  >> work I need multiple hashtables sharing the same objects not just
  >> objects with the same values, so its fairly complicated.

  Stefan> As you probably know, the answer to both is "no can do".
  Stefan> But if you provide more info about what you're trying to do
  Stefan> (rather than how you're trying to do it), maybe there's a
  Stefan> good answer that does not involve the usual "patches
  Stefan> welcome".

I'm building up a dictionary of words used as the user types, along
with word usage statistics.

I have two hashes like so....

usage-hash: "the"  -->  ("the" . 4)
            "and"  -->  ("and" . 6)

which records the usages of a specific word. 

Then a suffix hash

suffix-hash: "t"   --> (("the" . 4) ("then" . 3) ("talk" . 2) etc)
             "th"  --> (("the" . 4) etc )
             "the" --> (("the" . 4) etc )

which records suffixes of the words. 

In this case the cons cells for each word are shared between the
hashes, so this is not a massive memory waste as the written version
appears. 

Ideally I would want to build up these word usage statistics as they
are typed, but as you say its hard to do this. I think a flyspell like
approach combined with text properties should work okay. 

Anyway the idea with the weakness is that I want to garbage collect
the dictionary periodically, throwing away old, or rarely used words. 
But currently I have to keep the two hashes in sync by hand. I was
wondering whether it would be possible to use weakness to do this
automatically. But the second of the two hashes has values which are
in an alist, which would defeat this. 

I'm not sure that this is too much of a problem. The performance that
I an getting from these data structures is fairly good. 

The serialization would be to enable saving across sessions. Most of
the packages I know that do this depend on their objects having a read
syntax, which doesn't work with hashes. I think the solution here is
to convert the thing into a big alist to save it, and then reconstruct
the hashes on loading. 

Anyway the idea for all of this was to do a nifty version of
abbreviation expansion, something like dabbrev-expand, but instead of
searching local buffers, it would grab word stats as its going, and
use these to offer appropriate suggestions. I was thinking of a user
interface a little bit like the buffer/file switching of ido.el, of
which I have become a committed user. 

Its just an idea at the moment, with the basic data structures. As is
the way, building an decent UI around this will probably take 10 times
as much code! I think the chances are it will be to intrusive to be
off any use for most users. But you never know till you try. 

Cheers

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A couple of lisp questions
  2003-11-12 16:29   ` Phillip Lord
@ 2003-11-12 18:28     ` Stefan Monnier
  2003-11-12 19:00       ` Phillip Lord
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Monnier @ 2003-11-12 18:28 UTC (permalink / raw)


Stefan> Take a look at how flyspell does it.  Or maybe auto-fill.

> I will. I think auto-fill cheats though, as its tied directly in to
> the command loop. I seem to remember reading that somewhere.

Not the command loop, just the self-command-insert command (which is
implemented in C).  You can hijack the auto-fill-function for
your own non-auto-fill use.

> usage-hash: "the"  -->  ("the" . 4)
>             "and"  -->  ("and" . 6)

Why not just

   "the" --> 4
   "and" --> 6

> Then a suffix hash

> suffix-hash: "t"   --> (("the" . 4) ("then" . 3) ("talk" . 2) etc)
>              "th"  --> (("the" . 4) etc )
>              "the" --> (("the" . 4) etc )

Is `try-completion' too slow (because the usage-hash is too large?) to
build the suffixes on the fly ?

> In this case the cons cells for each word are shared between the
> hashes, so this is not a massive memory waste as the written version
> appears. 

Each word of N letters has:
- one string (i.e. N + 16 bytes)
- one cons-cell (8 bytes)
- one hash-table entry (16 bytes)
in usage-hash, plus:
- N cons-cells (N*8 bytes)
- N hash entries shared with other words (at least 16 btes).
For a total of 9*N + 56 bytes per word.  Probably not a big deal.

> Ideally I would want to build up these word usage statistics as they
> are typed, but as you say its hard to do this. I think a flyspell like
> approach combined with text properties should work okay.

How do you avoid counting the same instance of a word several times?  Oh,
you mark them with a text-property, I see.  More like font-lock than flyspell.

> Anyway the idea with the weakness is that I want to garbage collect
> the dictionary periodically, throwing away old, or rarely used words.

I don't think weakness gives you that.  It seems difficult to use
weakness here to get even a vague approximation of what you want.

You can use a gc-hook to flush stuff every once in a while, but you
could just as well use an idle-timer for that.

> The serialization would be to enable saving across sessions. Most of
> the packages I know that do this depend on their objects having a read
> syntax, which doesn't work with hashes. I think the solution here is
> to convert the thing into a big alist to save it, and then reconstruct
> the hashes on loading.

Why not reconstruct the suffix upon loading?  This way you have no sharing
to worry about and you can just dump the hash via maphash & pp.

> Anyway the idea for all of this was to do a nifty version of
> abbreviation expansion, something like dabbrev-expand, but instead of
> searching local buffers, it would grab word stats as its going, and
> use these to offer appropriate suggestions. I was thinking of a user
> interface a little bit like the buffer/file switching of ido.el, of
> which I have become a committed user.

Sounds neat.

> the way, building an decent UI around this will probably take 10 times
> as much code!

And even more time,


        Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A couple of lisp questions
  2003-11-12 18:28     ` Stefan Monnier
@ 2003-11-12 19:00       ` Phillip Lord
  2003-11-13 16:31         ` Stefan Monnier
  0 siblings, 1 reply; 6+ messages in thread
From: Phillip Lord @ 2003-11-12 19:00 UTC (permalink / raw)

>>>>> "Stefan" == Stefan Monnier <monnier@iro.umontreal.ca> writes:

  Stefan> Take a look at how flyspell does it.  Or maybe auto-fill.

  >> I will. I think auto-fill cheats though, as its tied directly in
  >> to the command loop. I seem to remember reading that somewhere.

  Stefan> Not the command loop, just the self-command-insert command
  Stefan> (which is implemented in C). 

Yes, you are right. I was a little confused by

"In general, this is the only way to do that, since the facilities for
customizing `self-insert-command' are limited to special cases
(designed for abbrevs and Auto Fill mode). Do not try substituting
your own definition of `self-insert-command' for the standard one.
The editor command loop handles this function specially."

So auto-fill is tied in slightly indirectly. 

  Stefan> You can hijack the auto-fill-function for your own
  Stefan> non-auto-fill use.

I would not want it to interfere with auto-fill though. I think I have
it working reasonably well know. 

  >> usage-hash: "the" --> ("the" . 4) "and" --> ("and" . 6)

  Stefan> Why not just

  Stefan>    "the" --> 4 "and" --> 6

it makes no difference. The suffix hash must contain cons cells, and I
share them with this. For the usage hash, you are correct, the car of
the cons cell is not used. 

  >> Then a suffix hash

  >> suffix-hash: "t" --> (("the" . 4) ("then" . 3) ("talk" . 2) etc)
  >> "th" --> (("the" . 4) etc ) "the" --> (("the" . 4) etc )

  Stefan> Is `try-completion' too slow (because the usage-hash is too
  Stefan> large?) to build the suffixes on the fly ?

I'm not convinced it does what I want. Perhaps I am wrong. 

When the letter "t" is pressed I get an alist back. The alist is
actually ordered, with the most commonly occurring words first. So I
pick the preferred usage straight of the front. So I have constant
time access to the hash, and constant time access to the list. 

Updating takes a bit longer....

  >> In this case the cons cells for each word are shared between the
  >> hashes, so this is not a massive memory waste as the written
  >> version appears.

  Stefan> Each word of N letters has:
  Stefan> - one string (i.e. N + 16 bytes)
  Stefan> - one cons-cell (8 bytes)
  Stefan> - one hash-table entry (16 bytes)
  Stefan> in usage-hash, plus:
  Stefan> - N cons-cells (N*8 bytes)
  Stefan> - N hash entries shared with other words (at least 16 btes).
  Stefan> For a total of 9*N + 56 bytes per word.  Probably not a big
  Stefan> deal.

Well there are other reasons as well. When I update the cons in the
usage, its automatically "update" in the suffix hash as well. That was
the main reason. 

  >> Ideally I would want to build up these word usage statistics as
  >> they are typed, but as you say its hard to do this. I think a
  >> flyspell like approach combined with text properties should work
  >> okay.

  Stefan> How do you avoid counting the same instance of a word
  Stefan> several times?  Oh, you mark them with a text-property, I
  Stefan> see.  More like font-lock than flyspell.

Just so. 

  >> The serialization would be to enable saving across sessions. Most
  >> of the packages I know that do this depend on their objects
  >> having a read syntax, which doesn't work with hashes. I think the
  >> solution here is to convert the thing into a big alist to save
  >> it, and then reconstruct the hashes on loading.

  Stefan> Why not reconstruct the suffix upon loading?  This way you
  Stefan> have no sharing to worry about and you can just dump the
  Stefan> hash via maphash & pp.

Yes, I think that's going to be my plan. Normally I sort the alist in
the suffix hash after every update, but if I disable this, and then do
them all at once, it should be quicker....

  >> Anyway the idea for all of this was to do a nifty version of
  >> abbreviation expansion, something like dabbrev-expand, but
  >> instead of searching local buffers, it would grab word stats as
  >> its going, and use these to offer appropriate suggestions. I was
  >> thinking of a user interface a little bit like the buffer/file
  >> switching of ido.el, of which I have become a committed user.

  Stefan> Sounds neat.

  >> the way, building an decent UI around this will probably take 10
  >> times as much code!

  Stefan> And even more time,

Just so. 

I've almost got a nasty version (where you build the dictionary
explicitly rather than automatically) working. 

Cheers

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A couple of lisp questions
  2003-11-12 19:00       ` Phillip Lord
@ 2003-11-13 16:31         ` Stefan Monnier
  0 siblings, 0 replies; 6+ messages in thread
From: Stefan Monnier @ 2003-11-13 16:31 UTC (permalink / raw)


>>> suffix-hash: "t" --> (("the" . 4) ("then" . 3) ("talk" . 2) etc)
>>> "th" --> (("the" . 4) etc ) "the" --> (("the" . 4) etc )
Stefan> Is `try-completion' too slow (because the usage-hash is too
Stefan> large?) to build the suffixes on the fly ?
> I'm not convinced it does what I want. Perhaps I am wrong. 
> When the letter "t" is pressed I get an alist back. The alist is
> actually ordered, with the most commonly occurring words first. So I
> pick the preferred usage straight of the front. So I have constant
> time access to the hash, and constant time access to the list. 

Admittedly, `try-completions' does not sort things as you want them, but
finding the most commonly word out of the ones returned is not too
difficult either.  If `try-completion' is fast enough (even tho it takes
time proportional the total number of words), then finding the most common
word out of the ones returned shouldn't be a problem either (it's
proportional to the number of words returned, although the constant factor
might indeed be higher since the loop would be in elisp and requires
a "slow" hash lookup for each iteration).

I have no idea whether it'd be slow or not, but it would save you the effort
of maintaining a second data structure in sync.  I.e. it'd save code and
live-memory at the cost of CPU of dynamic memory allocation (i.e. GC, which
translates into yet more CPU time eaten).


        Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-11-13 16:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-11 14:00 A couple of lisp questions Phillip Lord
2003-11-12 14:11 ` Stefan Monnier
2003-11-12 16:29   ` Phillip Lord
2003-11-12 18:28     ` Stefan Monnier
2003-11-12 19:00       ` Phillip Lord
2003-11-13 16:31         ` Stefan Monnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).