* A couple of lisp questions @ 2003-11-11 14:00 Phillip Lord 2003-11-12 14:11 ` Stefan Monnier 0 siblings, 1 reply; 6+ messages in thread From: Phillip Lord @ 2003-11-11 14:00 UTC (permalink / raw) I have a couple of questions which will help me with a package that I am writing. First, I want to call a function everytime a new word has been typed into a buffer. The only way that I can think of doing this at the moment is:- add to post-command-hook, check whether self-insert-command was last-command if so check whether the char before point is not a word constituent. if not then the word before that has probably just been entered. This does not work in all cases, so better ideas would be welcome. Second, my data structures are current using a hashtable, and a set of lists. The hashtable has a nice feature which is key/value weakness. I would really like to use this feature, but over an ordered list structure rather than a hash. As far as I can tell the only way I can use a weak reference is through the hashtable. There are no other weak data structures? Third, is there a good way of serializing hashtables, so that I can load them again next time from a file? To get my system to work I need multiple hashtables sharing the same objects not just objects with the same values, so its fairly complicated. Cheers Phil ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A couple of lisp questions 2003-11-11 14:00 A couple of lisp questions Phillip Lord @ 2003-11-12 14:11 ` Stefan Monnier 2003-11-12 16:29 ` Phillip Lord 0 siblings, 1 reply; 6+ messages in thread From: Stefan Monnier @ 2003-11-12 14:11 UTC (permalink / raw) > First, I want to call a function everytime a new word has been typed > into a buffer. The only way that I can think of doing this at the As you can imagine there's no perfect answer here, since words can be typed piecemeal, or backwards, or split into two, or joined, or modified in some other way. So it's not even clear what you mean by "everytime a new word has been typed". > This does not work in all cases, so better ideas would be welcome. Take a look at how flyspell does it. Or maybe auto-fill. > Second, my data structures are current using a hashtable, and a set of > lists. The hashtable has a nice feature which is key/value weakness. I > would really like to use this feature, but over an ordered list > structure rather than a hash. As far as I can tell the only way I can > use a weak reference is through the hashtable. There are no other weak > data structures? > Third, is there a good way of serializing hashtables, so that I can > load them again next time from a file? To get my system to work I need > multiple hashtables sharing the same objects not just objects with the > same values, so its fairly complicated. As you probably know, the answer to both is "no can do". But if you provide more info about what you're trying to do (rather than how you're trying to do it), maybe there's a good answer that does not involve the usual "patches welcome". Stefan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A couple of lisp questions 2003-11-12 14:11 ` Stefan Monnier @ 2003-11-12 16:29 ` Phillip Lord 2003-11-12 18:28 ` Stefan Monnier 0 siblings, 1 reply; 6+ messages in thread From: Phillip Lord @ 2003-11-12 16:29 UTC (permalink / raw) >>>>> "Stefan" == Stefan Monnier <monnier@iro.umontreal.ca> writes: >> First, I want to call a function everytime a new word has been >> typed into a buffer. The only way that I can think of doing this >> at the Stefan> As you can imagine there's no perfect answer here, since Stefan> words can be typed piecemeal, or backwards, or split into Stefan> two, or joined, or modified in some other way. So it's not Stefan> even clear what you mean by "everytime a new word has been Stefan> typed". This is the case. I've been thinking about this a bit more, and think I have a somewhat better solution, which is to move somewhat behind point, gathering words, and then mark them up with text properties to say that they have been found. >> This does not work in all cases, so better ideas would be >> welcome. Stefan> Take a look at how flyspell does it. Or maybe auto-fill. I will. I think auto-fill cheats though, as its tied directly in to the command loop. I seem to remember reading that somewhere. >> Second, my data structures are current using a hashtable, and a >> set of lists. The hashtable has a nice feature which is key/value >> weakness. I would really like to use this feature, but over an >> ordered list structure rather than a hash. As far as I can tell >> the only way I can use a weak reference is through the >> hashtable. There are no other weak data structures? >> Third, is there a good way of serializing hashtables, so that I >> can load them again next time from a file? To get my system to >> work I need multiple hashtables sharing the same objects not just >> objects with the same values, so its fairly complicated. Stefan> As you probably know, the answer to both is "no can do". Stefan> But if you provide more info about what you're trying to do Stefan> (rather than how you're trying to do it), maybe there's a Stefan> good answer that does not involve the usual "patches Stefan> welcome". I'm building up a dictionary of words used as the user types, along with word usage statistics. I have two hashes like so.... usage-hash: "the" --> ("the" . 4) "and" --> ("and" . 6) which records the usages of a specific word. Then a suffix hash suffix-hash: "t" --> (("the" . 4) ("then" . 3) ("talk" . 2) etc) "th" --> (("the" . 4) etc ) "the" --> (("the" . 4) etc ) which records suffixes of the words. In this case the cons cells for each word are shared between the hashes, so this is not a massive memory waste as the written version appears. Ideally I would want to build up these word usage statistics as they are typed, but as you say its hard to do this. I think a flyspell like approach combined with text properties should work okay. Anyway the idea with the weakness is that I want to garbage collect the dictionary periodically, throwing away old, or rarely used words. But currently I have to keep the two hashes in sync by hand. I was wondering whether it would be possible to use weakness to do this automatically. But the second of the two hashes has values which are in an alist, which would defeat this. I'm not sure that this is too much of a problem. The performance that I an getting from these data structures is fairly good. The serialization would be to enable saving across sessions. Most of the packages I know that do this depend on their objects having a read syntax, which doesn't work with hashes. I think the solution here is to convert the thing into a big alist to save it, and then reconstruct the hashes on loading. Anyway the idea for all of this was to do a nifty version of abbreviation expansion, something like dabbrev-expand, but instead of searching local buffers, it would grab word stats as its going, and use these to offer appropriate suggestions. I was thinking of a user interface a little bit like the buffer/file switching of ido.el, of which I have become a committed user. Its just an idea at the moment, with the basic data structures. As is the way, building an decent UI around this will probably take 10 times as much code! I think the chances are it will be to intrusive to be off any use for most users. But you never know till you try. Cheers Phil ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A couple of lisp questions 2003-11-12 16:29 ` Phillip Lord @ 2003-11-12 18:28 ` Stefan Monnier 2003-11-12 19:00 ` Phillip Lord 0 siblings, 1 reply; 6+ messages in thread From: Stefan Monnier @ 2003-11-12 18:28 UTC (permalink / raw) Stefan> Take a look at how flyspell does it. Or maybe auto-fill. > I will. I think auto-fill cheats though, as its tied directly in to > the command loop. I seem to remember reading that somewhere. Not the command loop, just the self-command-insert command (which is implemented in C). You can hijack the auto-fill-function for your own non-auto-fill use. > usage-hash: "the" --> ("the" . 4) > "and" --> ("and" . 6) Why not just "the" --> 4 "and" --> 6 > Then a suffix hash > suffix-hash: "t" --> (("the" . 4) ("then" . 3) ("talk" . 2) etc) > "th" --> (("the" . 4) etc ) > "the" --> (("the" . 4) etc ) Is `try-completion' too slow (because the usage-hash is too large?) to build the suffixes on the fly ? > In this case the cons cells for each word are shared between the > hashes, so this is not a massive memory waste as the written version > appears. Each word of N letters has: - one string (i.e. N + 16 bytes) - one cons-cell (8 bytes) - one hash-table entry (16 bytes) in usage-hash, plus: - N cons-cells (N*8 bytes) - N hash entries shared with other words (at least 16 btes). For a total of 9*N + 56 bytes per word. Probably not a big deal. > Ideally I would want to build up these word usage statistics as they > are typed, but as you say its hard to do this. I think a flyspell like > approach combined with text properties should work okay. How do you avoid counting the same instance of a word several times? Oh, you mark them with a text-property, I see. More like font-lock than flyspell. > Anyway the idea with the weakness is that I want to garbage collect > the dictionary periodically, throwing away old, or rarely used words. I don't think weakness gives you that. It seems difficult to use weakness here to get even a vague approximation of what you want. You can use a gc-hook to flush stuff every once in a while, but you could just as well use an idle-timer for that. > The serialization would be to enable saving across sessions. Most of > the packages I know that do this depend on their objects having a read > syntax, which doesn't work with hashes. I think the solution here is > to convert the thing into a big alist to save it, and then reconstruct > the hashes on loading. Why not reconstruct the suffix upon loading? This way you have no sharing to worry about and you can just dump the hash via maphash & pp. > Anyway the idea for all of this was to do a nifty version of > abbreviation expansion, something like dabbrev-expand, but instead of > searching local buffers, it would grab word stats as its going, and > use these to offer appropriate suggestions. I was thinking of a user > interface a little bit like the buffer/file switching of ido.el, of > which I have become a committed user. Sounds neat. > the way, building an decent UI around this will probably take 10 times > as much code! And even more time, Stefan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A couple of lisp questions 2003-11-12 18:28 ` Stefan Monnier @ 2003-11-12 19:00 ` Phillip Lord 2003-11-13 16:31 ` Stefan Monnier 0 siblings, 1 reply; 6+ messages in thread From: Phillip Lord @ 2003-11-12 19:00 UTC (permalink / raw) >>>>> "Stefan" == Stefan Monnier <monnier@iro.umontreal.ca> writes: Stefan> Take a look at how flyspell does it. Or maybe auto-fill. >> I will. I think auto-fill cheats though, as its tied directly in >> to the command loop. I seem to remember reading that somewhere. Stefan> Not the command loop, just the self-command-insert command Stefan> (which is implemented in C). Yes, you are right. I was a little confused by "In general, this is the only way to do that, since the facilities for customizing `self-insert-command' are limited to special cases (designed for abbrevs and Auto Fill mode). Do not try substituting your own definition of `self-insert-command' for the standard one. The editor command loop handles this function specially." So auto-fill is tied in slightly indirectly. Stefan> You can hijack the auto-fill-function for your own Stefan> non-auto-fill use. I would not want it to interfere with auto-fill though. I think I have it working reasonably well know. >> usage-hash: "the" --> ("the" . 4) "and" --> ("and" . 6) Stefan> Why not just Stefan> "the" --> 4 "and" --> 6 it makes no difference. The suffix hash must contain cons cells, and I share them with this. For the usage hash, you are correct, the car of the cons cell is not used. >> Then a suffix hash >> suffix-hash: "t" --> (("the" . 4) ("then" . 3) ("talk" . 2) etc) >> "th" --> (("the" . 4) etc ) "the" --> (("the" . 4) etc ) Stefan> Is `try-completion' too slow (because the usage-hash is too Stefan> large?) to build the suffixes on the fly ? I'm not convinced it does what I want. Perhaps I am wrong. When the letter "t" is pressed I get an alist back. The alist is actually ordered, with the most commonly occurring words first. So I pick the preferred usage straight of the front. So I have constant time access to the hash, and constant time access to the list. Updating takes a bit longer.... >> In this case the cons cells for each word are shared between the >> hashes, so this is not a massive memory waste as the written >> version appears. Stefan> Each word of N letters has: Stefan> - one string (i.e. N + 16 bytes) Stefan> - one cons-cell (8 bytes) Stefan> - one hash-table entry (16 bytes) Stefan> in usage-hash, plus: Stefan> - N cons-cells (N*8 bytes) Stefan> - N hash entries shared with other words (at least 16 btes). Stefan> For a total of 9*N + 56 bytes per word. Probably not a big Stefan> deal. Well there are other reasons as well. When I update the cons in the usage, its automatically "update" in the suffix hash as well. That was the main reason. >> Ideally I would want to build up these word usage statistics as >> they are typed, but as you say its hard to do this. I think a >> flyspell like approach combined with text properties should work >> okay. Stefan> How do you avoid counting the same instance of a word Stefan> several times? Oh, you mark them with a text-property, I Stefan> see. More like font-lock than flyspell. Just so. >> The serialization would be to enable saving across sessions. Most >> of the packages I know that do this depend on their objects >> having a read syntax, which doesn't work with hashes. I think the >> solution here is to convert the thing into a big alist to save >> it, and then reconstruct the hashes on loading. Stefan> Why not reconstruct the suffix upon loading? This way you Stefan> have no sharing to worry about and you can just dump the Stefan> hash via maphash & pp. Yes, I think that's going to be my plan. Normally I sort the alist in the suffix hash after every update, but if I disable this, and then do them all at once, it should be quicker.... >> Anyway the idea for all of this was to do a nifty version of >> abbreviation expansion, something like dabbrev-expand, but >> instead of searching local buffers, it would grab word stats as >> its going, and use these to offer appropriate suggestions. I was >> thinking of a user interface a little bit like the buffer/file >> switching of ido.el, of which I have become a committed user. Stefan> Sounds neat. >> the way, building an decent UI around this will probably take 10 >> times as much code! Stefan> And even more time, Just so. I've almost got a nasty version (where you build the dictionary explicitly rather than automatically) working. Cheers Phil ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A couple of lisp questions 2003-11-12 19:00 ` Phillip Lord @ 2003-11-13 16:31 ` Stefan Monnier 0 siblings, 0 replies; 6+ messages in thread From: Stefan Monnier @ 2003-11-13 16:31 UTC (permalink / raw) >>> suffix-hash: "t" --> (("the" . 4) ("then" . 3) ("talk" . 2) etc) >>> "th" --> (("the" . 4) etc ) "the" --> (("the" . 4) etc ) Stefan> Is `try-completion' too slow (because the usage-hash is too Stefan> large?) to build the suffixes on the fly ? > I'm not convinced it does what I want. Perhaps I am wrong. > When the letter "t" is pressed I get an alist back. The alist is > actually ordered, with the most commonly occurring words first. So I > pick the preferred usage straight of the front. So I have constant > time access to the hash, and constant time access to the list. Admittedly, `try-completions' does not sort things as you want them, but finding the most commonly word out of the ones returned is not too difficult either. If `try-completion' is fast enough (even tho it takes time proportional the total number of words), then finding the most common word out of the ones returned shouldn't be a problem either (it's proportional to the number of words returned, although the constant factor might indeed be higher since the loop would be in elisp and requires a "slow" hash lookup for each iteration). I have no idea whether it'd be slow or not, but it would save you the effort of maintaining a second data structure in sync. I.e. it'd save code and live-memory at the cost of CPU of dynamic memory allocation (i.e. GC, which translates into yet more CPU time eaten). Stefan ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-11-13 16:31 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-11-11 14:00 A couple of lisp questions Phillip Lord 2003-11-12 14:11 ` Stefan Monnier 2003-11-12 16:29 ` Phillip Lord 2003-11-12 18:28 ` Stefan Monnier 2003-11-12 19:00 ` Phillip Lord 2003-11-13 16:31 ` Stefan Monnier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).