[Meant to send this to the list] There's an implementation of a trie in guile-snmp, but it's one of the first things I wrote in scheme and is pretty hideous. I keep meaning to redo it using records instead of GOOPs. I'd ask for code review, but, to be honest, I didn't really have a clue what I was doing when I wrote it (things have only improved slightly since then). I've attached it, the trie code is about a third of the way in. It appears I never bothered writing the delete-node code. On 13 August 2012 11:18, nalaginrut wrote: > On Sun, 2012-08-12 at 22:31 +0800, rushan chen wrote: >> Hi Mark, >> >> Very appreciate for your reply. >> >> I see you mention that it's useful to implement a larger library of >> efficient data structure, and I'm interested in that very much. I used to >> work on projects which involve complicated but very interesting data >> structures, implementing them could be challenging, but once done I feel a >> great sense of achievement. >> > > good > >> One such project is implementing a language model (LM) which is a core >> component of speech recognition and machine translation. I don't know if >> you heard of it before. Unfortunately, I can't cover it too detailed here, >> that would complicate things too much. >> >> Basically, one of the key operations LM supports is it should return a >> probability associated with any given id sequence. All id sequences are of >> the same length, and there are a mass amount of such id sequences (a >> commonly-seen LM may contain billions of them). So it's required to store >> LM in a concise way, and at the same time make the search for each id >> sequence very quickly. >> > > OK, it's very good > > >> Trie is finally chosen to be the data structure for LM (there were many >> papers discussing this issue). All id sequences with the same prefix share >> the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>, >> only one copy of <1, 2, 3> will be stored in LM, and a search for a id >> sequence is done by a sequence of binary search until the leaf is met. One >> extra thing worth mentioning is that I store the whole trie structure in a >> single large piece of memory (usually around 2 gigabytes), which makes >> it convenient to write out to disk and load into memory by simply using >> mmap, and I think it also makes the system faster than if you allocate >> memory every time it's needed. >> > > Seems we don't have any prefix-tree implementation yet? > Maybe some hero too busy to share it? ;-) > I'd like to see you make it, or I must write myself one. > IIRC, many guys here wrote their own data-structure/algorithm > implementations. > Sharing makes our world better. > But, sometimes we reinvent wheels just for fun. > So just do what you want to do if it's interesting to you. > > >> There are some other projects I worked or working on like Spell Corrector, >> which also involve complicated data structures, but due to privacy policy, >> I can't say much about it. >> > > Actually, there's no privacy policy, that's why GNU and GPL exists. > If something force you not to share, you may rewrite it all by > yourself(or other guys), and GPL it. Then no more privacy policy. Your > friends will see your creativity, and your work be enhanced by others. > >> All in all, I'm very interested in it, and I really really hope I can help. >> >> Looking forward to your reply. Thanks in advance. >> >> Have fun! >> > > happy hacking! > >> Rushan Chen > > > -- Tristan Colgate-McFarlane ---- "You can get all your daily vitamins from 52 pints of guiness, and a glass of milk"