[Meant to send this to the list]

There's an implementation of a trie in guile-snmp, but it's one of the
first things I wrote in scheme and is pretty hideous. I keep meaning
to redo it using records instead of GOOPs.

I'd ask for code review, but, to be honest, I didn't really have a
clue what I was doing when I wrote it (things have only improved
slightly since then).

I've attached it, the trie code is about a third of the way in. It
appears I never bothered writing the delete-node code.


On 13 August 2012 11:18, nalaginrut <nalaginrut@gmail.com> wrote:
> On Sun, 2012-08-12 at 22:31 +0800, rushan chen wrote:
>> Hi Mark,
>>
>> Very appreciate for your reply.
>>
>> I see you mention that it's useful to implement a larger library of
>> efficient data structure, and I'm interested in that very much. I used to
>> work on projects which involve complicated but very interesting data
>> structures, implementing them could be challenging, but once done I feel a
>> great sense of achievement.
>>
>
> good
>
>> One such project is implementing a language model (LM) which is a core
>> component of speech recognition and machine translation. I don't know if
>> you heard of it before. Unfortunately, I can't cover it too detailed here,
>> that would complicate things too much.
>>
>> Basically, one of the key operations LM supports is it should return a
>> probability associated with any given id sequence. All id sequences are of
>> the same length, and there are a mass amount of such id sequences (a
>> commonly-seen LM may contain billions of them). So it's required to store
>> LM in a concise way, and at the same time make the search for each id
>> sequence very quickly.
>>
>
> OK, it's very good
>
>
>> Trie is finally chosen to be the data structure for LM (there were many
>> papers discussing this issue). All id sequences with the same prefix share
>> the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>,
>> only one copy of <1, 2, 3> will be stored in LM, and a search for a id
>> sequence is done by a sequence of binary search until the leaf is met. One
>> extra thing worth mentioning is that I store the whole trie structure in a
>> single large piece of memory (usually around 2 gigabytes), which makes
>> it convenient to write out to disk and load into memory by simply using
>> mmap, and I think it also makes the system faster than if you allocate
>> memory every time it's needed.
>>
>
> Seems we don't have any prefix-tree implementation yet?
> Maybe some hero too busy to share it? ;-)
> I'd like to see you make it, or I must write myself one.
> IIRC, many guys here wrote their own data-structure/algorithm
> implementations.
> Sharing makes our world better.
> But, sometimes we reinvent wheels just for fun.
> So just do what you want to do if it's interesting to you.
>
>
>> There are some other projects I worked or working on like Spell Corrector,
>> which also involve complicated data structures, but due to privacy policy,
>> I can't say much about it.
>>
>
> Actually, there's no privacy policy, that's why GNU and GPL exists.
> If something force you not to share, you may rewrite it all by
> yourself(or other guys), and GPL it. Then no more privacy policy. Your
> friends will see your creativity, and your work be enhanced by others.
>
>> All in all, I'm very interested in it, and I really really hope I can help.
>>
>> Looking forward to your reply. Thanks in advance.
>>
>> Have fun!
>>
>
> happy hacking!
>
>> Rushan Chen
>
>
>


-- 
Tristan Colgate-McFarlane
----
  "You can get all your daily vitamins from 52 pints of guiness, and a
glass of milk"