From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: rushan chen Newsgroups: gmane.lisp.guile.devel Subject: Re: a passionate guy who want to join in as a developer Date: Mon, 13 Aug 2012 23:47:11 +0800 Message-ID: References: <5024F643.8040706@netris.org> <1344853122.30422.23.camel@Renee-SUSE.suse> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=f46d044401de6b91fa04c7279b35 X-Trace: dough.gmane.org 1344872847 21465 80.91.229.3 (13 Aug 2012 15:47:27 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 13 Aug 2012 15:47:27 +0000 (UTC) Cc: guile-devel@gnu.org To: nalaginrut@gmail.com, Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Aug 13 17:47:23 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1T0wrQ-0000e0-9N for guile-devel@m.gmane.org; Mon, 13 Aug 2012 17:47:20 +0200 Original-Received: from localhost ([::1]:38289 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T0wrP-0005jE-7b for guile-devel@m.gmane.org; Mon, 13 Aug 2012 11:47:19 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:47291) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T0wrL-0005j3-E0 for guile-devel@gnu.org; Mon, 13 Aug 2012 11:47:17 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T0wrJ-00044I-Qj for guile-devel@gnu.org; Mon, 13 Aug 2012 11:47:15 -0400 Original-Received: from mail-we0-f169.google.com ([74.125.82.169]:56284) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T0wrJ-00044C-GQ for guile-devel@gnu.org; Mon, 13 Aug 2012 11:47:13 -0400 Original-Received: by weys10 with SMTP id s10so2905434wey.0 for ; Mon, 13 Aug 2012 08:47:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=MzUwxo0cgl1JJfuhmkgREDXeJxxj6E4x6JtjjouNWqE=; b=vLcvj0xlprKAlxTmgwW/zxfcDPWUhECp3x65tSh+9OYIycyOAnSa1i3CPCqFx5C6r9 3aVWUDTnAsHbhZqcKPchXDQrCEPLRoo/CtMGKvQO4NRbXADpRHQL/6RXsamOehV0nGM1 CERSDmLTQjqwEB1AzpShlA/aeYVFfspwRtLB//6TAroUFZhkVofyWNdysmpx/37OE2BU MBAGoV3pbcVionXCoF0aybEEIwLf7DLlHGPxDyxFsuaMlKlqm+9f1v6cljDE/U/5fX/Z hfIgioejoU+m/FS1Vk/CFcLWeus6LYaU2KDZHlag+2ar7OaN24HabmEufA3jZlLiGCrV m/qQ== Original-Received: by 10.180.81.133 with SMTP id a5mr19474665wiy.17.1344872832012; Mon, 13 Aug 2012 08:47:12 -0700 (PDT) Original-Received: by 10.216.0.139 with HTTP; Mon, 13 Aug 2012 08:47:11 -0700 (PDT) In-Reply-To: <1344853122.30422.23.camel@Renee-SUSE.suse> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 74.125.82.169 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:14800 Archived-At: --f46d044401de6b91fa04c7279b35 Content-Type: text/plain; charset=ISO-8859-1 Hi nalaginrut, Thanks for your reply. I'd like to start with the vlist implementation Mark mentioned previously, it would be cool if this can help speed up the compiler. But since I'm new here, would anybody be kind enough to give me some clue on how to integrate a c-version vlist into current system? or show me some documents so that I can find the answer myself. Thanks in advance. Rushan Chen On Mon, Aug 13, 2012 at 6:18 PM, nalaginrut wrote: > On Sun, 2012-08-12 at 22:31 +0800, rushan chen wrote: > > Hi Mark, > > > > Very appreciate for your reply. > > > > I see you mention that it's useful to implement a larger library of > > efficient data structure, and I'm interested in that very much. I used to > > work on projects which involve complicated but very interesting data > > structures, implementing them could be challenging, but once done I feel > a > > great sense of achievement. > > > > good > > > One such project is implementing a language model (LM) which is a core > > component of speech recognition and machine translation. I don't know if > > you heard of it before. Unfortunately, I can't cover it too detailed > here, > > that would complicate things too much. > > > > Basically, one of the key operations LM supports is it should return a > > probability associated with any given id sequence. All id sequences are > of > > the same length, and there are a mass amount of such id sequences (a > > commonly-seen LM may contain billions of them). So it's required to store > > LM in a concise way, and at the same time make the search for each id > > sequence very quickly. > > > > OK, it's very good > > > > Trie is finally chosen to be the data structure for LM (there were many > > papers discussing this issue). All id sequences with the same prefix > share > > the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>, > > only one copy of <1, 2, 3> will be stored in LM, and a search for a id > > sequence is done by a sequence of binary search until the leaf is met. > One > > extra thing worth mentioning is that I store the whole trie structure in > a > > single large piece of memory (usually around 2 gigabytes), which makes > > it convenient to write out to disk and load into memory by simply using > > mmap, and I think it also makes the system faster than if you allocate > > memory every time it's needed. > > > > Seems we don't have any prefix-tree implementation yet? > Maybe some hero too busy to share it? ;-) > I'd like to see you make it, or I must write myself one. > IIRC, many guys here wrote their own data-structure/algorithm > implementations. > Sharing makes our world better. > But, sometimes we reinvent wheels just for fun. > So just do what you want to do if it's interesting to you. > > > > There are some other projects I worked or working on like Spell > Corrector, > > which also involve complicated data structures, but due to privacy > policy, > > I can't say much about it. > > > > Actually, there's no privacy policy, that's why GNU and GPL exists. > If something force you not to share, you may rewrite it all by > yourself(or other guys), and GPL it. Then no more privacy policy. Your > friends will see your creativity, and your work be enhanced by others. > > > All in all, I'm very interested in it, and I really really hope I can > help. > > > > Looking forward to your reply. Thanks in advance. > > > > Have fun! > > > > happy hacking! > > > Rushan Chen > > > --f46d044401de6b91fa04c7279b35 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi=A0nalaginrut,

Thanks for your reply.
=

I'd like to start with the vlist implementation Mark men= tioned previously, it would be cool if this can help speed up the compiler.= But since I'm new here, would anybody be kind enough to give me some c= lue on how to integrate a c-version vlist into current system? or show me s= ome documents so that I can find the answer myself.

Thanks in advance.

Rushan Chen
<= br>
On Mon, Aug 13, 2012 at 6:18 PM, nalaginrut <= span dir=3D"ltr"><nalaginrut@gmail.com> wrote:
On Sun, 2012-08-12 at 22:3= 1 +0800, rushan chen wrote:
> Hi Mark,
>
> Very appreciate for your reply.
>
> I see you mention that it's useful to implement a larger library o= f
> efficient data structure, and I'm interested in that very much. I = used to
> work on projects which involve complicated but very interesting data > structures, implementing them could be challenging, but once done I fe= el a
> great sense of achievement.
>

good

> One such project is implementing a language model (LM) which is a core=
> component of speech recognition and machine translation. I don't k= now if
> you heard of it before. Unfortunately, I can't cover it too detail= ed here,
> that would complicate things too much.
>
> Basically, one of the key operations LM supports is it should return a=
> probability associated with any given id sequence. All id sequences ar= e of
> the same length, and there are a mass amount of such id sequences (a > commonly-seen LM may contain billions of them). So it's required t= o store
> LM in a concise way, and at the same time make the search for each id<= br> > sequence very quickly.
>

OK, it's very good


> Trie is finally chosen to be the data structure for LM (there were man= y
> papers discussing this issue). All id sequences with the same prefix s= hare
> the same internal node, for example, for <1, 2, 3, 4> and <1,= 2, 3, 5>,
> only one copy of <1, 2, 3> will be stored in LM, and a search fo= r a id
> sequence is done by a sequence of binary search until the leaf is met.= One
> extra thing worth mentioning is that I store the whole trie structure = in a
> single large piece of memory (usually around 2 gigabytes), which makes=
> it convenient to write out to disk and load into memory by simply usin= g
> mmap, and I think it also makes the system faster than if you allocate=
> memory every time it's needed.
>

Seems we don't have any prefix-tree implementation yet?
Maybe some hero too busy to share it? ;-)
I'd like to see you make it, or I must write myself one.
IIRC, many guys here wrote their own data-structure/algorithm
implementations.
Sharing makes our world better.
But, sometimes we reinvent wheels just for fun.
So just do what you want to do if it's interesting to you.


> There are some other projects I worked or working on like Spell Correc= tor,
> which also involve complicated data structures, but due to privacy pol= icy,
> I can't say much about it.
>

Actually, there's no privacy policy, that's why GNU and GPL e= xists.
If something force you not to share, you may rewrite it all by
yourself(or other guys), and GPL it. Then no more privacy policy. Your
friends will see your creativity, and your work be enhanced by others.

> All in all, I'm very interested in it, and I really really hope I = can help.
>
> Looking forward to your reply. Thanks in advance.
>
> Have fun!
>

happy hacking!

> Rushan Chen



--f46d044401de6b91fa04c7279b35--