From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: rushan chen <chenrslucky@gmail.com>
Newsgroups: gmane.lisp.guile.devel
Subject: Re: a passionate guy who want to join in as a developer
Date: Sun, 12 Aug 2012 22:31:22 +0800
Message-ID: <CAP-p8Ea_aaH2kU6C7zu+tgSn9RTN0=PX-b-iRYNz-rx=r_9Lfg@mail.gmail.com>
References: <CAP-p8EZ-jJ8KpFU3rJs1iZWqW_6yYY0iMmLBs0oNehwzSCeK2g@mail.gmail.com>
	<5024F643.8040706@netris.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=e89a8f3b9f65653be104c7126e87
X-Trace: dough.gmane.org 1344781892 16120 80.91.229.3 (12 Aug 2012 14:31:32 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Sun, 12 Aug 2012 14:31:32 +0000 (UTC)
Cc: guile-devel@gnu.org
To: Mark H Weaver <mhw@netris.org>
Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun Aug 12 16:31:32 2012
Return-path: <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>
Envelope-to: guile-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1T0ZCV-0000lU-Sq
	for guile-devel@m.gmane.org; Sun, 12 Aug 2012 16:31:32 +0200
Original-Received: from localhost ([::1]:60730 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1T0ZCU-0001JQ-Sf
	for guile-devel@m.gmane.org; Sun, 12 Aug 2012 10:31:30 -0400
Original-Received: from eggs.gnu.org ([208.118.235.92]:40829)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <chenrslucky@gmail.com>) id 1T0ZCQ-0001Fu-MO
	for guile-devel@gnu.org; Sun, 12 Aug 2012 10:31:28 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <chenrslucky@gmail.com>) id 1T0ZCO-0006OM-7H
	for guile-devel@gnu.org; Sun, 12 Aug 2012 10:31:26 -0400
Original-Received: from mail-wg0-f49.google.com ([74.125.82.49]:54672)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <chenrslucky@gmail.com>) id 1T0ZCN-0006NY-TZ
	for guile-devel@gnu.org; Sun, 12 Aug 2012 10:31:24 -0400
Original-Received: by wgbez12 with SMTP id ez12so2119074wgb.30
	for <guile-devel@gnu.org>; Sun, 12 Aug 2012 07:31:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=1PphRchLxCle4rsZyLTgloYsSRbR8TkkPqOSwSNsQng=;
	b=FZ7IniC6zdE6trVBMU7GI5db9csBNHCLq24JRtbwoPS1JMQ9M+jMW/MuqgVXZ9zVAO
	DHbh8iFwhZiTn1P50jlZp8iq5EB3pYXfz1QcqERnH9d2PiTQ5PLpfcB83KV7+P1lmhrh
	bHAcraS0yg5IJKyKIgyMLirKPpc3FTlFuJMihMYTBBD8jCh/whgLqrgM79OvBmVfJDaJ
	67xvlWg65dKT/JqbU1PnDYEp/vZbw1jlYLL/lRB4Q+5nH9lW31MxabDcUfTFyI+vXGR7
	ngdsWLX1hhHoHput007FntzsVTPhTKOd03pDG+RW30zowlWw6okF4pwPcRXvcM1ZkLWC
	szqQ==
Original-Received: by 10.180.107.2 with SMTP id gy2mr11014076wib.2.1344781882309; Sun,
	12 Aug 2012 07:31:22 -0700 (PDT)
Original-Received: by 10.216.0.139 with HTTP; Sun, 12 Aug 2012 07:31:22 -0700 (PDT)
In-Reply-To: <5024F643.8040706@netris.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 74.125.82.49
X-BeenThere: guile-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Developers list for Guile,
	the GNU extensibility library" <guile-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/guile-devel>
List-Post: <mailto:guile-devel@gnu.org>
List-Help: <mailto:guile-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=subscribe>
Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org
Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.lisp.guile.devel:14795
Archived-At: <http://permalink.gmane.org/gmane.lisp.guile.devel/14795>

--e89a8f3b9f65653be104c7126e87
Content-Type: text/plain; charset=ISO-8859-1

Hi Mark,

Very appreciate for your reply.

I see you mention that it's useful to implement a larger library of
efficient data structure, and I'm interested in that very much. I used to
work on projects which involve complicated but very interesting data
structures, implementing them could be challenging, but once done I feel a
great sense of achievement.

One such project is implementing a language model (LM) which is a core
component of speech recognition and machine translation. I don't know if
you heard of it before. Unfortunately, I can't cover it too detailed here,
that would complicate things too much.

Basically, one of the key operations LM supports is it should return a
probability associated with any given id sequence. All id sequences are of
the same length, and there are a mass amount of such id sequences (a
commonly-seen LM may contain billions of them). So it's required to store
LM in a concise way, and at the same time make the search for each id
sequence very quickly.

Trie is finally chosen to be the data structure for LM (there were many
papers discussing this issue). All id sequences with the same prefix share
the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>,
only one copy of <1, 2, 3> will be stored in LM, and a search for a id
sequence is done by a sequence of binary search until the leaf is met. One
extra thing worth mentioning is that I store the whole trie structure in a
single large piece of memory (usually around 2 gigabytes), which makes
it convenient to write out to disk and load into memory by simply using
mmap, and I think it also makes the system faster than if you allocate
memory every time it's needed.

There are some other projects I worked or working on like Spell Corrector,
which also involve complicated data structures, but due to privacy policy,
I can't say much about it.

All in all, I'm very interested in it, and I really really hope I can help.

Looking forward to your reply. Thanks in advance.

Have fun!

Rushan Chen

--e89a8f3b9f65653be104c7126e87
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Mark,<div><br></div><div>Very appreciate for your reply.=A0</div><div><b=
r></div><div>I see you mention that it&#39;s useful to implement a larger l=
ibrary of efficient data structure, and I&#39;m interested in that very muc=
h. I used to work on projects which involve complicated but very interestin=
g data structures, implementing them could be challenging, but once done I =
feel a great sense of achievement.=A0</div>
<div><br></div><div>One such project is implementing a language model (LM) =
which is a core component of=A0speech recognition and machine translation. =
I don&#39;t know if you heard of it before.=A0Unfortunately, I can&#39;t co=
ver it too detailed here, that would complicate things too much.=A0</div>
<div><br></div><div>Basically, one of the key operations LM supports is it =
should return a probability associated with any given id sequence. All id s=
equences are of the same length, and there are a mass amount of such id seq=
uences (a commonly-seen LM may contain billions of them). So it&#39;s requi=
red to store LM in a concise way, and at the same time make the search for =
each id sequence very quickly.</div>
<div><br></div><div>Trie is finally=A0chosen=A0to be the data structure for=
 LM (there were many papers discussing this issue). All id sequences with t=
he same prefix share the same internal node, for example, for &lt;1, 2, 3, =
4&gt; and &lt;1, 2, 3, 5&gt;, only one copy of &lt;1, 2, 3&gt; will be stor=
ed in LM, and a search for a id sequence is done by a sequence of binary se=
arch until the leaf is met. One extra thing worth mentioning is that I stor=
e the whole trie structure in a single large piece of memory (usually aroun=
d 2 gigabytes), which makes it=A0convenient=A0to write out to disk and load=
 into memory by simply using mmap, and I think it also makes the system fas=
ter than if you allocate memory every time it&#39;s needed.</div>
<div><div><br></div></div><div>There are some other projects I worked or wo=
rking on like Spell Corrector, which also involve complicated data structur=
es, but due to privacy policy, I can&#39;t say much about it.</div><div>
<br></div><div>All in all, I&#39;m very interested in it, and I really real=
ly hope I can help.</div><div><br></div><div>Looking forward to your reply.=
 Thanks in advance.</div><div><br></div><div>Have fun!</div><div><br></div>
<div>Rushan Chen</div>

--e89a8f3b9f65653be104c7126e87--