a passionate guy who want to join in as a developer

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* a passionate guy who want to join in as a developer
@ 2012-08-06 14:50 rushan chen
  2012-08-10 11:53 ` Mark H Weaver
  0 siblings, 1 reply; 10+ messages in thread
From: rushan chen @ 2012-08-06 14:50 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 338 bytes --]

Hi all!

I'm just a guy who is passionate about programming especially programming
in C, and want to enjoy it with people like-minded. Guile is an interesting
project to me, and I really really hope I can make some contribution to it
as a developer.

Hope someone could point me some direction, thanks in advance.

have fun!

rushan chen

[-- Attachment #2: Type: text/html, Size: 456 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a passionate guy who want to join in as a developer
  2012-08-06 14:50 a passionate guy who want to join in as a developer rushan chen
@ 2012-08-10 11:53 ` Mark H Weaver
  2012-08-12 14:31   ` rushan chen
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Mark H Weaver @ 2012-08-10 11:53 UTC (permalink / raw)
  To: rushan chen; +Cc: guile-devel

Hi Rushan,

On 08/06/2012 10:50 AM, rushan chen wrote:
> I'm just a guy who is passionate about programming especially
> programming in C, and want to enjoy it with people like-minded. Guile is
> an interesting project to me, and I really really hope I can make some
> contribution to it as a developer.

Sounds great, we can certainly use more help :)

> Hope someone could point me some direction, thanks in advance.

It's hard to make specific suggestions without knowing more about your 
skills and interests.  Most Guile developers started out by using Guile 
for their own work, and ended up fixing bugs that they found, and added 
features that they needed.  I think this is the best path, because it 
makes it far more likely that you will understand the needs of users who 
need a particular feature.

There's been a recent trend in Guile to write more code in Scheme and 
less in C.  Most of the new C code is fairly deep in the guts of Guile, 
and I'm reluctant to point you in that direction until you've gained 
some experience working on Guile.

If you prefer to write C code, then one category of useful jobs is to 
write bindings for C libraries so that they can be easily accessed from 
Scheme.  Of course they should be C libraries that Guile programmers 
would like to use, and that would not be better to implement directly in 
Scheme.

Another thing we need that makes sense to write in C is a larger library 
of efficient data structures.  For example, we could really use an 
efficient balanced tree implementation.  However, if you want to work on 
a new data structure, please first post a proposed API to this list so 
that we provide feedback before you get started on the implementation, 
to avoid wasted work.

Here's one specific data structure idea: we have a "vlist" data 
structure, currently written in Scheme, but in my opinion it would be 
useful to rewrite it in C for better efficiency.  It is already used 
quite heavily in our compiler, so you could make our compiler run much 
faster this way.  Also, users might be more inclined to use vlists if 
they were more efficient.

And of course, I should invite you to take a look at our bug tracker
http://debbugs.gnu.org/guile and see if there are any that you can fix. 
  While this work is not as fun as adding new features, it is deeply 
appreciated and a great way to learn about Guile internals.

Well, those are a few ideas anyway, but I still think the best path is 
for you to scratch your own itches.  Use Guile for your own work, report 
bugs that you find (and fix them if you can), and more generally try to 
find ways to improve Guile that would have made your work easier and 
more fun.

Thanks so much for your interest, and happy hacking!

      Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a passionate guy who want to join in as a developer
  2012-08-10 11:53 ` Mark H Weaver
@ 2012-08-12 14:31   ` rushan chen
  2012-08-13 10:18     ` nalaginrut
  2012-08-13 16:47   ` Optimizing vlists Ludovic Courtès
  2012-12-13  8:58   ` a passionate guy who want to join in as a developer Abhinav Jauhri
  2 siblings, 1 reply; 10+ messages in thread
From: rushan chen @ 2012-08-12 14:31 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2015 bytes --]

Hi Mark,

Very appreciate for your reply.

I see you mention that it's useful to implement a larger library of
efficient data structure, and I'm interested in that very much. I used to
work on projects which involve complicated but very interesting data
structures, implementing them could be challenging, but once done I feel a
great sense of achievement.

One such project is implementing a language model (LM) which is a core
component of speech recognition and machine translation. I don't know if
you heard of it before. Unfortunately, I can't cover it too detailed here,
that would complicate things too much.

Basically, one of the key operations LM supports is it should return a
probability associated with any given id sequence. All id sequences are of
the same length, and there are a mass amount of such id sequences (a
commonly-seen LM may contain billions of them). So it's required to store
LM in a concise way, and at the same time make the search for each id
sequence very quickly.

Trie is finally chosen to be the data structure for LM (there were many
papers discussing this issue). All id sequences with the same prefix share
the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>,
only one copy of <1, 2, 3> will be stored in LM, and a search for a id
sequence is done by a sequence of binary search until the leaf is met. One
extra thing worth mentioning is that I store the whole trie structure in a
single large piece of memory (usually around 2 gigabytes), which makes
it convenient to write out to disk and load into memory by simply using
mmap, and I think it also makes the system faster than if you allocate
memory every time it's needed.

There are some other projects I worked or working on like Spell Corrector,
which also involve complicated data structures, but due to privacy policy,
I can't say much about it.

All in all, I'm very interested in it, and I really really hope I can help.

Looking forward to your reply. Thanks in advance.

Have fun!

Rushan Chen

[-- Attachment #2: Type: text/html, Size: 2332 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a passionate guy who want to join in as a developer
  2012-08-12 14:31   ` rushan chen
@ 2012-08-13 10:18     ` nalaginrut
  2012-08-13 10:41       ` Tristan Colgate
  2012-08-13 15:47       ` rushan chen
  0 siblings, 2 replies; 10+ messages in thread
From: nalaginrut @ 2012-08-13 10:18 UTC (permalink / raw)
  To: rushan chen; +Cc: Mark H Weaver, guile-devel

On Sun, 2012-08-12 at 22:31 +0800, rushan chen wrote: 
> Hi Mark,
> 
> Very appreciate for your reply.
> 
> I see you mention that it's useful to implement a larger library of
> efficient data structure, and I'm interested in that very much. I used to
> work on projects which involve complicated but very interesting data
> structures, implementing them could be challenging, but once done I feel a
> great sense of achievement.
> 

good

> One such project is implementing a language model (LM) which is a core
> component of speech recognition and machine translation. I don't know if
> you heard of it before. Unfortunately, I can't cover it too detailed here,
> that would complicate things too much.
> 
> Basically, one of the key operations LM supports is it should return a
> probability associated with any given id sequence. All id sequences are of
> the same length, and there are a mass amount of such id sequences (a
> commonly-seen LM may contain billions of them). So it's required to store
> LM in a concise way, and at the same time make the search for each id
> sequence very quickly.
> 

OK, it's very good


> Trie is finally chosen to be the data structure for LM (there were many
> papers discussing this issue). All id sequences with the same prefix share
> the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>,
> only one copy of <1, 2, 3> will be stored in LM, and a search for a id
> sequence is done by a sequence of binary search until the leaf is met. One
> extra thing worth mentioning is that I store the whole trie structure in a
> single large piece of memory (usually around 2 gigabytes), which makes
> it convenient to write out to disk and load into memory by simply using
> mmap, and I think it also makes the system faster than if you allocate
> memory every time it's needed.
> 

Seems we don't have any prefix-tree implementation yet?
Maybe some hero too busy to share it? ;-)
I'd like to see you make it, or I must write myself one.
IIRC, many guys here wrote their own data-structure/algorithm
implementations. 
Sharing makes our world better. 
But, sometimes we reinvent wheels just for fun. 
So just do what you want to do if it's interesting to you.


> There are some other projects I worked or working on like Spell Corrector,
> which also involve complicated data structures, but due to privacy policy,
> I can't say much about it.
> 

Actually, there's no privacy policy, that's why GNU and GPL exists.
If something force you not to share, you may rewrite it all by
yourself(or other guys), and GPL it. Then no more privacy policy. Your
friends will see your creativity, and your work be enhanced by others.

> All in all, I'm very interested in it, and I really really hope I can help.
> 
> Looking forward to your reply. Thanks in advance.
> 
> Have fun!
> 

happy hacking!

> Rushan Chen





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a passionate guy who want to join in as a developer
  2012-08-13 10:18     ` nalaginrut
@ 2012-08-13 10:41       ` Tristan Colgate
  2012-08-13 15:47       ` rushan chen
  1 sibling, 0 replies; 10+ messages in thread
From: Tristan Colgate @ 2012-08-13 10:41 UTC (permalink / raw)
  To: nalaginrut; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3625 bytes --]

[Meant to send this to the list]

There's an implementation of a trie in guile-snmp, but it's one of the
first things I wrote in scheme and is pretty hideous. I keep meaning
to redo it using records instead of GOOPs.

I'd ask for code review, but, to be honest, I didn't really have a
clue what I was doing when I wrote it (things have only improved
slightly since then).

I've attached it, the trie code is about a third of the way in. It
appears I never bothered writing the delete-node code.


On 13 August 2012 11:18, nalaginrut <nalaginrut@gmail.com> wrote:
> On Sun, 2012-08-12 at 22:31 +0800, rushan chen wrote:
>> Hi Mark,
>>
>> Very appreciate for your reply.
>>
>> I see you mention that it's useful to implement a larger library of
>> efficient data structure, and I'm interested in that very much. I used to
>> work on projects which involve complicated but very interesting data
>> structures, implementing them could be challenging, but once done I feel a
>> great sense of achievement.
>>
>
> good
>
>> One such project is implementing a language model (LM) which is a core
>> component of speech recognition and machine translation. I don't know if
>> you heard of it before. Unfortunately, I can't cover it too detailed here,
>> that would complicate things too much.
>>
>> Basically, one of the key operations LM supports is it should return a
>> probability associated with any given id sequence. All id sequences are of
>> the same length, and there are a mass amount of such id sequences (a
>> commonly-seen LM may contain billions of them). So it's required to store
>> LM in a concise way, and at the same time make the search for each id
>> sequence very quickly.
>>
>
> OK, it's very good
>
>
>> Trie is finally chosen to be the data structure for LM (there were many
>> papers discussing this issue). All id sequences with the same prefix share
>> the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>,
>> only one copy of <1, 2, 3> will be stored in LM, and a search for a id
>> sequence is done by a sequence of binary search until the leaf is met. One
>> extra thing worth mentioning is that I store the whole trie structure in a
>> single large piece of memory (usually around 2 gigabytes), which makes
>> it convenient to write out to disk and load into memory by simply using
>> mmap, and I think it also makes the system faster than if you allocate
>> memory every time it's needed.
>>
>
> Seems we don't have any prefix-tree implementation yet?
> Maybe some hero too busy to share it? ;-)
> I'd like to see you make it, or I must write myself one.
> IIRC, many guys here wrote their own data-structure/algorithm
> implementations.
> Sharing makes our world better.
> But, sometimes we reinvent wheels just for fun.
> So just do what you want to do if it's interesting to you.
>
>
>> There are some other projects I worked or working on like Spell Corrector,
>> which also involve complicated data structures, but due to privacy policy,
>> I can't say much about it.
>>
>
> Actually, there's no privacy policy, that's why GNU and GPL exists.
> If something force you not to share, you may rewrite it all by
> yourself(or other guys), and GPL it. Then no more privacy policy. Your
> friends will see your creativity, and your work be enhanced by others.
>
>> All in all, I'm very interested in it, and I really really hope I can help.
>>
>> Looking forward to your reply. Thanks in advance.
>>
>> Have fun!
>>
>
> happy hacking!
>
>> Rushan Chen
>
>
>



-- 
Tristan Colgate-McFarlane
----
  "You can get all your daily vitamins from 52 pints of guiness, and a
glass of milk"

[-- Attachment #2: ipv4-router.scm --]
[-- Type: application/octet-stream, Size: 19026 bytes --]

(define-module (ipv4-router)
  #:use-module (oop goops)
  #:use-module (srfi srfi-11)
  #:export 
    (<ipv4-address> ip 
     <ipv4-network> network prefix prefix-length mask
     prefixlen->intmask
     ip-in-network?
     <ipv4-route> route net gw misc
     <ipv4-table>
     add-ipv4-route
     add-ipv4-route!
     remove-ipv4-route
     remove-ipv4-route!
     find-ipv4-route
     trie-root
     <trie-node>
     value value-length
     add-trie-node
     trie-node->dot
     rootnode
     nullnode
     object-equal?
     ))

; A class to represent an ip address
(define (ipv4str->ipv4int ipstr)
  (if (string? ipstr)
    (car (vector-ref (gethostbyname ipstr) 4 ))
    (error "ipstr expects a string")))

(define-class <ipv4-address> ()
  (int-ip #:accessor int-ip 
          #:init-value 0
          #:init-keyword #:int-ip)
  (ip #:init-keyword #:ip
      #:allocation #:virtual
      #:slot-ref (lambda(this)(int-ip this))
      #:slot-set! (lambda(this val)(set! (int-ip this) val))
      #:accessor ip ))

(define-method ((setter int-ip) (this <ipv4-address>) (ip-string <string>))
  (slot-set! this 'int-ip (ipv4str->ipv4int ip-string)))

(define-method ((setter int-ip) (this <ipv4-address>) (ip-integer <integer>))
  (slot-set! this 'int-ip  ip-integer))

(define-method ((setter int-ip) (this <ipv4-address>) default)
  (format (current-error-port) "Cannot convert to IP: ~a~%" default)
  (throw 'error))

(define-method (object-equal? (a <ipv4-address>) (b <ipv4-address>))
  (equal? (int-ip a) (int-ip b)))

(define-method (display (this <ipv4-address>) port)
  (display (inet-ntoa (int-ip this))))

(define-method (write (this <ipv4-address>) port)
  (format port "#:ip(~s)" (inet-ntoa (int-ip this))))

; Class for representing a network
(define-class <ipv4-network> ()
  (prefix #:accessor prefix 
          #:init-value (make <ipv4-address>)
          #:init-keyword #:prefix)
  (prefix-length  #:accessor prefix-length 
               #:init-value 0
               #:init-keyword #:prefix-len)
  (network #:allocation #:virtual 
           #:accessor network
           #:slot-ref (lambda(this)(prefix this))
           #:slot-set! (lambda(this val)(set! (network this) val))
           #:init-keyword #:network)
  (mask #:allocation #:virtual 
        #:accessor mask
        #:slot-ref (lambda(this)(inet-ntoa (prefixlen->intmask (prefix-length this))))
        #:slot-set! (lambda(this val)(set! (mask this) val))
        #:init-keyword #:mask))

; setters
(define-method ((setter prefix) (this <ipv4-network>) (value <ipv4-address>))
  (slot-set! this 'prefix value))
(define-method ((setter prefix) (this <ipv4-network>) value)
  (slot-set! this 'prefix (make <ipv4-address> #:ip value)))

(define-method ((setter prefix-len) (this <ipv4-network>) (value <integer>))
  (slot-set! this 'prefix-length value))

(define-method ((setter mask) (this <ipv4-network>) (value <string>))
  (set! (mask this) (inet-aton value)))

(define-method ((setter mask) (this <ipv4-network>) (value <integer>))
  (let* ((intlen  (integer-length value))
         (len (- 32  (integer-length (logxor value (- (integer-expt 2 32) 1))))))
    (if (and (or (eq? intlen 0) (eq? intlen 32))
             (eq? len (logcount value)))
     (set! (prefix-length this) len)
     (error "invalid (or discontiguous) netmask"))))

(define (prefixlen->intmask preflen)
  (ash (- (integer-expt 2 preflen) 1) (- 32 preflen )))
 
(define-method ((setter network) (this <ipv4-network>) (networkspec <string>))
  (if (string-index networkspec #\/)
    (let ((spec (string-split networkspec #\/)))
      (set! (network this) (list (make <ipv4-address> #:ip (car spec)) (string->number (cadr spec)))))
    (if (string-index networkspec #\ )
      (let ((spec (string-split networkspec #\ )))
        (set! (network this) (list (make <ipv4-address> #:ip (car spec)) (cadr spec))))
      (error "Bad network specification"))))

(define-method ((setter network) (this <ipv4-network>) (networkspec <pair>))
  (let ((net (car networkspec))
        (maskspec (cadr networkspec)))
    (set! (prefix this) net)
    (cond 
       ((equal? (class-of maskspec) <string>)
        (set! (mask this) maskspec))
       ((equal? (class-of maskspec) <integer>)
        (set! (prefix-length this) maskspec))
       (else (throw 'error)))))

(define-method (object-equal? (a <ipv4-network>) (b <ipv4-network>))
  (and
    (object-equal? (prefix a) (prefix b))
    (equal? (prefix-length a) (prefix-length b))))

(define-method (display (this <ipv4-network>) port)
  (format port "~a/~a" (prefix this) (prefix-length this)))
(define-method (write (this <ipv4-network>) port)
  (format port "#:net(~s/~s)" (prefix this) (prefix-length this)))

(define (ip-in-network? testip testnet)
  (eqv? (logand (ip testip) (prefixlen->intmask (prefix-length testnet)))
          (ip (prefix testnet))))

; Class for representing a route
(define-class <ipv4-route> ()
  (_net #:init-value (make <ipv4-network> ))
  (_gw  #:init-value (make <ipv4-address> ))
  (misc #:init-value '()
        #:accessor misc
        #:init-keyword #:misc)
  (net  #:allocation #:virtual
        #:accessor net 
        #:slot-ref (lambda(this)(slot-ref this '_net))
        #:slot-set! (lambda(this val)(set! (net this) val))
        #:init-keyword #:net)
  (gw   #:allocation #:virtual
        #:accessor gw 
        #:slot-ref (lambda(this)(slot-ref this '_gw))
        #:slot-set! (lambda(this val)(set! (gw this) val))
        #:init-keyword #:gw))

; setters
(define-method ((setter net) (this <ipv4-route>) (val <ipv4-network>))
  (slot-set! this '_net val))
(define-method ((setter net) (this <ipv4-route>) val)
  (slot-set! this '_net (make <ipv4-network> #:network val)))

(define-method ((setter gw) (this <ipv4-route>) (val <ipv4-address>))
  (slot-set! this '_gw val))
(define-method ((setter gw) (this <ipv4-route>) val)
  (slot-set! this '_gw (make <ipv4-address> #:ip val)))

(define-method (object-equal? (a <ipv4-route>) (b <ipv4-route>))
  (and
    (object-equal? (net a) (net b))
    (object-equal? (gw a) (gw b))))

(define-method (display (this <ipv4-route>) port)
  (format port "~a via ~a" (net this) (gw this)))
(define-method (write (this <ipv4-route>) port)
  (format port "#:route(~s via ~s)" (net this) (gw this)))

; Class for implementing a compressed radix tree
;
(define-generic value-binstr)
(define nullnode)
(define-class <trie-node> ()
  (value         #:accessor value         
                 #:init-value  0 
                 #:init-keyword #:val)
  (value-binstr  #:allocation #:virtual   
                 #:slot-ref value-binstr 
                 #:slot-set! value-binstr
                 #:init-keyword #:valbin)
  (value-length  #:accessor value-length  
                 #:init-value 0 
                 #:init-keyword #:len)
  (left          #:accessor left          
                 #:init-value nullnode 
                 #:init-keyword #:l)
  (right         #:accessor right         
                 #:init-value nullnode
                 #:init-keyword #:r)
  (userdata      #:accessor userdata      
                 #:init-value #f 
                 #:init-keyword #:udata))
(set! nullnode (make <trie-node>))
(define rootnode (make <trie-node> #:l nullnode #:r nullnode))

; setters
(define-method ((setter value) (this <trie-node>) (val <integer>))
  (slot-set! this 'value val))
(define-method ((setter value-length) (this <trie-node>) (len <integer>))
  (slot-set! this 'value-length len))
(define-method ((setter left) (this <trie-node>) (l <trie-node>))
  (slot-set! this 'left l))
(define-method ((setter right) (this <trie-node>) (r <trie-node>))
  (slot-set! this 'right r))
(define-method ((setter userdata) (this <trie-node>) ud)
  (slot-set! this 'userdata ud))

; virtual getters
(define-method (value-binstr (this <trie-node>))
  (with-output-to-string
    (lambda()
      (format #t 
        (string-append
          (string-append "~" 
            (string-append (number->string (value-length this))))
          ",'0b")
        (value this)))))

(define (shared-prefix v1 l1 v2 l2)
  "Takes two binary string (number and length) and returns, using multiple values
   the common prefix and its length"
  (cond 
     ((= l1 l2)
      (let again ((rotv1  v1)
                  (rotv2  v2)
                  (l      l1))
        (if (equal? rotv1 rotv2)
            (values rotv1 l)
            (again (ash rotv1 -1) (ash rotv2 -1) (- l 1)))))
     ((> l1 l2)
      (shared-prefix (ash v1 (- l2 l1)) l2 v2 l2))
     ((< l1 l2)
      (shared-prefix v1 l1 (ash v2 (- l1 l2)) l1))))
  
(define-generic add-trie-node)
(define-method (add-trie-node (root <trie-node>) 
                              (val <integer>) 
                              (len <integer>) 
                              udata)
  (if (eq? root nullnode)
    ; We are being asked to add a new leaf
    (make <trie-node> #:val val #:len len #:l nullnode #:r nullnode #:udata udata)
    ; Trying to add to an existing node
      (let-values (((common-pref common-len) (shared-prefix (value root)(value-length root)
                                              val len)))
        (if (equal? common-len (value-length root) len)
          ; We are an exact match for this prefix
          (if (equal? udata (userdata root))
             ; We are an exact match for this node, just return
             root
             ; Userdata is different
             (make <trie-node> #:val (value root) 
                               #:len (value-length root) 
                               #:l (left root) 
                               #:r (right root)
                               #:udata udata))
          (if (equal? common-len (value-length root))
            ; We are alering a sub tree
            (let* ((nextlen (- len common-len))
                   (nextval (logand val (- (integer-expt 2 nextlen) 1))))
              (if (not (logbit? (- nextlen 1) val))
                ;add to left for a zero
                (let ((nextnode (add-trie-node (left root) nextval nextlen udata)))
                  (if (eq? nextnode (left root))
                      root
                      (make <trie-node> #:val (value root) 
                                        #:len (value-length root) 
                                        #:l nextnode 
                                        #:r (right root)
                                        #:udata (userdata root))))
                ;add to right for a one 
                (let ((nextnode (add-trie-node (right root) nextval nextlen udata)))
                  (if (eq? nextnode (right root))
                      root
                      (make <trie-node> #:val (value root) 
                                        #:len (value-length root) 
                                        #:l (left root) 
                                        #:r nextnode
                                        #:udata (userdata root))))))
            ; We need to split our current node
            (let* ((common-remains (make <trie-node> 
                                      #:val
                                      (logand (value root) 
                                              (- (integer-expt 2 
                                                               (- (value-length root)
                                                                  common-len)) 
                                                 1))
                                      #:len
                                      (- (value-length root) common-len)
                                      #:l (left root)
                                      #:r (right root)
                                      #:udata (userdata root)))
                   (common-newdata 
                                   (if (equal? len common-len)
                                     nullnode
                                     (make <trie-node>
                                        #:val
                                        (logand val
                                                (- (integer-expt 2 
                                                                 (- len 
                                                                    common-len)) 
                                                   1))
                                        #:len (- len common-len)
                                        #:l nullnode
                                        #:r nullnode
                                        #:udata udata)))
                   (nextudata      (if (equal? len common-len)
                                      udata
                                      #f))
                   (nextleft       (if (not (logbit? (- (value-length common-remains) 1)
                                                     (value common-remains)))
                                      common-remains
                                      common-newdata))
                   (nextright      (if (eq? nextleft common-remains)
                                      common-newdata
                                      common-remains)))
              (make <trie-node> #:val common-pref #:len common-len #:l nextleft #:r nextright #:udata nextudata)))))))

(define-generic delete-trie-node)
(define-method (delete-trie-node (rootnode <trie-node>) (prefix <integer>)))

(define-generic longest-prefix-match)
(define-method (longest-prefix-match (root <trie-node>) (val <integer>) (len <integer>))
  (let  again ((currnode root)
               (currval val)
               (currlen len)
               (prevmatch 0)
               (prevmatchlen 0)
               (prevnode nullnode)
               (bestmatch 0) 
               (bestmatchlen 0) 
               (bestmatchnode nullnode)) 
   (if (eq? currnode nullnode)
     (values bestmatch bestmatchlen bestmatchnode)
     (let-values (((common-pref common-len) (shared-prefix (value currnode)(value-length currnode)
                                             currval currlen)))
       (cond 
         ; We have gone as far as we can, prevmatch holds the longest full match
         ((< common-len (value-length currnode))
          (values bestmatch bestmatchlen bestmatchnode))
         ; This node is an exact match))) 
         ((equal? common-len currlen) 
          (values
             (logand (ash prevmatch currlen) currval)
             (+ currlen prevmatchlen)
             currnode))
         (else 
           (letrec ((nextval (logand currval (- (integer-expt 2 (- currlen (value-length currnode))) 1))) 
                    (nextlen (- currlen (value-length currnode))))
             (again  
                 (if (logbit? (- nextlen 1) nextval)
                   (right currnode)
                   (left currnode))
                 nextval
                 nextlen
                 (logior (ash prevmatch (value-length currnode)) (value currnode))
                 (+ (value-length currnode) prevmatchlen)
                 currnode
                 (if (equal? (userdata currnode) #f)
                    bestmatch
                    (logior (ash prevmatch (value-length currnode)) (value currnode)))
                 (if (equal? (userdata currnode) #f)
                    bestmatchlen
                    (+ (value-length currnode) prevmatchlen))
                 (if (equal? (userdata currnode) #f)
                    bestmatchnode
                    currnode)))))))))
  

; Display methods
(define-method (display (this <trie-node>) port)
  (if (eq? this nullnode)
    (format port "null")
    (format port "( ~b ~a ~a ~a ~a)"
                 (value this)
                 (value-length this)
                 (left this)
                 (right this)
                 (userdata this))))
(define-method (write (this <trie-node>) port)
  (if (eq? this nullnode)
    (format port "null")
    (format port "#:trie-node(val: ~b len: ~a l: ~a r: ~a ud: ~a)" 
                 (value this)
                 (value-length this)
                 (left this)
                 (right this)
                 (userdata this))))

(define (walk-trie root func)
  (if (not (eq? nullnode root))
    (begin
      (func root)
      (walk-trie (left root) func)
      (walk-trie (right root) func))))

(define (trie-node->dot port start)
  (format port "digraph G {~%")
  (walk-trie start (lambda(node)
                     (if (not (eq? node nullnode))
                       (if (not (eq? (value-length node) 0))
                         (format port "~a [label=~s];~%" (object-address node)(with-output-to-string (lambda()(format #t "~a via ~a" (value-binstr node) (userdata node)))))
                         (format port "~a [label=\"root\"];~%" (object-address node))))))
  (walk-trie start (lambda(node)
                     (if (not (eq? node nullnode))
                       (begin
                         (if (not (eq? (left node) nullnode))
                           (format port "~a->~a;~%" (object-address node)
                                                   (object-address (left node))))
                         (if (not (eq? (right node) nullnode))
                           (format port "~a->~a;~%" (object-address node)
                                                   (object-address (right node))))))))
  (format port "}~%"))

; Wrapper around compressed radix code to support ipv4 routing table.
(define-class <ipv4-table> ()
  (trie-root #:init-value rootnode 
             #:accessor trie-root))
`
(define-generic add-ipv4-route)
(define-method (add-ipv4-route (table <ipv4-table>)(route <ipv4-route>))
  (let* ((newtable    (make <ipv4-table>))
         (addr    (net route))
         (pref    (ip (prefix addr)))
         (preflen (prefix-length addr)))
    (set! (trie-root newtable) (add-trie-node (trie-root table) (ash pref (- preflen 32)) preflen route))
    newtable))

(define-generic add-ipv4-route!)
(define-method (add-ipv4-route! (table <ipv4-table>)(route <ipv4-route>))
  (let*((addr    (net route))
        (pref    (ip (prefix addr)))
        (preflen (prefix-length addr))) 
    (set! (trie-root table) (add-trie-node (trie-root table) (ash pref (- preflen 32)) preflen route))))

(define-generic remove-ipv4-route)
(define-method (remove-ipv4-route (table <ipv4-table>)(route <ipv4-route>))
  (let ((newtable    (make <ipv4-table>)))
    (set! (trie-root newtable) (remove-trie-node (trie-root table) (net route) route))
    newtable))

(define-generic remove-ipv4-route!)
(define-method (remove-ipv4-route! (table <ipv4-table>)(route <ipv4-route>))
  (set! (trie-root table) (remove-trie-node (trie-root table) (net route) route)))

(define-generic find-ipv4-route)
(define-method (find-ipv4-route (table <ipv4-table>)(route <ipv4-address>))
  (let-values (((foundpref foundlen foundnode)
                                  (longest-prefix-match
                                     (trie-root table)
                                     (ip route)
                                     32)))
    (userdata foundnode)))


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a passionate guy who want to join in as a developer
  2012-08-13 10:18     ` nalaginrut
  2012-08-13 10:41       ` Tristan Colgate
@ 2012-08-13 15:47       ` rushan chen
  1 sibling, 0 replies; 10+ messages in thread
From: rushan chen @ 2012-08-13 15:47 UTC (permalink / raw)
  To: nalaginrut, Mark H Weaver; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3478 bytes --]

Hi nalaginrut,

Thanks for your reply.

I'd like to start with the vlist implementation Mark mentioned previously,
it would be cool if this can help speed up the compiler. But since I'm new
here, would anybody be kind enough to give me some clue on how to integrate
a c-version vlist into current system? or show me some documents so that I
can find the answer myself.

Thanks in advance.

Rushan Chen

On Mon, Aug 13, 2012 at 6:18 PM, nalaginrut <nalaginrut@gmail.com> wrote:

> On Sun, 2012-08-12 at 22:31 +0800, rushan chen wrote:
> > Hi Mark,
> >
> > Very appreciate for your reply.
> >
> > I see you mention that it's useful to implement a larger library of
> > efficient data structure, and I'm interested in that very much. I used to
> > work on projects which involve complicated but very interesting data
> > structures, implementing them could be challenging, but once done I feel
> a
> > great sense of achievement.
> >
>
> good
>
> > One such project is implementing a language model (LM) which is a core
> > component of speech recognition and machine translation. I don't know if
> > you heard of it before. Unfortunately, I can't cover it too detailed
> here,
> > that would complicate things too much.
> >
> > Basically, one of the key operations LM supports is it should return a
> > probability associated with any given id sequence. All id sequences are
> of
> > the same length, and there are a mass amount of such id sequences (a
> > commonly-seen LM may contain billions of them). So it's required to store
> > LM in a concise way, and at the same time make the search for each id
> > sequence very quickly.
> >
>
> OK, it's very good
>
>
> > Trie is finally chosen to be the data structure for LM (there were many
> > papers discussing this issue). All id sequences with the same prefix
> share
> > the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>,
> > only one copy of <1, 2, 3> will be stored in LM, and a search for a id
> > sequence is done by a sequence of binary search until the leaf is met.
> One
> > extra thing worth mentioning is that I store the whole trie structure in
> a
> > single large piece of memory (usually around 2 gigabytes), which makes
> > it convenient to write out to disk and load into memory by simply using
> > mmap, and I think it also makes the system faster than if you allocate
> > memory every time it's needed.
> >
>
> Seems we don't have any prefix-tree implementation yet?
> Maybe some hero too busy to share it? ;-)
> I'd like to see you make it, or I must write myself one.
> IIRC, many guys here wrote their own data-structure/algorithm
> implementations.
> Sharing makes our world better.
> But, sometimes we reinvent wheels just for fun.
> So just do what you want to do if it's interesting to you.
>
>
> > There are some other projects I worked or working on like Spell
> Corrector,
> > which also involve complicated data structures, but due to privacy
> policy,
> > I can't say much about it.
> >
>
> Actually, there's no privacy policy, that's why GNU and GPL exists.
> If something force you not to share, you may rewrite it all by
> yourself(or other guys), and GPL it. Then no more privacy policy. Your
> friends will see your creativity, and your work be enhanced by others.
>
> > All in all, I'm very interested in it, and I really really hope I can
> help.
> >
> > Looking forward to your reply. Thanks in advance.
> >
> > Have fun!
> >
>
> happy hacking!
>
> > Rushan Chen
>
>
>

[-- Attachment #2: Type: text/html, Size: 4365 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Optimizing vlists
  2012-08-10 11:53 ` Mark H Weaver
  2012-08-12 14:31   ` rushan chen
@ 2012-08-13 16:47   ` Ludovic Courtès
  2012-12-13  8:58   ` a passionate guy who want to join in as a developer Abhinav Jauhri
  2 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2012-08-13 16:47 UTC (permalink / raw)
  To: guile-devel

Hello,

Mark H Weaver <mhw@netris.org> skribis:

> Here's one specific data structure idea: we have a "vlist" data
> structure, currently written in Scheme, but in my opinion it would be
> useful to rewrite it in C for better efficiency.

FWIW, I’m not convinced that this is a fruitful approach.

First, because Guile 2.0 has allowed us to write more things in Scheme,
and I’d like to keep it this way.  ;-)

Second, because a very important optimizations would rather be to fiddle
with the GC so that vlists, which are base + offset pairs, would not
have to be heap-allocated, as vlist.scm says:

  ;; XXX: Allocating a <vlist> record in addition to the block at each
  ;; `vlist-cons' call is inefficient.  However, Bagwell's hack to avoid it
  ;; (Section 2.2) would require GC_ALL_INTERIOR_POINTERS, which would be a
  ;; performance hit for everyone.

I remember Andy did some measurements showing that ‘vhash-assoc’ was
often high in the profile when running the compiler.  One thing that
could be done is to run a C-level profiler to see exactly what’s going
on, and which optimizations could help.

WDYT?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a passionate guy who want to join in as a developer
  2012-08-10 11:53 ` Mark H Weaver
  2012-08-12 14:31   ` rushan chen
  2012-08-13 16:47   ` Optimizing vlists Ludovic Courtès
@ 2012-12-13  8:58   ` Abhinav Jauhri
  2012-12-13 15:31     ` Ludovic Courtès
  2 siblings, 1 reply; 10+ messages in thread
From: Abhinav Jauhri @ 2012-12-13  8:58 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 562 bytes --]

Hello everyone,

I was trying to find something interesting to start contributing, and
somehow landed on this thread. My interests are in data structures and
algorithms.

`vlist` implementation hasn't been done in C, right?
Is this still required by the community?

Any resources/examples material for newbies to start in C @guile? I have
started using guile.
Other tricks which devs have been using as part of their guile development
would be helpful too.

Thanks!
On Fri, Aug 10, 2012 at 7:53 AM, Mark H Weaver <mhw@netris.org> wrote:

> vlist

-- 
Abhinav

[-- Attachment #2: Type: text/html, Size: 1229 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a passionate guy who want to join in as a developer
  2012-12-13  8:58   ` a passionate guy who want to join in as a developer Abhinav Jauhri
@ 2012-12-13 15:31     ` Ludovic Courtès
  2012-12-13 23:32       ` Abhinav Jauhri
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2012-12-13 15:31 UTC (permalink / raw)
  To: Abhinav Jauhri; +Cc: Mark H. Weaver, guile-devel

Hi,

Abhinav Jauhri <aj910martin@gmail.com> skribis:

> `vlist` implementation hasn't been done in C, right?
> Is this still required by the community?

FWIW, I’m still skeptical about this approach:

  http://thread.gmane.org/gmane.lisp.guile.devel/14801

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a passionate guy who want to join in as a developer
  2012-12-13 15:31     ` Ludovic Courtès
@ 2012-12-13 23:32       ` Abhinav Jauhri
  0 siblings, 0 replies; 10+ messages in thread
From: Abhinav Jauhri @ 2012-12-13 23:32 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mark H. Weaver, guile-devel

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

On Thu, Dec 13, 2012 at 10:31 AM, Ludovic Courtès <ludo@gnu.org> wrote:

> Hi,
>
> Abhinav Jauhri <aj910martin@gmail.com> skribis:
>
> > `vlist` implementation hasn't been done in C, right?
> > Is this still required by the community?
>
> FWIW, I’m still skeptical about this approach:
>
>   http://thread.gmane.org/gmane.lisp.guile.devel/14801

Can you point  me in the right direction such that I can perform some
profiling in this regard? On #guile someone suggested about valgrind.


>
>
> Thanks,
> Ludo’.
>



-- 
Abhinav

[-- Attachment #2: Type: text/html, Size: 1187 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-12-13 23:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-06 14:50 a passionate guy who want to join in as a developer rushan chen
2012-08-10 11:53 ` Mark H Weaver
2012-08-12 14:31   ` rushan chen
2012-08-13 10:18     ` nalaginrut
2012-08-13 10:41       ` Tristan Colgate
2012-08-13 15:47       ` rushan chen
2012-08-13 16:47   ` Optimizing vlists Ludovic Courtès
2012-12-13  8:58   ` a passionate guy who want to join in as a developer Abhinav Jauhri
2012-12-13 15:31     ` Ludovic Courtès
2012-12-13 23:32       ` Abhinav Jauhri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).