unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Tristan Colgate <tcolgate@gmail.com>
To: nalaginrut <nalaginrut@gmail.com>
Cc: guile-devel <guile-devel@gnu.org>
Subject: Re: a passionate guy who want to join in as a developer
Date: Mon, 13 Aug 2012 11:41:16 +0100	[thread overview]
Message-ID: <CAPGZSGJwnrHjPPBTAdna+rx5zsHHCg3bR3md6yyQqwL7JFaSwA@mail.gmail.com> (raw)
In-Reply-To: <1344853122.30422.23.camel@Renee-SUSE.suse>

[-- Attachment #1: Type: text/plain, Size: 3625 bytes --]

[Meant to send this to the list]

There's an implementation of a trie in guile-snmp, but it's one of the
first things I wrote in scheme and is pretty hideous. I keep meaning
to redo it using records instead of GOOPs.

I'd ask for code review, but, to be honest, I didn't really have a
clue what I was doing when I wrote it (things have only improved
slightly since then).

I've attached it, the trie code is about a third of the way in. It
appears I never bothered writing the delete-node code.


On 13 August 2012 11:18, nalaginrut <nalaginrut@gmail.com> wrote:
> On Sun, 2012-08-12 at 22:31 +0800, rushan chen wrote:
>> Hi Mark,
>>
>> Very appreciate for your reply.
>>
>> I see you mention that it's useful to implement a larger library of
>> efficient data structure, and I'm interested in that very much. I used to
>> work on projects which involve complicated but very interesting data
>> structures, implementing them could be challenging, but once done I feel a
>> great sense of achievement.
>>
>
> good
>
>> One such project is implementing a language model (LM) which is a core
>> component of speech recognition and machine translation. I don't know if
>> you heard of it before. Unfortunately, I can't cover it too detailed here,
>> that would complicate things too much.
>>
>> Basically, one of the key operations LM supports is it should return a
>> probability associated with any given id sequence. All id sequences are of
>> the same length, and there are a mass amount of such id sequences (a
>> commonly-seen LM may contain billions of them). So it's required to store
>> LM in a concise way, and at the same time make the search for each id
>> sequence very quickly.
>>
>
> OK, it's very good
>
>
>> Trie is finally chosen to be the data structure for LM (there were many
>> papers discussing this issue). All id sequences with the same prefix share
>> the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>,
>> only one copy of <1, 2, 3> will be stored in LM, and a search for a id
>> sequence is done by a sequence of binary search until the leaf is met. One
>> extra thing worth mentioning is that I store the whole trie structure in a
>> single large piece of memory (usually around 2 gigabytes), which makes
>> it convenient to write out to disk and load into memory by simply using
>> mmap, and I think it also makes the system faster than if you allocate
>> memory every time it's needed.
>>
>
> Seems we don't have any prefix-tree implementation yet?
> Maybe some hero too busy to share it? ;-)
> I'd like to see you make it, or I must write myself one.
> IIRC, many guys here wrote their own data-structure/algorithm
> implementations.
> Sharing makes our world better.
> But, sometimes we reinvent wheels just for fun.
> So just do what you want to do if it's interesting to you.
>
>
>> There are some other projects I worked or working on like Spell Corrector,
>> which also involve complicated data structures, but due to privacy policy,
>> I can't say much about it.
>>
>
> Actually, there's no privacy policy, that's why GNU and GPL exists.
> If something force you not to share, you may rewrite it all by
> yourself(or other guys), and GPL it. Then no more privacy policy. Your
> friends will see your creativity, and your work be enhanced by others.
>
>> All in all, I'm very interested in it, and I really really hope I can help.
>>
>> Looking forward to your reply. Thanks in advance.
>>
>> Have fun!
>>
>
> happy hacking!
>
>> Rushan Chen
>
>
>



-- 
Tristan Colgate-McFarlane
----
  "You can get all your daily vitamins from 52 pints of guiness, and a
glass of milk"

[-- Attachment #2: ipv4-router.scm --]
[-- Type: application/octet-stream, Size: 19026 bytes --]

(define-module (ipv4-router)
  #:use-module (oop goops)
  #:use-module (srfi srfi-11)
  #:export 
    (<ipv4-address> ip 
     <ipv4-network> network prefix prefix-length mask
     prefixlen->intmask
     ip-in-network?
     <ipv4-route> route net gw misc
     <ipv4-table>
     add-ipv4-route
     add-ipv4-route!
     remove-ipv4-route
     remove-ipv4-route!
     find-ipv4-route
     trie-root
     <trie-node>
     value value-length
     add-trie-node
     trie-node->dot
     rootnode
     nullnode
     object-equal?
     ))

; A class to represent an ip address
(define (ipv4str->ipv4int ipstr)
  (if (string? ipstr)
    (car (vector-ref (gethostbyname ipstr) 4 ))
    (error "ipstr expects a string")))

(define-class <ipv4-address> ()
  (int-ip #:accessor int-ip 
          #:init-value 0
          #:init-keyword #:int-ip)
  (ip #:init-keyword #:ip
      #:allocation #:virtual
      #:slot-ref (lambda(this)(int-ip this))
      #:slot-set! (lambda(this val)(set! (int-ip this) val))
      #:accessor ip ))

(define-method ((setter int-ip) (this <ipv4-address>) (ip-string <string>))
  (slot-set! this 'int-ip (ipv4str->ipv4int ip-string)))

(define-method ((setter int-ip) (this <ipv4-address>) (ip-integer <integer>))
  (slot-set! this 'int-ip  ip-integer))

(define-method ((setter int-ip) (this <ipv4-address>) default)
  (format (current-error-port) "Cannot convert to IP: ~a~%" default)
  (throw 'error))

(define-method (object-equal? (a <ipv4-address>) (b <ipv4-address>))
  (equal? (int-ip a) (int-ip b)))

(define-method (display (this <ipv4-address>) port)
  (display (inet-ntoa (int-ip this))))

(define-method (write (this <ipv4-address>) port)
  (format port "#:ip(~s)" (inet-ntoa (int-ip this))))

; Class for representing a network
(define-class <ipv4-network> ()
  (prefix #:accessor prefix 
          #:init-value (make <ipv4-address>)
          #:init-keyword #:prefix)
  (prefix-length  #:accessor prefix-length 
               #:init-value 0
               #:init-keyword #:prefix-len)
  (network #:allocation #:virtual 
           #:accessor network
           #:slot-ref (lambda(this)(prefix this))
           #:slot-set! (lambda(this val)(set! (network this) val))
           #:init-keyword #:network)
  (mask #:allocation #:virtual 
        #:accessor mask
        #:slot-ref (lambda(this)(inet-ntoa (prefixlen->intmask (prefix-length this))))
        #:slot-set! (lambda(this val)(set! (mask this) val))
        #:init-keyword #:mask))

; setters
(define-method ((setter prefix) (this <ipv4-network>) (value <ipv4-address>))
  (slot-set! this 'prefix value))
(define-method ((setter prefix) (this <ipv4-network>) value)
  (slot-set! this 'prefix (make <ipv4-address> #:ip value)))

(define-method ((setter prefix-len) (this <ipv4-network>) (value <integer>))
  (slot-set! this 'prefix-length value))

(define-method ((setter mask) (this <ipv4-network>) (value <string>))
  (set! (mask this) (inet-aton value)))

(define-method ((setter mask) (this <ipv4-network>) (value <integer>))
  (let* ((intlen  (integer-length value))
         (len (- 32  (integer-length (logxor value (- (integer-expt 2 32) 1))))))
    (if (and (or (eq? intlen 0) (eq? intlen 32))
             (eq? len (logcount value)))
     (set! (prefix-length this) len)
     (error "invalid (or discontiguous) netmask"))))

(define (prefixlen->intmask preflen)
  (ash (- (integer-expt 2 preflen) 1) (- 32 preflen )))
 
(define-method ((setter network) (this <ipv4-network>) (networkspec <string>))
  (if (string-index networkspec #\/)
    (let ((spec (string-split networkspec #\/)))
      (set! (network this) (list (make <ipv4-address> #:ip (car spec)) (string->number (cadr spec)))))
    (if (string-index networkspec #\ )
      (let ((spec (string-split networkspec #\ )))
        (set! (network this) (list (make <ipv4-address> #:ip (car spec)) (cadr spec))))
      (error "Bad network specification"))))

(define-method ((setter network) (this <ipv4-network>) (networkspec <pair>))
  (let ((net (car networkspec))
        (maskspec (cadr networkspec)))
    (set! (prefix this) net)
    (cond 
       ((equal? (class-of maskspec) <string>)
        (set! (mask this) maskspec))
       ((equal? (class-of maskspec) <integer>)
        (set! (prefix-length this) maskspec))
       (else (throw 'error)))))

(define-method (object-equal? (a <ipv4-network>) (b <ipv4-network>))
  (and
    (object-equal? (prefix a) (prefix b))
    (equal? (prefix-length a) (prefix-length b))))

(define-method (display (this <ipv4-network>) port)
  (format port "~a/~a" (prefix this) (prefix-length this)))
(define-method (write (this <ipv4-network>) port)
  (format port "#:net(~s/~s)" (prefix this) (prefix-length this)))

(define (ip-in-network? testip testnet)
  (eqv? (logand (ip testip) (prefixlen->intmask (prefix-length testnet)))
          (ip (prefix testnet))))

; Class for representing a route
(define-class <ipv4-route> ()
  (_net #:init-value (make <ipv4-network> ))
  (_gw  #:init-value (make <ipv4-address> ))
  (misc #:init-value '()
        #:accessor misc
        #:init-keyword #:misc)
  (net  #:allocation #:virtual
        #:accessor net 
        #:slot-ref (lambda(this)(slot-ref this '_net))
        #:slot-set! (lambda(this val)(set! (net this) val))
        #:init-keyword #:net)
  (gw   #:allocation #:virtual
        #:accessor gw 
        #:slot-ref (lambda(this)(slot-ref this '_gw))
        #:slot-set! (lambda(this val)(set! (gw this) val))
        #:init-keyword #:gw))

; setters
(define-method ((setter net) (this <ipv4-route>) (val <ipv4-network>))
  (slot-set! this '_net val))
(define-method ((setter net) (this <ipv4-route>) val)
  (slot-set! this '_net (make <ipv4-network> #:network val)))

(define-method ((setter gw) (this <ipv4-route>) (val <ipv4-address>))
  (slot-set! this '_gw val))
(define-method ((setter gw) (this <ipv4-route>) val)
  (slot-set! this '_gw (make <ipv4-address> #:ip val)))

(define-method (object-equal? (a <ipv4-route>) (b <ipv4-route>))
  (and
    (object-equal? (net a) (net b))
    (object-equal? (gw a) (gw b))))

(define-method (display (this <ipv4-route>) port)
  (format port "~a via ~a" (net this) (gw this)))
(define-method (write (this <ipv4-route>) port)
  (format port "#:route(~s via ~s)" (net this) (gw this)))

; Class for implementing a compressed radix tree
;
(define-generic value-binstr)
(define nullnode)
(define-class <trie-node> ()
  (value         #:accessor value         
                 #:init-value  0 
                 #:init-keyword #:val)
  (value-binstr  #:allocation #:virtual   
                 #:slot-ref value-binstr 
                 #:slot-set! value-binstr
                 #:init-keyword #:valbin)
  (value-length  #:accessor value-length  
                 #:init-value 0 
                 #:init-keyword #:len)
  (left          #:accessor left          
                 #:init-value nullnode 
                 #:init-keyword #:l)
  (right         #:accessor right         
                 #:init-value nullnode
                 #:init-keyword #:r)
  (userdata      #:accessor userdata      
                 #:init-value #f 
                 #:init-keyword #:udata))
(set! nullnode (make <trie-node>))
(define rootnode (make <trie-node> #:l nullnode #:r nullnode))

; setters
(define-method ((setter value) (this <trie-node>) (val <integer>))
  (slot-set! this 'value val))
(define-method ((setter value-length) (this <trie-node>) (len <integer>))
  (slot-set! this 'value-length len))
(define-method ((setter left) (this <trie-node>) (l <trie-node>))
  (slot-set! this 'left l))
(define-method ((setter right) (this <trie-node>) (r <trie-node>))
  (slot-set! this 'right r))
(define-method ((setter userdata) (this <trie-node>) ud)
  (slot-set! this 'userdata ud))

; virtual getters
(define-method (value-binstr (this <trie-node>))
  (with-output-to-string
    (lambda()
      (format #t 
        (string-append
          (string-append "~" 
            (string-append (number->string (value-length this))))
          ",'0b")
        (value this)))))

(define (shared-prefix v1 l1 v2 l2)
  "Takes two binary string (number and length) and returns, using multiple values
   the common prefix and its length"
  (cond 
     ((= l1 l2)
      (let again ((rotv1  v1)
                  (rotv2  v2)
                  (l      l1))
        (if (equal? rotv1 rotv2)
            (values rotv1 l)
            (again (ash rotv1 -1) (ash rotv2 -1) (- l 1)))))
     ((> l1 l2)
      (shared-prefix (ash v1 (- l2 l1)) l2 v2 l2))
     ((< l1 l2)
      (shared-prefix v1 l1 (ash v2 (- l1 l2)) l1))))
  
(define-generic add-trie-node)
(define-method (add-trie-node (root <trie-node>) 
                              (val <integer>) 
                              (len <integer>) 
                              udata)
  (if (eq? root nullnode)
    ; We are being asked to add a new leaf
    (make <trie-node> #:val val #:len len #:l nullnode #:r nullnode #:udata udata)
    ; Trying to add to an existing node
      (let-values (((common-pref common-len) (shared-prefix (value root)(value-length root)
                                              val len)))
        (if (equal? common-len (value-length root) len)
          ; We are an exact match for this prefix
          (if (equal? udata (userdata root))
             ; We are an exact match for this node, just return
             root
             ; Userdata is different
             (make <trie-node> #:val (value root) 
                               #:len (value-length root) 
                               #:l (left root) 
                               #:r (right root)
                               #:udata udata))
          (if (equal? common-len (value-length root))
            ; We are alering a sub tree
            (let* ((nextlen (- len common-len))
                   (nextval (logand val (- (integer-expt 2 nextlen) 1))))
              (if (not (logbit? (- nextlen 1) val))
                ;add to left for a zero
                (let ((nextnode (add-trie-node (left root) nextval nextlen udata)))
                  (if (eq? nextnode (left root))
                      root
                      (make <trie-node> #:val (value root) 
                                        #:len (value-length root) 
                                        #:l nextnode 
                                        #:r (right root)
                                        #:udata (userdata root))))
                ;add to right for a one 
                (let ((nextnode (add-trie-node (right root) nextval nextlen udata)))
                  (if (eq? nextnode (right root))
                      root
                      (make <trie-node> #:val (value root) 
                                        #:len (value-length root) 
                                        #:l (left root) 
                                        #:r nextnode
                                        #:udata (userdata root))))))
            ; We need to split our current node
            (let* ((common-remains (make <trie-node> 
                                      #:val
                                      (logand (value root) 
                                              (- (integer-expt 2 
                                                               (- (value-length root)
                                                                  common-len)) 
                                                 1))
                                      #:len
                                      (- (value-length root) common-len)
                                      #:l (left root)
                                      #:r (right root)
                                      #:udata (userdata root)))
                   (common-newdata 
                                   (if (equal? len common-len)
                                     nullnode
                                     (make <trie-node>
                                        #:val
                                        (logand val
                                                (- (integer-expt 2 
                                                                 (- len 
                                                                    common-len)) 
                                                   1))
                                        #:len (- len common-len)
                                        #:l nullnode
                                        #:r nullnode
                                        #:udata udata)))
                   (nextudata      (if (equal? len common-len)
                                      udata
                                      #f))
                   (nextleft       (if (not (logbit? (- (value-length common-remains) 1)
                                                     (value common-remains)))
                                      common-remains
                                      common-newdata))
                   (nextright      (if (eq? nextleft common-remains)
                                      common-newdata
                                      common-remains)))
              (make <trie-node> #:val common-pref #:len common-len #:l nextleft #:r nextright #:udata nextudata)))))))

(define-generic delete-trie-node)
(define-method (delete-trie-node (rootnode <trie-node>) (prefix <integer>)))

(define-generic longest-prefix-match)
(define-method (longest-prefix-match (root <trie-node>) (val <integer>) (len <integer>))
  (let  again ((currnode root)
               (currval val)
               (currlen len)
               (prevmatch 0)
               (prevmatchlen 0)
               (prevnode nullnode)
               (bestmatch 0) 
               (bestmatchlen 0) 
               (bestmatchnode nullnode)) 
   (if (eq? currnode nullnode)
     (values bestmatch bestmatchlen bestmatchnode)
     (let-values (((common-pref common-len) (shared-prefix (value currnode)(value-length currnode)
                                             currval currlen)))
       (cond 
         ; We have gone as far as we can, prevmatch holds the longest full match
         ((< common-len (value-length currnode))
          (values bestmatch bestmatchlen bestmatchnode))
         ; This node is an exact match))) 
         ((equal? common-len currlen) 
          (values
             (logand (ash prevmatch currlen) currval)
             (+ currlen prevmatchlen)
             currnode))
         (else 
           (letrec ((nextval (logand currval (- (integer-expt 2 (- currlen (value-length currnode))) 1))) 
                    (nextlen (- currlen (value-length currnode))))
             (again  
                 (if (logbit? (- nextlen 1) nextval)
                   (right currnode)
                   (left currnode))
                 nextval
                 nextlen
                 (logior (ash prevmatch (value-length currnode)) (value currnode))
                 (+ (value-length currnode) prevmatchlen)
                 currnode
                 (if (equal? (userdata currnode) #f)
                    bestmatch
                    (logior (ash prevmatch (value-length currnode)) (value currnode)))
                 (if (equal? (userdata currnode) #f)
                    bestmatchlen
                    (+ (value-length currnode) prevmatchlen))
                 (if (equal? (userdata currnode) #f)
                    bestmatchnode
                    currnode)))))))))
  

; Display methods
(define-method (display (this <trie-node>) port)
  (if (eq? this nullnode)
    (format port "null")
    (format port "( ~b ~a ~a ~a ~a)"
                 (value this)
                 (value-length this)
                 (left this)
                 (right this)
                 (userdata this))))
(define-method (write (this <trie-node>) port)
  (if (eq? this nullnode)
    (format port "null")
    (format port "#:trie-node(val: ~b len: ~a l: ~a r: ~a ud: ~a)" 
                 (value this)
                 (value-length this)
                 (left this)
                 (right this)
                 (userdata this))))

(define (walk-trie root func)
  (if (not (eq? nullnode root))
    (begin
      (func root)
      (walk-trie (left root) func)
      (walk-trie (right root) func))))

(define (trie-node->dot port start)
  (format port "digraph G {~%")
  (walk-trie start (lambda(node)
                     (if (not (eq? node nullnode))
                       (if (not (eq? (value-length node) 0))
                         (format port "~a [label=~s];~%" (object-address node)(with-output-to-string (lambda()(format #t "~a via ~a" (value-binstr node) (userdata node)))))
                         (format port "~a [label=\"root\"];~%" (object-address node))))))
  (walk-trie start (lambda(node)
                     (if (not (eq? node nullnode))
                       (begin
                         (if (not (eq? (left node) nullnode))
                           (format port "~a->~a;~%" (object-address node)
                                                   (object-address (left node))))
                         (if (not (eq? (right node) nullnode))
                           (format port "~a->~a;~%" (object-address node)
                                                   (object-address (right node))))))))
  (format port "}~%"))

; Wrapper around compressed radix code to support ipv4 routing table.
(define-class <ipv4-table> ()
  (trie-root #:init-value rootnode 
             #:accessor trie-root))
`
(define-generic add-ipv4-route)
(define-method (add-ipv4-route (table <ipv4-table>)(route <ipv4-route>))
  (let* ((newtable    (make <ipv4-table>))
         (addr    (net route))
         (pref    (ip (prefix addr)))
         (preflen (prefix-length addr)))
    (set! (trie-root newtable) (add-trie-node (trie-root table) (ash pref (- preflen 32)) preflen route))
    newtable))

(define-generic add-ipv4-route!)
(define-method (add-ipv4-route! (table <ipv4-table>)(route <ipv4-route>))
  (let*((addr    (net route))
        (pref    (ip (prefix addr)))
        (preflen (prefix-length addr))) 
    (set! (trie-root table) (add-trie-node (trie-root table) (ash pref (- preflen 32)) preflen route))))

(define-generic remove-ipv4-route)
(define-method (remove-ipv4-route (table <ipv4-table>)(route <ipv4-route>))
  (let ((newtable    (make <ipv4-table>)))
    (set! (trie-root newtable) (remove-trie-node (trie-root table) (net route) route))
    newtable))

(define-generic remove-ipv4-route!)
(define-method (remove-ipv4-route! (table <ipv4-table>)(route <ipv4-route>))
  (set! (trie-root table) (remove-trie-node (trie-root table) (net route) route)))

(define-generic find-ipv4-route)
(define-method (find-ipv4-route (table <ipv4-table>)(route <ipv4-address>))
  (let-values (((foundpref foundlen foundnode)
                                  (longest-prefix-match
                                     (trie-root table)
                                     (ip route)
                                     32)))
    (userdata foundnode)))


  reply	other threads:[~2012-08-13 10:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-06 14:50 a passionate guy who want to join in as a developer rushan chen
2012-08-10 11:53 ` Mark H Weaver
2012-08-12 14:31   ` rushan chen
2012-08-13 10:18     ` nalaginrut
2012-08-13 10:41       ` Tristan Colgate [this message]
2012-08-13 15:47       ` rushan chen
2012-08-13 16:47   ` Optimizing vlists Ludovic Courtès
2012-12-13  8:58   ` a passionate guy who want to join in as a developer Abhinav Jauhri
2012-12-13 15:31     ` Ludovic Courtès
2012-12-13 23:32       ` Abhinav Jauhri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPGZSGJwnrHjPPBTAdna+rx5zsHHCg3bR3md6yyQqwL7JFaSwA@mail.gmail.com \
    --to=tcolgate@gmail.com \
    --cc=guile-devel@gnu.org \
    --cc=nalaginrut@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).