unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Introducing Jitgen
@ 2010-11-28 19:52 Noah Lavine
  2010-12-10  9:09 ` Ludovic Courtès
  0 siblings, 1 reply; 3+ messages in thread
From: Noah Lavine @ 2010-11-28 19:52 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3909 bytes --]

Hello all,

Attached is the first version of Jitgen, a system for writing virtual
machines and matching JIT compilers without so much duplicated code.
It's pretty rough, but I wanted to see what you thought of this
version and ask for comments.

The problem it's trying to solve is that if I made a JIT compiler the
normal way, I would have to basically write the virtual machine over
again. This is because each VM opcode would have a compiler method
which would implement exactly the same algorithm as the
already-existing C implementation, but with slightly different
function calls. For example, here's the C implementation of
scm_op_assert_nargs_ee_locals (with a few macros expanded):

{
  scm_t_ptrdiff n;
  SCM *old_sp;

  /* nargs = n & 0x7, nlocs = nargs + (n >> 3) */
  n = (*ip++);

  if (sp - (fp - 1) != (n & 0x7))
    goto vm_error_wrong_num_args;

  old_sp = sp;
  sp += (n >> 3);
  while (old_sp < sp)
    *++old_sp = SCM_UNDEFINED;

  NEXT;
}

And here's the JIT implementation:

{
  jit_value_t tmp1, tmp2, tmp3, val_scm_undefined;
  jit_label_t good_path = jit_label_undefined;
  int n_val, count;

  objcodep++; ip = jit_insn_add (function, ip, ipup);
  n_val = *objcodep;
  tmp1 = jit_insn_sub (function, fp, spup);
  tmp2 = jit_insn_sub (function, sp, tmp1);
  tmp3 = jit_insn_eq (function, tmp2,
                      jit_value_create_nint_constant (function,
                                                      jit_type_void_ptr,
                                                      n_val&0x7));
  jit_insn_branch_if (function, tmp3, &good_path);
  /** goto vm_error_wrong_num_args **/
  /* restore the ip, sp and fp pointers, and return. */
  jit_insn_store_relative (function, ipp, 0, ip);
  jit_insn_store_relative (function, spp, 0, sp);
  jit_insn_store_relative (function, fpp, 0, fp);
  jit_insn_return
    (function,
     jit_value_create_nint_constant (function, jit_type_int,
                                     jit_return_wrong_num_args));
  jit_insn_label (function, &good_path);

  val_scm_undefined =
    jit_value_create_nint_constant (function, jit_type_void_ptr,
                                    (jit_nint)SCM_UNDEFINED);
  for (count = 0;
       count < (n_val >> 3);
       count++)
    {
      sp = jit_insn_add (function, sp, spup);
      jit_insn_store_relative (function, sp, 0, val_scm_undefined);
    }
  break;
}

The goal of Jitgen is to generate both of these from the following instructions:

'((ip <- ip + ipup)
  (tmp1 <- fp - spup)
  (tmp2 <- sp - tmp1)
  (tmp3 <- tmp2 == (n_val & 0x7))
  (branch-if tmp3 good_path)
  (store-relative ipp 0 ip)
  (store-relative spp 0 sp)
  (store-relative fpp 0 fp)
  (return jit_return_wrong_num_args)
  (label good_path)
  (sp <- sp + spup)
  (store-relative sp 0 SCM_UNDEFINED)
  )

(Actually these instructions will only generate part of that, but I
did say it was rough. :-) )

The attached file jitgen.scm has two functions, instructions->c and
instructions->libjit, which implement this. The rest of that file is
supporting functions for those two.

The file test.scm has a few tests for these, but they're pretty primitive.

There's also a file called expand.scm, which is my idea for eventually
using these. It's a Guile script that takes a file and copies it to an
output file, except that in between occurrences of `%%` it expects to
find Scheme code, and it evaluates the code and pastes the output into
its output file. I'm thinking that we could use this to generate
vm-i-system.c and vm-jit.c from templates, using a common set of
definitions for the VM operations.

(As a side note, something similar is already done for foreign.c and
gsubr.c, but with Elisp to generate the code.)

My biggest question is, is this something that you would be interested
in having in Guile?

If so, then also what do you think of the code style and interfaces,
and do you have any other feedback?

Thanks,
Noah

[-- Attachment #2: jitgen.scm --]
[-- Type: application/octet-stream, Size: 10883 bytes --]

#!/usr/bin/guile
!#

(define-module (jitgen)
  #:export (instructions->c instructions->libjit))

(use-modules (ice-9 match)
             (srfi srfi-1)
             (srfi srfi-2))

; this next bit is supposed to be in (rnrs base), but it's not.
(define (assert cond)
  (if (not cond)
      (error "Assertion failed.")))

; jitgen is a system that can take a description of an algorithm as a
; series of instructions for a register machine and do any of the
; following things with it:
;   * generate C code that will have the same effect, given C
;     arguments
;   * generate C code that calls libjit functions to JIT some machine
;     code that will have the same effect, given libjit values as arguments

; instructions->c: turn a list of instructions into a string of C code
(define* (instructions->c insns c-env #:key (line-prefix "") (indent "\t"))
  (assert (pair? insns))

  ; there are two steps to generating C code from instructions.
  ; 1: scan the instructions to figure out what C variable
  ; declarations we'll need and what their types should be. this is
  ; new-env.
  ; 2: generate the code, from both new-env and the given
  ; instructions.
  (let ((new-env (get-new-names-and-types insns c-env)))
    (string-append
     "{\n"
     (env->c-declarations new-env line-prefix)
     "\n"
     (string-concatenate
      (map (lambda (insn) (instruction->c insn line-prefix indent)) insns))
     "}\n")))

; a C environment is an association list of names (which are symbols)
; and types.

; a type is either a symbol or a pair ('pointer . <type>).
; a symbol represents any type that could be in this C declaration:
;    <type> variable;
; this includes native C types and typedef'd types.

; functions for taking a set of instructions and a given set of C
; variables and figuring out what new C variables we will need to
; declare to generate code for the instructions and what their types
; should be
(define (get-new-names-and-types insns c-env)
  (fold
   (lambda (insn new-env)
     (cond ((new-name-and-type insn c-env new-env)
            => (lambda (new-entry) (cons new-entry new-env)))
           (else new-env)))
   '()
   insns))

(define (new-name-and-type insn env-a env-b)
  (and-let* ((name-and-type (target-name-and-type insn env-a env-b))
             ((not (lookup-in-either (car name-and-type) env-a env-b))))
    name-and-type))

(define (lookup-in-either name env-a env-b)
  (or (assq name env-a) (assq name env-b)))

(define (target-name-and-type insn env-a env-b)
  (match insn
     ((target '<- . rest) (cond ((expression-type rest env-a env-b)
                                 => (lambda (t) (cons target t)))
                                (else #f)))
     (_ #f)))

; return the type of the expression, where env-a and env-b contain
; active variables, or #f if the expression has no type.
(define (expression-type expr env-a env-b)
  (match expr
     ((a (or '+ '- '* '/ '&) b) (arithmetic-type a b env-a env-b))
     ((a '== b) 'int) ; right?
     (_ #f)))

; a list of basic C types. Types earlier in the list degenerate into
; types later in the list.
(define numerical-types-tower
  '(int float double))

(define (type-from-either sym env-a env-b)
  (and-let* ((entry (lookup-in-either sym env-a env-b)))
    (cdr entry)))

(define (arithmetic-type a b env-a env-b)
  (and-let* ((a-type (if (symbol? a)
                         (type-from-either a env-a env-b)
                         (expression-type a env-a env-b)))
             (b-type (if (symbol? b)
                         (type-from-either b env-a env-b)
                         (expression-type b env-a env-b))))
    (let iter ((typ numerical-types-tower))
      (cond ((null? typ) #f)
            ((eq? (car typ) a-type) b-type)
            ((eq? (car typ) b-type) a-type)
            (else (iter (cdr typ)))))))

; functions for generating C code from some instructions and the new
; environment they require.

(define (env->c-declarations env line-prefix)
  (string-concatenate
   (map (lambda (entry) (env-entry->c-declaration entry line-prefix))
        env)))

(define (env-entry->c-declaration entry line-prefix)
  (assert (pair? entry)) ; the entry should be the pair (name . type)
  
  (string-append line-prefix
                 (->string (cdr entry)) ; this won't work for pointers.
                 " "
                 (symbol->string (car entry))
                 ";\n"))
      
(define (instruction->c insn line-prefix indent)
  (match insn
    ((target '<- . expr)
     (simple-format #f "~A~A = ~A;\n"
                    line-prefix target (expression->c expr line-prefix indent)))
    (('return val)
     (simple-format #f "~Areturn ~A;\n"
                    line-prefix (->string val)))
    (('branch-if val label)
     (simple-format #f "~Aif (~A) goto ~A;\n"
                    line-prefix (->string val) (->string label)))
    (('store-relative ptr off val)
     (simple-format #f "~A~A[~A] = ~A;\n"
                    line-prefix (->string ptr) (->string off) (->string val)))
    (('label lbl)
     (simple-format #f "~A~A:\n" line-prefix (->string lbl)))
    (_ (error "Unrecognized instruction" insn "in instruction->c"))))

; expressions are treated differently because they can be nested
(define (expression->c expr line-prefix indent)
  (match expr
    ((a '+ b)
     (simple-format #f "(~A + ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    ((a '- b)
     (simple-format #f "(~A - ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    ((a '* b)
     (simple-format #f "(~A * ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    ((a '/ b)
     (simple-format #f "(~A / ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    ((a '& b)
     (simple-format #f "(~A & ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    ((a '== b)
     (simple-format #f "(~A == ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    (a ; ideally this should restrict to symbols, numbers and
       ; strings. oh, well...
     (simple-format #f "~S" a))
    (_ (error "Unrecognized expression" expr "in expression->c"))))

(define (->string obj)
  (simple-format #f "~A" obj))

; instructions->libjit: turn a list of instructions into a string of libjit code.
(define* (instructions->libjit insns libjit-env #:key (line-prefix "") (indent "\t"))
  ; insns should be a list of jitgen instructions.
  ; libjit-env should be a list where each entry is (name . kind),
  ; where `name' is a symbol and `kind' is either 'value or 'label
  ; depending on what sort of thing has been declared.
  ; line-prefix and indent are strings, and are used to format the
  ; resulting code.
  
  (let ((new-names (get-new-names-and-kinds insns libjit-env)))
    (string-append
     "{\n"
     (new-names->libjit-declarations new-names line-prefix indent)
     "\n"
     (string-concatenate
      (map (lambda (insn) (instruction->libjit insn line-prefix indent))
           insns))
     "}")))

; get new names from a list of jitgen instructions and some
; already-defined names.
(define (get-new-names-and-kinds insns libjit-env)
  ; todo: this doesn't take into account the possibility that a value
  ; and a label will have the same name.
  (fold
   (lambda (insn new-names)
     (match insn
       ((target '<- . rest)
        (if (or (assq target new-names)
                (assq target libjit-env))
            new-names
            (acons target 'value new-names)))
       (('label lbl)
        (if (assq lbl libjit-env)
            new-names
            (acons lbl 'label new-names)))
       (_ new-names)))
   '()
   insns))

; functions to generate C code given a list of jitgen instructions and
; a list of new declarations to make.

(define (new-names->libjit-declarations new-names line-prefix indent)
  (string-concatenate
   (map (lambda (entry)
          (match entry
            ((name . 'value)
             (string-append line-prefix "jit_value_t " (->string name) ";\n"))
            ((name . 'label)
             (string-append line-prefix "jit_label_t " (->string name) ";\n"))
            (_ (error "Unrecognized entry in new libjit environment" entry))))
        new-names)))

(define (instruction->libjit insn line-prefix indent)
  (match insn
    ((target '<- . expr)
     (simple-format #f "~A~A = ~A;\n"
                    line-prefix target (expression->libjit expr line-prefix indent)))
    (('return val)
     (simple-format #f "~Ajit_insn_return (function, ~A);\n"
                    line-prefix (->string val)))
    (('branch-if val label)
     (simple-format #f "~Ajit_insn_branch_if (function, ~A, ~A)\n"
                    line-prefix (->string val) (->string label)))
    (('store-relative ptr off val)
     (simple-format #f "~Ajit_insn_store_relative (function, ~A, ~A, ~A);\n"
                    line-prefix (->string ptr) (->string off) (->string val)))
    (('label lbl)
     (simple-format #f "~Ajit_insn_label (function, &~A)\n"
                    line-prefix (->string lbl)))
    (_ (error "Unrecognized instruction" insn "in instruction->c"))))

; expressions are treated differently because they can be nested
(define (expression->libjit expr line-prefix indent)
  (match expr
    ((a '+ b)
     (simple-format #f "jit_insn_add (function, ~A, ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    ((a '- b)
     (simple-format #f "jit_insn_sub (function, ~A, ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    ((a '* b)
     (simple-format #f "jit_insn_mul (function, ~A, ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    ((a '/ b)
     (simple-format #f "jit_insn_div (function, ~A, ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    ((a '& b)
     (simple-format #f "jit_insn_and (function, ~A, ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    ((a '== b)
     (simple-format #f "jit_insn_eq (function, ~A, ~A)"
                    (expression->c a line-prefix indent)
                    (expression->c b line-prefix indent)))
    (a ; ideally this should restrict to symbols, numbers and
       ; strings. oh, well...
     (simple-format #f "~S" a))
    (_ (error "Unrecognized expression" expr "in expression->c"))))

[-- Attachment #3: expand.scm --]
[-- Type: application/octet-stream, Size: 2082 bytes --]

#!/usr/bin/guile -s
!#

(use-modules (rnrs control))

; this program takes a file, given as its first command-line argument,
; and expands the Scheme expressions which are preceded by %%.

; here is the procedure: copy the input file to the output file until
; we reach a `%%`. We expect that in between pairs of `%%` will be
; Scheme code. We will evaluate that Scheme code with
; 'current-output-port' set to the outfile, and additionally display
; whatever each expression of that code returns to the output file if
; it is not *unspecified*.

(define args (command-line))

(if (or (not (pair? args))
        (not (pair? (cdr args)))
        (not (string? (cadr args))))
  (error "Malformed argument list.\n"))

(define infile (open-file (cadr args) "r"))
(define outfile (open-file (substring (cadr args) ; take off the '.t'
                                      0
                                      (- (string-length (cadr args)) 2))
                           "w"))

(define (repl)
  (let ((expr (read infile))
        (current-output (current-output-port)))
    (when (not (eq? expr '%%))
          (dynamic-wind
              (lambda () (set-current-output-port outfile))
              (lambda () (let ((val (eval expr (interaction-environment))))
                      (when (not (eq? val *unspecified*))
                        (display val))))
              (lambda () (set-current-output-port current-output)))
          (repl))))

(let loop ()
  (let ((next-char (read-char infile)))
    (cond ((eof-object? next-char)) ; success!
          ((not (char=? next-char #\%))
           (write-char next-char outfile)
           (loop))
          (else
           (let ((second-char (read-char infile)))
             (cond ((eof-object? second-char)
                    (write-char next-char outfile))
                   ((not (char=? second-char #\%))
                    (write-char next-char outfile)
                    (write-char second-char outfile)
                    (loop))
                   (else
                    (repl)
                    (loop))))))))

[-- Attachment #4: test.scm --]
[-- Type: application/octet-stream, Size: 2312 bytes --]

#!guile
!#

(load "jitgen.scm")

(use-modules (jitgen))

(display
 (instructions->c
  '((x <- y + z))
  '()))

; this is a translation of the jit code for scm_op_assert_nargs_ee_locals
(display
 (instructions->c
  '((ip <- ip + ipup)
    (tmp1 <- fp - spup)
    (tmp2 <- sp - tmp1)
    (tmp3 <- tmp2 == (n_val & 0x7))
    (branch-if tmp3 good_path)
    (store-relative ipp 0 ip)
    (store-relative spp 0 sp)
    (store-relative fpp 0 fp)
    (return jit_return_wrong_num_args)
    (label good_path)
    (sp <- sp + spup)
    (store-relative sp 0 SCM_UNDEFINED)
    )
  '((ip . int) (ipup . int) (sp . int) (spup . int) (fp . int))))

; this should return something like
; "{
;  void *tmp1;
;  void *tmp2;
;  void *tmp3;
;  ip = ip + ipup;
;  tmp1 = fp - spup;
;  tmp2 = sp - tmp1;
;  tmp3 = tmp2 == (n_val & 0x7);
;  if tmp3
;      goto good_path;
;  ipp[0] = ip;
;  spp[0] = sp;
;  fpp[0] = fp;
;  return jit_return_wrong_num_args;
;  good_path:
;  sp = sp + spup;
;  sp[0] = SCM_UNDEFINED;
;  }"

(display
 (expression-type '(a + (b * c)) '((a . int) (b . double)) '((c . float))))
; this should be 'double

(display
 (instructions->libjit
  '((ip <- ip + ipup)
    (tmp1 <- fp - spup)
    (tmp2 <- sp - tmp1)
    (tmp3 <- tmp2 == (n_val & 0x7))
    (branch-if tmp3 good_path)
    (store-relative ipp 0 ip)
    (store-relative spp 0 sp)
    (store-relative fpp 0 fp)
    (return jit_return_wrong_num_args)
    (label good_path)
    (sp <- sp + spup)
    (store-relative sp 0 SCM_UNDEFINED)
    )
  '((ip . value) (ipup . value) (sp . value) (spup . value) (fp . value))))

; this should return something like
; "{
; jit_value_t tmp1;
; jit_value_t tmp2;
; jit_value_t tmp3;
; jit_label_t good_path;
;
; ip = jit_insn_add (function, ip, ipup);
; tmp1 = jit_insn_sub (function, fp, spup);
; tmp2 - jit_insn_sub (function, sp, tmp1);
; tmp3 = jit_insn_eq (function, tmp2, jit_insn_and(function, n_val,
; 0x7));
; jit_insn_branch_if (function, tmp3, good_path);
; jit_insn_store_relative (function, ipp, 0, ip);
; jit_insn_store_relative (function, spp, 0, sp);
; jit_insn_store_relative (function, fpp, 0, fp);
; jit_insn_return (function, jit_return_wrong_num_args);
; jit_insn_label (function, &good_path);
; sp = jit_insn_add (function, sp, spup);
; jit_insn_store_relative (function, sp, 0, SCM_UNDEFINED);


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Introducing Jitgen
  2010-11-28 19:52 Introducing Jitgen Noah Lavine
@ 2010-12-10  9:09 ` Ludovic Courtès
  2011-01-28 10:34   ` Andy Wingo
  0 siblings, 1 reply; 3+ messages in thread
From: Ludovic Courtès @ 2010-12-10  9:09 UTC (permalink / raw)
  To: guile-devel

Hello!

Better late than never...

Noah Lavine <noah.b.lavine@gmail.com> writes:

[...]

> The goal of Jitgen is to generate both of these from the following instructions:
>
> '((ip <- ip + ipup)
>   (tmp1 <- fp - spup)
>   (tmp2 <- sp - tmp1)
>   (tmp3 <- tmp2 == (n_val & 0x7))
>   (branch-if tmp3 good_path)
>   (store-relative ipp 0 ip)
>   (store-relative spp 0 sp)
>   (store-relative fpp 0 fp)
>   (return jit_return_wrong_num_args)
>   (label good_path)
>   (sp <- sp + spup)
>   (store-relative sp 0 SCM_UNDEFINED)
>   )

Excellent!

The challenge will be to see if it’s sufficiently expressive to express
all the actual VM instructions, and allows the generated code to be
about the same as the hand-written one.

> My biggest question is, is this something that you would be interested
> in having in Guile?

I think so, yes.  Not in 2.0, I guess, but when it starts stabilizing it
could go into a branch.

> If so, then also what do you think of the code style and interfaces,
> and do you have any other feedback?

The code style looks OK to me.

This approach raises bootstrapping issues, or semi-issues.  For
instance, tarballs could come with all the pre-generated C files
containing the VM instructions, but we may also have to check them in.
That’s how psyntax bootstrapping is handled already, so it’s probably
OK.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Introducing Jitgen
  2010-12-10  9:09 ` Ludovic Courtès
@ 2011-01-28 10:34   ` Andy Wingo
  0 siblings, 0 replies; 3+ messages in thread
From: Andy Wingo @ 2011-01-28 10:34 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Heya Noah,

On Fri 10 Dec 2010 10:09, ludo@gnu.org (Ludovic Courtès) writes:

> Better late than never...

Indeed, and my apologies for only getting back to you now.

> Noah Lavine <noah.b.lavine@gmail.com> writes:
>
>> My biggest question is, is this something that you would be interested
>> in having in Guile?
>
> I think so, yes.  Not in 2.0, I guess, but when it starts stabilizing it
> could go into a branch.

I am skeptical.  I think that the jitgen language sounds like a fine
intermediate thing, but I don't know if it will be good enough for
describing the whole VM.

OTOH something like it does seem useful for implementing all of the VM
ops in the JIT engine.

Have you considered going the other way?  That's to say, writing a
limited C -> Jitgen parser, and thus keeping the primary representation
of the VM in C?  It doesn't have to be a full parser.  It doesn't have
to expand macros -- i.e. it can interpret NEXT() directly, without
relying on details of its expansion.  If there is an idiom that the
parser is having a hard time with, we can change the C VM implementation
to make the meanings more explicit, via more macro use.

In this way we would keep the VM's canonical representation in a
well-defined language.  This makes it easier for anyone to maintain and
fix the VM.  It also frees you up to make changes in jitgen without
worrying about impacting the non-jit case.

What do you think?

Regards,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-01-28 10:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-28 19:52 Introducing Jitgen Noah Lavine
2010-12-10  9:09 ` Ludovic Courtès
2011-01-28 10:34   ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).