unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* using guile like a awk filter in a C program.
@ 2024-05-10 13:55 Pierre LINDENBAUM
  2024-05-17 18:09 ` Simon Tournier
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Pierre LINDENBAUM @ 2024-05-10 13:55 UTC (permalink / raw)
  To: guile-user


Hi all,

I tried to learn guile a few years ago with a side project that went 
nowhere.

I'm now back with guile that I would like to use as a filter, just like 
awk, for my data.
I've got question about the general design of such program.

My program uses a C library ( https://github.com/samtools/htslib ) 
scanning mutations/variants in large VCF files ( 
https://en.wikipedia.org/wiki/Variant_Call_Format ).

A typical C program looks like (pseudo code) ;

	```
	header = read_header(input);
	variant = new_variant();
	while(read_variant(input,header,variant)) {
		do_something(header,variant)
		}
	dispose_variant(variant)
	dispose_header(header)
	```

I would like to use guile to filter VCF using a custom user GUILE 
expression/program . So my program would now look like

	```
	header = read_header(input);
	guile_context = my_initialize_guile(header, argc_argv_user_script)
	variant = new_variant();
	while(read_variant(input,header,variant)) {
		if(!my_guile_test(guile_context,header,variant)) {
			continue;
			}
		do_something(header,variant)
		}
	dispose_variant(variant)
	my_dispose_guile(guile_context)
	dispose_header(header)
	```

and would may be be invoked like:

```
./a.out -e '(and (variant-is-snp? ) (equals? (variant-allele-count) 2))' 
input.vcf > output.vcf
```

where `variant-is-snp` would test if the current variant in the 'while' 
loop is a 'single nucleotide polylmorphism' using `bcf_is_snp` 
https://github.com/samtools/htslib/blob/develop/htslib/vcf.h#L889


So my questions are:

- is it possible to use guile for such task ? More precisely, can I 
compile the guile script just once in `my_initialize_guile` and use it 
in the while loop without having to recompile it.
- furthermore, what's the best practice to include the user's script in 
a larger script that would include the definitions of `variant-is-snp?` 
, `variant-allele-count`, etc...
- is there any implementation that works like this (something like a AWK 
script in guile) ?
- is there any way to make the program stateless ? I mean, any 
invocation of `my_guile_test` would erase the definition of the previous 
variant

Thanks for your answers,

Pierre L



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using guile like a awk filter in a C program.
  2024-05-10 13:55 using guile like a awk filter in a C program Pierre LINDENBAUM
@ 2024-05-17 18:09 ` Simon Tournier
  2024-05-21 13:39   ` Pierre Lindenbaum
  2024-05-21 14:58 ` Basile Starynkevitch
  2024-05-30  7:50 ` Pjotr Prins
  2 siblings, 1 reply; 9+ messages in thread
From: Simon Tournier @ 2024-05-17 18:09 UTC (permalink / raw)
  To: Pierre LINDENBAUM, guile-user

Hi,

On Fri, 10 May 2024 at 15:55, Pierre LINDENBAUM <Pierre.Lindenbaum@univ-nantes.fr> wrote:

> 	```
> 	header = read_header(input);
> 	guile_context = my_initialize_guile(header, argc_argv_user_script)
> 	variant = new_variant();
> 	while(read_variant(input,header,variant)) {
> 		if(!my_guile_test(guile_context,header,variant)) {
> 			continue;
> 			}
> 		do_something(header,variant)
> 		}
> 	dispose_variant(variant)
> 	my_dispose_guile(guile_context)
> 	dispose_header(header)
> 	```
>
> and would may be be invoked like:
>
> ```
> ./a.out -e '(and (variant-is-snp? ) (equals? (variant-allele-count) 2))' 
> input.vcf > output.vcf
> ```

[...]


> - is it possible to use guile for such task ? More precisely, can I 
> compile the guile script just once in `my_initialize_guile` and use it 
> in the while loop without having to recompile it.

I am not sure to well understand.  Since, it could be read two ways:

 + Call Scheme from C
 + Call C from Scheme

Will the main loop be in C-like or in Guile?


> - furthermore, what's the best practice to include the user's script in 
> a larger script that would include the definitions of `variant-is-snp?` 
> , `variant-allele-count`, etc...

Maybe modules?

> - is there any implementation that works like this (something like a AWK 
> script in guile) ?

What you want to replace is the driver – currently implemented in AWK.
You would like to have the driver (glue code) in Guile.  Is it correct?

> - is there any way to make the program stateless ? I mean, any 
> invocation of `my_guile_test` would erase the definition of the previous 
> variant

Well, this question depends on the above, I guess. :-)

Cheers,
simon



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using guile like a awk filter in a C program.
  2024-05-17 18:09 ` Simon Tournier
@ 2024-05-21 13:39   ` Pierre Lindenbaum
  2024-05-22 10:39     ` Simon Tournier
  2024-05-24  7:31     ` Pierre Lindenbaum
  0 siblings, 2 replies; 9+ messages in thread
From: Pierre Lindenbaum @ 2024-05-21 13:39 UTC (permalink / raw)
  To: Simon Tournier, guile-user

Thanks for your answer
>> - is it possible to use guile for such task ? More precisely, can I
>> compile the guile script just once in `my_initialize_guile` and use it
>> in the while loop without having to recompile it.
> I am not sure to well understand.  Since, it could be read two ways:
>
>   + Call Scheme from C

I want to call scheme from C.

The C loop scans the binary data and a scheme script would be used to accept/reject a record. Something like


         ./my-c-program --filter '(and (is-male?) (is-german?) )' persons.data > male_from_germany.data


>> - furthermore, what's the best practice to include the user's script in
>> a larger script that would include the definitions of `variant-is-snp?`
>> , `variant-allele-count`, etc...
> Maybe modules?

right, if I call scheme fro C should I put my library in a directory or is there any way to embed the code in the C executable (in a const char*...)

> What you want to replace is the driver – currently implemented in AWK.
> You would like to have the driver (glue code) in Guile.  Is it correct?

hum not sure .. again, here my program is a C code, and I want to provide a guile program that would be used as a filter for each record.


./my-c-program --filter '(and (is-male?) (is-german?) )' persons.data > male_from_germany.data


I hope it's clearer now


Pierre





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using guile like a awk filter in a C program.
  2024-05-10 13:55 using guile like a awk filter in a C program Pierre LINDENBAUM
  2024-05-17 18:09 ` Simon Tournier
@ 2024-05-21 14:58 ` Basile Starynkevitch
  2024-05-30  7:50 ` Pjotr Prins
  2 siblings, 0 replies; 9+ messages in thread
From: Basile Starynkevitch @ 2024-05-21 14:58 UTC (permalink / raw)
  To: guile-user; +Cc: Pierre LINDENBAUM

Hello Pierre,

(if you reply privately, feel free to reply in French, we both are French)

You definitely could use libguile in your C program. Since Guile has a 
conservative garbage collector, and assuming your program is 
single-threaded, this should be reasonably easy. If your C program needs 
to be multi-threaded (with each thread doing garbage collection or GUILE 
object allocation) it would be probably a lot harder. If you need the 
full power of Scheme call/cc primitive (traversing C primitives and a 
mixed C/Scheme call stack) you could be in trouble.


You probably know about the Bigloo implementation of Guile (compiling to 
C) by Manuel Serrano https://github.com/manuel-serrano/bigloo (I don't 
know if it fits better your need, but that code is very well written and 
robust)

BTW, my pet open source project is 
https://github.com/RefPerSys/RefPerSys/ "REFlexive PERsistent SYStem" - 
it is not Guile, but provide orthogonal persistence (in JSON files), is 
GPLv3+ licensed, coded mostly in C++ for Linux, has a precise garbage 
collector (with some moving/copying of simple immutable data), a 
multi-threaded agenda mechanism. It is work in progress (aiming to 
become some open source inference engine). We are seeking some ITEA or 
ANR or HorizonEurope interested by this open source (or even informal 
students contributing to it).

On 5/10/24 15:55, Pierre LINDENBAUM wrote:
>
> Hi all,
>
> I tried to learn guile a few years ago with a side project that went 
> nowhere.
>
> I'm now back with guile that I would like to use as a filter, just 
> like awk, for my data.
> I've got question about the general design of such program.
>
> My program uses a C library ( https://github.com/samtools/htslib ) 
> scanning mutations/variants in large VCF files ( 
> https://en.wikipedia.org/wiki/Variant_Call_Format ).
>
> A typical C program looks like (pseudo code) ;
>
>     ```
>     header = read_header(input);
>     variant = new_variant();
>     while(read_variant(input,header,variant)) {
>         do_something(header,variant)
>         }
>     dispose_variant(variant)
>     dispose_header(header)
>     ```
>
> I would like to use guile to filter VCF using a custom user GUILE 
> expression/program . So my program would now look like
>
>     ```
>     header = read_header(input);
>     guile_context = my_initialize_guile(header, argc_argv_user_script)
>     variant = new_variant();
>     while(read_variant(input,header,variant)) {
>         if(!my_guile_test(guile_context,header,variant)) {
>             continue;
>             }
>         do_something(header,variant)
>         }
>     dispose_variant(variant)
>     my_dispose_guile(guile_context)
>     dispose_header(header)
>     ```
>
> and would may be be invoked like:
>
> ```
> ./a.out -e '(and (variant-is-snp? ) (equals? (variant-allele-count) 
> 2))' input.vcf > output.vcf
> ```
>
> where `variant-is-snp` would test if the current variant in the 
> 'while' loop is a 'single nucleotide polylmorphism' using `bcf_is_snp` 
> https://github.com/samtools/htslib/blob/develop/htslib/vcf.h#L889
>
>
> So my questions are:
>
> - is it possible to use guile for such task ? More precisely, can I 
> compile the guile script just once in `my_initialize_guile` and use it 
> in the while loop without having to recompile it.


Yes, by using scm_boot_guile and most importantly the internal functions 
it is calling. It is worth studying the source code of GNU guile.

> - furthermore, what's the best practice to include the user's script 
> in a larger script that would include the definitions of 
> `variant-is-snp?` , `variant-allele-count`, etc...
> - is there any implementation that works like this (something like a 
> AWK script in guile) ?
> - is there any way to make the program stateless ? I mean, any 
> invocation of `my_guile_test` would erase the definition of the 
> previous variant


Probably by having some Guile global variable. You want to read also 
https://www.gnu.org/software/guile/manual/html_node/Foreign-Object-Memory-Management.html

>
> Thanks for your answers,
>

Regards.

-- 
Basile Starynkevitch             <basile@starynkevitch.net>
(only mine opinions / les opinions sont miennes uniquement)
8 rue de la Faïencerie, 92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/
See/voir:   https://github.com/RefPerSys/RefPerSys




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using guile like a awk filter in a C program.
  2024-05-21 13:39   ` Pierre Lindenbaum
@ 2024-05-22 10:39     ` Simon Tournier
  2024-05-24  7:31     ` Pierre Lindenbaum
  1 sibling, 0 replies; 9+ messages in thread
From: Simon Tournier @ 2024-05-22 10:39 UTC (permalink / raw)
  To: Pierre Lindenbaum, guile-user

Hi Pierre,

> I want to call scheme from C.

Well, I would start with:

https://www.gnu.org/software/guile/manual/html_node/A-Sample-Guile-Main-Program.html

Then maybe:

https://www.gnu.org/software/guile/manual/html_node/General-Libguile-Concepts.html


Personally, I have never done that.  The memory management can be
painful.  And instead of directly using libguile, my first attempt would
be a very simplistic approach: call system() or popen(), i.e., start
Guile this way and process some script.  If that is not doable because
the objects to process are alive inside only the C program, that’s an
indication it will not be straightforward. ;-)

Somehow, I do not know about your question. :-)

Cheers,
simon



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using guile like a awk filter in a C program.
  2024-05-21 13:39   ` Pierre Lindenbaum
  2024-05-22 10:39     ` Simon Tournier
@ 2024-05-24  7:31     ` Pierre Lindenbaum
  2024-05-28  8:42       ` Zelphir Kaltstahl
  2024-06-10  9:08       ` Maxime Devos via General Guile related discussions
  1 sibling, 2 replies; 9+ messages in thread
From: Pierre Lindenbaum @ 2024-05-24  7:31 UTC (permalink / raw)
  To: guile-user


> I want to call scheme from C.
>
> The C loop scans the binary data and a scheme script would be used to accept/reject a record. Something like
>
>
>         ./my-c-program --filter '(and (is-male?) (is-german?) )' persons.data > male_from_germany.data
>
>
>>> - furthermore, what's the best practice to include the user's script in
>>> a larger script that would include the definitions of `variant-is-snp?`
>>> , `variant-allele-count`, etc...
>> Maybe modules?

Ok , this it what I got so far. I tested my idea with a C program scanning a stream of integers on stdin . I put  the C code and the Makefile below (or you can read it at https://gist.github.com/lindenb/7b7793ced81fc3a4a566333f5149d65a ) :

The invocation looks like:

seq 1 100 | ./a.out '(= (modulo (_get) 7) 0)'
7
14
21
28
35
42
49
56
63
70
77
84
91
98

where (_get) returns the current integer.

Is it the correct way to initialize the script ?

it looks ugly. Currently the user just define the 'filter' part and then I include this string in a larger code. Ant how should I design this if I want to include modules, custom functions,...  ?

Furthermore, I'd like to include a module that would be embedded in the C code , in a big const string to avoid any external file, it it possible ?


Thank you for your suggestions

Pierre


/***************** C code ************************************************/

#include <stdio.h>
#include <libguile.h>


struct Global {
     /** inputstream */
     FILE* in;
     /** currrent value */
     int value;
     /** contains script as string */
     char buffer[2048];
     };

static struct Global global;

/** retrieve the next record, return #t on success */
static SCM _next () {
   int ret = fscanf(global.in,"%d", &(global.value));
   if(ret!=1) return SCM_BOOL_F;
   return SCM_BOOL_T;
}
/** get current number */
static SCM _get () {
   return scm_from_int(global.value);
}

/** output current number */
static SCM _emit () {
   fprintf(stderr,"%d\n",global.value);
   return SCM_BOOL_T;
}

/** dispose memory associated */
static SCM _dispose () {
   return SCM_BOOL_T;
}


static void*
inner_main (void *data)
{
     struct Global* g= (struct Global*)data;
     scm_c_define_gsubr("_next", 0, 0, 0, _next);
     scm_c_define_gsubr("_get", 0, 0, 0, _get);
     scm_c_define_gsubr("_emit", 0, 0, 0, _emit);
     scm_c_define_gsubr("_dispose", 0, 0, 0, _dispose);
     scm_c_eval_string(g->buffer);
     return 0;
}

int main(int argc,char** argv) {
     if(argc!=2) return -1;
     global.value=1;
     global.in = stdin;
     sprintf(global.buffer,"(while (_next) (if %s  (_emit)   ) (_dispose) )  ",argv[1]);
     scm_with_guile(inner_main,(void*)&global);
     return 0;
     }

/***************** Makefile ************************************************/GUILE_VERSION=2.2
# Tell the C compiler where to find <libguile.h>
CFLAGS=`pkg-config --cflags guile-$(GUILE_VERSION)`

# Tell the linker what libraries to use and where to find them.
LIBS=`pkg-config --libs guile-$(GUILE_VERSION)`


test: a.out
     seq 1 100 | ./a.out '(= (modulo (_get) 7) 0)'
a.out: filter.c
     $(CC) $(CFLAGS) -o $@ $< $(LIBS)





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using guile like a awk filter in a C program.
  2024-05-24  7:31     ` Pierre Lindenbaum
@ 2024-05-28  8:42       ` Zelphir Kaltstahl
  2024-06-10  9:08       ` Maxime Devos via General Guile related discussions
  1 sibling, 0 replies; 9+ messages in thread
From: Zelphir Kaltstahl @ 2024-05-28  8:42 UTC (permalink / raw)
  To: Pierre Lindenbaum; +Cc: Guile User

Hello Pierre,

To me it looks like the easiest way could be to think of a declarative 
description of your filters and put that into a JSON file and read that as input 
(or similar file format), rather than using a whole programming language.

Is there a specific requirement, that makes a JSON file infeasible and using a 
Scheme necessary?

Is it because you do not want to implement the logic for processing the 
description of the filter in your C program? (That could be a reasonable 
concern, depending on how complicated the filters become.)

Regards,
Zelphir

-- 
repositories: https://notabug.org/ZelphirKaltstahl




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: using guile like a awk filter in a C program.
  2024-05-10 13:55 using guile like a awk filter in a C program Pierre LINDENBAUM
  2024-05-17 18:09 ` Simon Tournier
  2024-05-21 14:58 ` Basile Starynkevitch
@ 2024-05-30  7:50 ` Pjotr Prins
  2 siblings, 0 replies; 9+ messages in thread
From: Pjotr Prins @ 2024-05-30  7:50 UTC (permalink / raw)
  To: Pierre LINDENBAUM; +Cc: guile-user

Note that the scheme shell (scsh) has an awk implementation using
macros. Maybe the code helps:

https://carlstrom.com/publications/scsh-manual.pdf

On Fri, May 10, 2024 at 03:55:58PM +0200, Pierre LINDENBAUM wrote:
> 
> Hi all,
> 
> I tried to learn guile a few years ago with a side project that went
> nowhere.
> 
> I'm now back with guile that I would like to use as a filter, just like awk,
> for my data.
> I've got question about the general design of such program.
> 
> My program uses a C library ( https://github.com/samtools/htslib ) scanning
> mutations/variants in large VCF files (
> https://en.wikipedia.org/wiki/Variant_Call_Format ).
> 
> A typical C program looks like (pseudo code) ;
> 
> 	```
> 	header = read_header(input);
> 	variant = new_variant();
> 	while(read_variant(input,header,variant)) {
> 		do_something(header,variant)
> 		}
> 	dispose_variant(variant)
> 	dispose_header(header)
> 	```
> 
> I would like to use guile to filter VCF using a custom user GUILE
> expression/program . So my program would now look like
> 
> 	```
> 	header = read_header(input);
> 	guile_context = my_initialize_guile(header, argc_argv_user_script)
> 	variant = new_variant();
> 	while(read_variant(input,header,variant)) {
> 		if(!my_guile_test(guile_context,header,variant)) {
> 			continue;
> 			}
> 		do_something(header,variant)
> 		}
> 	dispose_variant(variant)
> 	my_dispose_guile(guile_context)
> 	dispose_header(header)
> 	```
> 
> and would may be be invoked like:
> 
> ```
> ./a.out -e '(and (variant-is-snp? ) (equals? (variant-allele-count) 2))'
> input.vcf > output.vcf
> ```
> 
> where `variant-is-snp` would test if the current variant in the 'while' loop
> is a 'single nucleotide polylmorphism' using `bcf_is_snp`
> https://github.com/samtools/htslib/blob/develop/htslib/vcf.h#L889
> 
> 
> So my questions are:
> 
> - is it possible to use guile for such task ? More precisely, can I compile
> the guile script just once in `my_initialize_guile` and use it in the while
> loop without having to recompile it.
> - furthermore, what's the best practice to include the user's script in a
> larger script that would include the definitions of `variant-is-snp?` ,
> `variant-allele-count`, etc...
> - is there any implementation that works like this (something like a AWK
> script in guile) ?
> - is there any way to make the program stateless ? I mean, any invocation of
> `my_guile_test` would erase the definition of the previous variant
> 
> Thanks for your answers,
> 
> Pierre L
> 

-- 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: using guile like a awk filter in a C program.
  2024-05-24  7:31     ` Pierre Lindenbaum
  2024-05-28  8:42       ` Zelphir Kaltstahl
@ 2024-06-10  9:08       ` Maxime Devos via General Guile related discussions
  1 sibling, 0 replies; 9+ messages in thread
From: Maxime Devos via General Guile related discussions @ 2024-06-10  9:08 UTC (permalink / raw)
  To: Pierre Lindenbaum, guile-user@gnu.org

>it looks ugly. Currently the user just define the 'filter' part and then I include this string in a larger code. Ant how should I design this if I want to include modules, custom functions,...  ?

I propose replacing the interface for filter part by (lambda (record) [more code here]) and evaluating this only once – i.e., the expression itself, not the value of the procedure at a particular record. Then the user can just do

(letrec*() (define custom ...) (lambda (record) [stuff here]))

for any custom functions.

To include modules, you can just support --use-module (foo bar) in the main function, using the reflection API to import functions.

Another option (IMO the best option) is let the user define an entire module (with optional initial (define-module (...)) – that’s not really important for the purposes of the program) and let the last expression be the return filter – you can use the compilation API for this.

If you do that, it becomes easy to support other languages than Scheme as well, by passing #:from to procedures like ‘compile’ and adding a ‘--language language’ argument to the main function.

>Furthermore, I'd like to include a module that would be embedded in the C code , in a big const string to avoid any external file, it it possible ?

Just do a big

scm_eval_string(" (define-module (foo bar)) [code here]").

Be aware that this switches the current module to (foo bar), so you might want to do a module excursion or something.

    /** contains script as string */
     char buffer[2048];
    sprintf(global.buffer,"(while (_next) (if %s  (_emit)   ) (_dispose) )  ",argv[1]);

This sprintf is terrible, please never do this. Maybe use scm_string_append + that function for making Scheme strings from C strings instead.

In particular, note that if the total length of the text (excluding terminating \0) is 4096, then global.buffer doesn’t have a terminating zero so scm_c_eval_string(g->buffer) is not well-defined. And even if the lack of terminating zero wasn’t a problem, then only part of the script will be run.

The “int ret = fscanf(global.in,"%d", &(global.value));” + error handling is also bad. Why does ‘main’ return 0 on input errors and syntax errors? Why is the cause of the error never printed (say, with perror or something) – that would give information on whether it is a mere syntax error or whether the underlying device has some I/O errors.

And why are you writing normal output to stderrr instead of stdout?

The sprint is also suboptimal for a different reason.
For example, suppose that argv[1] is “#true #true)) (if”. This is invalid syntax, but it will be interpreted as something – likely leading to errors, but not the right kind of error, i.e., a syntax error.

The simplest way to avoid this while still staying in C (I think it would be simpler to instead lead scheme call C but whatever(*)), I think, is to construct relevant S-expressions as S-expressions – first define “SCM filter_code” with scm_read(scm_open_input_string(argv[1])), then define

SCM filter_lambda = scm_list_3(scm_from_utf8_symbol("lambda"),SCM_EOL, filter_code)

and so on until you did the C equivalent of

(define code
  #`(let ((filter (lambda () #,(vector-ref argv 1))))
       (while (_next)
         ((if (filter) _emit _dispose)))))

At last, you can do scm_eval(code,scm_current_module()).

(*) I.e., write a C library that reads binary data from a port and then formats it in some Scheme record, e.g. with a function that reads a single record and then returns the Scheme representation (or an end-of-file marker). It can then be made available to Scheme with scm_c_define (I’m not sure about the name of the function, but surely there is a C function to define a Scheme ‘variable’ (in this case, constant)) etc.

Best regards,
Maxime Devos


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-10  9:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-10 13:55 using guile like a awk filter in a C program Pierre LINDENBAUM
2024-05-17 18:09 ` Simon Tournier
2024-05-21 13:39   ` Pierre Lindenbaum
2024-05-22 10:39     ` Simon Tournier
2024-05-24  7:31     ` Pierre Lindenbaum
2024-05-28  8:42       ` Zelphir Kaltstahl
2024-06-10  9:08       ` Maxime Devos via General Guile related discussions
2024-05-21 14:58 ` Basile Starynkevitch
2024-05-30  7:50 ` Pjotr Prins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).