emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* speeding up Babel Gnuplot
@ 2016-12-28 20:33 Thierry Banel
  2016-12-29 20:04 ` Nicolas Goaziou
  2017-01-01 20:17 ` Thierry Banel
  0 siblings, 2 replies; 17+ messages in thread
From: Thierry Banel @ 2016-12-28 20:33 UTC (permalink / raw)
  To: emacs-orgmode

Babel Gnuplot is quite slow on large tables.
Example: 45 seconds for a 1500 rows table.

Why? Because orgtbl-to-generic is too slow (or too generic). Its
behavior seems to be quadratic O(size^2). Should we bypass it? Should
we try to optimize it?

Here is a fix to speed up the rendering to a mere fraction of a second.

#+BEGIN_SRC elisp
(defun org-babel-gnuplot-table-to-data (table data-file params)
  "Export TABLE to DATA-FILE in a format readable by gnuplot."
  (let ((org-babel-gnuplot-timestamp-fmt
     (or (plist-get params :timefmt) "%Y-%m-%d-%H:%M:%S")))
    (with-temp-file data-file
      (mapc (lambda (line)
          (mapc (lambda (cell)
              (insert (org-babel-gnuplot-quote-tsv-field cell))
              (insert "\t"))
            line)
          (insert "\n"))
        table)))
  data-file)
#+END_SRC

And here is a test case.

First generate a 1500 rows table:

#+BEGIN_SRC elisp :results none
  (goto-char (point-max))
  (insert "#+name: data\n")
  (let ((i 1500))
    (while (> i 0)
      (goto-char (point-max))
      (insert (format "| %4s |\n" i))
      (setq i (1- i))))
#+END_SRC

Then run Babel Gnuplot:

#+BEGIN_SRC gnuplot :var data=data :file x.svg :session none :term svg
plot data
#+END_SRC

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2016-12-28 20:33 speeding up Babel Gnuplot Thierry Banel
@ 2016-12-29 20:04 ` Nicolas Goaziou
  2016-12-29 20:34   ` Thierry Banel
  2017-01-01 20:17 ` Thierry Banel
  1 sibling, 1 reply; 17+ messages in thread
From: Nicolas Goaziou @ 2016-12-29 20:04 UTC (permalink / raw)
  To: Thierry Banel; +Cc: emacs-orgmode

Hello,

Thierry Banel <tbanelwebmin@free.fr> writes:

> Babel Gnuplot is quite slow on large tables.
> Example: 45 seconds for a 1500 rows table.
>
> Why? Because orgtbl-to-generic is too slow (or too generic). Its
> behavior seems to be quadratic O(size^2). Should we bypass it?

I don't think so.

> Should we try to optimize it?

I did some optimizations in master branch. I go below 1 sec for the 1500
rows table.

> Here is a fix to speed up the rendering to a mere fraction of a second.
>
> #+BEGIN_SRC elisp
> (defun org-babel-gnuplot-table-to-data (table data-file params)
>   "Export TABLE to DATA-FILE in a format readable by gnuplot."
>   (let ((org-babel-gnuplot-timestamp-fmt
>      (or (plist-get params :timefmt) "%Y-%m-%d-%H:%M:%S")))
>     (with-temp-file data-file
>       (mapc (lambda (line)
>           (mapc (lambda (cell)
>               (insert (org-babel-gnuplot-quote-tsv-field cell))
>               (insert "\t"))
>             line)
>           (insert "\n"))
>         table)))
>   data-file)
> #+END_SRC

The comparison is not fair, because the function doesn't handle all the
cases `orgtbl-to-generic' handles.

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2016-12-29 20:04 ` Nicolas Goaziou
@ 2016-12-29 20:34   ` Thierry Banel
  0 siblings, 0 replies; 17+ messages in thread
From: Thierry Banel @ 2016-12-29 20:34 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode

Le 29/12/2016 21:04, Nicolas Goaziou a écrit :
> I did some optimizations in master branch. I go below 1 sec for the 1500
> rows table.
Confirmed! Your latest commit givesa huge boost.

>> Here is a fix to speed up the rendering to a mere fraction of a second.
>>
>> #+BEGIN_SRC elisp
>> (defun org-babel-gnuplot-table-to-data (table data-file params)
>>   "Export TABLE to DATA-FILE in a format readable by gnuplot."
>>   (let ((org-babel-gnuplot-timestamp-fmt
>>      (or (plist-get params :timefmt) "%Y-%m-%d-%H:%M:%S")))
>>     (with-temp-file data-file
>>       (mapc (lambda (line)
>>           (mapc (lambda (cell)
>>               (insert (org-babel-gnuplot-quote-tsv-field cell))
>>               (insert "\t"))
>>             line)
>>           (insert "\n"))
>>         table)))
>>   data-file)
>> #+END_SRC
> The comparison is not fair, because the function doesn't handle all the
> cases `orgtbl-to-generic' handles.
>
Of course it was not fair.
It was just a quick-and-dirty-not-to-be-commited patch to discuss the issue.
And of course improving orgtbl-to-generic benefits to many usages,
besidesBabel Gnuplot.
Thanks for taking care and doing so so quickly.

Thierry

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2016-12-28 20:33 speeding up Babel Gnuplot Thierry Banel
  2016-12-29 20:04 ` Nicolas Goaziou
@ 2017-01-01 20:17 ` Thierry Banel
  2017-01-01 23:34   ` Nicolas Goaziou
  1 sibling, 1 reply; 17+ messages in thread
From: Thierry Banel @ 2017-01-01 20:17 UTC (permalink / raw)
  To: emacs-orgmode

Babel Gnuplot can be further accelerated. A table is converted to a
temporary file readable by Gnuplot. I found two issues in this process:
1. the temporary file is generated twice,
2. the generation is quadratic.

I have not provided a committable patch because I am not happy with myfixes.

Of course there is no hurry in fixing that because large tables do not
happen so often.

------------------------------

1. Temporary generated twice
Because org-babel-gnuplot-process-vars is called twice.

There is no obvious fix. Here is a dirty patch. It caches the name of
the temporary file in the 'param' list.

#+BEGIN_SRC elisp
  (defun org-babel-gnuplot-process-vars (params)
    "Extract variables from PARAMS and process the variables.
  Dumps all vectors into files and returns an association list
  of variable names and the related value to be used in the gnuplot
  code."
    (let ((*org-babel-gnuplot-missing* (cdr (assq :missing params))))
      (mapcar
       (lambda (pair)
     (cons
      (car pair) ;; variable name
      (let ((val (cdr pair))) ;; variable value
        (if (not (listp val))
        val
          (let ((temp-file (org-babel-temp-file "gnuplot-"))
            (first  (car val)))
        (setcdr pair temp-file)  ;; <------ caching here
        (org-babel-gnuplot-table-to-data
         (if (or (listp first) (symbolp first))
             val
           (mapcar 'list val))
         temp-file params))))))
       (org-babel--get-vars params))))
#+END_SRC

------------------------------

2. Quadratic behavior
The spot is at ox.el::5119(the lambda in org-export-table-row-number).

This lambda is called a number of times equal to the square of thesize
of the table being plotted. For a 2000 rows table, this is
2000x2000 = four millions times. The cache a few lines before does
nothelp because each row is visited only once.

I don't know how to fix that. Hereafter is a quick-and-dirty patch to
turn the behavior into a linear one (lambda called 2000 timesinstead).
The cache is pre-filled upon creation with a single call to
org-element-map. So instead of calling org-element-map for each row
(forcing it to go through all previous rows over and over), it iscalled
only once.

#+BEGIN_SRC elisp
(defun prefill-cache (table cache info)
  (let ((number 0))
    (org-element-map   ;; <--- called once here
          table
      'table-row
      (lambda (row)
        (if (eq (org-element-property :type row) 'standard)
           (progn
          (puthash row number cache)
          (cl-incf number))))
      info )))

(defun org-export-table-row-number (table-row info)
  "Return TABLE-ROW number.
INFO is a plist used as a communication channel.  Return value is
zero-based and ignores separators.  The function returns nil for
special columns and separators."
  (let* ((cache (or (plist-get info :table-row-number-cache)
                    (let ((table (make-hash-table :test #'eq)))
                      (plist-put info :table-row-number-cache table)
                      (prefill-cache
                        (org-export-get-parent-table table-row)
                        table
                        info)
                     table)))
         (cached (gethash table-row cache 'no-cache)))
    (if (not (eq cached 'no-cache)) cached
      (puthash table-row
       (and (eq (org-element-property :type table-row) 'standard)
            (not (org-export-table-row-is-special-p table-row info))
            (let ((number 0))
              (org-element-map   ;; <--- called many times here
                (org-export-get-parent-table table-row)
                'table-row
                (lambda (row)
                  (cond ((eq row table-row) number)
                    ((eq (org-element-property :type row) 'standard)
                     (cl-incf number) nil)))
                info 'first-match)))
           cache))))
#+END_SRC

------------------------------

3. Test case

First generate a 2000 rows table:

#+BEGIN_SRC elisp :results none
  (goto-char (point-max))
  (insert "#+name: data\n")
  (let ((i 2000))
    (while (> i 0)
      (goto-char (point-max))
      (insert (format "| %4s |\n" i))
      (setq i (1- i))))
#+END_SRC

Then run Babel Gnuplot:

#+BEGIN_SRC gnuplot :var data=data :file x.svg :session none :term svg
plot data
#+END_SRC

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-01 20:17 ` Thierry Banel
@ 2017-01-01 23:34   ` Nicolas Goaziou
  2017-01-02 20:11     ` Thierry Banel
  0 siblings, 1 reply; 17+ messages in thread
From: Nicolas Goaziou @ 2017-01-01 23:34 UTC (permalink / raw)
  To: Thierry Banel; +Cc: emacs-orgmode

Hello,

Thierry Banel <tbanelwebmin@free.fr> writes:

> Babel Gnuplot can be further accelerated. A table is converted to a
> temporary file readable by Gnuplot. I found two issues in this process:
> 1. the temporary file is generated twice,
> 2. the generation is quadratic.
>
> I have not provided a committable patch because I am not happy with myfixes.
>
> Of course there is no hurry in fixing that because large tables do not
> happen so often.
>
> ------------------------------
>
> 1. Temporary generated twice
> Because org-babel-gnuplot-process-vars is called twice.
>
> There is no obvious fix. Here is a dirty patch. It caches the name of
> the temporary file in the 'param' list.

This may not be an issue if `orgtbl-to-generic' is sufficiently fast.

> 2. Quadratic behavior
> The spot is at ox.el::5119(the lambda in org-export-table-row-number).
>
> This lambda is called a number of times equal to the square of thesize
> of the table being plotted. For a 2000 rows table, this is
> 2000x2000 = four millions times. The cache a few lines before does
> nothelp because each row is visited only once.

Fixed. Thank you.

I also optimized a bit more `orgtbl-to-generic'. Hopefully, Babel
Gnuplot should be responsive again of large tables.

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-01 23:34   ` Nicolas Goaziou
@ 2017-01-02 20:11     ` Thierry Banel
  2017-01-03 21:40       ` Thierry Banel
  0 siblings, 1 reply; 17+ messages in thread
From: Thierry Banel @ 2017-01-02 20:11 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org >> Org Mode

Le 02/01/2017 00:34, Nicolas Goaziou a écrit :
> Hello,
>
> Thierry Banel <tbanelwebmin@free.fr> writes:
>
>> 1. Temporary generated twice
>> Because org-babel-gnuplot-process-vars is called twice.
>>
>> There is no obvious fix. Here is a dirty patch. It caches the name of
>> the temporary file in the 'param' list.
> This may not be an issue if `orgtbl-to-generic' is sufficiently fast.

I will look further into that.

>> 2. Quadratic behavior
>> The spot is at ox.el::5119(the lambda in org-export-table-row-number).
>>
>> This lambda is called a number of times equal to the square of thesize
>> of the table being plotted. For a 2000 rows table, this is
>> 2000x2000 = four millions times. The cache a few lines before does
>> nothelp because each row is visited only once.
> Fixed. Thank you.
>
> I also optimized a bit more `orgtbl-to-generic'. Hopefully, Babel
> Gnuplot should be responsive again of large tables.
>
Great improvement!
So,filling the org-export-table-row-numbercache at once was the way to go.

Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-02 20:11     ` Thierry Banel
@ 2017-01-03 21:40       ` Thierry Banel
  2017-01-03 21:55         ` Nicolas Goaziou
  2017-01-04 17:32         ` Achim Gratz
  0 siblings, 2 replies; 17+ messages in thread
From: Thierry Banel @ 2017-01-03 21:40 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 742 bytes --]

Here is a patch to avoid generating temporary files multiple times.

There is no way to ensure a single call to
(org-babel-gnuplot-process-vars) without modifying ob-core.el. I don't
want to do that because I would have to change a lot of babel backends.
Thus, I come back to my first light patch.

A 'param' list is passed around. It reflects the #+BEGIN_SRC header. My
patch changes it in-place from:
  (((:var data (3000) (2999) (2998) (2997) ...
to:
  (((:var data . "/tmp/babel-16991kSr/gnuplot-16991YBq") ...

The 'param' list behaves as a cache. There is nothing wrong with that.
The worst thing that can happen is the caching no longer working in case
'param' would be copied some day. Results would stay correct.

Regards,
Thierry



[-- Attachment #2: 0001-ob-gnuplot-temp-files-generated-only-once.patch --]
[-- Type: text/x-diff, Size: 1411 bytes --]

From 6e72396c6d5b280be900676fdcddcb14cdaf2f60 Mon Sep 17 00:00:00 2001
From: Thierry Banel <tbanelwebmin@free.fr>
Date: Tue, 3 Jan 2017 22:24:07 +0100
Subject: [PATCH] ob-gnuplot: temp files generated only once

* lisp/ob-gnuplot.el (org-babel-gnuplot-process-vars): still called
  multiple times but caches temporary files generation
---
 lisp/ob-gnuplot.el | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/lisp/ob-gnuplot.el b/lisp/ob-gnuplot.el
index e91e05a..76d47fd 100644
--- a/lisp/ob-gnuplot.el
+++ b/lisp/ob-gnuplot.el
@@ -84,15 +84,17 @@ code."
      (lambda (pair)
        (cons
 	(car pair) ;; variable name
-	(let* ((val (cdr pair)) ;; variable value
-	       (lp  (listp val)))
-	  (if lp
+	(let ((val (cdr pair))) ;; variable value
+	  (if (not (listp val))
+	      val
+	    (let ((temp-file (org-babel-temp-file "gnuplot-"))
+		  (first  (car val)))
+	      (setcdr pair temp-file) ;; <------ caching here
 	      (org-babel-gnuplot-table-to-data
-	       (let* ((first  (car val))
-		      (tablep (or (listp first) (symbolp first))))
-		 (if tablep val (mapcar 'list val)))
-	       (org-babel-temp-file "gnuplot-") params)
-	  val))))
+	       (if (or (listp first) (symbolp first))
+		   val
+		 (mapcar 'list val))
+	       temp-file params))))))
      (org-babel--get-vars params))))
 
 (defun org-babel-expand-body:gnuplot (body params)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-03 21:40       ` Thierry Banel
@ 2017-01-03 21:55         ` Nicolas Goaziou
  2017-01-03 23:06           ` Thierry Banel
  2017-01-04 17:32         ` Achim Gratz
  1 sibling, 1 reply; 17+ messages in thread
From: Nicolas Goaziou @ 2017-01-03 21:55 UTC (permalink / raw)
  To: Thierry Banel; +Cc: emacs-orgmode

Hello,

Thierry Banel <tbanelwebmin@free.fr> writes:

> Here is a patch to avoid generating temporary files multiple times.
>
> There is no way to ensure a single call to
> (org-babel-gnuplot-process-vars) without modifying ob-core.el. I don't
> want to do that because I would have to change a lot of babel backends.
> Thus, I come back to my first light patch.
>
> A 'param' list is passed around. It reflects the #+BEGIN_SRC header. My
> patch changes it in-place from:
>   (((:var data (3000) (2999) (2998) (2997) ...
> to:
>   (((:var data . "/tmp/babel-16991kSr/gnuplot-16991YBq") ...
>
> The 'param' list behaves as a cache. There is nothing wrong with that.
> The worst thing that can happen is the caching no longer working in case
> 'param' would be copied some day. Results would stay correct.

Thank you.

What is the benefit of this patch? I mean,
`org-babel-gnuplot-process-vars' is already quite fast here. Do you have
some benchmark for that?

>  	(car pair) ;; variable name
> -	(let* ((val (cdr pair)) ;; variable value
> -	       (lp  (listp val)))
> -	  (if lp
> +	(let ((val (cdr pair))) ;; variable value
> +	  (if (not (listp val))
> +	      val
> +	    (let ((temp-file (org-babel-temp-file "gnuplot-"))
> +		  (first  (car val)))
> +	      (setcdr pair temp-file) ;; <------ caching here

It would be nice to expunge the comment a bit.

>  	      (org-babel-gnuplot-table-to-data
> -	       (let* ((first  (car val))
> -		      (tablep (or (listp first) (symbolp first))))
> -		 (if tablep val (mapcar 'list val)))
> -	       (org-babel-temp-file "gnuplot-") params)
> -	  val))))
> +	       (if (or (listp first) (symbolp first))
> +		   val
> +		 (mapcar 'list val))
> +	       temp-file params))))))
>       (org-babel--get-vars params))))
>  
>  (defun org-babel-expand-body:gnuplot (body params)

Another option would be to generate a file according to the hash of
contents so `org-babel-gnuplot-process-vars' knows when to create a new
file.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-03 21:55         ` Nicolas Goaziou
@ 2017-01-03 23:06           ` Thierry Banel
  2017-01-04 22:36             ` Nicolas Goaziou
  0 siblings, 1 reply; 17+ messages in thread
From: Thierry Banel @ 2017-01-03 23:06 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode

Le 03/01/2017 22:55, Nicolas Goaziou a écrit :
> Hello,
>
> Thierry Banel <tbanelwebmin@free.fr> writes:
>
>> Here is a patch to avoid generating temporary files multiple times.
>>
>> There is no way to ensure a single call to
>> (org-babel-gnuplot-process-vars) without modifying ob-core.el. I don't
>> want to do that because I would have to change a lot of babel backends.
>> Thus, I come back to my first light patch.
>>
>> A 'param' list is passed around. It reflects the #+BEGIN_SRC header. My
>> patch changes it in-place from:
>>   (((:var data (3000) (2999) (2998) (2997) ...
>> to:
>>   (((:var data . "/tmp/babel-16991kSr/gnuplot-16991YBq") ...
>>
>> The 'param' list behaves as a cache. There is nothing wrong with that.
>> The worst thing that can happen is the caching no longer working in case
>> 'param' would be copied some day. Results would stay correct.
> Thank you.
>
> What is the benefit of this patch? I mean,
> `org-babel-gnuplot-process-vars' is already quite fast here. Do you have
> some benchmark for that?
The benefit is Babel Gnuplot running twice as fast on large Org tables
(thousands of rows). On small tables there is no real benefit. Two
temporary files are left over in /tmp. They have identical content: data
suitably formatted for Gnuplot. Creating such large temp files
out-weights any other Babel processing.

Granted, you have already speeded-up `org-babel-gnuplot-process-vars'
quite a lot by reworking `org-export-table-row-number'. Now, going down
from 4 seconds to 2 seconds on a 5000 rows table (on my computer) is
quite pleasant.

My patch is very light: `org-babel-gnuplot-process-vars' is the only
modified function, and the change involves only adding a (setcdr)
instruction to cache the result of a heavy processing.


>>  	(car pair) ;; variable name
>> -	(let* ((val (cdr pair)) ;; variable value
>> -	       (lp  (listp val)))
>> -	  (if lp
>> +	(let ((val (cdr pair))) ;; variable value
>> +	  (if (not (listp val))
>> +	      val
>> +	    (let ((temp-file (org-babel-temp-file "gnuplot-"))
>> +		  (first  (car val)))
>> +	      (setcdr pair temp-file) ;; <------ caching here
> It would be nice to expunge the comment a bit.

Yes sure. If the patch is accepted, I'll clean it.

> Another option would be to generate a file according to the hash of
> contents so `org-babel-gnuplot-process-vars' knows when to create a new
> file.
Your proposal provides an additional benefit: caching file generation
between several invocations of Babel. (The cache in my patch is intended
to be used within a single Babel invocation, and is then garbage
collected.). The drawback is that we need to go through all rows of the
table, compute the hash, just to discover that the hash was already
known. The purpose of the cache was precisely to avoid going through the
table again.

Your proposal involves substantial work. We may also want to extend it
to all other Babel backends (R, shell, C, etc.). I may help if enough
users need it.

Regards
Thierry

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-03 21:40       ` Thierry Banel
  2017-01-03 21:55         ` Nicolas Goaziou
@ 2017-01-04 17:32         ` Achim Gratz
  2017-01-04 20:29           ` Thierry Banel
  2017-01-04 23:15           ` Charles C. Berry
  1 sibling, 2 replies; 17+ messages in thread
From: Achim Gratz @ 2017-01-04 17:32 UTC (permalink / raw)
  To: emacs-orgmode

Thierry Banel writes:
> There is no way to ensure a single call to
> (org-babel-gnuplot-process-vars) without modifying ob-core.el. I don't
> want to do that because I would have to change a lot of babel backends.

But that is the right fix to apply, unless there is a reason for the
input vars to be processed multiple times.  I haven't looked at the
Babel code in the last two years, but generally I'd suggest that each
argument should only be processed once per Babel block since the second
processing could have unwanted side-effects.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf rackAttack:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-04 17:32         ` Achim Gratz
@ 2017-01-04 20:29           ` Thierry Banel
  2017-01-04 23:15           ` Charles C. Berry
  1 sibling, 0 replies; 17+ messages in thread
From: Thierry Banel @ 2017-01-04 20:29 UTC (permalink / raw)
  To: emacs-orgmode

Le 04/01/2017 18:32, Achim Gratz a écrit :
> Thierry Banel writes:
>> There is no way to ensure a single call to
>> (org-babel-gnuplot-process-vars) without modifying ob-core.el. I don't
>> want to do that because I would have to change a lot of babel backends.
> But that is the right fix to apply, unless there is a reason for the
> input vars to be processed multiple times.  I haven't looked at the
> Babel code in the last two years, but generally I'd suggest that each
> argument should only be processed once per Babel block since the second
> processing could have unwanted side-effects.
>
>

Absolutely! But not so easy.

Here is a simplified version of the involved functions:

#+BEGIN_SRC elisp
(defun org-babel-expand-body:gnuplot (body params)
   (let ((vars (org-babel-gnuplot-process-vars params))) ;; <-- 1st call
       (org-babel-variable-assignments:gnuplot params)))

(defun org-babel-variable-assignments:gnuplot (params)
   (org-babel-gnuplot-process-vars params))              ;; <-- 2nd call

(defun org-babel-gnuplot-process-vars (params)
  (... generate temp file ...))
#+END_SRC

Following the flow of calls, we can see that starting from
(org-babel-expand-body:gnuplot), the function
(org-babel-gnuplot-process-vars) is called twice.

I would like to pass `vars' around (which is the result of the first
call) to avoid the second call. To do that I need to add a parameter
`vars' to (org-babel-variable-assignments:gnuplot). Unfortunately I
cannot because the parameter of this function (and all functions
matching (org-babel-variable-assignments:*)) is enforced by the Babel core.

Therefore to pass information around, the only channel is the `params'
parameter. I use it as a cache in my patch.

Moreover, the above simplified ELisp sketch is not the whole story.
There is also the (org-babel-prep-session:gnuplot) function involved. I
have not yet investigated that. But, the `params' cache trick takes care
of this flow without having to understand it.

To sum-up: yes, ob-gnuplot.el is doing the double `params' processing.
But to avoid that without the cache trick, ob-core.el should be changed,
as well as all dependent ob-*.el backends. Lot of work. Or I may be
missing something...

Regards
Thierry

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-03 23:06           ` Thierry Banel
@ 2017-01-04 22:36             ` Nicolas Goaziou
  2017-01-05 20:47               ` Thierry Banel
  0 siblings, 1 reply; 17+ messages in thread
From: Nicolas Goaziou @ 2017-01-04 22:36 UTC (permalink / raw)
  To: Thierry Banel; +Cc: emacs-orgmode

Hello,

Thierry Banel <tbanelwebmin@free.fr> writes:

>>>  	(car pair) ;; variable name
>>> -	(let* ((val (cdr pair)) ;; variable value
>>> -	       (lp  (listp val)))
>>> -	  (if lp
>>> +	(let ((val (cdr pair))) ;; variable value
>>> +	  (if (not (listp val))
>>> +	      val
>>> +	    (let ((temp-file (org-babel-temp-file "gnuplot-"))
>>> +		  (first  (car val)))
>>> +	      (setcdr pair temp-file) ;; <------ caching here

[...]

> Your proposal provides an additional benefit: caching file generation
> between several invocations of Babel. (The cache in my patch is intended
> to be used within a single Babel invocation, and is then garbage
> collected.). The drawback is that we need to go through all rows of the
> table, compute the hash, just to discover that the hash was already
> known. The purpose of the cache was precisely to avoid going through the
> table again.

I'm not sure to understand.

I suggest to compute the hash of VAL before it is sent through
`org-babel-gnuplot-table-to-data', i.e., before `orgtbl-to-generic' is
called. There's no "going through the table" involved, is it?

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-04 17:32         ` Achim Gratz
  2017-01-04 20:29           ` Thierry Banel
@ 2017-01-04 23:15           ` Charles C. Berry
  2017-01-05 20:23             ` Thierry Banel
  1 sibling, 1 reply; 17+ messages in thread
From: Charles C. Berry @ 2017-01-04 23:15 UTC (permalink / raw)
  To: Achim Gratz; +Cc: emacs-orgmode

On Wed, 4 Jan 2017, Achim Gratz wrote:

> Thierry Banel writes:
>> There is no way to ensure a single call to
>> (org-babel-gnuplot-process-vars) without modifying ob-core.el. I don't
>> want to do that because I would have to change a lot of babel backends.
>
> But that is the right fix to apply, unless there is a reason for the
> input vars to be processed multiple times.  I haven't looked at the
> Babel code in the last two years, but generally I'd suggest that each
> argument should only be processed once per Babel block since the second
> processing could have unwanted side-effects.
>

I'm late to this party, but AFAICS input vars are processed just once.

Running this:

#+BEGIN_SRC emacs-lisp :var a=(setq runvar (+ 1 runvar))
a
#+END_SRC

increments runvar by one each time it is run.

So this seems not to be a general babel issue.

??

Chuck

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-04 23:15           ` Charles C. Berry
@ 2017-01-05 20:23             ` Thierry Banel
  0 siblings, 0 replies; 17+ messages in thread
From: Thierry Banel @ 2017-01-05 20:23 UTC (permalink / raw)
  To: emacs-orgmode

Le 05/01/2017 00:15, Charles C. Berry a écrit :
> On Wed, 4 Jan 2017, Achim Gratz wrote:
>
>
> I'm late to this party, but AFAICS input vars are processed just once.
>
> Running this:
>
> #+BEGIN_SRC emacs-lisp :var a=(setq runvar (+ 1 runvar))
> a
> #+END_SRC
>
> increments runvar by one each time it is run.
>
> So this seems not to be a general babel issue.
>
> ??
>
> Chuck
>
>

Right!
This is a Babel Gnuplot issue, not a Babel Core one.

My findings:
1- Babel Gnuplot generates two temporary files with identical content.
2- This consumes most of the time for large Org tables.
3- This is a Babel Gnuplot issue (only).
4- Babel Core put constraints on the parameters of functions
(xxx:gnuplot) (which is fine).

To bypass the second file generation under those constraints I have
found no other way than a temporary cache.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-04 22:36             ` Nicolas Goaziou
@ 2017-01-05 20:47               ` Thierry Banel
  2017-01-06  9:41                 ` Nicolas Goaziou
  0 siblings, 1 reply; 17+ messages in thread
From: Thierry Banel @ 2017-01-05 20:47 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode

Le 04/01/2017 23:36, Nicolas Goaziou a écrit :
>
>> Your proposal provides an additional benefit: caching file generation
>> between several invocations of Babel. (The cache in my patch is intended
>> to be used within a single Babel invocation, and is then garbage
>> collected.). The drawback is that we need to go through all rows of the
>> table, compute the hash, just to discover that the hash was already
>> known. The purpose of the cache was precisely to avoid going through the
>> table again.
> I'm not sure to understand.
>
> I suggest to compute the hash of VAL before it is sent through
> `org-babel-gnuplot-table-to-data', i.e., before `orgtbl-to-generic' is
> called. There's no "going through the table" involved, is it?
>
>

I have the same understanding.

The table is recorded in the `params' parameter in a structure similar to this:
  (((:var data (row1..) (row2..) (row3..) (row4..) ...

By "going through the table" I mean going through this structure. Remember the issue in the first place was a slow conversion of this huge structure into a temporary file. Now we need to convert this huge structure into a hash. Ok, let us assume hashing is faster than writing to disk.

Now, what do we do with this hash? Were do we record it? The obvious answer is: we record it in the name of the temporary file. Then, before generating a file we compute the hash and see if there is already a temporary file with this hash in its name.

It is heavier than my proposed `param' cache, but it provides an additional memory between two Babel invocations. I'm fine with that. It solves the issue. If we really need the additional feature (memory between two invocations), let's begin working on it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-05 20:47               ` Thierry Banel
@ 2017-01-06  9:41                 ` Nicolas Goaziou
  2017-01-06 18:24                   ` Thierry Banel
  0 siblings, 1 reply; 17+ messages in thread
From: Nicolas Goaziou @ 2017-01-06  9:41 UTC (permalink / raw)
  To: Thierry Banel; +Cc: emacs-orgmode

Hello,

Thierry Banel <tbanelwebmin@free.fr> writes:

> By "going through the table" I mean going through this structure.
> Remember the issue in the first place was a slow conversion of this
> huge structure into a temporary file. Now we need to convert this huge
> structure into a hash. Ok, let us assume hashing is faster than
> writing to disk.

Converting this huge structure into a temporary file is (relatively)
slow because of `orgtbl-to-generic', isn't it? Computing the hash
doesn't call this function.

Why do you expect computing the hash to be slow?

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: speeding up Babel Gnuplot
  2017-01-06  9:41                 ` Nicolas Goaziou
@ 2017-01-06 18:24                   ` Thierry Banel
  0 siblings, 0 replies; 17+ messages in thread
From: Thierry Banel @ 2017-01-06 18:24 UTC (permalink / raw)
  To: emacs-orgmode

Le 06/01/2017 10:41, Nicolas Goaziou a écrit :
> Hello,
>
> Thierry Banel <tbanelwebmin@free.fr> writes:
>
>> By "going through the table" I mean going through this structure.
>> Remember the issue in the first place was a slow conversion of this
>> huge structure into a temporary file. Now we need to convert this huge
>> structure into a hash. Ok, let us assume hashing is faster than
>> writing to disk.
> Converting this huge structure into a temporary file is (relatively)
> slow because of `orgtbl-to-generic', isn't it? Computing the hash
> doesn't call this function.
>
> Why do you expect computing the hash to be slow?
>
>

I don't. I "assume hashing is faster than writing to disk."

Regards
Thierry

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-01-06 18:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-28 20:33 speeding up Babel Gnuplot Thierry Banel
2016-12-29 20:04 ` Nicolas Goaziou
2016-12-29 20:34   ` Thierry Banel
2017-01-01 20:17 ` Thierry Banel
2017-01-01 23:34   ` Nicolas Goaziou
2017-01-02 20:11     ` Thierry Banel
2017-01-03 21:40       ` Thierry Banel
2017-01-03 21:55         ` Nicolas Goaziou
2017-01-03 23:06           ` Thierry Banel
2017-01-04 22:36             ` Nicolas Goaziou
2017-01-05 20:47               ` Thierry Banel
2017-01-06  9:41                 ` Nicolas Goaziou
2017-01-06 18:24                   ` Thierry Banel
2017-01-04 17:32         ` Achim Gratz
2017-01-04 20:29           ` Thierry Banel
2017-01-04 23:15           ` Charles C. Berry
2017-01-05 20:23             ` Thierry Banel

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).