From: tsd@tsdye.com (Thomas S. Dye)
To: Eric Schulte <eric.schulte@gmx.com>
Cc: Michael Hannon <jm_hannon@yahoo.com>,
Org-Mode List <emacs-orgmode@gnu.org>
Subject: Re: Babel: communicating irregular data to R source-code block
Date: Mon, 23 Apr 2012 06:46:50 -1000 [thread overview]
Message-ID: <m1y5pmz98l.fsf@tsdye.com> (raw)
In-Reply-To: <87ipgrn4by.fsf@gmx.com> (Eric Schulte's message of "Sun, 22 Apr 2012 11:58:40 -0400")
Hi Eric,
Eric Schulte <eric.schulte@gmx.com> writes:
> tsd@tsdye.com (Thomas S. Dye) writes:
>
>> Aloha Michael,
>>
>> Michael Hannon <jm_hannon@yahoo.com> writes:
>>
>>> Greetings. I'm sitting in on a weekly, informal, "brown-bag" seminar on data
>>> technologies in statistics. There are more people attending the seminar than
>>> there are weeks in which to give talks, so I may get by with being my usual,
>>> passive-slug self.
>>>
>>> But I thought it might be useful to have a contingency plan and decided that
>>> giving a brief talk about Babel might be useful/instructive. I thought (and
>>> think) that mushing together (with attribution) some of the content of the
>>> paper [1] by The Gang of Four and the content of Eric's talk [2] might be a
>>> good approach. (BTW, if this isn't legal, desirable, permissible, etc., this
>>> would be a good time to tell me.)
>>>
>
> I would be happy for you to re-use these materials.
>
>>>
>>> I liked the Pascal's Triangle example (which morphed from elisp to Python, or
>>> vice versa, in the two references), but I was afraid that the elisp routine
>>> "pst-check", used as a check on the correctness of the previously-generated
>>> Pascal's triangle, might be too esoteric for this audience, not to mention me.
>>> (The recursive Fibonacci function is virtually identical in all languages,
>>> but the second part is more obscure.)
>>>
>
> I was giving a presentation to a local lisp/scheme user group, so I
> figured I'd spare them the pain of trying to read python code :).
>
>>>
>>> I thought it should be possible to use R to do the same sanity check, as R
>>> would be much more-familiar to this audience (and its use would still
>>> demonstrate the meta-language feature of Babel).
>>>
>>> Unfortunately, I haven't been able to find a way to communicate the output of
>>> the Pascal's Triangle example to an R source-code block. The gist of the
>>> problem seems to be that regardless of how I try to grab the data (scan,
>>> readLines, etc.) Babel always ends up trying to read a data frame (table) and
>>> I get an error similar to:
>>>
>
> I present some options below specific to Tom's discussion, but another
> option may be to use the ":results output" option on a python code block
> which prints the table to STDOUT, and then use something line readLines
> to read from the resulting string into R.
>
I didn't have any luck with :results output, but didn't spend much time
trying to figure it out.
>>>
>>> <<<<<<
>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>>> : line 1 did not have 5 elements
>>>
>>> Enter a frame number, or 0 to exit
>>>
>>> 1: read.table("/tmp/babel-3780tje/R-import-3780Akj", header = FALSE, row.names
>>> = NULL, sep = "
>>>>>>>>>
>>>
>>> If I construct a table "by hand" with all of the cells occupied, everything
>>> goes OK. For instance:
>>>
>>> <<<<<<
>>> #+TBLNAME: some-junk
>>> | 1 | 0 | 0 | 0 |
>>> | 1 | 1 | 0 | 0 |
>>> | 1 | 2 | 1 | 0 |
>>> | 1 | 3 | 3 | 1 |
>>>
>>> #+NAME: read-some-junk(sj_input=some-junk)
>>> #+BEGIN_SRC R
>>>
>>> rowSums(sj_input)
>>>
>>> #+END_SRC
>>>
>>> #+RESULTS: read-some-junk
>>> | 1 |
>>> | 2 |
>>> | 4 |
>>> | 8 |
>>>>>>>>>
>>>
>>> But the following gives the kind of error I described above:
>>>
>>> <<<<<<
>>> #+name: pascals_triangle
>>> #+begin_src python :var n=5 :exports none :return pascals_triangle(5)
>>> def pascals_triangle(n):
>>> if n == 0:
>>> return [[1]]
>>> prev_triangle = pascals_triangle(n-1)
>>> prev_row = prev_triangle[n-1]
>>> this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
>>> return prev_triangle + [this_row]
>>>
>>> pascals_triangle(n)
>>> #+end_src
>>
>> A few things are wrong at this point. It seems the JSS article has
>> an error in the header of the pascals_triangle source block. AFAIK
>> there is no header argument :return. I don't know how :return
>> pascals_triangle(5) got there, but am fairly certain it shouldn't be.
>>
>
> The :return header argument *is* a supported header argument of python
> code blocks and is not an error. The python code block should run w/o
> error and without the extra "return pascals_triangle(n)" at the bottom.
> The following works for me.
>
> #+name: pascals_triangle
> #+begin_src python :var n=5 :exports none :return pascals_triangle(5)
> def pascals_triangle(n):
> if n == 0:
> return [[1]]
> prev_triangle = pascals_triangle(n-1)
> prev_row = prev_triangle[n-1]
> this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
> return prev_triangle + [this_row]
>
> #+end_src
>
> #+RESULTS: pascals_triangle
> | 1 | | | | | |
> | 1 | 1 | | | | |
> | 1 | 2 | 1 | | | |
> | 1 | 3 | 3 | 1 | | |
> | 1 | 4 | 6 | 4 | 1 | |
> | 1 | 5 | 10 | 10 | 5 | 1 |
>
> [...]
I'm beginning to see why you have strong feelings about python. In the
code above, the blank line before #+end_src is necessary and must not
contain any spaces, and :var n can be set to anything, since it is
declared for initialization only.
The code in the JSS article doesn't run for me with a recent Org-mode
unless I add a blank line before #+end_src, or remove the :return header
argument. If I remove the :return header argument, then the need for
the blank line goes away. The following code block seems to work:
#+name: pascals-triangle
#+begin_src python :var n=2 :exports none
def pascals_triangle(n):
if n == 0:
return [[1]]
prev_triangle = pascals_triangle(n-1)
prev_row = prev_triangle[n-1]
this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
return prev_triangle + [this_row]
return pascals_triangle(n)
#+end_src
#+RESULTS: pascals-triangle
| 1 | | |
| 1 | 1 | |
| 1 | 2 | 1 |
I'm guessing that the need for a blank line when using :results has
arisen since the JSS article was published, because the article was
generated from source code and didn't show any errors.
If I have this right (a big if), then might it be possible to
re-establish the old behavior so the JSS code works?
>>
>> I vaguely remember that it once was possible to pass variables in
>> through the name line, but I couldn't find this syntax in some fairly
>> recent documentation.
>
> This style of passing arguments is still supported, but not necessarily
> encouraged by the documentation.
>
>> It does appear to work still using a recent Org-mode. If I rename the
>> results and then pass that to the source code block, all is well.
>>
>> #+RESULTS: pascals-tri
>> | 1 | | | | | |
>> | 1 | 1 | | | | |
>> | 1 | 2 | 1 | | | |
>> | 1 | 3 | 3 | 1 | | |
>> | 1 | 4 | 6 | 4 | 1 | |
>> | 1 | 5 | 10 | 10 | 5 | 1 |
>>
>>
>> #+name: pst-checkR(p=pascals-tri)
>> #+BEGIN_SRC R
>> p
>> #+END_SRC
>>
>> #+RESULTS: pst-checkR
>>
>> | 1 | nil | nil | nil | nil | nil |
>> | 1 | 1 | nil | nil | nil | nil |
>> | 1 | 2 | 1 | nil | nil | nil |
>> | 1 | 3 | 3 | 1 | nil | nil |
>> | 1 | 4 | 6 | 4 | 1 | nil |
>> | 1 | 5 | 10 | 10 | 5 | 1 |
>>
>> This looks like a bug to me, but Eric S. will know better what might be
>> going on.
>
> The above is due to the inability of R (or at least of the read.table
> function) to read in tables with different row length. The process of
> writing to an Org-mode table and *then* referencing that table as Tom
> suggests above has the side effect of filling in blank spots in the
> final exported table, turning what would otherwise be something like
>
> 1
> 1 1
> 1 2 1
>
> into something like
>
> 1 "" ""
> 1 1 ""
> 1 2 1
>
Thanks for this explanation. It makes sense that mapping a python data
structure to an R data structure would involve an intermediate
representation.
All the best,
Tom
> You could also use a function like the following to explicitly fill in
> these missing lines.
>
> #+name: padded_pascals_triangle
> #+begin_src emacs-lisp :var data=pascals_triangle
> (let ((max-length (apply #'max (mapcar #'length data))))
> (mapcar (lambda (row)
> (append row (make-vector (- max-length (length row)) "") nil))
> data))
> #+end_src
>
>> I can't do much more than this, but I'm optimistic things will be
>> sorted out before your turn to speak at the seminar rolls around.
>>
>> Thanks for bringing the error in the JSS article to light.
>>
>> All the best,
>> Tom
>>
>
> I often have to explicitly convert data read into R code blocks as a
> table into some other data structure like a vector or a matrix. I run
> into this myself when trying to use the statistical functions of R. It
> generally takes a while to look up the function to do the conversion,
> but I imagine that there is a reason why people who know more R than I
> do chose to make tables the default data type for data read into R
> blocks.
>
> Best,
>
> Combining the examples above yields the following,
>
>
> #+name: pascals_triangle
> #+begin_src python :var n=5 :exports none :return pascals_triangle(5) :results vector
> def pascals_triangle(n):
> if n == 0:
> return [[1]]
> prev_triangle = pascals_triangle(n-1)
> prev_row = prev_triangle[n-1]
> this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
> return prev_triangle + [this_row]
>
> #+end_src
>
> #+name: padded_pascals_triangle
> #+begin_src emacs-lisp :var data=pascals_triangle
> (let ((max-length (apply #'max (mapcar #'length data))))
> (mapcar (lambda (row)
> (append row (make-vector (- max-length (length row)) "") nil))
> data))
> #+end_src
>
> #+begin_src R :var data=padded_pascals_triangle
> data
> #+end_src
>
> #+RESULTS:
> | 1 | nil | nil | nil | nil | nil |
> | 1 | 1 | nil | nil | nil | nil |
> | 1 | 2 | 1 | nil | nil | nil |
> | 1 | 3 | 3 | 1 | nil | nil |
> | 1 | 4 | 6 | 4 | 1 | nil |
> | 1 | 5 | 10 | 10 | 5 | 1 |
>
>
>>
>>>>>>>>>
>>>
>>> Note that I don't really want to do rowSums in this case. I'm just trying to
>>> demonstrate the error.
>>>
>>> Of course, it's clear that the first line does NOT contain five elements, nor
>>> does the second, etc., as all of the above-diagonal elements are blanks.
>>>
>>> But I've been unable to find an R input function that doesn't end up treating
>>> the source data as a table, i.e., in the context of Babel source blocks -- R
>>> is "happy" to read a lower-diagonal structure. See the appendix for an
>>> example.
>>>
>>> Any suggestions? Note that I'm happy to acknowledge that my own ignorance of
>>> R and/or Babel might be the source of the problem. If so, please enlighten
>>> me.
>>>
>>> Thanks.
>>>
>>> -- Mike
>>>
>>> [1] http://www.jstatsoft.org/v46/i03
>>> [2] https://github.com/eschulte/babel-presentation
>>>
>>> <<<<<<
>>> Appendix
>>> --------
>>>
>>>
>>> $ cat pascal.dat
>>> 1
>>> 1 1
>>> 1 2 1
>>> 1 3 3 1
>>> 1 4 6 4 1
>>>
>>> $ R --vanilla < pascal.R
>>>
>>> R version 2.15.0 (2012-03-30)
>>> Copyright (C) 2012 The R Foundation for Statistical Computing
>>> ISBN 3-900051-07-0
>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>> .
>>> .
>>> .
>>>
>>>> x <- readLines("pascal.dat")
>>>> x
>>> [1] "1" "1 1" "1 2 1" "1 3 3 1" "1 4 6 4 1"
>>>> str(x)
>>> chr [1:5] "1" "1 1" "1 2 1" "1 3 3 1" "1 4 6 4 1"
>>>>
>>>> y <- scan("pascal.dat")
>>> Read 15 items
>>>> y
>>> [1] 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1
>>>> str(y)
>>> num [1:15] 1 1 1 1 2 1 1 3 3 1 ...
>>>>
>>>> z <- read.table("pascal.dat", header=FALSE)
>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
>>> line 1 did not have 5 elements
>>> Calls: read.table -> scan
>>> Execution halted
>>>
>>>
--
Thomas S. Dye
http://www.tsdye.com
next prev parent reply other threads:[~2012-04-23 16:47 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-21 20:17 Babel: communicating irregular data to R source-code block Michael Hannon
2012-04-22 0:44 ` Thomas S. Dye
2012-04-22 15:58 ` Eric Schulte
2012-04-23 16:46 ` Thomas S. Dye [this message]
2012-04-23 15:41 ` Eric Schulte
2012-04-23 19:17 ` Thomas S. Dye
2012-04-23 22:24 ` Michael Hannon
2012-04-23 21:05 ` Eric Schulte
2012-04-24 0:23 ` Thomas S. Dye
2012-04-23 22:55 ` Eric Schulte
2012-04-24 6:44 ` Thomas S. Dye
2012-04-24 7:07 ` Michael Hannon
2012-04-24 17:18 ` Thomas S. Dye
2012-04-24 19:23 ` Thomas S. Dye
2012-04-25 23:52 ` Thomas S. Dye
2012-04-26 2:06 ` Michael Hannon
2012-04-26 6:34 ` Thomas S. Dye
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1y5pmz98l.fsf@tsdye.com \
--to=tsd@tsdye.com \
--cc=emacs-orgmode@gnu.org \
--cc=eric.schulte@gmx.com \
--cc=jm_hannon@yahoo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).