emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [babel] R questions
@ 2009-12-04 22:31 Sébastien Vauban
  2009-12-05  0:45 ` Dan Davison
  0 siblings, 1 reply; 8+ messages in thread
From: Sébastien Vauban @ 2009-12-04 22:31 UTC (permalink / raw)
  To: emacs-orgmode-mXXj517/zsQ

Hello,

One of this questions is a bit border-line, but I'm still trying ;-)

I have this table generated by a script:

--8<---------------cut here---------------start------------->8---
#+results: abc2008
| "2008/1"  | -78.59 |   1627.24 |
| "2008/2"  | -80.17 |    700.33 |
| "2008/3"  | -80.17 |     879.8 |
| "2008/4"  | -80.17 | -25823.17 |
| "2008/5"  | -80.17 |   3570.75 |
| "2008/6"  | -81.77 |    2377.8 |
| "2008/7"  | -81.77 |    2889.4 |
| "2008/8"  | -81.77 |   2612.92 |
| "2008/9"  | -81.77 |   1585.21 |
| "2008/10" |  -83.4 |   1561.42 |
| "2008/11" |  -83.4 |   2189.17 |
| "2008/12" |     "" |        "" |
--8<---------------cut here---------------end--------------->8---

I want to draw the 12 months with the values side by side.

Problem #1: the "" in the last line hinder the generation of the graph. Format
error.

--8<---------------cut here---------------start------------->8---
#+srcname: expenses-bar-plot(abc = abc2008)
#+begin_src R :results file :file abc2008.pdf
    barplot(abc[,3], col = "red", main = "Profit and Loss 2008", las = 1, xlab
    = "Months", ylab = "EUR")
#+end_src
--8<---------------cut here---------------end--------------->8---

Problem #2: I don't know how to ask for drawing the 2 columns. I've tried
putting the arguments in a list, I've tried `cbind' (as read in one of the Org
papers), nothing made it. This is the border-line question.

The first one (at least) merits an answer, as it's a generic problem on
handling empty results. Is there some spec to consider empty result equivalent
to empty strings or to 0. How can we set it to be 0, here, instead of ""?  I
guess it's that the problem for R.

Best regards,
  Seb

-- 
Sébastien Vauban



_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode-mXXj517/zsQ@public.gmane.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [babel] R questions
  2009-12-04 22:31 [babel] R questions Sébastien Vauban
@ 2009-12-05  0:45 ` Dan Davison
  2009-12-08  9:50   ` Sébastien Vauban
  2009-12-08  9:58   ` Sébastien Vauban
  0 siblings, 2 replies; 8+ messages in thread
From: Dan Davison @ 2009-12-05  0:45 UTC (permalink / raw)
  To: Sébastien Vauban; +Cc: emacs-orgmode

Sébastien Vauban <wxhgmqzgwmuf@spammotel.com> writes:

> Hello,
>
> One of this questions is a bit border-line, but I'm still trying ;-)
>
> I have this table generated by a script:
>
> #+results: abc2008
> | "2008/1"  | -78.59 |   1627.24 |
> | "2008/2"  | -80.17 |    700.33 |
> | "2008/3"  | -80.17 |     879.8 |
> | "2008/4"  | -80.17 | -25823.17 |
> | "2008/5"  | -80.17 |   3570.75 |
> | "2008/6"  | -81.77 |    2377.8 |
> | "2008/7"  | -81.77 |    2889.4 |
> | "2008/8"  | -81.77 |   2612.92 |
> | "2008/9"  | -81.77 |   1585.21 |
> | "2008/10" |  -83.4 |   1561.42 |
> | "2008/11" |  -83.4 |   2189.17 |
> | "2008/12" |     "" |        "" |
>
> I want to draw the 12 months with the values side by side.
>
> Problem #1: the "" in the last line hinder the generation of the graph. Format
> error.

Missing values in R are represented by the value NA. If you change the
last line of your table to

| "2008/12" |     NA |        NA |

then it works [1], [2], [3].

>
> #+srcname: expenses-bar-plot(abc = abc2008)
> #+begin_src R :results file :file abc2008.pdf
>     barplot(abc[,3], col = "red", main = "Profit and Loss 2008", las = 1, xlab
>     = "Months", ylab = "EUR")
> #+end_src
>
> Problem #2: I don't know how to ask for drawing the 2 columns. I've tried

OK, so one point that is arguably relevant to this mailing list is that
when org tables are read into R, the object that is created in R is a
*data frame*. Not a matrix. (A data frame can have columns of
different types; matrices are all one type). [4]

So to solve your problem, you'd need to read the description of the
height argument in the help page for barplot (?barplot), noting that it
says "either a vector or matrix", and also noting that it says that bars
correspond to columns (not rows), thus realising that you need to
explicitly convert the relevant columns of the data frame to a matrix
and then transpose.

However, your two columns have rather different magnitude values and so
are not very well suited for plotting on the same scale. Below I
rescaled the first column by a factor of 20 so you can at least see the
bars.

#+srcname: expenses-bar-plot-two-columns(abc = abc2008)
#+begin_src R :file abc2008.png
  ## select the two columns, convert to matrix, transpose and rescale top row.
  x <- t(as.matrix(abc[,2:3])) * c(20,1)
  barplot(x, col = rep(c("red","blue"), ncol(x)), main = "Profit and Loss 2008", las = 1, xlab= "Months", ylab = "EUR", beside=TRUE)
#+end_src

Dan



Footnotes:

[1] Note no quotes around NA here. You asked a good question about
quoting in org-babel; it will be answered.

[2] I guess one could potentially think about dealing with missing
values more explicitly in org-babel. E.g. there could be a header arg
specifying what values are to be treatyed as missing. Nothing like that
exists currently.

[3] You might think that an alternative would be to
do something like this in R

abc[abc == "\"\""] <- NA

but the trouble is that with those double quotes present, R will
interpret the column as containing character data rather than numeric,
and things will not be pretty.

[4] org-babel uses orgtbl-to-tsv followed by read.table() to convert the
org table into a data.frame in R. A source of much confusion with
R-beginners is that by default, read.table converts character columns
into the *factor* data type. Note that org-babel currently uses
'as.is=TRUE' when calling read.table and therefore does *not* convert to
factor. This may avoid some confusion among users but is
memory-inefficient and misses out on other advantages of factors.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [babel] R questions
  2009-12-05  0:45 ` Dan Davison
@ 2009-12-08  9:50   ` Sébastien Vauban
  2009-12-08 16:00     ` Thomas S. Dye
  2009-12-08  9:58   ` Sébastien Vauban
  1 sibling, 1 reply; 8+ messages in thread
From: Sébastien Vauban @ 2009-12-08  9:50 UTC (permalink / raw)
  To: emacs-orgmode-mXXj517/zsQ

Hi Dan,

Dan Davison wrote:
> Sébastien Vauban <wxhgmqzgwmuf-geNee64TY+gS+FvcfC7Uqw@public.gmane.org> writes:
>>
>> I have this table generated by a script:
>>
>> #+results: abc2008
>> | "2008/1"  | -78.59 |   1627.24 |
>> | "2008/2"  | -80.17 |    700.33 |
>> | "2008/3"  | -80.17 |     879.8 |
>> | "2008/4"  | -80.17 | -25823.17 |
>> | "2008/5"  | -80.17 |   3570.75 |
>> | "2008/6"  | -81.77 |    2377.8 |
>> | "2008/7"  | -81.77 |    2889.4 |
>> | "2008/8"  | -81.77 |   2612.92 |
>> | "2008/9"  | -81.77 |   1585.21 |
>> | "2008/10" |  -83.4 |   1561.42 |
>> | "2008/11" |  -83.4 |   2189.17 |
>> | "2008/12" |     "" |        "" |
>>
>> I want to draw the 12 months with the values side by side.
>>
>> Problem #1: the "" in the last line hinder the generation of the graph.
>> Format error.
>
> Missing values in R are represented by the value NA. If you change the last
> line of your table to
>
> | "2008/12" |     NA |        NA |
>
> then it works [1], [2], [3].
>
> [1] Note no quotes around NA here. You asked a good question about quoting
>     in org-babel; it will be answered.

OK.


> [2] I guess one could potentially think about dealing with missing values
>     more explicitly in org-babel. E.g. there could be a header arg
>     specifying what values are to be treatyed as missing. Nothing like that
>     exists currently.

I guess such a feature would be required on the long term. Of course, even
specifying what would be the needed behavior is already difficult, I think.
One must have good knowledge of the multiple languages and environments, and
try to abstract the best behavior out of these.

Side note -- I know, for example, that there is an option in Access to let it
consider the empty string ('') as the NULL value, or not. Clear.

But what's a "NA" value in general?  Is 0 always a meaningful value as
numeric?  Context-sensitive...

Side question -- You talked of some way to remember the bugs or features to be
added to Org. Same question here: where will these little things be added in
order to avoid forgetting them?  Is it in one of the Worg documents itself?


> [3] You might think that an alternative would be to do something like this
>     in R
>
> abc[abc == "\"\""] <- NA
>
> but the trouble is that with those double quotes present, R will interpret
> the column as containing character data rather than numeric, and things will
> not be pretty.

I believe you...


>> #+srcname: expenses-bar-plot(abc = abc2008)
>> #+begin_src R :results file :file abc2008.pdf
>>     barplot(abc[,3], col = "red", main = "Profit and Loss 2008", las = 1, xlab
>>     = "Months", ylab = "EUR")
>> #+end_src
>>
>> Problem #2: I don't know how to ask for drawing the 2 columns. I've tried
>
> OK, so one point that is arguably relevant to this mailing list is that when
> org tables are read into R, the object that is created in R is a *data
> frame*. Not a matrix. (A data frame can have columns of different types;
> matrices are all one type). [4]
>
> [4] org-babel uses orgtbl-to-tsv followed by read.table() to convert the
> org table into a data.frame in R. A source of much confusion with
> R-beginners is that by default, read.table converts character columns into
> the *factor* data type. Note that org-babel currently uses 'as.is=TRUE' when
> calling read.table and therefore does *not* convert to factor. This may
> avoid some confusion among users but is memory-inefficient and misses out on
> other advantages of factors.
>
> So to solve your problem, you'd need to read the description of the height
> argument in the help page for barplot (?barplot), noting that it says
> "either a vector or matrix", and also noting that it says that bars
> correspond to columns (not rows), thus realising that you need to explicitly
> convert the relevant columns of the data frame to a matrix and then
> transpose.
>
> However, your two columns have rather different magnitude values and so are
> not very well suited for plotting on the same scale. Below I rescaled the
> first column by a factor of 20 so you can at least see the bars.
>
> #+srcname: expenses-bar-plot-two-columns(abc = abc2008)
> #+begin_src R :file abc2008.png
>   ## select the two columns, convert to matrix, transpose and rescale top
>   ## row.
>   x <- t(as.matrix(abc[,2:3])) * c(20,1)
>   barplot(x, col = rep(c("red","blue"), ncol(x)), main = "Profit and Loss
>   2008", las = 1, xlab= "Months", ylab = "EUR", beside=TRUE)
> #+end_src

Thanks a lot for the enlightened explanation, and the correction to be brought
to the R code.

Best regards,
  Seb

-- 
Sébastien Vauban



_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode-mXXj517/zsQ@public.gmane.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [babel] R questions
  2009-12-05  0:45 ` Dan Davison
  2009-12-08  9:50   ` Sébastien Vauban
@ 2009-12-08  9:58   ` Sébastien Vauban
  2009-12-08 16:26     ` Dan Davison
  1 sibling, 1 reply; 8+ messages in thread
From: Sébastien Vauban @ 2009-12-08  9:58 UTC (permalink / raw)
  To: emacs-orgmode-mXXj517/zsQ

Hi Dan and Eric,

I have a side question, but I think this is of general interest for others as
well.

I almost don't know GnuPlot neither R -- yes, before seeing the light, I used
Excel for all my graphs.

So, my question is: for typical small plots (piecharts and barplots), is there
any Org-babel reason that would advocate for doing it in one of the two above
language preferably than in the other one?

Reasons could be better integration (for editing or (re-)generating the
graphs), simpler semantics (with NA values, for example), etc.

Best regards,
  Seb

-- 
Sébastien Vauban



_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode-mXXj517/zsQ@public.gmane.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [babel] R questions
  2009-12-08  9:50   ` Sébastien Vauban
@ 2009-12-08 16:00     ` Thomas S. Dye
  2009-12-08 21:37       ` Sébastien Vauban
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas S. Dye @ 2009-12-08 16:00 UTC (permalink / raw)
  To: Sébastien Vauban; +Cc: emacs-orgmode

Aloha Sebasien,

On Dec 7, 2009, at 11:50 PM, Sébastien Vauban wrote:

> But what's a "NA" value in general?  Is 0 always a meaningful value as
> numeric?  Context-sensitive..
>

NA is a logical constant of length 1 which contains a missing value  
indicator.  Whether or not 0 is a meaningful value as numeric depends  
on your data and the questions you are asking of it.  You don't ask  
this question, but if I read this thread correctly and you are trying  
to workaround a data input problem with R in Org-babel, then replacing  
missing values with 0 in a numeric context to get around the Org-babel  
problem is NOT a good idea.

HTH,
Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [babel] R questions
  2009-12-08  9:58   ` Sébastien Vauban
@ 2009-12-08 16:26     ` Dan Davison
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Davison @ 2009-12-08 16:26 UTC (permalink / raw)
  To: Sébastien Vauban; +Cc: emacs-orgmode

Sébastien Vauban <wxhgmqzgwmuf@spammotel.com> writes:

> Hi Dan and Eric,
>
> I have a side question, but I think this is of general interest for others as
> well.
>
> I almost don't know GnuPlot neither R -- yes, before seeing the light, I used
> Excel for all my graphs.
>
> So, my question is: for typical small plots (piecharts and barplots), is there
> any Org-babel reason that would advocate for doing it in one of the two above
> language preferably than in the other one?

> Reasons could be better integration (for editing or (re-)generating the
> graphs), simpler semantics (with NA values, for example), etc.

Org-babel wants to support both languages as well as possible. So there
is no such purely org-babel reason; or if there is, there shouldn't be,
so tell us about it and we'll try to fix it.

With respect to graphics, I'm sure that each one has things it can do
better than the other (e.g. I get the impression that gnuplot is better
for "3D" graphics).

But yes, if there was someone who (a) didn't know either language, and
(b) were limited in the amount of time they could devote to learning
computer languages, and (c) thought they might one day have some use for
some of the things that R can do and gnuplot can't, then I would suggest
that they start using R over gnuplot.

R is a fully-featured programming language with a very large amount of
numerical/statistical/scientific procedures available. (2094 add-on
packages currently listed at http://cran.r-project.org/). One wouldn't
normally compare R with gnuplot; more appropriate comparisons might be
to the scientific libraries for python, perl and C++, and to things like
Excel, SAS, and Matlab, Mathematica (although R is not a symbolic
mathematics engine).

Dan

>
>
> Best regards,
>   Seb

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [babel] R questions
  2009-12-08 16:00     ` Thomas S. Dye
@ 2009-12-08 21:37       ` Sébastien Vauban
  0 siblings, 0 replies; 8+ messages in thread
From: Sébastien Vauban @ 2009-12-08 21:37 UTC (permalink / raw)
  To: emacs-orgmode-mXXj517/zsQ

Hi Thomas,

"Thomas S. Dye" wrote:
> On Dec 7, 2009, at 11:50 PM, Sébastien Vauban wrote:
>
>>> [2] I guess one could potentially think about dealing with missing values
>>>     more explicitly in org-babel. E.g. there could be a header arg
>>>     specifying what values are to be treatyed as missing. Nothing like
>>>     that exists currently.
>>
>> I guess such a feature would be required on the long term. Of course, even
>> specifying what would be the needed behavior is already difficult, I think.
>> One must have good knowledge of the multiple languages and environments,
>> and try to abstract the best behavior out of these.
>>
>> Side note -- I know, for example, that there is an option in Access to let
>> it consider the empty string ('') as the NULL value, or not. Clear.
>>
>> But what's a "NA" value in general?  Is 0 always a meaningful value as
>> numeric?  Context-sensitive..
>
> NA is a logical constant of length 1 which contains a missing value
> indicator. Whether or not 0 is a meaningful value as numeric depends on your
> data and the questions you are asking of it. You don't ask this question,

?  I thought I addressed that when asking (to myself) "Is 0 always a
meaningful value as numeric?" and answering [that it certainly is]
"context-sensitive.."


> but if I read this thread correctly and you are trying to workaround a data
> input problem with R in Org-babel,

No, you misread, or I mis-wrote ;-)

I wasn't speaking of R only, saying that "such a feature would be required on
the long term [... for] the multiple languages".

Thinking at shell-script (with empty strings), SQL code (with empty strings
and NULL values), etc.


> then replacing missing values with 0 in a numeric context to get around the
> Org-babel problem is NOT a good idea.

Implementing a fixed interpretation is NOT a good idea. I share your point of
view.

My comments were:

- I think we must be able to write a rule for interpreting "empty" (whatever
  it means) values;

- We should think at what's needed to cover the current and future needs, not
  focusing on one specific language (R), but thinking at all of them (shell
  commands, SQL, etc.).

Best regards,
  Seb

-- 
Sébastien Vauban



_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode-mXXj517/zsQ@public.gmane.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [babel] R questions
@ 2009-12-08 22:40 Thomas S. Dye
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas S. Dye @ 2009-12-08 22:40 UTC (permalink / raw)
  To: emacs-orgmode list


[-- Attachment #1.1: Type: text/plain, Size: 3076 bytes --]

Hi Sebastien,

On Dec 8, 2009, at 11:37 AM, Sébastien Vauban wrote:

> Hi Thomas,
>
> "Thomas S. Dye" wrote:
>> On Dec 7, 2009, at 11:50 PM, Sébastien Vauban wrote:
>>
>>>> [2] I guess one could potentially think about dealing with  
>>>> missing values
>>>>    more explicitly in org-babel. E.g. there could be a header arg
>>>>    specifying what values are to be treatyed as missing. Nothing  
>>>> like
>>>>    that exists currently.
>>>
>>> I guess such a feature would be required on the long term. Of  
>>> course, even
>>> specifying what would be the needed behavior is already difficult,  
>>> I think.
>>> One must have good knowledge of the multiple languages and  
>>> environments,
>>> and try to abstract the best behavior out of these.
>>>
>>> Side note -- I know, for example, that there is an option in  
>>> Access to let
>>> it consider the empty string ('') as the NULL value, or not. Clear.
>>>
>>> But what's a "NA" value in general?  Is 0 always a meaningful  
>>> value as
>>> numeric?  Context-sensitive..
>>
>> NA is a logical constant of length 1 which contains a missing value
>> indicator. Whether or not 0 is a meaningful value as numeric  
>> depends on your
>> data and the questions you are asking of it. You don't ask this  
>> question,
>
> ?  I thought I addressed that when asking (to myself) "Is 0 always a
> meaningful value as numeric?" and answering [that it certainly is]
> "context-sensitive.."
>
>
>> but if I read this thread correctly and you are trying to  
>> workaround a data
>> input problem with R in Org-babel,
>
> No, you misread, or I mis-wrote ;-)
>
> I wasn't speaking of R only, saying that "such a feature would be  
> required on
> the long term [... for] the multiple languages".
>
> Thinking at shell-script (with empty strings), SQL code (with empty  
> strings
> and NULL values), etc.
>
>
>> then replacing missing values with 0 in a numeric context to get  
>> around the
>> Org-babel problem is NOT a good idea.
>
> Implementing a fixed interpretation is NOT a good idea. I share your  
> point of
> view.
>
> My comments were:
>
> - I think we must be able to write a rule for interpreting  
> "empty" (whatever
>  it means) values;
>
> - We should think at what's needed to cover the current and future  
> needs, not
>  focusing on one specific language (R), but thinking at all of them  
> (shell
>  commands, SQL, etc.).
>
> Best regards,
>  Seb
>
> -- 
> Sébastien Vauban

I agree with you on the importance of having some way to represent  
missing values in Org-babel that can be translated cleanly and  
transparently to the representations used by specific languages.

I was responding to one part of your longer message in the context of  
the message subject, "R questions."  I see now that you were asking a  
more general question.  Mea culpa.

All the best,
Tom

Thomas S. Dye, Ph.D.
T. S. Dye & Colleagues, Archaeologists, Inc.
Phone: (808) 529-0866 Fax: (808) 529-0884
http://www.tsdye.com


[-- Attachment #1.2: Type: text/html, Size: 7555 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-12-08 22:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-04 22:31 [babel] R questions Sébastien Vauban
2009-12-05  0:45 ` Dan Davison
2009-12-08  9:50   ` Sébastien Vauban
2009-12-08 16:00     ` Thomas S. Dye
2009-12-08 21:37       ` Sébastien Vauban
2009-12-08  9:58   ` Sébastien Vauban
2009-12-08 16:26     ` Dan Davison
  -- strict thread matches above, loose matches on Subject: below --
2009-12-08 22:40 Thomas S. Dye

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).