emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Babel: communicating irregular data to R source-code block


From: Eric Schulte
Subject: Re: [O] Babel: communicating irregular data to R source-code block
Date: Sun, 22 Apr 2012 11:58:40 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1.50 (gnu/linux)

address@hidden (Thomas S. Dye) writes:

> Aloha Michael,
>
> Michael Hannon <address@hidden> writes:
>
>> Greetings.  I'm sitting in on a weekly, informal, "brown-bag" seminar on data
>> technologies in statistics.  There are more people attending the seminar than
>> there are weeks in which to give talks, so I may get by with being my usual,
>> passive-slug self.
>>
>> But I thought it might be useful to have a contingency plan and decided that
>> giving a brief talk about Babel might be useful/instructive.  I thought (and
>> think) that mushing together (with attribution) some of the content of the
>> paper [1] by The Gang of Four and the content of Eric's talk [2] might be a
>> good approach.  (BTW, if this isn't legal, desirable, permissible, etc., this
>> would be a good time to tell me.)
>>

I would be happy for you to re-use these materials.

>>
>> I liked the Pascal's Triangle example (which morphed from elisp to Python, or
>> vice versa, in the two references), but I was afraid that the elisp routine
>> "pst-check", used as a check on the correctness of the previously-generated
>> Pascal's triangle, might be too esoteric for this audience, not to mention 
>> me.
>> (The recursive Fibonacci function is virtually identical in all languages,
>> but the second part is more obscure.)
>>

I was giving a presentation to a local lisp/scheme user group, so I
figured I'd spare them the pain of trying to read python code :).

>>
>> I thought it should be possible to use R to do the same sanity check, as R
>> would be much more-familiar to this audience (and its use would still
>> demonstrate the meta-language feature of Babel).
>>
>> Unfortunately, I haven't been able to find a way to communicate the output of
>> the Pascal's Triangle example to an R source-code block.  The gist of the
>> problem seems to be that regardless of how I try to grab the data (scan,
>> readLines, etc.) Babel always ends up trying to read a data frame (table) and
>> I get an error similar to:
>>

I present some options below specific to Tom's discussion, but another
option may be to use the ":results output" option on a python code block
which prints the table to STDOUT, and then use something line readLines
to read from the resulting string into R.

>>
>> <<<<<<
>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>> : line 1 did not have 5 elements
>>
>> Enter a frame number, or 0 to exit   
>>
>> 1: read.table("/tmp/babel-3780tje/R-import-3780Akj", header = FALSE, 
>> row.names
>> = NULL, sep = "
>>>>>>>>
>>
>> If I construct a table "by hand" with all of the cells occupied, everything
>> goes OK.  For instance:
>>
>> <<<<<<
>> #+TBLNAME: some-junk
>> | 1 | 0 | 0 | 0 |
>> | 1 | 1 | 0 | 0 |
>> | 1 | 2 | 1 | 0 |
>> | 1 | 3 | 3 | 1 | 
>>
>> #+NAME: read-some-junk(sj_input=some-junk)
>> #+BEGIN_SRC R
>>
>> rowSums(sj_input)
>>
>> #+END_SRC  
>>
>> #+RESULTS: read-some-junk
>> | 1 |
>> | 2 |
>> | 4 |
>> | 8 |
>>>>>>>>
>>
>> But the following gives the kind of error I described above:
>>
>> <<<<<<
>> #+name: pascals_triangle
>> #+begin_src python :var n=5 :exports none :return pascals_triangle(5)
>> def pascals_triangle(n):
>>     if n == 0:
>>         return [[1]]
>>     prev_triangle = pascals_triangle(n-1)
>>     prev_row = prev_triangle[n-1]
>>     this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
>>     return prev_triangle + [this_row]
>>
>> pascals_triangle(n)
>> #+end_src
>
> A few things are wrong at this point.  It seems the JSS article has
> an error in the header of the pascals_triangle source block.  AFAIK
> there is no header argument :return.  I don't know how :return
> pascals_triangle(5) got there, but am fairly certain it shouldn't be.
>

The :return header argument *is* a supported header argument of python
code blocks and is not an error.  The python code block should run w/o
error and without the extra "return pascals_triangle(n)" at the bottom.
The following works for me.

#+name: pascals_triangle
#+begin_src python :var n=5 :exports none :return pascals_triangle(5)
def pascals_triangle(n):
    if n == 0:
        return [[1]]
    prev_triangle = pascals_triangle(n-1)
    prev_row = prev_triangle[n-1]
    this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
    return prev_triangle + [this_row]

#+end_src

#+RESULTS: pascals_triangle
| 1 |   |    |    |   |   |
| 1 | 1 |    |    |   |   |
| 1 | 2 |  1 |    |   |   |
| 1 | 3 |  3 |  1 |   |   |
| 1 | 4 |  6 |  4 | 1 |   |
| 1 | 5 | 10 | 10 | 5 | 1 |

[...]
>
> I vaguely remember that it once was possible to pass variables in
> through the name line, but I couldn't find this syntax in some fairly
> recent documentation.

This style of passing arguments is still supported, but not necessarily
encouraged by the documentation.

> It does appear to work still using a recent Org-mode.  If I rename the
> results and then pass that to the source code block, all is well.
>
> #+RESULTS: pascals-tri
> | 1 |   |    |    |   |   |
> | 1 | 1 |    |    |   |   |
> | 1 | 2 |  1 |    |   |   |
> | 1 | 3 |  3 |  1 |   |   |
> | 1 | 4 |  6 |  4 | 1 |   |
> | 1 | 5 | 10 | 10 | 5 | 1 |
>
>   
> #+name: pst-checkR(p=pascals-tri)
> #+BEGIN_SRC R
> p
> #+END_SRC
>
> #+RESULTS: pst-checkR
>
> | 1 | nil | nil | nil | nil | nil |
> | 1 |   1 | nil | nil | nil | nil |
> | 1 |   2 |   1 | nil | nil | nil |
> | 1 |   3 |   3 |   1 | nil | nil |
> | 1 |   4 |   6 |   4 | 1   | nil |
> | 1 |   5 |  10 |  10 | 5   | 1   |
>
> This looks like a bug to me, but Eric S. will know better what might be
> going on.

The above is due to the inability of R (or at least of the read.table
function) to read in tables with different row length.  The process of
writing to an Org-mode table and *then* referencing that table as Tom
suggests above has the side effect of filling in blank spots in the
final exported table, turning what would otherwise be something like

1
1  1
1  2  1

into something like

1  ""  ""
1   1  ""
1   2  1

You could also use a function like the following to explicitly fill in
these missing lines.

#+name: padded_pascals_triangle
#+begin_src emacs-lisp :var data=pascals_triangle
  (let ((max-length (apply #'max (mapcar #'length data))))
    (mapcar (lambda (row)
              (append row (make-vector (- max-length (length row)) "") nil))
            data))
#+end_src

> I can't do much more than this, but I'm optimistic things will be
> sorted out before your turn to speak at the seminar rolls around.
>
> Thanks for bringing the error in the JSS article to light.
>
> All the best,
> Tom
>

I often have to explicitly convert data read into R code blocks as a
table into some other data structure like a vector or a matrix.  I run
into this myself when trying to use the statistical functions of R.  It
generally takes a while to look up the function to do the conversion,
but I imagine that there is a reason why people who know more R than I
do chose to make tables the default data type for data read into R
blocks.

Best,

Combining the examples above yields the following,

#+name: pascals_triangle
#+begin_src python :var n=5 :exports none :return pascals_triangle(5) :results 
vector
def pascals_triangle(n):
    if n == 0:
        return [[1]]
    prev_triangle = pascals_triangle(n-1)
    prev_row = prev_triangle[n-1]
    this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
    return prev_triangle + [this_row]

#+end_src

#+name: padded_pascals_triangle
#+begin_src emacs-lisp :var data=pascals_triangle
  (let ((max-length (apply #'max (mapcar #'length data))))
    (mapcar (lambda (row)
              (append row (make-vector (- max-length (length row)) "") nil))
            data))
#+end_src

#+begin_src R :var data=padded_pascals_triangle
data
#+end_src

#+RESULTS:
| 1 | nil | nil | nil | nil | nil |
| 1 |   1 | nil | nil | nil | nil |
| 1 |   2 |   1 | nil | nil | nil |
| 1 |   3 |   3 |   1 | nil | nil |
| 1 |   4 |   6 |   4 | 1   | nil |
| 1 |   5 |  10 |  10 | 5   | 1   |
>
>>>>>>>>
>>
>> Note that I don't really want to do rowSums in this case.  I'm just trying to
>> demonstrate the error.
>>
>> Of course, it's clear that the first line does NOT contain five elements, nor
>> does the second, etc., as all of the above-diagonal elements are blanks.
>>
>> But I've been unable to find an R input function that doesn't end up treating
>> the source data as a table, i.e., in the context of Babel source blocks -- R
>> is "happy" to read a lower-diagonal structure.  See the appendix for an
>> example.
>>
>> Any suggestions?  Note that I'm happy to acknowledge that my own ignorance of
>> R and/or Babel might be the source of the problem.  If so, please enlighten
>> me.
>>
>> Thanks.
>>
>> -- Mike
>>
>> [1] http://www.jstatsoft.org/v46/i03
>> [2] https://github.com/eschulte/babel-presentation
>>
>> <<<<<<
>> Appendix
>> --------
>>
>>
>> $ cat pascal.dat
>> 1
>> 1 1
>> 1 2 1
>> 1 3 3 1
>> 1 4 6 4 1
>>
>> $ R --vanilla < pascal.R
>>
>> R version 2.15.0 (2012-03-30)
>> Copyright (C) 2012 The R Foundation for Statistical Computing
>> ISBN 3-900051-07-0
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>> .
>> .
>> .
>>
>>> x <- readLines("pascal.dat")
>>> x
>> [1] "1"         "1 1"       "1 2 1"     "1 3 3 1"   "1 4 6 4 1"
>>> str(x)
>>  chr [1:5] "1" "1 1" "1 2 1" "1 3 3 1" "1 4 6 4 1"
>>> 
>>> y <- scan("pascal.dat")
>> Read 15 items
>>> y
>>  [1] 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1
>>> str(y)
>>  num [1:15] 1 1 1 1 2 1 1 3 3 1 ...
>>> 
>>> z <- read.table("pascal.dat", header=FALSE)
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
>> : 
>>   line 1 did not have 5 elements
>> Calls: read.table -> scan
>> Execution halted
>>
>>

-- 
Eric Schulte
http://cs.unm.edu/~eschulte/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]