[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: state of the art in org-mode tables e.g. join, etc
From: |
John Kitchin |
Subject: |
Re: state of the art in org-mode tables e.g. join, etc |
Date: |
Thu, 25 Feb 2021 09:50:40 -0500 |
User-agent: |
mu4e 1.4.13; emacs 27.1.90 |
That is remarkably slim code to get those results!
Cook, Malcolm <MEC@stowers.org> writes:
> John,
>
> Checkout what R sqldf package makes easy:
>
> ** aggregation example
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday | Red | 30 | 11 |
> | Monday | Blue | 25 | 3 |
> | Tuesday | Red | 51 | 12 |
> | Tuesday | Red | 45 | 15 |
> | Tuesday | Blue | 33 | 18 |
> | Wednesday | Red | 27 | 23 |
> | Wednesday | Blue | 12 | 16 |
> | Wednesday | Blue | 15 | 15 |
> | Thursday | Red | 39 | 24 |
> | Thursday | Red | 41 | 29 |
> | Thursday | Red | 49 | 30 |
> | Friday | Blue | 7 | 5 |
> | Friday | Blue | 6 | 8 |
> | Friday | Blue | 11 | 9 |
>
> #+PROPERTY: header-args:R :session *R*
>
> #+begin_src R :results none
> library(sqldf)
> #+end_src
>
>
> #+begin_src R :var original=original :colnames yes
> sqldf('select Color, count(*) from original group by Color;')
> #+end_src
>
> #+RESULTS:
> | Color | count(*) |
> |-------+----------|
> | Blue | 7 |
> | Red | 7 |
>
>
>
> ** join example
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
> | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
> | onion | 1.3 | 4.4 | 1.3 | 9.0 |
> | egg | 0 | 18.3 | 31.9 | 18.3 |
> | rice | 0.2 | 0 | 1.5 | 16.0 |
> | bread | 0.7 | 0.7 | 3.3 | 16.0 |
> | orange | 3.1 | 11.9 | 1.3 | 17.6 |
> | banana | 2.1 | 9.9 | 0.9 | 18.5 |
> | tofu | 0.7 | 0.5 | 6.6 | 1.4 |
> | nut | 2.6 | 1.3 | 4.9 | 7.2 |
> | corn | 4.7 | 1.8 | 2.8 | 21.3 |
>
>
> #+name: recipe
> | type | quty |
> |----------+------|
> | onion | 70 |
> | tomatoe | 120 |
> | eggplant | 300 |
> | tofu | 100 |
>
>
> #+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes
> sqldf('select * from recipe, nutrition where recipe.type=nutrition.type')
> #+end_src
>
> #+RESULTS:
> | type | quty | type | Fiber | Sugar | Protein | Carb |
> |----------+------+----------+-------+-------+---------+------|
> | onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9 |
> | tomatoe | 120 | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
> | eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
> | tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 |
>
>
>
> This should also be possible but I cannot get it to work now:
>
> #+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes
> :prologue sqldf(' :epilogue ')
> select * from recipe, nutrition where recipe.type=nutrition.type
> #+end_src
>
>
>
>
>
> From: Emacs-orgmode <emacs-orgmode-bounces+mec=stowers.org@gnu.org> On Behalf
> Of John Kitchin
> Sent: Sunday, February 21, 2021 10:24
> To: Tim Cross <theophilusx@gmail.com>
> Cc: org-mode-email <emacs-orgmode@gnu.org>
> Subject: Re: state of the art in org-mode tables e.g. join, etc
>
> ATTENTION: This email came from an external source. Do not open attachments
> or click on links from unknown senders or unexpected emails.
>
> For fun, here is the sqlite equivalent of the Pandas example using the same
> tables as before
>
>
> ** aggregation example
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday | Red | 30 | 11 |
> | Monday | Blue | 25 | 3 |
> | Tuesday | Red | 51 | 12 |
> | Tuesday | Red | 45 | 15 |
> | Tuesday | Blue | 33 | 18 |
> | Wednesday | Red | 27 | 23 |
> | Wednesday | Blue | 12 | 16 |
> | Wednesday | Blue | 15 | 15 |
> | Thursday | Red | 39 | 24 |
> | Thursday | Red | 41 | 29 |
> | Thursday | Red | 49 | 30 |
> | Friday | Blue | 7 | 5 |
> | Friday | Blue | 6 | 8 |
> | Friday | Blue | 11 | 9 |
>
>
> #+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes
> drop table if exists testtable;
> create table testtable(Day str, Color str, Level int, Quantity int);
> .mode csv testtable
> .import $orgtable testtable
> select Color, count(*) from testtable group by Color;
> #+end_src
>
> #+RESULTS:
> | Color | count(*) |
> |-------+----------|
> | Blue | 7 |
> | Red | 7 |
>
> ** join example
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
> | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
> | onion | 1.3 | 4.4 | 1.3 | 9.0 |
> | egg | 0 | 18.3 | 31.9 | 18.3 |
> | rice | 0.2 | 0 | 1.5 | 16.0 |
> | bread | 0.7 | 0.7 | 3.3 | 16.0 |
> | orange | 3.1 | 11.9 | 1.3 | 17.6 |
> | banana | 2.1 | 9.9 | 0.9 | 18.5 |
> | tofu | 0.7 | 0.5 | 6.6 | 1.4 |
> | nut | 2.6 | 1.3 | 4.9 | 7.2 |
> | corn | 4.7 | 1.8 | 2.8 | 21.3 |
>
>
> #+name: recipe
> | type | quty |
> |----------+------|
> | onion | 70 |
> | tomatoe | 120 |
> | eggplant | 300 |
> | tofu | 100 |
>
>
> #+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames yes
> drop table if exists nutrition;
> drop table if exists recipe;
> create table nutrition(type str, Fiber float, Sugar float, Protein float,
> Carb float);
> create table recipe(type str, quty int);
>
> .mode csv nutrition
> .import $nut nutrition
>
> .mode csv recipe
> .import $rec recipe
>
> select * from recipe, nutrition where recipe.type=nutrition.type;
> #+end_src
>
> #+RESULTS:
> | type | quty | type | Fiber | Sugar | Protein | Carb |
> |----------+------+----------+-------+-------+---------+------|
> | onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9.0 |
> | tomatoe | 120 | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
> | eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
> | tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 |
>
>
> John
>
> -----------------------------------
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
>
>
> On Sun, Feb 21, 2021 at 10:03 AM John Kitchin
> <jkitchin@andrew.cmu.edu<mailto:jkitchin@andrew.cmu.edu>> wrote:
> Thanks Tim and Greg. I had mostly come to the same conclusions that it is
> probably best to outsource this. I worked out some examples from the
> orgtbljoin and orgaggregate packages with Pandas below, in case anyone is
> interested in seeing how it works. A key point is using the ":colnames no"
> header args to get the column names for Pandas. It seems like a pretty good
> approach.
>
> * org-mode tables with Pandas
> ** Aggregating from a table
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday | Red | 30 | 11 |
> | Monday | Blue | 25 | 3 |
> | Tuesday | Red | 51 | 12 |
> | Tuesday | Red | 45 | 15 |
> | Tuesday | Blue | 33 | 18 |
> | Wednesday | Red | 27 | 23 |
> | Wednesday | Blue | 12 | 16 |
> | Wednesday | Blue | 15 | 15 |
> | Thursday | Red | 39 | 24 |
> | Thursday | Red | 41 | 29 |
> | Thursday | Red | 49 | 30 |
> | Friday | Blue | 7 | 5 |
> | Friday | Blue | 6 | 8 |
> | Friday | Blue | 11 | 9 |
>
>
> #+BEGIN_SRC ipython :var data=original :colnames no
> import pandas as pd
>
> pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [1]:
> # text/plain
> : Color
> : Blue 7
> : Red 7
> : dtype: int64
> :end:
>
> The categorical stuff here is just to get the days sorted the same way as the
> example. It is otherwise not needed. I feel there should be a more clever way
> to do this, but didn't think of it.
>
> #+BEGIN_SRC ipython :var data=original :colnames no
> df = pd.DataFrame(data[1:], columns=data[0])
> days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
> 'Sunday']
> df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)
>
> (df
> .groupby('Day')
> .agg({'Level': 'mean',
> 'Quantity': 'sum'})
> .sort_values('Day'))
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [2]:
> # text/plain
> : Level Quantity
> : Day
> : Monday 27.5 14
> : Tuesday 43.0 45
> : Wednesday 18.0 54
> : Thursday 43.0 83
> : Friday 8.0 22
> : Saturday NaN 0
> : Sunday NaN 0
>
> [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
> :end:
>
> ** Joining tables
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
> | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
> | onion | 1.3 | 4.4 | 1.3 | 9.0 |
> | egg | 0 | 18.3 | 31.9 | 18.3 |
> | rice | 0.2 | 0 | 1.5 | 16.0 |
> | bread | 0.7 | 0.7 | 3.3 | 16.0 |
> | orange | 3.1 | 11.9 | 1.3 | 17.6 |
> | banana | 2.1 | 9.9 | 0.9 | 18.5 |
> | tofu | 0.7 | 0.5 | 6.6 | 1.4 |
> | nut | 2.6 | 1.3 | 4.9 | 7.2 |
> | corn | 4.7 | 1.8 | 2.8 | 21.3 |
>
>
> #+name: recipe
> | type | quty |
> |----------+------|
> | onion | 70 |
> | tomatoe | 120 |
> | eggplant | 300 |
> | tofu | 100 |
>
>
> #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
> nutrition = pd.DataFrame(nut[1:], columns=nut[0])
> rec = pd.DataFrame(recipe[1:], columns=recipe[0])
>
> pd.merge(rec, nutrition, on='type')
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [4]:
> # text/plain
> : type quty Fiber Sugar Protein Carb
> : 0 onion 70 1.3 4.4 1.3 9.0
> : 1 tomatoe 120 0.6 2.1 0.8 3.4
> : 2 eggplant 300 2.5 3.2 0.8 8.6
> : 3 tofu 100 0.7 0.5 6.6 1.4
> :end:
>
>
> John
>
> -----------------------------------
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
>
>
> On Sun, Feb 21, 2021 at 1:54 AM Tim Cross
> <theophilusx@gmail.com<mailto:theophilusx@gmail.com>> wrote:
>
> Greg Minshall <minshall@umich.edu<mailto:minshall@umich.edu>> writes:
>
>> John,
>>
>>> Is there a state of the art in using org-tables as little databases
>>> with joins and stuff?
>>
>> i have to admit i do all that with an R code source block. (the dplyr
>> package has the relevant joins, e.g. dplyr::inner_join().) and, in R,
>> ":colnames yes" as a header argument gives you header lines on results.
>> (maybe that's ?now? for "all" languages?)
>>
>
> For really complex joins and ad hoc queries, I would do similar or put
> the data into sqlite. For more simple ones, I just define a table which
> uses table formulas to extract the values from the other tables - the
> downside being the tables need to have the same data ordering or the
> formulas need to be somewhat complex. Provided the tables have the same
> number of records in the same order, table formulas are usually fairly
> easy.
>
> I did think about writing some elisp functions to use in my table
> formulas to make things easier, but then decided I was just re-inventing
> and well defined database solution and figured when I need it, just use
> sqlite. However, it has been a while since I needed this level of
> complexity, so perhaps things have moved on and there are better ways
> now.
--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu
- state of the art in org-mode tables e.g. join, etc, John Kitchin, 2021/02/20
- Re: state of the art in org-mode tables e.g. join, etc, Greg Minshall, 2021/02/20
- Re: state of the art in org-mode tables e.g. join, etc, Tim Cross, 2021/02/21
- Re: state of the art in org-mode tables e.g. join, etc, John Kitchin, 2021/02/21
- Re: state of the art in org-mode tables e.g. join, etc, John Kitchin, 2021/02/21
- RE: state of the art in org-mode tables e.g. join, etc, Cook, Malcolm, 2021/02/22
- Re: state of the art in org-mode tables e.g. join, etc, Greg Minshall, 2021/02/22
- RE: state of the art in org-mode tables e.g. join, etc, Cook, Malcolm, 2021/02/22
- Re: state of the art in org-mode tables e.g. join, etc, Greg Minshall, 2021/02/22
- Re: state of the art in org-mode tables e.g. join, etc,
John Kitchin <=
- Re: state of the art in org-mode tables e.g. join, etc, Derek Feichtinger, 2021/02/22
- Re: state of the art in org-mode tables e.g. join, etc, John Kitchin, 2021/02/24