From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id btNeDluJMmDtSgAA0tVLHw (envelope-from ) for ; Sun, 21 Feb 2021 16:24:59 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id 8GDXCVuJMmDoNAAA1q6Kng (envelope-from ) for ; Sun, 21 Feb 2021 16:24:59 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C7BECBFFF for ; Sun, 21 Feb 2021 17:24:57 +0100 (CET) Received: from localhost ([::1]:54720 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lDrXU-0002Ym-Gi for larch@yhetil.org; Sun, 21 Feb 2021 11:24:56 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:52408) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lDrWU-00029l-1s for emacs-orgmode@gnu.org; Sun, 21 Feb 2021 11:23:54 -0500 Received: from mail-wr1-x432.google.com ([2a00:1450:4864:20::432]:36887) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lDrWQ-0000w9-NA for emacs-orgmode@gnu.org; Sun, 21 Feb 2021 11:23:53 -0500 Received: by mail-wr1-x432.google.com with SMTP id v15so16551610wrx.4 for ; Sun, 21 Feb 2021 08:23:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=andrew-cmu-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gaWV//khViORN9EuBfIQyTe7NUJYboCD8/W8+pkAYUE=; b=YPN8sTCPXd0yZtOsbsN+f3ALCN5FAba6+VRKvE+yw7tGxtUwi+NeTYHFz86aaNtRxZ 51VWtse81T14i2SPm5G6Q4xGbaGEdNl6z/WA1fAYQ7hVSQ90fDAz90gox/puCn3lnPXs AUr+1lFj7FsWUIvpMiXdUcdwicbFgLky5tnTFEbELvJ2BFGqqsZ3s5EgHk+CZkwUEtaB ffbcBX5xO83i9UmhKgBAgLQ35beF4/loDNi6Iwd+iiJbfvsKDPSPNIh5UQ7firyuJwm9 TmF3elcl7MMZjnciQxQaNwM0pjg1iq/vw0EuQwLmBtZrQeZGJvytpEMQBCWqexgFSOpq xF0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gaWV//khViORN9EuBfIQyTe7NUJYboCD8/W8+pkAYUE=; b=XL+OG6uomEFhnU1iYA2OPvKcYmpaeUVivCHVhS/HJiBAaHdVxdoK4x/3DblB5LeJuK 7CQDYk1iGSjRpyuEGvxv+uyf+G6edE9Q8RiKDvid8wA+ZQggOfisuwM1/hlLTAIJG/Hw NVp81OLTxaXHnA72EvJ+BCouvOwPZO9rdgK5shJrB7AR9QKyTr/udwbc1IHSy1rbnzUK hFe4uapy5n4h7Bx+avvKAcaOAUKFTnA+tmOHdkLu/eoPiGdTewp0d3QFNV/8gLczBZsm SsogJgFieVr6R92eosFM3YW9+jOL316vItp07fD4hIZi9L3kigiuq+WDz9lg6PQPqfZ5 COEw== X-Gm-Message-State: AOAM531cmoUfRT5e146jV/8lG0KSk8hKJ44iDQDNqr/hK959xsCM9ROf h+6bjw3Cgg2Zq4Lw6bWt0FTIAQpQ+XKqbwnJr/k= X-Google-Smtp-Source: ABdhPJzQ5I9tpw5YU5L0zXPMMAUigx3h5AVtkyZ3YQCvoX1bNiSUGfXf/UKt/cgzZI3Vs69nhDqEhR48qfm9WpnR3cE= X-Received: by 2002:a05:6000:1816:: with SMTP id m22mr17930080wrh.91.1613924629163; Sun, 21 Feb 2021 08:23:49 -0800 (PST) MIME-Version: 1.0 References: <529055.1613882450@apollo2.minshall.org> <875z2lgbco.fsf@gmail.com> In-Reply-To: From: John Kitchin Date: Sun, 21 Feb 2021 11:23:38 -0500 Message-ID: Subject: Re: state of the art in org-mode tables e.g. join, etc To: Tim Cross Content-Type: multipart/alternative; boundary="00000000000037330405bbdb1cb5" Received-SPF: pass client-ip=2a00:1450:4864:20::432; envelope-from=johnrkitchin@gmail.com; helo=mail-wr1-x432.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: org-mode-email Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -2.47 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=andrew-cmu-edu.20150623.gappssmtp.com header.s=20150623 header.b=YPN8sTCP; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=andrew.cmu.edu (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: C7BECBFFF X-Spam-Score: -2.47 X-Migadu-Scanner: scn0.migadu.com X-TUID: yQ4DxWTU16IQ --00000000000037330405bbdb1cb5 Content-Type: text/plain; charset="UTF-8" For fun, here is the sqlite equivalent of the Pandas example using the same tables as before ** aggregation example Examples from https://github.com/tbanel/orgaggregate #+NAME: original | Day | Color | Level | Quantity | |-----------+-------+-------+----------| | Monday | Red | 30 | 11 | | Monday | Blue | 25 | 3 | | Tuesday | Red | 51 | 12 | | Tuesday | Red | 45 | 15 | | Tuesday | Blue | 33 | 18 | | Wednesday | Red | 27 | 23 | | Wednesday | Blue | 12 | 16 | | Wednesday | Blue | 15 | 15 | | Thursday | Red | 39 | 24 | | Thursday | Red | 41 | 29 | | Thursday | Red | 49 | 30 | | Friday | Blue | 7 | 5 | | Friday | Blue | 6 | 8 | | Friday | Blue | 11 | 9 | #+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes drop table if exists testtable; create table testtable(Day str, Color str, Level int, Quantity int); .mode csv testtable .import $orgtable testtable select Color, count(*) from testtable group by Color; #+end_src #+RESULTS: | Color | count(*) | |-------+----------| | Blue | 7 | | Red | 7 | ** join example Example from https://github.com/tbanel/orgtbljoin #+name: nutrition | type | Fiber | Sugar | Protein | Carb | |----------+-------+-------+---------+------| | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | | onion | 1.3 | 4.4 | 1.3 | 9.0 | | egg | 0 | 18.3 | 31.9 | 18.3 | | rice | 0.2 | 0 | 1.5 | 16.0 | | bread | 0.7 | 0.7 | 3.3 | 16.0 | | orange | 3.1 | 11.9 | 1.3 | 17.6 | | banana | 2.1 | 9.9 | 0.9 | 18.5 | | tofu | 0.7 | 0.5 | 6.6 | 1.4 | | nut | 2.6 | 1.3 | 4.9 | 7.2 | | corn | 4.7 | 1.8 | 2.8 | 21.3 | #+name: recipe | type | quty | |----------+------| | onion | 70 | | tomatoe | 120 | | eggplant | 300 | | tofu | 100 | #+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames yes drop table if exists nutrition; drop table if exists recipe; create table nutrition(type str, Fiber float, Sugar float, Protein float, Carb float); create table recipe(type str, quty int); .mode csv nutrition .import $nut nutrition .mode csv recipe .import $rec recipe select * from recipe, nutrition where recipe.type=nutrition.type; #+end_src #+RESULTS: | type | quty | type | Fiber | Sugar | Protein | Carb | |----------+------+----------+-------+-------+---------+------| | onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9.0 | | tomatoe | 120 | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | | eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | | tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 | John ----------------------------------- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu On Sun, Feb 21, 2021 at 10:03 AM John Kitchin wrote: > Thanks Tim and Greg. I had mostly come to the same conclusions that it is > probably best to outsource this. I worked out some examples from > the orgtbljoin and orgaggregate packages with Pandas below, in case anyone > is interested in seeing how it works. A key point is using the ":colnames > no" header args to get the column names for Pandas. It seems like a pretty > good approach. > > * org-mode tables with Pandas > ** Aggregating from a table > > Examples from https://github.com/tbanel/orgaggregate > > > #+NAME: original > | Day | Color | Level | Quantity | > |-----------+-------+-------+----------| > | Monday | Red | 30 | 11 | > | Monday | Blue | 25 | 3 | > | Tuesday | Red | 51 | 12 | > | Tuesday | Red | 45 | 15 | > | Tuesday | Blue | 33 | 18 | > | Wednesday | Red | 27 | 23 | > | Wednesday | Blue | 12 | 16 | > | Wednesday | Blue | 15 | 15 | > | Thursday | Red | 39 | 24 | > | Thursday | Red | 41 | 29 | > | Thursday | Red | 49 | 30 | > | Friday | Blue | 7 | 5 | > | Friday | Blue | 6 | 8 | > | Friday | Blue | 11 | 9 | > > > #+BEGIN_SRC ipython :var data=original :colnames no > import pandas as pd > > pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size() > #+END_SRC > > #+RESULTS: > :results: > # Out [1]: > # text/plain > : Color > : Blue 7 > : Red 7 > : dtype: int64 > :end: > > The categorical stuff here is just to get the days sorted the same way as > the example. It is otherwise not needed. I feel there should be a more > clever way to do this, but didn't think of it. > > #+BEGIN_SRC ipython :var data=original :colnames no > df = pd.DataFrame(data[1:], columns=data[0]) > days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', > 'Saturday', 'Sunday'] > df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True) > > (df > .groupby('Day') > .agg({'Level': 'mean', > 'Quantity': 'sum'}) > .sort_values('Day')) > #+END_SRC > > #+RESULTS: > :results: > # Out [2]: > # text/plain > : Level Quantity > : Day > : Monday 27.5 14 > : Tuesday 43.0 45 > : Wednesday 18.0 54 > : Thursday 43.0 83 > : Friday 8.0 22 > : Saturday NaN 0 > : Sunday NaN 0 > > > [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]] > :end: > > ** Joining tables > > Example from https://github.com/tbanel/orgtbljoin > > #+name: nutrition > | type | Fiber | Sugar | Protein | Carb | > |----------+-------+-------+---------+------| > | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | > | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | > | onion | 1.3 | 4.4 | 1.3 | 9.0 | > | egg | 0 | 18.3 | 31.9 | 18.3 | > | rice | 0.2 | 0 | 1.5 | 16.0 | > | bread | 0.7 | 0.7 | 3.3 | 16.0 | > | orange | 3.1 | 11.9 | 1.3 | 17.6 | > | banana | 2.1 | 9.9 | 0.9 | 18.5 | > | tofu | 0.7 | 0.5 | 6.6 | 1.4 | > | nut | 2.6 | 1.3 | 4.9 | 7.2 | > | corn | 4.7 | 1.8 | 2.8 | 21.3 | > > > #+name: recipe > | type | quty | > |----------+------| > | onion | 70 | > | tomatoe | 120 | > | eggplant | 300 | > | tofu | 100 | > > > #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no > nutrition = pd.DataFrame(nut[1:], columns=nut[0]) > rec = pd.DataFrame(recipe[1:], columns=recipe[0]) > > pd.merge(rec, nutrition, on='type') > #+END_SRC > > #+RESULTS: > :results: > # Out [4]: > # text/plain > : type quty Fiber Sugar Protein Carb > : 0 onion 70 1.3 4.4 1.3 9.0 > : 1 tomatoe 120 0.6 2.1 0.8 3.4 > : 2 eggplant 300 2.5 3.2 0.8 8.6 > : 3 tofu 100 0.7 0.5 6.6 1.4 > :end: > > > John > > ----------------------------------- > Professor John Kitchin > Doherty Hall A207F > Department of Chemical Engineering > Carnegie Mellon University > Pittsburgh, PA 15213 > 412-268-7803 > @johnkitchin > http://kitchingroup.cheme.cmu.edu > > > > On Sun, Feb 21, 2021 at 1:54 AM Tim Cross wrote: > >> >> Greg Minshall writes: >> >> > John, >> > >> >> Is there a state of the art in using org-tables as little databases >> >> with joins and stuff? >> > >> > i have to admit i do all that with an R code source block. (the dplyr >> > package has the relevant joins, e.g. dplyr::inner_join().) and, in R, >> > ":colnames yes" as a header argument gives you header lines on results. >> > (maybe that's ?now? for "all" languages?) >> > >> >> For really complex joins and ad hoc queries, I would do similar or put >> the data into sqlite. For more simple ones, I just define a table which >> uses table formulas to extract the values from the other tables - the >> downside being the tables need to have the same data ordering or the >> formulas need to be somewhat complex. Provided the tables have the same >> number of records in the same order, table formulas are usually fairly >> easy. >> >> I did think about writing some elisp functions to use in my table >> formulas to make things easier, but then decided I was just re-inventing >> and well defined database solution and figured when I need it, just use >> sqlite. However, it has been a while since I needed this level of >> complexity, so perhaps things have moved on and there are better ways >> now. >> >> -- >> Tim Cross >> >> --00000000000037330405bbdb1cb5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
For fun, here is the sqlite equivalent of the Pandas examp= le using the same tables as before


** aggregation ex= ample

Examples from https://github.com/tbanel/orgaggregate


#+NAME: original| Day =C2=A0 =C2=A0 =C2=A0 | Color | Level | Quantity |
|-----------+-= ------+-------+----------|
| Monday =C2=A0 =C2=A0| Red =C2=A0 | =C2=A0 = =C2=A030 | =C2=A0 =C2=A0 =C2=A0 11 |
| Monday =C2=A0 =C2=A0| Blue =C2=A0= | =C2=A0 =C2=A025 | =C2=A0 =C2=A0 =C2=A0 =C2=A03 |
| Tuesday =C2=A0 | Re= d =C2=A0 | =C2=A0 =C2=A051 | =C2=A0 =C2=A0 =C2=A0 12 |
| Tuesday =C2=A0 = | Red =C2=A0 | =C2=A0 =C2=A045 | =C2=A0 =C2=A0 =C2=A0 15 |
| Tuesday =C2= =A0 | Blue =C2=A0| =C2=A0 =C2=A033 | =C2=A0 =C2=A0 =C2=A0 18 |
| Wednesd= ay | Red =C2=A0 | =C2=A0 =C2=A027 | =C2=A0 =C2=A0 =C2=A0 23 |
| Wednesda= y | Blue =C2=A0| =C2=A0 =C2=A012 | =C2=A0 =C2=A0 =C2=A0 16 |
| Wednesday= | Blue =C2=A0| =C2=A0 =C2=A015 | =C2=A0 =C2=A0 =C2=A0 15 |
| Thursday = =C2=A0| Red =C2=A0 | =C2=A0 =C2=A039 | =C2=A0 =C2=A0 =C2=A0 24 |
| Thurs= day =C2=A0| Red =C2=A0 | =C2=A0 =C2=A041 | =C2=A0 =C2=A0 =C2=A0 29 |
| T= hursday =C2=A0| Red =C2=A0 | =C2=A0 =C2=A049 | =C2=A0 =C2=A0 =C2=A0 30 || Friday =C2=A0 =C2=A0| Blue =C2=A0| =C2=A0 =C2=A0 7 | =C2=A0 =C2=A0 =C2= =A0 =C2=A05 |
| Friday =C2=A0 =C2=A0| Blue =C2=A0| =C2=A0 =C2=A0 6 | =C2= =A0 =C2=A0 =C2=A0 =C2=A08 |
| Friday =C2=A0 =C2=A0| Blue =C2=A0| =C2=A0 = =C2=A011 | =C2=A0 =C2=A0 =C2=A0 =C2=A09 |


#+be= gin_src sqlite :db ":memory:" :var orgtable=3Doriginal :colnames = yes
drop table if exists testtable;
create table testtable(Day str, C= olor str, Level int, Quantity int);
.mode csv testtable
.import $orgt= able testtable
select Color, count(*) from testtable group by Color;
= #+end_src

#+RESULTS:
| Color | count(*) |
|-------+----------|=
| Blue =C2=A0| =C2=A0 =C2=A0 =C2=A0 =C2=A07 |
| Red =C2=A0 | =C2=A0 = =C2=A0 =C2=A0 =C2=A07 |

** join example

Example from https://github.com/tbanel/orgtblj= oin

#+name: nutrition
| type =C2=A0 =C2=A0 | Fiber | Sugar | = Protein | Carb |
|----------+-------+-------+---------+------|
| eggp= lant | =C2=A0 2.5 | =C2=A0 3.2 | =C2=A0 =C2=A0 0.8 | =C2=A08.6 |
| tomat= oe =C2=A0| =C2=A0 0.6 | =C2=A0 2.1 | =C2=A0 =C2=A0 0.8 | =C2=A03.4 |
| o= nion =C2=A0 =C2=A0| =C2=A0 1.3 | =C2=A0 4.4 | =C2=A0 =C2=A0 1.3 | =C2=A09.0= |
| egg =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A0 0 | =C2=A018.3 | =C2=A0 =C2= =A031.9 | 18.3 |
| rice =C2=A0 =C2=A0 | =C2=A0 0.2 | =C2=A0 =C2=A0 0 | = =C2=A0 =C2=A0 1.5 | 16.0 |
| bread =C2=A0 =C2=A0| =C2=A0 0.7 | =C2=A0 0.= 7 | =C2=A0 =C2=A0 3.3 | 16.0 |
| orange =C2=A0 | =C2=A0 3.1 | =C2=A011.9= | =C2=A0 =C2=A0 1.3 | 17.6 |
| banana =C2=A0 | =C2=A0 2.1 | =C2=A0 9.9 = | =C2=A0 =C2=A0 0.9 | 18.5 |
| tofu =C2=A0 =C2=A0 | =C2=A0 0.7 | =C2=A0 = 0.5 | =C2=A0 =C2=A0 6.6 | =C2=A01.4 |
| nut =C2=A0 =C2=A0 =C2=A0| =C2=A0= 2.6 | =C2=A0 1.3 | =C2=A0 =C2=A0 4.9 | =C2=A07.2 |
| corn =C2=A0 =C2=A0= | =C2=A0 4.7 | =C2=A0 1.8 | =C2=A0 =C2=A0 2.8 | 21.3 |


#+name: = recipe
| type =C2=A0 =C2=A0 | quty |
|----------+------|
| onion = =C2=A0 =C2=A0| =C2=A0 70 |
| tomatoe =C2=A0| =C2=A0120 |
| eggplant |= =C2=A0300 |
| tofu =C2=A0 =C2=A0 | =C2=A0100 |


#+begin_src sqlite :db ":memory:" :var nut=3Dnutrition rec= =3Drecipe :colnames yes
drop table if exists nutrition;
drop table if= exists recipe;
create table nutrition(type str, Fiber float, Sugar floa= t, Protein float, Carb float);
create table recipe(type str, quty int);<= br>
.mode csv nutrition
.import $nut nutrition

.mode csv recip= e
.import $rec recipe

select * from recipe, nutrition where recip= e.type=3Dnutrition.type;
#+end_src

#+RESULTS:
| type =C2=A0 = =C2=A0 | quty | type =C2=A0 =C2=A0 | Fiber | Sugar | Protein | Carb |
|-= ---------+------+----------+-------+-------+---------+------|
| onion = =C2=A0 =C2=A0| =C2=A0 70 | onion =C2=A0 =C2=A0| =C2=A0 1.3 | =C2=A0 4.4 | = =C2=A0 =C2=A0 1.3 | =C2=A09.0 |
| tomatoe =C2=A0| =C2=A0120 | tomatoe = =C2=A0| =C2=A0 0.6 | =C2=A0 2.1 | =C2=A0 =C2=A0 0.8 | =C2=A03.4 |
| eggp= lant | =C2=A0300 | eggplant | =C2=A0 2.5 | =C2=A0 3.2 | =C2=A0 =C2=A0 0.8 |= =C2=A08.6 |
| tofu =C2=A0 =C2=A0 | =C2=A0100 | tofu =C2=A0 =C2=A0 | =C2= =A0 0.7 | =C2=A0 0.5 | =C2=A0 =C2=A0 6.6 | =C2=A01.4 |


John

-----------------------------------
Professor John= Kitchin=C2=A0
Doherty Hall A207F
Department of Chemical Engineering<= br>Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
=


On Sun, Feb 21, 2021 at 10:03 AM John Kitch= in <jkitchin@andrew.cmu.edu> wrote:
Thanks Tim and Greg. I had mostly come to the same conclusio= ns that it is probably best to outsource this. I worked out some examples f= rom the=C2=A0orgtbljoin and orgaggregate=C2=A0packages with Pandas below, i= n case anyone is interested in seeing how it works. A key point is using th= e ":colnames no" header args to get the column names for Pandas. = It seems like a pretty good approach.

* org-mode tables = with Pandas
** Aggregating from a table

Examples from
https://github.co= m/tbanel/orgaggregate


#+NAME: original
| Day =C2=A0 =C2= =A0 =C2=A0 | Color | Level | Quantity |
|-----------+-------+-------+---= -------|
| Monday =C2=A0 =C2=A0| Red =C2=A0 | =C2=A0 =C2=A030 | =C2=A0 = =C2=A0 =C2=A0 11 |
| Monday =C2=A0 =C2=A0| Blue =C2=A0| =C2=A0 =C2=A025 = | =C2=A0 =C2=A0 =C2=A0 =C2=A03 |
| Tuesday =C2=A0 | Red =C2=A0 | =C2=A0 = =C2=A051 | =C2=A0 =C2=A0 =C2=A0 12 |
| Tuesday =C2=A0 | Red =C2=A0 | =C2= =A0 =C2=A045 | =C2=A0 =C2=A0 =C2=A0 15 |
| Tuesday =C2=A0 | Blue =C2=A0|= =C2=A0 =C2=A033 | =C2=A0 =C2=A0 =C2=A0 18 |
| Wednesday | Red =C2=A0 | = =C2=A0 =C2=A027 | =C2=A0 =C2=A0 =C2=A0 23 |
| Wednesday | Blue =C2=A0| = =C2=A0 =C2=A012 | =C2=A0 =C2=A0 =C2=A0 16 |
| Wednesday | Blue =C2=A0| = =C2=A0 =C2=A015 | =C2=A0 =C2=A0 =C2=A0 15 |
| Thursday =C2=A0| Red =C2= =A0 | =C2=A0 =C2=A039 | =C2=A0 =C2=A0 =C2=A0 24 |
| Thursday =C2=A0| Red= =C2=A0 | =C2=A0 =C2=A041 | =C2=A0 =C2=A0 =C2=A0 29 |
| Thursday =C2=A0|= Red =C2=A0 | =C2=A0 =C2=A049 | =C2=A0 =C2=A0 =C2=A0 30 |
| Friday =C2= =A0 =C2=A0| Blue =C2=A0| =C2=A0 =C2=A0 7 | =C2=A0 =C2=A0 =C2=A0 =C2=A05 || Friday =C2=A0 =C2=A0| Blue =C2=A0| =C2=A0 =C2=A0 6 | =C2=A0 =C2=A0 =C2= =A0 =C2=A08 |
| Friday =C2=A0 =C2=A0| Blue =C2=A0| =C2=A0 =C2=A011 | =C2= =A0 =C2=A0 =C2=A0 =C2=A09 |


#+BEGIN_SRC ipython :var data=3Dorig= inal :colnames no
import pandas as pd

pd.DataFrame(data[1:], colu= mns=3Ddata[0]).groupby('Color').size()
#+END_SRC

#+RESULT= S:
:results:
# Out [1]:
# text/plain
: Color
: Blue =C2=A0 = =C2=A07
: Red =C2=A0 =C2=A0 7
: dtype: int64
:end:

The cate= gorical stuff here is just to get the days sorted the same way as the examp= le. It is otherwise not needed. I feel there should be a more clever way to= do this, but didn't think of it.

#+BEGIN_SRC ipython :va= r data=3Doriginal :colnames no
df =3D pd.DataFrame(data[1:], columns=3Dd= ata[0])
days =3D ['Monday', 'Tuesday', 'Wednesday= 9;, 'Thursday', 'Friday', 'Saturday', 'Sunday&#= 39;]
df['Day'] =3D pd.Categorical(df['Day'], categories= =3Ddays, ordered=3DTrue)

(df
=C2=A0.groupby('Day')
=C2= =A0.agg({'Level': 'mean',
=C2=A0 =C2=A0 =C2=A0 =C2=A0= 9;Quantity': 'sum'})
=C2=A0.sort_values('Day'))
#= +END_SRC

#+RESULTS:
:results:
# Out [2]:
# text/plain
: = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Level =C2=A0Quantity
: Day
:= Monday =C2=A0 =C2=A0 =C2=A027.5 =C2=A0 =C2=A0 =C2=A0 =C2=A014
: Tuesday= =C2=A0 =C2=A0 43.0 =C2=A0 =C2=A0 =C2=A0 =C2=A045
: Wednesday =C2=A0 18.= 0 =C2=A0 =C2=A0 =C2=A0 =C2=A054
: Thursday =C2=A0 =C2=A043.0 =C2=A0 =C2= =A0 =C2=A0 =C2=A083
: Friday =C2=A0 =C2=A0 =C2=A0 8.0 =C2=A0 =C2=A0 =C2= =A0 =C2=A022
: Saturday =C2=A0 =C2=A0 NaN =C2=A0 =C2=A0 =C2=A0 =C2=A0 0<= br>: Sunday =C2=A0 =C2=A0 =C2=A0 NaN =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

[= [file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9= a.html]]
:end:

** Joining tables

Example from https://github.com/t= banel/orgtbljoin

#+name: nutrition
| type =C2=A0 =C2=A0 | Fib= er | Sugar | Protein | Carb |
|----------+-------+-------+---------+----= --|
| eggplant | =C2=A0 2.5 | =C2=A0 3.2 | =C2=A0 =C2=A0 0.8 | =C2=A08.6= |
| tomatoe =C2=A0| =C2=A0 0.6 | =C2=A0 2.1 | =C2=A0 =C2=A0 0.8 | =C2= =A03.4 |
| onion =C2=A0 =C2=A0| =C2=A0 1.3 | =C2=A0 4.4 | =C2=A0 =C2=A0 = 1.3 | =C2=A09.0 |
| egg =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A0 0 | =C2=A018= .3 | =C2=A0 =C2=A031.9 | 18.3 |
| rice =C2=A0 =C2=A0 | =C2=A0 0.2 | =C2= =A0 =C2=A0 0 | =C2=A0 =C2=A0 1.5 | 16.0 |
| bread =C2=A0 =C2=A0| =C2=A0 = 0.7 | =C2=A0 0.7 | =C2=A0 =C2=A0 3.3 | 16.0 |
| orange =C2=A0 | =C2=A0 3= .1 | =C2=A011.9 | =C2=A0 =C2=A0 1.3 | 17.6 |
| banana =C2=A0 | =C2=A0 2.= 1 | =C2=A0 9.9 | =C2=A0 =C2=A0 0.9 | 18.5 |
| tofu =C2=A0 =C2=A0 | =C2= =A0 0.7 | =C2=A0 0.5 | =C2=A0 =C2=A0 6.6 | =C2=A01.4 |
| nut =C2=A0 =C2= =A0 =C2=A0| =C2=A0 2.6 | =C2=A0 1.3 | =C2=A0 =C2=A0 4.9 | =C2=A07.2 |
| = corn =C2=A0 =C2=A0 | =C2=A0 4.7 | =C2=A0 1.8 | =C2=A0 =C2=A0 2.8 | 21.3 |

#+name: recipe
| type =C2=A0 =C2=A0 | quty |
|----------+--= ----|
| onion =C2=A0 =C2=A0| =C2=A0 70 |
| tomatoe =C2=A0| =C2=A0120 = |
| eggplant | =C2=A0300 |
| tofu =C2=A0 =C2=A0 | =C2=A0100 |

=
#+BEGIN_SRC ipython :var nut=3Dnutrition recipe=3Drecipe :colnames nonutrition =3D pd.DataFrame(nut[1:], columns=3Dnut[0])
rec =3D pd.DataF= rame(recipe[1:], columns=3Drecipe[0])

pd.merge(rec, nutrition, on=3D= 'type')
#+END_SRC

#+RESULTS:
:results:
# Out [4]:# text/plain
: =C2=A0 =C2=A0 =C2=A0 =C2=A0type =C2=A0quty =C2=A0Fiber = =C2=A0Sugar =C2=A0Protein =C2=A0Carb
: 0 =C2=A0 =C2=A0 onion =C2=A0 =C2= =A070 =C2=A0 =C2=A01.3 =C2=A0 =C2=A04.4 =C2=A0 =C2=A0 =C2=A01.3 =C2=A0 9.0<= br>: 1 =C2=A0 tomatoe =C2=A0 120 =C2=A0 =C2=A00.6 =C2=A0 =C2=A02.1 =C2=A0 = =C2=A0 =C2=A00.8 =C2=A0 3.4
: 2 =C2=A0eggplant =C2=A0 300 =C2=A0 =C2=A02= .5 =C2=A0 =C2=A03.2 =C2=A0 =C2=A0 =C2=A00.8 =C2=A0 8.6
: 3 =C2=A0 =C2=A0= =C2=A0tofu =C2=A0 100 =C2=A0 =C2=A00.7 =C2=A0 =C2=A00.5 =C2=A0 =C2=A0 =C2= =A06.6 =C2=A0 1.4
:end:
<= div dir=3D"ltr">

John<= br>
-----------------------------------
Professor John Kitchin=C2=A0<= br>Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mel= lon University
Pittsburgh, PA 15213
412-268-7803


On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com>= ; wrote:

Greg Minshall <m= inshall@umich.edu> writes:

> John,
>
>> Is there a state of the art in using org-tables as little database= s
>> with joins and stuff?
>
> i have to admit i do all that with an R code source block.=C2=A0 (the = dplyr
> package has the relevant joins, e.g. dplyr::inner_join().)=C2=A0 and, = in R,
> ":colnames yes" as a header argument gives you header lines = on results.
> (maybe that's ?now? for "all" languages?)
>

For really complex joins and ad hoc queries, I would do similar or put
the data into sqlite. For more simple ones, I just define a table which
uses table formulas to extract the values from the other tables - the
downside being the tables need to have the same data ordering or the
formulas need to be somewhat complex. Provided the tables have the same
number of records in the same order, table formulas are usually fairly
easy.

I did think about writing some elisp functions to use in my table
formulas to make things easier, but then decided I was just re-inventing and well defined database solution and figured when I need it, just use
sqlite. However, it has been a while since I needed this level of
complexity, so perhaps things have moved on and there are better ways
now.

--
Tim Cross

--00000000000037330405bbdb1cb5--