From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id EO5uOLBtM2ByKAAA0tVLHw (envelope-from ) for ; Mon, 22 Feb 2021 08:39:12 +0000 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id YGdANLBtM2B3fAAAB5/wlQ (envelope-from ) for ; Mon, 22 Feb 2021 08:39:12 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 1A7D8213F8 for ; Mon, 22 Feb 2021 09:39:12 +0100 (CET) Received: from localhost ([::1]:53920 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lE6kJ-0001vI-0S for larch@yhetil.org; Mon, 22 Feb 2021 03:39:11 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:51902) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lE6gG-0004BJ-PT for emacs-orgmode@gnu.org; Mon, 22 Feb 2021 03:35:00 -0500 Received: from psi-seppmail1.ethz.ch ([129.132.93.141]:26870) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lE6gD-0001XK-4I for emacs-orgmode@gnu.org; Mon, 22 Feb 2021 03:35:00 -0500 Received: from mailg210.ethz.ch (mailg210.ethz.ch [129.132.198.194]) by psi-seppmail1.ethz.ch (Postfix) with ESMTPS; Mon, 22 Feb 2021 09:27:51 +0100 (CET) Received: from mailm116.d.ethz.ch (2001:67c:10ec:5602::28) by mailg210.ethz.ch (2001:67c:10ec:5606::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Mon, 22 Feb 2021 09:27:47 +0100 Received: from dflt2w (85.0.184.218) by mailm116.d.ethz.ch (2001:67c:10ec:5602::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Mon, 22 Feb 2021 09:27:51 +0100 References: <529055.1613882450@apollo2.minshall.org> <875z2lgbco.fsf@gmail.com> User-agent: mu4e 1.4.9; emacs 27.1.90 From: Derek Feichtinger To: John Kitchin Subject: Re: state of the art in org-mode tables e.g. join, etc In-Reply-To: Date: Mon, 22 Feb 2021 09:27:50 +0100 Message-ID: <87r1l8tsl5.fsf@psi.ch> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [85.0.184.218] X-ClientProxiedBy: mailm215.d.ethz.ch (2001:67c:10ec:5603::29) To mailm116.d.ethz.ch (2001:67c:10ec:5602::28) X-TM-SNTS-SMTP: 36571FD9C4418D9762AF90E0B100D36E61D96BB2E8AB618191A91DF1AB1816B02000:8 X-ExSBR-RoutingRule: mailg210:psi.ch;mailg210:psi.ch;mailg210:psi.ch; X-SM-outgoing: yes Received-SPF: pass client-ip=129.132.93.141; envelope-from=derek.feichtinger@psi.ch; helo=psi-seppmail1.ethz.ch X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tim Cross , emacs-orgmode@gnu.org Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -2.37 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: 1A7D8213F8 X-Spam-Score: -2.37 X-Migadu-Scanner: scn1.migadu.com X-TUID: 9g4MMInGTwGy Hi John, I invested time some years ago in preparing babel examples, and a lot of the description went into using tables. The most detailed documents I had for elisp and python. In order to be productive, e.g. for producing all kinds of scientific graphs, but also for doing the finances and planning for our scientific computing section I ended up the same as you with mostly going to python and leveraging Pandas. I think all of us end up using ":colnames no" as the most convenient solution. https://github.com/dfeich/org-babel-examples/blob/master/python3/python3-babel.org (especially look at the Pandas section 10) In that file I also tangle a python library "orgbabelhelper" that is available in Conda and PyPi. I mainly use that to work with my tables. Best regards Derek -- Paul Scherrer Institut Dr. Derek Feichtinger Phone: +41 56 310 47 33 Group Head HPC and Emerging Technologies Email: derek.feichtinger@psi.ch Building/Room No. OHSA/D17 Forschungsstrasse 111 CH-5232 Villigen PSI On Sun, Feb 21 2021, John Kitchin wrote: > For fun, here is the sqlite equivalent of the Pandas example using the same > tables as before > > > ** aggregation example > > Examples from https://github.com/tbanel/orgaggregate > > > #+NAME: original > | Day | Color | Level | Quantity | > |-----------+-------+-------+----------| > | Monday | Red | 30 | 11 | > | Monday | Blue | 25 | 3 | > | Tuesday | Red | 51 | 12 | > | Tuesday | Red | 45 | 15 | > | Tuesday | Blue | 33 | 18 | > | Wednesday | Red | 27 | 23 | > | Wednesday | Blue | 12 | 16 | > | Wednesday | Blue | 15 | 15 | > | Thursday | Red | 39 | 24 | > | Thursday | Red | 41 | 29 | > | Thursday | Red | 49 | 30 | > | Friday | Blue | 7 | 5 | > | Friday | Blue | 6 | 8 | > | Friday | Blue | 11 | 9 | > > > #+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes > drop table if exists testtable; > create table testtable(Day str, Color str, Level int, Quantity int); > .mode csv testtable > .import $orgtable testtable > select Color, count(*) from testtable group by Color; > #+end_src > > #+RESULTS: > | Color | count(*) | > |-------+----------| > | Blue | 7 | > | Red | 7 | > > ** join example > > Example from https://github.com/tbanel/orgtbljoin > > #+name: nutrition > | type | Fiber | Sugar | Protein | Carb | > |----------+-------+-------+---------+------| > | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | > | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | > | onion | 1.3 | 4.4 | 1.3 | 9.0 | > | egg | 0 | 18.3 | 31.9 | 18.3 | > | rice | 0.2 | 0 | 1.5 | 16.0 | > | bread | 0.7 | 0.7 | 3.3 | 16.0 | > | orange | 3.1 | 11.9 | 1.3 | 17.6 | > | banana | 2.1 | 9.9 | 0.9 | 18.5 | > | tofu | 0.7 | 0.5 | 6.6 | 1.4 | > | nut | 2.6 | 1.3 | 4.9 | 7.2 | > | corn | 4.7 | 1.8 | 2.8 | 21.3 | > > > #+name: recipe > | type | quty | > |----------+------| > | onion | 70 | > | tomatoe | 120 | > | eggplant | 300 | > | tofu | 100 | > > > #+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames > yes > drop table if exists nutrition; > drop table if exists recipe; > create table nutrition(type str, Fiber float, Sugar float, Protein float, > Carb float); > create table recipe(type str, quty int); > > .mode csv nutrition > .import $nut nutrition > > .mode csv recipe > .import $rec recipe > > select * from recipe, nutrition where recipe.type=nutrition.type; > #+end_src > > #+RESULTS: > | type | quty | type | Fiber | Sugar | Protein | Carb | > |----------+------+----------+-------+-------+---------+------| > | onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9.0 | > | tomatoe | 120 | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | > | eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | > | tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 | > > > John > > ----------------------------------- > Professor John Kitchin > Doherty Hall A207F > Department of Chemical Engineering > Carnegie Mellon University > Pittsburgh, PA 15213 > 412-268-7803 > @johnkitchin > http://kitchingroup.cheme.cmu.edu > > > > On Sun, Feb 21, 2021 at 10:03 AM John Kitchin > wrote: > >> Thanks Tim and Greg. I had mostly come to the same conclusions that it is >> probably best to outsource this. I worked out some examples from >> the orgtbljoin and orgaggregate packages with Pandas below, in case anyone >> is interested in seeing how it works. A key point is using the ":colnames >> no" header args to get the column names for Pandas. It seems like a pretty >> good approach. >> >> * org-mode tables with Pandas >> ** Aggregating from a table >> >> Examples from https://github.com/tbanel/orgaggregate >> >> >> #+NAME: original >> | Day | Color | Level | Quantity | >> |-----------+-------+-------+----------| >> | Monday | Red | 30 | 11 | >> | Monday | Blue | 25 | 3 | >> | Tuesday | Red | 51 | 12 | >> | Tuesday | Red | 45 | 15 | >> | Tuesday | Blue | 33 | 18 | >> | Wednesday | Red | 27 | 23 | >> | Wednesday | Blue | 12 | 16 | >> | Wednesday | Blue | 15 | 15 | >> | Thursday | Red | 39 | 24 | >> | Thursday | Red | 41 | 29 | >> | Thursday | Red | 49 | 30 | >> | Friday | Blue | 7 | 5 | >> | Friday | Blue | 6 | 8 | >> | Friday | Blue | 11 | 9 | >> >> >> #+BEGIN_SRC ipython :var data=original :colnames no >> import pandas as pd >> >> pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size() >> #+END_SRC >> >> #+RESULTS: >> :results: >> # Out [1]: >> # text/plain >> : Color >> : Blue 7 >> : Red 7 >> : dtype: int64 >> :end: >> >> The categorical stuff here is just to get the days sorted the same way as >> the example. It is otherwise not needed. I feel there should be a more >> clever way to do this, but didn't think of it. >> >> #+BEGIN_SRC ipython :var data=original :colnames no >> df = pd.DataFrame(data[1:], columns=data[0]) >> days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', >> 'Saturday', 'Sunday'] >> df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True) >> >> (df >> .groupby('Day') >> .agg({'Level': 'mean', >> 'Quantity': 'sum'}) >> .sort_values('Day')) >> #+END_SRC >> >> #+RESULTS: >> :results: >> # Out [2]: >> # text/plain >> : Level Quantity >> : Day >> : Monday 27.5 14 >> : Tuesday 43.0 45 >> : Wednesday 18.0 54 >> : Thursday 43.0 83 >> : Friday 8.0 22 >> : Saturday NaN 0 >> : Sunday NaN 0 >> >> >> [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]] >> :end: >> >> ** Joining tables >> >> Example from https://github.com/tbanel/orgtbljoin >> >> #+name: nutrition >> | type | Fiber | Sugar | Protein | Carb | >> |----------+-------+-------+---------+------| >> | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | >> | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | >> | onion | 1.3 | 4.4 | 1.3 | 9.0 | >> | egg | 0 | 18.3 | 31.9 | 18.3 | >> | rice | 0.2 | 0 | 1.5 | 16.0 | >> | bread | 0.7 | 0.7 | 3.3 | 16.0 | >> | orange | 3.1 | 11.9 | 1.3 | 17.6 | >> | banana | 2.1 | 9.9 | 0.9 | 18.5 | >> | tofu | 0.7 | 0.5 | 6.6 | 1.4 | >> | nut | 2.6 | 1.3 | 4.9 | 7.2 | >> | corn | 4.7 | 1.8 | 2.8 | 21.3 | >> >> >> #+name: recipe >> | type | quty | >> |----------+------| >> | onion | 70 | >> | tomatoe | 120 | >> | eggplant | 300 | >> | tofu | 100 | >> >> >> #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no >> nutrition = pd.DataFrame(nut[1:], columns=nut[0]) >> rec = pd.DataFrame(recipe[1:], columns=recipe[0]) >> >> pd.merge(rec, nutrition, on='type') >> #+END_SRC >> >> #+RESULTS: >> :results: >> # Out [4]: >> # text/plain >> : type quty Fiber Sugar Protein Carb >> : 0 onion 70 1.3 4.4 1.3 9.0 >> : 1 tomatoe 120 0.6 2.1 0.8 3.4 >> : 2 eggplant 300 2.5 3.2 0.8 8.6 >> : 3 tofu 100 0.7 0.5 6.6 1.4 >> :end: >> >> >> John >> >> ----------------------------------- >> Professor John Kitchin >> Doherty Hall A207F >> Department of Chemical Engineering >> Carnegie Mellon University >> Pittsburgh, PA 15213 >> 412-268-7803 >> @johnkitchin >> http://kitchingroup.cheme.cmu.edu >> >> >> >> On Sun, Feb 21, 2021 at 1:54 AM Tim Cross wrote: >> >>> >>> Greg Minshall writes: >>> >>> > John, >>> > >>> >> Is there a state of the art in using org-tables as little databases >>> >> with joins and stuff? >>> > >>> > i have to admit i do all that with an R code source block. (the dplyr >>> > package has the relevant joins, e.g. dplyr::inner_join().) and, in R, >>> > ":colnames yes" as a header argument gives you header lines on results. >>> > (maybe that's ?now? for "all" languages?) >>> > >>> >>> For really complex joins and ad hoc queries, I would do similar or put >>> the data into sqlite. For more simple ones, I just define a table which >>> uses table formulas to extract the values from the other tables - the >>> downside being the tables need to have the same data ordering or the >>> formulas need to be somewhat complex. Provided the tables have the same >>> number of records in the same order, table formulas are usually fairly >>> easy. >>> >>> I did think about writing some elisp functions to use in my table >>> formulas to make things easier, but then decided I was just re-inventing >>> and well defined database solution and figured when I need it, just use >>> sqlite. However, it has been a while since I needed this level of >>> complexity, so perhaps things have moved on and there are better ways >>> now. >>> >>> -- >>> Tim Cross >>> >>>