Re: best practices query: non-emacs packages based on tangled source

emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed

From: Tom Gillespie <tgbugs@gmail.com>
To: Greg Minshall <minshall@umich.edu>
Cc: emacs-orgmode <emacs-orgmode@gnu.org>, TEC <tecosaur@gmail.com>
Subject: Re: best practices query: non-emacs packages based on tangled source
Date: Sun, 18 Oct 2020 02:01:20 -0400	[thread overview]
Message-ID: <CA+G3_PNo8i8U9rOrC8BBydjEv9=LKgtJBopwX9J6vyNXdnjtXg@mail.gmail.com> (raw)
In-Reply-To: <87h7qup5zs.fsf@gmail.com>

Hi Greg,

Great question. This came out a bit longer than I anticipated since I
wrote up a couple of relevant workflows. Sync between org source
blocks and tangled code is something that I think needs improvement. I
have covered the difference in semantics between tangled code and
babel evaluated code (along with some others factors) since it can
have an impact on what workflows you might choose.

Following on Timothy's suggestion, one key thing that I think is
needed is the ability to detangle nested and arbitrary code. Detangle
of code tangled with :comments noweb is not fully implemented. A full
detangling implementation would make it easier and safer to
automatically detangle back to the org source blocks using a
pre-commit hook or similar. Detangling from the org sources is also
something that needs to be implemented for this to work. I think that
a more complete detangling implementation could go a long way toward
making it easier for those who are not used to org to commit to a
project. In the absence of this, I have found that most of my existing
workflows actively avoid keeping tangled code and org sources tracked
in git at the same time unless absolutely necessary, and even with git
there to back me up I have shot myself in the foot tangling over files
that I forgot to detangle.

Below are a few examples. In all cases I have had to consciously work
around the issue of having tangled code that is outside the source of
truth that is the org file.

As an additional note before the examples, I have found that the trade
offs when tangling code also depend on the language you are using. For
example, I consider some languages, such as Python, to be obligate
tanglers since their semantics conflate modules and files. Org babel
might be able to work around this in some cases, but it would mean,
for example, that ob-python would have to explicitly compensate for
this deficiency by implementing the ability to treat source blocks as
modules to be loaded into a session or somehow pulled in during the
prologue by pre-parsing blocks to look for import statements, etc. The
deficiencies of a language mean that if you want certain functionality
for that language then org babel can't just treat the code as text,
and might have to go to great lengths to try to keep the semantics of
babel evaluation and of tangled code aligned.

The workflow that I have found to be the most reasonable I developed
while working on an elisp project (it is public but I'm not quite
ready to link it on this list). In this workflow I add a hook via
~(add-hook 'before-save-hook #'org-babel-tangle nil t)~ for any org
file that should be tangled, that way I don't have to worry about
whether I remember to tangle etc. However, there are a couple of
issues. It does not work in reverse, you still always have to edit the
org blocks. It will become annoyingly slow if you have many blocks to
tangle. You really want it to run only when the source blocks change,
not whenever the org files changes. Finally, The exact semantics of
tangling multiple blocks to the same file can have a major impact on
performance. So even this best case is not the greatest and doesn't
enable your specific use case (detangle issues in particular are a
show stopper). Since this is an elisp project I have to run tests on
the tangled file in a separate instance of Emacs to ensure that it
works as expected.

In this project I also have a completely unreadable file that is valid
and executable in 3 languages. Tangled blocks that are easily readable
in the org sources are commented out intentionally in the tangled
file. This is a worst case for detangling. I suspect that it can be
done, but it will push the detangling implementation to the limit. At
the moment, there is no way to detangle this file back to readable
form at all, and it is not clear that anyone should try to edit the
tangled file in the first place. All this to say, if we reason from
this extreme example, maybe the best thing is to tangle at the last
possible moment, never keep the tangled form under version control
etc. Unfortunately the use case for this file is to bootstrap Emacs,
which means that in order for it to be useful it _must_ be tangled and
put under version control since the systems it needs to run on don't
have Emacs.

The worst experience I have had was when I was developing python code
where I needed to capture the output of the block in order to populate
tables. Over time the code grew to the point where it needed to become
a library. This is where Python being an obligate tangler reared its
head, and the differences in semantics between tangled and evaluated
code became a major pain. Combine this with the fact that my testing
workflows in Python essentially require me to edit the tangled code
for me not to lose my mind, but I would also forget to detangle, and
sometimes overwrite on retangle, and I was quite unhappy.

As a result I eventually gave up and moved all python development out
of the org file except for the few critical parts that were needed to
produce the tables. There were simply too many steps between modifying
a file and being able to test changes (my time writing elisp and
common lisp has massively reduced my tolerance for this kind of
thing). The risk of forgetting whether I had or needed to tangle, or
detangle went to zero. I was much more productive and could do sane
things like safely import the python code into other modules etc.

Even if you automatically tangle code to the file system on save, you
still have to be able to use it from the org file. In obligate
tanglers like Python this means that you must figure out how to do
something like setting ~PYTHONPATH~ so that org babel can find
it. There weren't good ways to do this inside a single org file and
adding a random path to your .bashrc for each one of these would be a
nightmare not to mention that it completely defeats the purpose of
using org to simplify documentation of code (this is one use case for
the elisp project discussed above).

A slightly better experience with Python is one where I have an
existing code base with a single module containing most of the
implementation. I then wrote a developer guide as an org mode file and
I tangle that code to a submodule.
https://github.com/SciCrunch/sparc-curation/blob/master/docs/developer-guide.org#datasets

This was not nearly as bad as the other python project because I wrote
each source block as if it were its own complete file and module. This
severely limited the style that I could use and recombination and
reuse within the org file is difficult (as noted). I still have to
tangle everything before I test, and I have to (if I have not already)
add the tangled files to .gitignore so that other developers cannot
accidentally edit them (runnable documentation is cool, except when
people don't read it and start modifying just the runnable part). To
compensate for this I now have a build time dependency on Emacs (major
WTF right there ya?) that all the python packaging tools know nothing
about, just so that there is only a single source of truth for the
python code.

Now, you would think that I could use the source block header
arguments with the modularized example code to run the code via org
babel directly, but it is not really possible because when tangled I
import code from other source blocks as a module, but in org babel
that means those files would still have to be tangled, otherwise the
python import system could not find the code. Maybe ob-python could be
enhanced to dynamically load other source blocks as modules?  I'm sure
that other languages have similar issues.

Just to be a bit less harsh on Python, I have had similar issues
developing code in bash that needed to be sourced in order for the
functions to be available for use in a shell. Sometimes I would find
myself accidentally editing the tangled source and forget to detangle,
or was unable to detangle because I was using :comments noweb. As
mentioned above, I think this is the single largest issue preventing
sane workflows for keeping tangled files and org sources in sync.

Another example of how the quality and experience of the workflow
depends on the language you are working in. I started a project
(https://github.com/tgbugs/git-share) in common lisp that includes
other languages such as sql, bash, and elisp. I wanted to write
everything in a single org file. In this case I have been able to
develop two separate workflows. For production release I tangle all
the files and then run ~save-lisp-and-die~ in sbcl. For development I
have a workflow where everything is set up and runs via slime and all
modifications can be made and run via org babel directly. For this
project I also explicitly never commit the tangled code to git.

This winds up being less of an issue for this project compared to the
others because dumping the files to disk is only needed to create the
production build (and might not even be required for that). One
disadvantage of this approach (which also applies to a pure elisp
babel approach) is that there aren't concrete source files so you
can't use ~xref-find-definitions~ or ~slime-edit-definition~ to jump
to a definition. I imagine that this is something that could be fixed
though, so that the source location for definitions could point to
lines in an org file.

This kind of split setup is really only possible in languages where
the semantics for an org babel session are the same as the semantics
when tangled (common lisp and elisp being two examples). As mentioned,
in Python this is virtually impossible because the semantics of the
babel session and the semantics of a tangled file that start from the
same block(s) are radically different. This is understandable due to
the fact that the CL community put an enormous amount of effort into
making sure that compiled code and interpreted code, top level and
nested code had semantics that were as close to each other as possible
(and it shows).

Best!
Tom

next prev parent reply	other threads:[~2020-10-18  6:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-15 18:11 best practices query: non-emacs packages based on tangled source Greg Minshall
2020-10-15 21:22 ` Tim Cross
2020-10-16  9:09 ` Eric S Fraga
2020-10-16 14:52 ` Diego Zamboni
2020-10-16 15:04 ` TEC
2020-10-18  6:01   ` Tom Gillespie [this message]
2020-10-27 17:38     ` Greg Minshall
2020-10-27 18:09 ` TRS-80
2020-10-27 22:28   ` Dr. Arne Babenhauserheide
2020-10-28  8:29     ` Immanuel Litzroth
2020-10-28 13:25       ` Greg Minshall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+G3_PNo8i8U9rOrC8BBydjEv9=LKgtJBopwX9J6vyNXdnjtXg@mail.gmail.com' \
    --to=tgbugs@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=minshall@umich.edu \
    --cc=tecosaur@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).