Re: LLM Experiments, Part 1: Corrections

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: "João Távora" <joaotavora@gmail.com>
To: Andrew Hyatt <ahyatt@gmail.com>
Cc: emacs-devel@gnu.org, sskostyaev@gmail.com
Subject: Re: LLM Experiments, Part 1: Corrections
Date: Tue, 23 Jan 2024 01:36:28 +0000	[thread overview]
Message-ID: <CALDnm51D8hFRG7wqDr5pdfBeA9o9YntFW7nLONHdr-DV_b6eSg@mail.gmail.com> (raw)
In-Reply-To: <m2il3mj961.fsf@gmail.com>

On Mon, Jan 22, 2024 at 4:16 AM Andrew Hyatt <ahyatt@gmail.com> wrote:
>
>
> Hi everyone,

Hi Andrew,

I have some ideas to share, though keep in mind this is mainly
thinking out loud and I'm largely an LLM newbie.

> Question 1: Does the llm-flows.el file really belong in the llm
> package?

Maybe, but keep the functions isolated.  I'd be interested in
a diff-mode flow which is different from this ediff-one you
demo.  So it should be possible to build both.

The diff-mode flow I'm thinking of would be similar to the
diff option of LSP-proposed edits if your code, btw.  See the
variable eglot-confirm-server-edits for an idea of the interface.

> Question 3: How should we deal with context? The code that has the
> text corrector doesn't include surrounding context (the text
> before and after the text to rewrite), but it usually is helpful.
> How much context should we add?

Karthik of gptel.el explained to me that this is one of
the biggest challenges of working with LLMs, and that GitHub
Copilot and other code-assistance tools work by sending
not only the region you're interested in having the LLM help you
with but also some auxiliary functions and context discovered
heuristically.  This is potentially complex, and likely doesn't
belong in the your base llm.el but it should be possible to do
somehow with an application build on top of llm.el (Karthik
suggests tree-sitter or LSP's reference finding abilities to
discover what's nearest in terms of context).

In case noone mentinoed this already, i think a good logging
facility is essential.  This could go in the base llm.el library.
I'm obviously biased towards my own jsonrpc.el logging facilities,
where a separate easy-to-find buffer for each JSON-RPC connection
lists all the JSON transport-level conversation details in a
consistent format.  jsonrpc.el clients can also use those logging
facilities to output application-level details.

In an LLM library, I suppose the equivalent to JSON transport-level
details are the specific API calls to each provider, how it gathers
context, prompts, etc.  Those would be distinct for each LLM.
A provider-agnosntic application built on top of llm.el's abstraction
could log in a much more consistent way.

So my main point regarding logging is that is should live in a
readable log buffer, so it's easy to piece together what happened
and debug.  Representing JSON as pretty-printed plists is often
very practical in my experience (though a bit slow if loads of text
is to be printed).

Maybe these logging transcripts could even be used to produce
automated tests, in case there's a way to achieve any kind of
determinism with LLMs (not sure if there is).

Similarly to logging, it would be good to have some kind
of visual feedback of what context is being sent in each
LLM request.  Like momentarily highlighting the regions
to be sent alongside the prompt.  Sometimes that is
not feasible. So it could make sense to summarize that extra
context in a few lines shown in the minibuffer perhaps.  Like
"lines 2..10 from foo.cpp\nlines42-420 from bar.cpp"

So just my 200c,
Good luck,
João

next prev parent reply	other threads:[~2024-01-23  1:36 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <m2il3mj961.fsf@gmail.com>
2024-01-22 18:50 ` LLM Experiments, Part 1: Corrections Sergey Kostyaev
2024-01-22 20:31   ` Andrew Hyatt
2024-01-22 22:06     ` T.V Raman
2024-01-23  0:52       ` Andrew Hyatt
2024-01-23  1:57         ` T.V Raman
2024-01-23  3:00         ` Emanuel Berg
2024-01-23  3:49           ` Andrew Hyatt
2024-01-23  1:36 ` João Távora [this message]
2024-01-23  4:17   ` T.V Raman
2024-01-23 19:19   ` Andrew Hyatt
2024-01-24  1:26 ` contact
2024-01-24  4:17   ` T.V Raman
2024-01-24 15:00     ` Andrew Hyatt
2024-01-24 15:14       ` T.V Raman
2024-01-24 14:55   ` Andrew Hyatt
2024-01-24  2:28 ` Karthik Chikmagalur
2024-05-20 17:28 ` Juri Linkov
2024-01-22 12:57 Psionic K
2024-01-22 20:21 ` Andrew Hyatt
2024-01-23  6:49   ` Psionic K
2024-01-23 15:19     ` T.V Raman
2024-01-23 19:36     ` Andrew Hyatt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALDnm51D8hFRG7wqDr5pdfBeA9o9YntFW7nLONHdr-DV_b6eSg@mail.gmail.com \
    --to=joaotavora@gmail.com \
    --cc=ahyatt@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=sskostyaev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).