LLM Experiments, Part 1: Corrections

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* LLM Experiments, Part 1: Corrections
@ 2024-01-22 12:57 Psionic K
  2024-01-22 20:21 ` Andrew Hyatt
  0 siblings, 1 reply; 22+ messages in thread
From: Psionic K @ 2024-01-22 12:57 UTC (permalink / raw)
  To: ahyatt, Emacs developers

> I think things have to be synchronous here.

 Snapshot isolation is the best strategy for merging here.  We don't
know what user commands affected the region in question, so using undo
states to merge might need to undo really arbitrary user commands.  To
snapshot isolate, basically you store a copy of the buffer text and
hold two markers where that text was.  You can merge the result if it
arrives on time and then diff the snapshot with the buffer text
between the markers.  If things are too different for a valid merge,
you can give up and drop the results.

These days various CRDT (conflict-free replicated data type)
treatments have great insights into dealing with much worse problems
of multiple asynchronous writers, and it's a good place to look.
There is a crdt.el package for some inspiration.

But definitely not synchronous.

As a package author, I would want to treat my LLM like a fancy
process.  I create it, I handle results.  I have a merging strategy
(this is mainly up to the client, not the library), but I don't care
about the asynchronous details and I don't want to be tied to each
call.

> Question 6

A rock solid library that sticks to the domain model is best for
ecosystem growth.  When that doesn't happen, we get four or five 75%
finished packages because every author is having to figure out and
integrate their high level features with so many backends.  If you
want to work on high level things, build a client for your library and
experience both sides.

Every model will have some mostly static configuration, dynamic
arguments that could change all the time but in practice change just a
few times, and then the input data.  The static configuration, if
absolutely necessary, can be updated for one call via dynamic binding.
The dynamic arguments should be abstracted into a "context" object
that the model backend figures out how to translate into a valid API
call.  The input data is an arbitrary bundle of whatever that model
type consumes as input.  The library user will want to get a valid
context of the dynamic arguments from the library, enabling them to
make changes to it in subsequent calls, but they don't really want to
touch it that much.

As a package author, I would want to focus on integrating outputs and
piping in inputs.  I don't want to write a UI for tuning the model
parameters.  If the model can ask the user to make adjustments and
just give me a record of their decision I can use later, that would be
fantastic.  I should be able to integrate more closely with backends I
know about but otherwise just call with the provided context and my
inputs.

Providers offer multiple models.  As a library user, it's inconvenient
if I have to go through long incantations to get each context that
represents the capability to make valid calls for the provider.  I
want to initialize once and then use an existing context to pull out
the correct context based on the input or output type I need, and then
make refinements that are specific to a call, such as changing quality
or entropy etc.  Input or output type and settings that tune the call
are two different things.  Settings are mostly provider-specific
argument data that doesn't affect the validity of connecting one model
to another.  Input and output type affect which pipes can be connected
to which other pipes.  This distinction between input or output types
and other arguments become important in composition.  I should be able
to connect any string to string model with any other model that
handles strings no matter what the other settings are.

Integrating these systems will be more like distributed streaming
programming than feeding inputs to a GPU with tight synchronization
and everything under our watch, although a local model might work that
way inside its own box.  We should treat them like unreliable external
services.  A call to the model is a command.  When I send a command, I
should store how to handle the reply, but I shouldn't couple myself to
it with nested callbacks or async, which we fortunately don't have
anyway.  The call data just goes into a pile.  If the reply shows up
and it matches a call, we handle it.  If things time out, we
dead-letter and drop the record of making a call.  This is a very good
way to get around the limitations of the process as our main
asynchronous primitive for now.  It works for big distributed services
which by their very nature cannot lock each other or share memory.  It
will work for connecting many models to each other.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
       [not found] <m2il3mj961.fsf@gmail.com>
@ 2024-01-22 18:50 ` Sergey Kostyaev
  2024-01-22 20:31   ` Andrew Hyatt
  2024-01-23  1:36 ` João Távora
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Sergey Kostyaev @ 2024-01-22 18:50 UTC (permalink / raw)
  To: Andrew Hyatt; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 8358 bytes --]

Hello everyone,
This is cool Idea, I will definitely use it in ellama. But I have some suggestions:
1. Every prompt should be customizable. And since llm is low level library, it should be customizable at function call time (to manage custom variables on caller’s side). Or easier will be to reimplement this functionality.
2. Maybe it will be useful to make corrections other way (not instead of current solution, but together with it): user press some keybinding and change prompt or other parameters and redo query. Follow up revision also useful, so don’t remove it.

About your questions:
1. I think it should be different require calls, but what package it will be - doesn’t matter for me. Do it anyhow you will be comfortable with.
2. I don’t know fsm library, but understand how to manage finite state machines. I would prefer simpler code. If it will be readable with this library - ok, if without - also fine.
3. This should have small default length (256 - 1000 tokens or words or something like that) and be extendable by caller’s code. This should be different in different scenarios. Need maximum flexibility here.
4. 20 seconds of blocked Emacs is way too long. Some big local models are very good, but not very fast. For example mixtral 8x7b instruct provides great quality, but not very fast. I prefer not break user’s flow by blocking. I think configurable ability to show generation stream (or don’t show if user don’t want it) will be perfect.
5. See https://github.com/karthink/gptel as an example of flexibility.
6. Emacs has great explainability. There are ‘M-x’ commands, which-key integration for faster remembering keybindings. And we can add other interface (for example, grouping actions by meaning with completing-read, for example).

Best regards,

Sergey Kostyaev


> 22 янв. 2024 г., в 11:15, Andrew Hyatt <ahyatt@gmail.com> написал(а):
> 
> 
> Hi everyone,
> 
> This email is a demo and a summary of some questions which could use your feedback in the context of using LLMs in Emacs, and specifically the development of the llm GNU ELPA package. If that interests you, read on.
> 
> I'm starting to experiment with what LLMs and Emacs, together, are capable of. I've written the llm package to act as a base layer, allowing communication various LLMs: servers, local LLMs, free, and nonfree. ellama, also a GNU ELPA package, is also showing some interesting functionality - asking about a region, translating a region, adding code, getting a code review, etc.
> 
> My goal is to take that basic approach that ellama is doing (providing useful functionality beyond chat that only the LLM can give), and expand it to a new set of more complicated interactions. Each new interaction is a new demo, and as I write them, I'll continue to develop a library that can support these more complicated experiences. The demos should be interesting, and more importantly, developing them brings up interesting questions that this mailing list may have some opinions on.
> 
> To start, I have a demo of showing the user using an LLM to rewrite existing text.
> 
> <rewrite-demo.gif>
> I've created a function that will ask for a rewrite of the current region. The LLM offers a suggestion, which the user can review with ediff, and ask for a revision. This can continue until the user is satisfied, and then the user can accept the rewrite, which will replace the region.
> 
> You can see the version of code in a branch of my llm source here:
> https://raw.githubusercontent.com/ahyatt/llm/flows/llm-flows.el
> 
> And you can see the code that uses it to write the text corrector function here:
> https://gist.githubusercontent.com/ahyatt/63d0302c007223eaf478b84e64bfd2cc/raw/c1b89d001fcbe948cf563d5ee2eeff00976175d4/llm-flows-example.el
> 
> There's a few questions I'm trying to figure out in all these demos, so let me state them and give my current guesses.  These are things I'd love feedback on.
> 
> Question 1: Does the llm-flows.el file really belong in the llm package?  It does help people code against llms, but it expands the scope of the llm package from being just about connecting to different LLMs to offering a higher level layer necessary for these more complicated flows.  I think this probably does make sense, there's no need to have a separate package just for this one part.
> 
> Question 2: What's the best way to write these flows with multiple stages, in which some stages sometimes need to be repeated? It's kind of a state machine when you think about it, and there's a state machine GNU ELPA library already (fsm). I opted to not model it explicitly as a state machine, optimizing instead to just use the most straightforward code possible.
> 
> Question 3: How should we deal with context? The code that has the text corrector doesn't include surrounding context (the text before and after the text to rewrite), but it usually is helpful. How much context should we add? The llm package does know about model token limits, but more tokens add more cost in terms of actual money (per/token billing for services, or just the CPU energy costs for local models). Having it be customizable makes sense to some extent, but users are not expected to have a good sense of how much context to include. My guess is that we should just have a small amount of context that won't be a problem for most models. But there's other questions as well when you think about context generally: How would context work in different modes? What about when context may spread in multiple files? It's a problem that I don't have any good insight into yet.
> 
> Question 4: Should the LLM calls be synchronous? In general, it's not great to block all of Emacs on a sync call to the LLM. On the other hand, the LLM calls are generally fast enough (a few seconds, the current timeout is 20s) that the user isn't going to be accomplishing much while the LLM works, and is likely to get into a state where the workflow is waiting for their input and we have to get them back to a state where they are interacting with the workflow.  Streaming calls are a way that works well for just getting a response from the LLM, but when we have a workflow, the response isn't useful until it is processed (in the demo's case, until it is an input into ediff-buffers).  I think things have to be synchronous here.
> 
> Question 5: Should there be a standard set of user behaviors about editing the prompt? In another demo (one I'll send as a followup), with a universal argument, the user can edit the prompt, minus context and content (in this case the content is the text to correct). Maybe that should always be the case. However, that prompt can be long, perhaps a bit long for the minibuffer. Using a buffer instead seems like it would complicate the flow. Also, if the context and content is embedded in that prompt, they would have to be replaced with some placeholder. I think the prompt should always be editable, we should have some templating system. Perhaps emacs already has some templating system, and one that can pass arguments for number of tokens from context would be nice.
> 
> Question 6: How do we avoid having a ton of very specific functions for all the various ways that LLMs can be used? Besides correcting text, I could have had it expand it, summarize it, translate it, etc. Ellama offers all these things (but without the diff and other workflow-y aspects). I think these are too much for the user to remember. It'd be nice to have one function when the user wants to do something, and we work out what to do in the workflow. But the user shouldn't be developing the prompt themselves; at least at this point, it's kind of hard to just think of everything you need to think of in a good prompt.  They need to be developed, updated, etc. What might be good is a system in which the user chooses what they want to do to a region as a secondary input, kind of like another kind of execute-extended-command.
> 
> These are the issues as I see them now. As I continue to develop demos, and as people in the list give feedback, I'll try to work through them.
> 
> BTW, I plan on continuing these emails, one for every demo, until the questions seem worked out. If this mailing list is not the appropriate place for this, let me know.


[-- Attachment #2: Type: text/html, Size: 8999 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-22 12:57 LLM Experiments, Part 1: Corrections Psionic K
@ 2024-01-22 20:21 ` Andrew Hyatt
  2024-01-23  6:49   ` Psionic K
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Hyatt @ 2024-01-22 20:21 UTC (permalink / raw)
  To: Psionic K; +Cc: Emacs developers

 
On 22 January 2024 21:57, Psionic K <psionik@positron.solutions> 
wrote: 

    > I think things have to be synchronous here. 
     
     Snapshot isolation is the best strategy for merging here.  We 
     don't 
    know what user commands affected the region in question, so 
    using undo states to merge might need to undo really arbitrary 
    user commands.  To snapshot isolate, basically you store a 
    copy of the buffer text and hold two markers where that text 
    was.  You can merge the result if it arrives on time and then 
    diff the snapshot with the buffer text between the markers. 
    If things are too different for a valid merge, you can give up 
    and drop the results.   These days various CRDT (conflict-free 
    replicated data type) treatments have great insights into 
    dealing with much worse problems of multiple asynchronous 
    writers, and it's a good place to look.  There is a crdt.el 
    package for some inspiration.

This is a good tip, thank you.
     
    But definitely not synchronous.  
I think the changing of the text out from under you is just one 
problem to solve, but the other is that we start some llm-powered 
workflow, then the user is free to do whatever they want for just 
like 10 seconds. It requires both us and the user to do more - we 
would need to communicate to the user that something is awaiting 
their input. Then the user would need to run a command to go back 
to the experience we want to put them in (in this case, an ediff 
session). It's a bit weird, and I think it's a bit too 
complicated.  I'm still leaning to the synchronous side, but it's 
worth trying out an async solution and seeing just how bad it is.

    As a package author, I would want to treat my LLM like a fancy 
    process.  I create it, I handle results.  I have a merging 
    strategy (this is mainly up to the client, not the library), 
    but I don't care about the asynchronous details and I don't 
    want to be tied to each call.

The LLM library does work like this already.  It has async methods 
that have callbacks.  This is about higher-level functionality.

     
    > Question 6 
     
    A rock solid library that sticks to the domain model is best 
    for ecosystem growth.  When that doesn't happen, we get four 
    or five 75% finished packages because every author is having 
    to figure out and integrate their high level features with so 
    many backends.  If you want to work on high level things, 
    build a client for your library and experience both sides.

Totally agree, and that's what I've started and will continue to 
do with these demos, which inform the development of the llm-flows 
layer I'm building.

     
    Every model will have some mostly static configuration, 
    dynamic arguments that could change all the time but in 
    practice change just a few times, and then the input data. 
    The static configuration, if absolutely necessary, can be 
    updated for one call via dynamic binding.  The dynamic 
    arguments should be abstracted into a "context" object that 
    the model backend figures out how to translate into a valid 
    API call.  The input data is an arbitrary bundle of whatever 
    that model type consumes as input.  The library user will want 
    to get a valid context of the dynamic arguments from the 
    library, enabling them to make changes to it in subsequent 
    calls, but they don't really want to touch it that much.   As 
    a package author, I would want to focus on integrating outputs 
    and piping in inputs.  I don't want to write a UI for tuning 
    the model parameters.  If the model can ask the user to make 
    adjustments and just give me a record of their decision I can 
    use later, that would be fantastic.  I should be able to 
    integrate more closely with backends I know about but 
    otherwise just call with the provided context and my inputs. 

Agreed, such adjustments should be part of a common layer.

 
    Providers offer multiple models.  As a library user, it's 
    inconvenient if I have to go through long incantations to get 
    each context that represents the capability to make valid 
    calls for the provider.  I want to initialize once and then 
    use an existing context to pull out the correct context based 
    on the input or output type I need, and then make refinements 
    that are specific to a call, such as changing quality or 
    entropy etc.  Input or output type and settings that tune the 
    call are two different things.  Settings are mostly 
    provider-specific argument data that doesn't affect the 
    validity of connecting one model to another.  Input and output 
    type affect which pipes can be connected to which other pipes. 
    This distinction between input or output types and other 
    arguments become important in composition.  I should be able 
    to connect any string to string model with any other model 
    that handles strings no matter what the other settings are.  
I think we're on the same page here. Anything for quality tuning 
should be generic, I hope - perhaps a knob on quality to price 
tradeoff that can be used to many things, including understanding 
how much context to provide.  The rest is already generic in the 
llm package.

    Integrating these systems will be more like distributed 
    streaming programming than feeding inputs to a GPU with tight 
    synchronization and everything under our watch, although a 
    local model might work that way inside its own box.  We should 
    treat them like unreliable external services.  A call to the 
    model is a command.  When I send a command, I should store how 
    to handle the reply, but I shouldn't couple myself to it with 
    nested callbacks or async, which we fortunately don't have 
    anyway.  The call data just goes into a pile.  If the reply 
    shows up and it matches a call, we handle it.  If things time 
    out, we dead-letter and drop the record of making a call. 
    This is a very good way to get around the limitations of the 
    process as our main asynchronous primitive for now.  It works 
    for big distributed services which by their very nature cannot 
    lock each other or share memory.  It will work for connecting 
    many models to each other. 

I'm not sure I understand this part. Yes, we can have a system 
that stores callbacks in some hashmap or something, and that's 
better than tying it directly to a specific process. However, 
something must always be understanding when the process is done, 
or if it has timed out, and that's the process. I'm not sure how 
the centralized storage reduces the coupling to the process. But 
if I'm reading this correctly, it seems like an argument for using 
state machines with the centralized storage acting as a driver for 
state changes, which may be a good way to think about this.

Thank you for your thorough and thoughtful response!



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-22 18:50 ` Sergey Kostyaev
@ 2024-01-22 20:31   ` Andrew Hyatt
  2024-01-22 22:06     ` T.V Raman
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Hyatt @ 2024-01-22 20:31 UTC (permalink / raw)
  To: Sergey Kostyaev; +Cc: emacs-devel

 
On 23 January 2024 01:50, Sergey Kostyaev <sskostyaev@gmail.com> 
wrote: 

    Hello everyone, This is cool Idea, I will definitely use it in 
    ellama. But I have some suggestions: 1. Every prompt should be 
    customizable. And since llm is low level library, it should be 
    customizable at function call time (to manage custom variables 
    on caller’s side). Or easier will be to reimplement this 
    functionality.  2. Maybe it will be useful to make corrections 
    other way (not instead of current solution, but together with 
    it): user press some keybinding and change prompt or other 
    parameters and redo query. Follow up revision also useful, so 
    don’t remove it.   About your questions: 1. I think it should 
    be different require calls, but what package it will be - 
    doesn’t matter for me. Do it anyhow you will be comfortable 
    with.  2. I don’t know fsm library, but understand how to 
    manage finite state machines. I would prefer simpler code. If 
    it will be readable with this library - ok, if without - also 
    fine.

Agree to all the above.  Seems worth trying out fsm, but not sure 
how much it will help.
 
    3. This should have small default length (256 - 1000 tokens or 
    words or something like that) and be extendable by caller’s 
    code. This should be different in different scenarios. Need 
    maximum flexibility here.

Agreed, probably a small default length is sufficient - but it 
might be good to have options for maximizing the length.  The 
extensibility here may be tricky to design, but it's important.
 
    4. 20 seconds of blocked Emacs is way too long. Some big local 
    models are very good, but not very fast. For example mixtral 
    8x7b instruct provides great quality, but not very fast. I 
    prefer not break user’s flow by blocking. I think configurable 
    ability to show generation stream (or don’t show if user don’t 
    want it) will be perfect.

How do you see this working in the demo I shared, though? 
Streaming wouldn't help at all, AFAICT. If you don't block, how 
does the user get to the ediff screen? Does it just pop up in the 
middle of whatever they were doing? That seems intrusive. Better 
would be to message to the user that they can do something to get 
back into the workflow. Still, at least for me, I'd prefer to just 
wait.  I'm doing something that I'm turning my attention to, so 
even if it takes a while, I want to maintain my focus on that 
task.  At least as long as I don't get bored, but LLMs are fast 
enough that I'm not losing focus here.
 
    5. See https://github.com/karthink/gptel as an example of 
    flexibility.

Agreed, it's a very full system for prompt editing.

    6. Emacs has great explainability. There are ‘M-x’ commands, which-key integration for faster remembering keybindings. And we can
    add other interface (for example, grouping actions by meaning with completing-read, for example).
    
    Best regards,
    
    Sergey Kostyaev
    
     22 янв. 2024 г., в 11:15, Andrew Hyatt <ahyatt@gmail.com> написал(а):
    
     Hi everyone,
    
     This email is a demo and a summary of some questions which could use your feedback in the context of using LLMs in Emacs, and
     specifically the development of the llm GNU ELPA package. If that interests you, read on.
    
     I'm starting to experiment with what LLMs and Emacs, together, are capable of. I've written the llm package to act as a base layer,
     allowing communication various LLMs: servers, local LLMs, free, and nonfree. ellama, also a GNU ELPA package, is also
     showing some interesting functionality - asking about a region, translating a region, adding code, getting a code review, etc.
    
     My goal is to take that basic approach that ellama is doing (providing useful functionality beyond chat that only the LLM can give),
     and expand it to a new set of more complicated interactions. Each new interaction is a new demo, and as I write them, I'll continue
     to develop a library that can support these more complicated experiences. The demos should be interesting, and more importantly,
     developing them brings up interesting questions that this mailing list may have some opinions on.
    
     To start, I have a demo of showing the user using an LLM to rewrite existing text.
    
     <rewrite-demo.gif>
     I've created a function that will ask for a rewrite of the current region. The LLM offers a suggestion, which the user can review with
     ediff, and ask for a revision. This can continue until the user is satisfied, and then the user can accept the rewrite, which will replace
     the region.
    
     You can see the version of code in a branch of my llm source here:
     https://raw.githubusercontent.com/ahyatt/llm/flows/llm-flows.el
    
     And you can see the code that uses it to write the text corrector function here:
     https://gist.githubusercontent.com/ahyatt/63d0302c007223eaf478b84e64bfd2cc/raw/c1b89d001fcbe948cf563d5ee2eeff00976175d4/llm-flows-example.el
     
    
     There's a few questions I'm trying to figure out in all these demos, so let me state them and give my current guesses.  These are
     things I'd love feedback on.
    
     Question 1: Does the llm-flows.el file really belong in the llm package?  It does help people code against llms, but it expands the
     scope of the llm package from being just about connecting to different LLMs to offering a higher level layer necessary for these more
     complicated flows.  I think this probably does make sense, there's no need to have a separate package just for this one part.
    
     Question 2: What's the best way to write these flows with multiple stages, in which some stages sometimes need to be repeated? It's
     kind of a state machine when you think about it, and there's a state machine GNU ELPA library already (fsm). I opted to not model
     it explicitly as a state machine, optimizing instead to just use the most straightforward code possible.
    
     Question 3: How should we deal with context? The code that has the text corrector doesn't include surrounding context (the text
     before and after the text to rewrite), but it usually is helpful. How much context should we add? The llm package does know about
     model token limits, but more tokens add more cost in terms of actual money (per/token billing for services, or just the CPU energy
     costs for local models). Having it be customizable makes sense to some extent, but users are not expected to have a good sense of
     how much context to include. My guess is that we should just have a small amount of context that won't be a problem for most
     models. But there's other questions as well when you think about context generally: How would context work in different modes?
     What about when context may spread in multiple files? It's a problem that I don't have any good insight into yet.
    
     Question 4: Should the LLM calls be synchronous? In general, it's not great to block all of Emacs on a sync call to the LLM. On the
     other hand, the LLM calls are generally fast enough (a few seconds, the current timeout is 20s) that the user isn't going to be
     accomplishing much while the LLM works, and is likely to get into a state where the workflow is waiting for their input and we
     have to get them back to a state where they are interacting with the workflow.  Streaming calls are a way that works well for just
     getting a response from the LLM, but when we have a workflow, the response isn't useful until it is processed (in the demo's case,
     until it is an input into ediff-buffers).  I think things have to be synchronous here.
    
     Question 5: Should there be a standard set of user behaviors about editing the prompt? In another demo (one I'll send as a
     followup), with a universal argument, the user can edit the prompt, minus context and content (in this case the content is the text to
     correct). Maybe that should always be the case. However, that prompt can be long, perhaps a bit long for the minibuffer. Using a
     buffer instead seems like it would complicate the flow. Also, if the context and content is embedded in that prompt, they would have
     to be replaced with some placeholder. I think the prompt should always be editable, we should have some templating system.
     Perhaps emacs already has some templating system, and one that can pass arguments for number of tokens from context would be
     nice.
    
     Question 6: How do we avoid having a ton of very specific functions for all the various ways that LLMs can be used? Besides
     correcting text, I could have had it expand it, summarize it, translate it, etc. Ellama offers all these things (but without the diff and
     other workflow-y aspects). I think these are too much for the user to remember. It'd be nice to have one function when the user wants
     to do something, and we work out what to do in the workflow. But the user shouldn't be developing the prompt themselves; at least
     at this point, it's kind of hard to just think of everything you need to think of in a good prompt.  They need to be developed, updated,
     etc. What might be good is a system in which the user chooses what they want to do to a region as a secondary input, kind of like
     another kind of execute-extended-command.
    
     These are the issues as I see them now. As I continue to develop demos, and as people in the list give feedback, I'll try to work
     through them.
    
     BTW, I plan on continuing these emails, one for every demo, until the questions seem worked out. If this mailing list is not the
     appropriate place for this, let me know.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-22 20:31   ` Andrew Hyatt
@ 2024-01-22 22:06     ` T.V Raman
  2024-01-23  0:52       ` Andrew Hyatt
  0 siblings, 1 reply; 22+ messages in thread
From: T.V Raman @ 2024-01-22 22:06 UTC (permalink / raw)
  To: Andrew Hyatt; +Cc: Sergey Kostyaev, emacs-devel

Some more related thoughts below, mostly thinking aloud:
1. From using gptel and ellama against the same model, I see different
   style responses, and that kind of inconsistency would be good to get
   a handle on; LLMs are difficult enough to figure out re what they're
   doing without this additional variation.
2. Package LLM has the laudible goal of bridgeing between models and
   front-ends, and this is going to be vital.
3. (1,2) above lead  to the following question:
4. Can we write down  a list of common configuration vars --- here
  common across the model axis. Make  it a union of all such params.
5. Next, write down a list of all configurable params on the UI side.
6. When stable, define a single data-structure in elisp that acts as
   the bridge between the front-end emacs UI and the LLM module.
7. Finally factor out  the settings of that structure and make it
   possible to create "profiles" so that one can predictably experiment
   across front-ends and models.
-- 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-22 22:06     ` T.V Raman
@ 2024-01-23  0:52       ` Andrew Hyatt
  2024-01-23  1:57         ` T.V Raman
  2024-01-23  3:00         ` Emanuel Berg
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Hyatt @ 2024-01-23  0:52 UTC (permalink / raw)
  To: T.V Raman; +Cc: Sergey Kostyaev, emacs-devel

On 22 January 2024 14:06, "T.V Raman" <raman@google.com> wrote: 

    Some more related thoughts below, mostly thinking aloud: 1. 
    From using gptel and ellama against the same model, I see 
    different 
       style responses, and that kind of inconsistency would be 
       good to get a handle on; LLMs are difficult enough to 
       figure out re what they're doing without this additional 
       variation.

Is this keeping the prompt and temperature constant?  There's 
inconsistency, though, even keeping everything constant due to the 
randomness of the LLM.  I often get very different results, for 
example, to make the demo I shared, I had to run it like 5 times 
because it would either do things too well (no need to demo 
corrections), or not well enough (for example, it wouldn't follow 
my orders to put everything in one paragraph).

    2. Package LLM has the laudible goal of bridgeing between 
    models and 
       front-ends, and this is going to be vital. 
    3. (1,2) above lead  to the following question: 4. Can we 
    write down  a list of common configuration vars --- here 
      common across the model axis. Make  it a union of all such 
      params.

I think the list of common model-and-prompt configuration should 
already be already in the llm package already, but we probably 
will need to keep expanding this.

    5. Next, write down a list of all configurable params on the 
    UI side.

This will change quite a bit depending on the task. It's unclear 
how much should be configurable - for example, in the demo, I have 
ediff so the user can see and evaluate the diff. But maybe that 
should be configurable, so if the user wants to see just a diff 
output instead, perhaps that should be allowed? When I was 
thinking about a state machine, I was thinking that parts of the 
state machine might be overridable by the user, such as a "have 
the user check the results of the operation" is a state in the 
state machine that the user can just define their own function 
for.  I suspect we'll have a better idea of this after a few more 
demos.

    6. When stable, define a single data-structure in elisp that 
    acts as 
       the bridge between the front-end emacs UI and the LLM 
       module.

If I understand you correctly, this would be the configuration you 
listed in your point (4) and (5)?

    7. Finally factor out  the settings of that structure and make 
    it 
       possible to create "profiles" so that one can predictably 
       experiment across front-ends and models. 

I like this idea, thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
       [not found] <m2il3mj961.fsf@gmail.com>
  2024-01-22 18:50 ` Sergey Kostyaev
@ 2024-01-23  1:36 ` João Távora
  2024-01-23  4:17   ` T.V Raman
  2024-01-23 19:19   ` Andrew Hyatt
  2024-01-24  1:26 ` contact
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 22+ messages in thread
From: João Távora @ 2024-01-23  1:36 UTC (permalink / raw)
  To: Andrew Hyatt; +Cc: emacs-devel, sskostyaev

On Mon, Jan 22, 2024 at 4:16 AM Andrew Hyatt <ahyatt@gmail.com> wrote:
>
>
> Hi everyone,

Hi Andrew,

I have some ideas to share, though keep in mind this is mainly
thinking out loud and I'm largely an LLM newbie.

> Question 1: Does the llm-flows.el file really belong in the llm
> package?

Maybe, but keep the functions isolated.  I'd be interested in
a diff-mode flow which is different from this ediff-one you
demo.  So it should be possible to build both.

The diff-mode flow I'm thinking of would be similar to the
diff option of LSP-proposed edits if your code, btw.  See the
variable eglot-confirm-server-edits for an idea of the interface.

> Question 3: How should we deal with context? The code that has the
> text corrector doesn't include surrounding context (the text
> before and after the text to rewrite), but it usually is helpful.
> How much context should we add?

Karthik of gptel.el explained to me that this is one of
the biggest challenges of working with LLMs, and that GitHub
Copilot and other code-assistance tools work by sending
not only the region you're interested in having the LLM help you
with but also some auxiliary functions and context discovered
heuristically.  This is potentially complex, and likely doesn't
belong in the your base llm.el but it should be possible to do
somehow with an application build on top of llm.el (Karthik
suggests tree-sitter or LSP's reference finding abilities to
discover what's nearest in terms of context).

In case noone mentinoed this already, i think a good logging
facility is essential.  This could go in the base llm.el library.
I'm obviously biased towards my own jsonrpc.el logging facilities,
where a separate easy-to-find buffer for each JSON-RPC connection
lists all the JSON transport-level conversation details in a
consistent format.  jsonrpc.el clients can also use those logging
facilities to output application-level details.

In an LLM library, I suppose the equivalent to JSON transport-level
details are the specific API calls to each provider, how it gathers
context, prompts, etc.  Those would be distinct for each LLM.
A provider-agnosntic application built on top of llm.el's abstraction
could log in a much more consistent way.

So my main point regarding logging is that is should live in a
readable log buffer, so it's easy to piece together what happened
and debug.  Representing JSON as pretty-printed plists is often
very practical in my experience (though a bit slow if loads of text
is to be printed).

Maybe these logging transcripts could even be used to produce
automated tests, in case there's a way to achieve any kind of
determinism with LLMs (not sure if there is).

Similarly to logging, it would be good to have some kind
of visual feedback of what context is being sent in each
LLM request.  Like momentarily highlighting the regions
to be sent alongside the prompt.  Sometimes that is
not feasible. So it could make sense to summarize that extra
context in a few lines shown in the minibuffer perhaps.  Like
"lines 2..10 from foo.cpp\nlines42-420 from bar.cpp"

So just my 200c,
Good luck,
João

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-23  0:52       ` Andrew Hyatt
@ 2024-01-23  1:57         ` T.V Raman
  2024-01-23  3:00         ` Emanuel Berg
  1 sibling, 0 replies; 22+ messages in thread
From: T.V Raman @ 2024-01-23  1:57 UTC (permalink / raw)
  To: ahyatt; +Cc: raman, sskostyaev, emacs-devel

I'm seeing short responses with gptel that are good; roundabout
responses for the same question with ellama.
-- 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-23  0:52       ` Andrew Hyatt
  2024-01-23  1:57         ` T.V Raman
@ 2024-01-23  3:00         ` Emanuel Berg
  2024-01-23  3:49           ` Andrew Hyatt
  1 sibling, 1 reply; 22+ messages in thread
From: Emanuel Berg @ 2024-01-23  3:00 UTC (permalink / raw)
  To: emacs-devel

Andrew Hyatt wrote:

> [...]  1.     From using gptel and ellama against the same
> model, I see     different        style responses, and that
> kind of inconsistency would be        good to get a handle on;
> LLMs are difficult enough to        figure out re what they're
> doing without this additional        variation.
>
> Is this keeping the prompt and temperature constant?  There's
> inconsistency, though, even keeping everything constant due to
> the randomness of the LLM.  I often get very different
> results, for example, to make the demo I shared, I had to run
> it like 5 times because it would either do things too well (no
> need to demo corrections), or not well enough (for example, it
> wouldn't follow my orders to put everything in one paragraph).
>
>    2. Package LLM has the laudible goal of bridgeing between
>    models and        front-ends, and this is going to be
>    vital.     3. (1,2) above lead  to the following question:
>    4. Can we     write down  a list of common configuration
>    vars --- here       common across the model axis. Make  it
>    a union of all such       params. [...]

Uhm, pardon me for asking but why are the e-mails looking
like this?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-23  3:00         ` Emanuel Berg
@ 2024-01-23  3:49           ` Andrew Hyatt
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Hyatt @ 2024-01-23  3:49 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1701 bytes --]

Thanks for pointing this out - I was using gnus to respond to email, it
looks like it messed things up for reasons probably having to do with
quoting.  I don't think I've configured anything strange here, but who
knows.  For now, I'll just use gmail to respond.

On Mon, Jan 22, 2024 at 11:11 PM Emanuel Berg <incal@dataswamp.org> wrote:

> Andrew Hyatt wrote:
>
> > [...]  1.     From using gptel and ellama against the same
> > model, I see     different        style responses, and that
> > kind of inconsistency would be        good to get a handle on;
> > LLMs are difficult enough to        figure out re what they're
> > doing without this additional        variation.
> >
> > Is this keeping the prompt and temperature constant?  There's
> > inconsistency, though, even keeping everything constant due to
> > the randomness of the LLM.  I often get very different
> > results, for example, to make the demo I shared, I had to run
> > it like 5 times because it would either do things too well (no
> > need to demo corrections), or not well enough (for example, it
> > wouldn't follow my orders to put everything in one paragraph).
> >
> >    2. Package LLM has the laudible goal of bridgeing between
> >    models and        front-ends, and this is going to be
> >    vital.     3. (1,2) above lead  to the following question:
> >    4. Can we     write down  a list of common configuration
> >    vars --- here       common across the model axis. Make  it
> >    a union of all such       params. [...]
>
> Uhm, pardon me for asking but why are the e-mails looking
> like this?
>
> --
> underground experts united
> https://dataswamp.org/~incal
>
>
>

[-- Attachment #2: Type: text/html, Size: 2258 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-23  1:36 ` João Távora
@ 2024-01-23  4:17   ` T.V Raman
  2024-01-23 19:19   ` Andrew Hyatt
  1 sibling, 0 replies; 22+ messages in thread
From: T.V Raman @ 2024-01-23  4:17 UTC (permalink / raw)
  To: João Távora; +Cc: Andrew Hyatt, emacs-devel, sskostyaev

These are good observations.

Since pretty much all the LLM ApIs take json, logging the json is the
most friction-free approach. This consistency will also help us monitor
the LLM traffic consistently to ensure that rogue clients dont lea
context one doesn't want leaked.
-- 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-22 20:21 ` Andrew Hyatt
@ 2024-01-23  6:49   ` Psionic K
  2024-01-23 15:19     ` T.V Raman
  2024-01-23 19:36     ` Andrew Hyatt
  0 siblings, 2 replies; 22+ messages in thread
From: Psionic K @ 2024-01-23  6:49 UTC (permalink / raw)
  To: Andrew Hyatt; +Cc: Psionic K, Emacs developers

>  I'm not sure how
> the centralized storage reduces the coupling to the process. But
> if I'm reading this correctly, it seems like an argument for using
> state machines with the centralized storage acting as a driver for
> state changes, which may be a good way to think about this.

If you start by having a callback closure waiting on each individual
request, you don't want these callbacks to perform reconciliation
between one another.  It's too much logic in every callback.  A single
handler should reconcile results and decide if the callback closure of
that specific message is still valid to be called.

For example, the user sends two requests, each with a logical
timestamp (a counter).  The second one gets back first.  Should we
even handle the first or throw it away?  Sometimes this can only be
decided with knowledge of all the requests that I made.  Rather than
make the callbacks smart, it's usually better to first try to
reconcile what you receive with everything you sent.  You might need
to wait for more output before sending the next input.  This would
require the callback to know about past and future messages and what
to do with them.  You want to decouple reconciliation with handling
reconciled outputs.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-23  6:49   ` Psionic K
@ 2024-01-23 15:19     ` T.V Raman
  2024-01-23 19:36     ` Andrew Hyatt
  1 sibling, 0 replies; 22+ messages in thread
From: T.V Raman @ 2024-01-23 15:19 UTC (permalink / raw)
  To: Psionic K; +Cc: Andrew Hyatt, Emacs developers

See how mpv (music player) and empv (emacs front end) do this.
-- 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-23  1:36 ` João Távora
  2024-01-23  4:17   ` T.V Raman
@ 2024-01-23 19:19   ` Andrew Hyatt
  1 sibling, 0 replies; 22+ messages in thread
From: Andrew Hyatt @ 2024-01-23 19:19 UTC (permalink / raw)
  To: João Távora; +Cc: emacs-devel, sskostyaev

[-- Attachment #1: Type: text/plain, Size: 4823 bytes --]

On Mon, Jan 22, 2024 at 9:36 PM João Távora <joaotavora@gmail.com> wrote:

> On Mon, Jan 22, 2024 at 4:16 AM Andrew Hyatt <ahyatt@gmail.com> wrote:
> >
> >
> > Hi everyone,
>
> Hi Andrew,
>
> I have some ideas to share, though keep in mind this is mainly
> thinking out loud and I'm largely an LLM newbie.
>
> > Question 1: Does the llm-flows.el file really belong in the llm
> > package?
>
> Maybe, but keep the functions isolated.  I'd be interested in
> a diff-mode flow which is different from this ediff-one you
> demo.  So it should be possible to build both.
>
> The diff-mode flow I'm thinking of would be similar to the
> diff option of LSP-proposed edits if your code, btw.  See the
> variable eglot-confirm-server-edits for an idea of the interface.
>

Great call-out, thanks.  I'll check it out.  I'm starting to get a better
idea of how this might all work out in my mind.


>
> > Question 3: How should we deal with context? The code that has the
> > text corrector doesn't include surrounding context (the text
> > before and after the text to rewrite), but it usually is helpful.
> > How much context should we add?
>
> Karthik of gptel.el explained to me that this is one of
> the biggest challenges of working with LLMs, and that GitHub
> Copilot and other code-assistance tools work by sending
> not only the region you're interested in having the LLM help you
> with but also some auxiliary functions and context discovered
> heuristically.  This is potentially complex, and likely doesn't
> belong in the your base llm.el but it should be possible to do
> somehow with an application build on top of llm.el (Karthik
> suggests tree-sitter or LSP's reference finding abilities to
> discover what's nearest in terms of context).
>

Interesting idea - yes, this should be customizable, and it will be quite
complicated in some cases.


>
> In case noone mentinoed this already, i think a good logging
> facility is essential.  This could go in the base llm.el library.
> I'm obviously biased towards my own jsonrpc.el logging facilities,
> where a separate easy-to-find buffer for each JSON-RPC connection
> lists all the JSON transport-level conversation details in a
> consistent format.  jsonrpc.el clients can also use those logging
> facilities to output application-level details.
>
> In an LLM library, I suppose the equivalent to JSON transport-level
> details are the specific API calls to each provider, how it gathers
> context, prompts, etc.  Those would be distinct for each LLM.
> A provider-agnosntic application built on top of llm.el's abstraction
> could log in a much more consistent way.
>
> So my main point regarding logging is that is should live in a
> readable log buffer, so it's easy to piece together what happened
> and debug.  Representing JSON as pretty-printed plists is often
> very practical in my experience (though a bit slow if loads of text
> is to be printed).
>

Good feedback, mainly for what I've already released in the llm package so
far.  JSON is useful for the initial request, but there's a lot of
streaming that happens,which isn't really valid JSON, although sometimes it
contains valid JSON.  Sometimes it is just JSON, streaming a chunk at a
time.  So in general you have to deal with that stuff as just pure text.

So far, setting url-debug to non-nil is sufficient for basic debugging, but
making a more standard and better logging facility would be very nice. I'll
work on it.


>
> Maybe these logging transcripts could even be used to produce
> automated tests, in case there's a way to achieve any kind of
> determinism with LLMs (not sure if there is).
>

Probably not, unfortunately, even if you can remove the randomness, little
changes are always happening with newer version of the model or the
processing around it.  I do have included a fake LLM that can be used to
test whatever flow, though.


>
> Similarly to logging, it would be good to have some kind
> of visual feedback of what context is being sent in each
> LLM request.  Like momentarily highlighting the regions
> to be sent alongside the prompt.  Sometimes that is
> not feasible. So it could make sense to summarize that extra
> context in a few lines shown in the minibuffer perhaps.  Like
> "lines 2..10 from foo.cpp\nlines42-420 from bar.cpp"
>

I like the idea of logging to the context.  I think it might make sense to
just add that to the debug buffer instead of the minibuffer, though.
Hopefully things just work, and so the minibuffer would just show something
like it does in the demo, just saying it's sending things to whatever LLM
(perhaps a reference to the debug buffer would be nice, though).


>
> So just my 200c,
> Good luck,
> João
>

[-- Attachment #2: Type: text/html, Size: 6244 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-23  6:49   ` Psionic K
  2024-01-23 15:19     ` T.V Raman
@ 2024-01-23 19:36     ` Andrew Hyatt
  1 sibling, 0 replies; 22+ messages in thread
From: Andrew Hyatt @ 2024-01-23 19:36 UTC (permalink / raw)
  To: Psionic K; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 1597 bytes --]

On Tue, Jan 23, 2024 at 2:49 AM Psionic K <psionik@positron.solutions>
wrote:

> >  I'm not sure how
> > the centralized storage reduces the coupling to the process. But
> > if I'm reading this correctly, it seems like an argument for using
> > state machines with the centralized storage acting as a driver for
> > state changes, which may be a good way to think about this.
>
> If you start by having a callback closure waiting on each individual
> request, you don't want these callbacks to perform reconciliation
> between one another.  It's too much logic in every callback.  A single
> handler should reconcile results and decide if the callback closure of
> that specific message is still valid to be called.
>
> For example, the user sends two requests, each with a logical
> timestamp (a counter).  The second one gets back first.  Should we
> even handle the first or throw it away?  Sometimes this can only be
> decided with knowledge of all the requests that I made.  Rather than
> make the callbacks smart, it's usually better to first try to
> reconcile what you receive with everything you sent.  You might need
> to wait for more output before sending the next input.  This would
> require the callback to know about past and future messages and what
> to do with them.  You want to decouple reconciliation with handling
> reconciled outputs.
>

That makes sense, but it isn't how things work today, where the requests
are independent.  I think they should stay that way, but in case they
don't, your suggestion seems like a good one, so thank you!

[-- Attachment #2: Type: text/html, Size: 1928 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
       [not found] <m2il3mj961.fsf@gmail.com>
  2024-01-22 18:50 ` Sergey Kostyaev
  2024-01-23  1:36 ` João Távora
@ 2024-01-24  1:26 ` contact
  2024-01-24  4:17   ` T.V Raman
  2024-01-24 14:55   ` Andrew Hyatt
  2024-01-24  2:28 ` Karthik Chikmagalur
  2024-05-20 17:28 ` Juri Linkov
  4 siblings, 2 replies; 22+ messages in thread
From: contact @ 2024-01-24  1:26 UTC (permalink / raw)
  To: Andrew Hyatt, emacs-devel; +Cc: sskostyaev

Hi Andrew,

Having worked on similar problems in gptel for about nine months now
(without much success), here are some thoughts.

> Question 1: Does the llm-flows.el file really belong in the llm 
> package?  It does help people code against llms, but it expands 
> the scope of the llm package from being just about connecting to 
> different LLMs to offering a higher level layer necessary for 
> these more complicated flows.  I think this probably does make 
> sense, there's no need to have a separate package just for this 
> one part.

I think llm-flows works better as an independent package for now,
since it's not clear yet what the "primitive" operations of working
with LLMs look like.  I suspect these usage patterns will also be a
matter of preference for users, as I've realized from comparing how I
use gptel to the feature requests I get.

> Question 2: What's the best way to write these flows with multiple 
> stages, in which some stages sometimes need to be repeated? It's 
> kind of a state machine when you think about it, and there's a 
> state machine GNU ELPA library already (fsm). I opted to not model 
> it explicitly as a state machine, optimizing instead to just use 
> the most straightforward code possible.

An FSM might be overkill here.  At the same time, I'm not sure that
all possible interactions will fit this multi-step paradigm like
rewriting text does.

> Question 3: How should we deal with context? The code that has the 
> text corrector doesn't include surrounding context (the text 
> before and after the text to rewrite), but it usually is helpful. 
> How much context should we add? The llm package does know about 
> model token limits, but more tokens add more cost in terms of 
> actual money (per/token billing for services, or just the CPU 
> energy costs for local models). Having it be customizable makes 
> sense to some extent, but users are not expected to have a good 
> sense of how much context to include. My guess is that we should 
> just have a small amount of context that won't be a problem for 
> most models. But there's other questions as well when you think 
> about context generally: How would context work in different 
> modes? What about when context may spread in multiple files? It's 
> a problem that I don't have any good insight into yet.

I see different questions here:

1.  What should be the default amount of context included with
requests?
2.  How should this context be determined? (single buffer, across
files etc)
3.  How should this be different between modes of usage, and how
should this be communicated unambiguously?
4.  Should we track token costs (when applicable) and communicate them
to the user?

Some lessons from gptel, which focuses mostly on a chat interface:

1.  Users seem to understand gptel's model intuitively since they
think of it like a chat application, where the context is expected to
be everything in the buffer up to the cursor position.  The only
addition is to use the region contents instead when the region is is
active.  This default works well for more than chat, actually.  It's
good enough when rewriting-in-place or for continuing your prose/code.

2.  This is tricky, I don't have any workable ideas yet.  In gptel
I've experimented with providing the options "full buffer" and "open
project buffers" in addition to the default, but these are both
overkill, expensive and rarely useful -- they often confuse the LLM
more than they help.  Additionally, in Org mode documents I've
experimented with using sparse trees as the context -- this is
inexpensive and can work very well but the document has to be
(semantically) structured a certain way.  This becomes obvious after a
couple of sessions, but the behavior has to be learned nevertheless.

3a.  For coding projects I think it might be possible to construct a
"sparse tree" with LSP or via treesitter, and send (essentially) an
"API reference" along with smaller chunks of code.  This should make
basic copilot-style usage viable.  I don't use LSP or treesitter
seriously, so I don't know how to do this.

3b. Communicating this unambiguously to users is a UI design question,
and I can imagine many ways to do it.

4.  I think optionally showing the cumulative token count for a
"session" (however defined) makes sense.

> Question 5: Should there be a standard set of user behaviors about 
> editing the prompt? In another demo (one I'll send as a followup), 
> with a universal argument, the user can edit the prompt, minus 
> context and content (in this case the content is the text to 
> correct). Maybe that should always be the case. However, that 
> prompt can be long, perhaps a bit long for the minibuffer. Using a 
> buffer instead seems like it would complicate the flow. Also, if 
> the context and content is embedded in that prompt, they would 
> have to be replaced with some placeholder. I think the prompt 
> should always be editable, we should have some templating system. 
> Perhaps emacs already has some templating system, and one that can 
> pass arguments for number of tokens from context would be nice.

Another unsolved problem in gptel right now.  Here's what it uses
currently:

- prompt: from the minibuffer
- context and content: selected region only

The main problem with including context separate from the content here
is actually not the UI, it's convincing the LLM to consistently
rewrite only the content and use the context as context.  Using the
prompt+context as the "system message" works, but not all LLM APIs
provide a "system message" field.

> Question 6: How do we avoid having a ton of very specific 
> functions for all the various ways that LLMs can be used? Besides 
> correcting text, I could have had it expand it, summarize it, 
> translate it, etc. Ellama offers all these things (but without the 
> diff and other workflow-y aspects). I think these are too much for 
> the user to remember.

Yes, this was the original reason I wrote gptel -- the first few
packages for LLM interaction (only GPT-3.5 back then) wanted to
prescribe the means of interaction via dedicated commands, which I
thought overwhelmed the user while also missing what makes LLMs
different from (say) language checkers like proselint and vale, and
from code refactoring tools.

> It'd be nice to have one function when the 
> user wants to do something, and we work out what to do in the 
> workflow. But the user shouldn't be developing the prompt 
> themselves; at least at this point, it's kind of hard to just 
> think of everything you need to think of in a good prompt.  They 
> need to be developed, updated, etc. What might be good is a system 
> in which the user chooses what they want to do to a region as a 
> secondary input, kind of like another kind of 
> execute-extended-command.

I think having users type out their intention in natural language into
a prompt is fine -- the prompt can then be saved and added to a
persistent collection.  We will never be able to cover
(programmatically) even a reasonable fraction of the things the user
might want to do.

The things the user might need help with is what I'd call "prompt
decoration".  There are standard things you can specify in a prompt to
change the brevity and tone of a response.  LLMs tend to generate
purple prose, summarize their responses, apologize or warn
excessively, etc.  We can offer a mechanism to quick-add templates to
the prompt to stem these behaviors, or encourage other ones.

Karthik



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
       [not found] <m2il3mj961.fsf@gmail.com>
                   ` (2 preceding siblings ...)
  2024-01-24  1:26 ` contact
@ 2024-01-24  2:28 ` Karthik Chikmagalur
  2024-05-20 17:28 ` Juri Linkov
  4 siblings, 0 replies; 22+ messages in thread
From: Karthik Chikmagalur @ 2024-01-24  2:28 UTC (permalink / raw)
  To: Andrew Hyatt, emacs-devel; +Cc: sskostyaev

Hi Andrew,

Having worked on similar problems in gptel for about nine months now
(without much success), here are some thoughts.

> Question 1: Does the llm-flows.el file really belong in the llm 
> package?  It does help people code against llms, but it expands 
> the scope of the llm package from being just about connecting to 
> different LLMs to offering a higher level layer necessary for 
> these more complicated flows.  I think this probably does make 
> sense, there's no need to have a separate package just for this 
> one part.

I think llm-flows works better as an independent package for now,
since it's not clear yet what the "primitive" operations of working
with LLMs look like.  I suspect these usage patterns will also be a
matter of preference for users, as I've realized from comparing how I
use gptel to the feature requests I get.

> Question 2: What's the best way to write these flows with multiple 
> stages, in which some stages sometimes need to be repeated? It's 
> kind of a state machine when you think about it, and there's a 
> state machine GNU ELPA library already (fsm). I opted to not model 
> it explicitly as a state machine, optimizing instead to just use 
> the most straightforward code possible.

An FSM might be overkill here.  At the same time, I'm not sure that
all possible interactions will fit this multi-step paradigm like
rewriting text does.

> Question 3: How should we deal with context? The code that has the 
> text corrector doesn't include surrounding context (the text 
> before and after the text to rewrite), but it usually is helpful. 
> How much context should we add? The llm package does know about 
> model token limits, but more tokens add more cost in terms of 
> actual money (per/token billing for services, or just the CPU 
> energy costs for local models). Having it be customizable makes 
> sense to some extent, but users are not expected to have a good 
> sense of how much context to include. My guess is that we should 
> just have a small amount of context that won't be a problem for 
> most models. But there's other questions as well when you think 
> about context generally: How would context work in different 
> modes? What about when context may spread in multiple files? It's 
> a problem that I don't have any good insight into yet.

I see different questions here:

1.  What should be the default amount of context included with
requests?
2.  How should this context be determined? (single buffer, across
files etc)
3.  How should this be different between modes of usage, and how
should this be communicated unambiguously?
4.  Should we track token costs (when applicable) and communicate them
to the user?

Some lessons from gptel, which focuses mostly on a chat interface:

1.  Users seem to understand gptel's model intuitively since they
think of it like a chat application, where the context is expected to
be everything in the buffer up to the cursor position.  The only
addition is to use the region contents instead when the region is is
active.  This default works well for more than chat, actually.  It's
good enough when rewriting-in-place or for continuing your prose/code.

2.  This is tricky, I don't have any workable ideas yet.  In gptel
I've experimented with providing the options "full buffer" and "open
project buffers" in addition to the default, but these are both
overkill, expensive and rarely useful -- they often confuse the LLM
more than they help.  Additionally, in Org mode documents I've
experimented with using sparse trees as the context.  This is
inexpensive and can work very well but the document has to be
(semantically) structured a certain way.  This becomes obvious after a
couple of sessions, but the behavior has to be learned nevertheless.

3.  For coding projects I think it might be possible to construct a
"sparse tree" with LSP or via treesitter, and send (essentially) an
"API reference" along with smaller chunks of code.  This should make
copilot-style usage viable.  I don't use LSP or treesitter seriously,
so I don't know how to do this.

3b. Communicating this unambiguously to users is a UI design question,
and I can imagine many ways to do it.

4.  I think optionally showing the cumulative token count for a
"session" (however defined) makes sense.

> Question 5: Should there be a standard set of user behaviors about 
> editing the prompt? In another demo (one I'll send as a followup), 
> with a universal argument, the user can edit the prompt, minus 
> context and content (in this case the content is the text to 
> correct). Maybe that should always be the case. However, that 
> prompt can be long, perhaps a bit long for the minibuffer. Using a 
> buffer instead seems like it would complicate the flow. Also, if 
> the context and content is embedded in that prompt, they would 
> have to be replaced with some placeholder. I think the prompt 
> should always be editable, we should have some templating system. 
> Perhaps emacs already has some templating system, and one that can 
> pass arguments for number of tokens from context would be nice.

Another unsolved problem in gptel right now.  Here's what it uses
currently:

- prompt: from the minibuffer
- context and content: selected region only

The main problem with including context separate from the content here
is actually not the UI, it's convincing the LLM to consistently
rewrite only the content and use the context as context.

> Question 6: How do we avoid having a ton of very specific 
> functions for all the various ways that LLMs can be used? Besides 
> correcting text, I could have had it expand it, summarize it, 
> translate it, etc. Ellama offers all these things (but without the 
> diff and other workflow-y aspects). I think these are too much for 
> the user to remember.

Yes, this was the original reason I wrote gptel -- the first few
packages for LLM interaction (only GPT-3.5 back then) wanted to
prescribe the means of interaction via dedicated commands, which I
thought overwhelmed the user while also missing what makes LLMs
different from (say) language checkers like proselint and vale, and
from code refactoring tools.

> It'd be nice to have one function when the 
> user wants to do something, and we work out what to do in the 
> workflow. But the user shouldn't be developing the prompt 
> themselves; at least at this point, it's kind of hard to just 
> think of everything you need to think of in a good prompt.  They 
> need to be developed, updated, etc. What might be good is a system 
> in which the user chooses what they want to do to a region as a 
> secondary input, kind of like another kind of 
> execute-extended-command.

I think having users type out their intention in natural language into
a prompt is fine -- the prompt can then be saved and added to a
persistent collection.  We will never be able to cover
(programmatically) even a reasonable fraction of the things the user
might want to do.

The things the user might need help with is what I'd call "prompt
decoration".  There are standard things you can specify in a prompt to
change the brevity and tone of a response.  LLMs tend to generate
purple prose, summarize their responses, apologize or warn
excessively, etc.  We can offer a mechanism to quick-add templates to
the prompt to stem these behaviors, or encourage other ones.

Karthik

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-24  1:26 ` contact
@ 2024-01-24  4:17   ` T.V Raman
  2024-01-24 15:00     ` Andrew Hyatt
  2024-01-24 14:55   ` Andrew Hyatt
  1 sibling, 1 reply; 22+ messages in thread
From: T.V Raman @ 2024-01-24  4:17 UTC (permalink / raw)
  To: contact; +Cc: Andrew Hyatt, emacs-devel, sskostyaev

All very good points, Kartik!

Some related thoughts below:

1. I think we should for now treat  prose-rewriting vs code-rewriting as
   separate flows -- but that said, limit our types of "flows"
   to 2. More might emerge over time, but it's too early.
2. Multi-step flows with LLMs are still early -- or feel early to me; I
   think that for now, we should just have human-in-the-loop at each
   step, but then leverage the power of Emacs to help the user stay
   efficient in the human-in-the-loop step, start with simple things
   like putting point and mark in the right place, populate Emacs
   completions with the right choices etc.

-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-24  1:26 ` contact
  2024-01-24  4:17   ` T.V Raman
@ 2024-01-24 14:55   ` Andrew Hyatt
  1 sibling, 0 replies; 22+ messages in thread
From: Andrew Hyatt @ 2024-01-24 14:55 UTC (permalink / raw)
  To: contact; +Cc: emacs-devel, sskostyaev


Thanks for this super useful response. BTW, I'm going to try again to
use gnus to respond, after making some changes, so apologies if the
formatting goes awry. If it does, I'll re-respond in gmail.

On Tue, Jan 23, 2024 at 05:26 PM contact@karthinks.com wrote:

> Hi Andrew,
>
> Having worked on similar problems in gptel for about nine months now
> (without much success), here are some thoughts.
>
>> Question 1: Does the llm-flows.el file really belong in the llm 
>> package?  It does help people code against llms, but it expands 
>> the scope of the llm package from being just about connecting to 
>> different LLMs to offering a higher level layer necessary for 
>> these more complicated flows.  I think this probably does make 
>> sense, there's no need to have a separate package just for this 
>> one part.
>
> I think llm-flows works better as an independent package for now,
> since it's not clear yet what the "primitive" operations of working
> with LLMs look like.  I suspect these usage patterns will also be a
> matter of preference for users, as I've realized from comparing how I
> use gptel to the feature requests I get.

I agree that the more we assume specific patterns (which indeed seems to
be where things are going), the more it should go into its own package
(or maybe be part of ellama or something).  But there may be some
commonality that are just useful regardless of the usage patterns.  I
think we'll have to see how this plays out.

>> Question 2: What's the best way to write these flows with multiple 
>> stages, in which some stages sometimes need to be repeated? It's 
>> kind of a state machine when you think about it, and there's a 
>> state machine GNU ELPA library already (fsm). I opted to not model 
>> it explicitly as a state machine, optimizing instead to just use 
>> the most straightforward code possible.
>
> An FSM might be overkill here.  At the same time, I'm not sure that
> all possible interactions will fit this multi-step paradigm like
> rewriting text does.
>
>> Question 3: How should we deal with context? The code that has the 
>> text corrector doesn't include surrounding context (the text 
>> before and after the text to rewrite), but it usually is helpful. 
>> How much context should we add? The llm package does know about 
>> model token limits, but more tokens add more cost in terms of 
>> actual money (per/token billing for services, or just the CPU 
>> energy costs for local models). Having it be customizable makes 
>> sense to some extent, but users are not expected to have a good 
>> sense of how much context to include. My guess is that we should 
>> just have a small amount of context that won't be a problem for 
>> most models. But there's other questions as well when you think 
>> about context generally: How would context work in different 
>> modes? What about when context may spread in multiple files? It's 
>> a problem that I don't have any good insight into yet.
>
> I see different questions here:
>
> 1.  What should be the default amount of context included with
> requests?
> 2.  How should this context be determined? (single buffer, across
> files etc)
> 3.  How should this be different between modes of usage, and how
> should this be communicated unambiguously?
> 4.  Should we track token costs (when applicable) and communicate them
> to the user?
>
> Some lessons from gptel, which focuses mostly on a chat interface:
>
> 1.  Users seem to understand gptel's model intuitively since they
> think of it like a chat application, where the context is expected to
> be everything in the buffer up to the cursor position.  The only
> addition is to use the region contents instead when the region is is
> active.  This default works well for more than chat, actually.  It's
> good enough when rewriting-in-place or for continuing your prose/code.

I think it may be still useful to use the context even when modifying
the region, though.

>
> 2.  This is tricky, I don't have any workable ideas yet.  In gptel
> I've experimented with providing the options "full buffer" and "open
> project buffers" in addition to the default, but these are both
> overkill, expensive and rarely useful -- they often confuse the LLM
> more than they help.  Additionally, in Org mode documents I've
> experimented with using sparse trees as the context -- this is
> inexpensive and can work very well but the document has to be
> (semantically) structured a certain way.  This becomes obvious after a
> couple of sessions, but the behavior has to be learned nevertheless.
>
> 3a.  For coding projects I think it might be possible to construct a
> "sparse tree" with LSP or via treesitter, and send (essentially) an
> "API reference" along with smaller chunks of code.  This should make
> basic copilot-style usage viable.  I don't use LSP or treesitter
> seriously, so I don't know how to do this.
>
> 3b. Communicating this unambiguously to users is a UI design question,
> and I can imagine many ways to do it.

Thanks, this is very useful.  My next demo is with org-mode, and there
I'm currently sending the tree (just the structure) as context.

There's a meta-question here is that we don't know how what the right
thing to do is, because we don't have a quality process. If we were
doing this seriously, we'd want a corpus of examples to test with, and a
way to judge quality (LLMs can actually do this as well). The fact that
people could be running any model is a significant complicating factor.
But as it is, we just have our own anecdotal evidences, and reasonable
hunches.  As these things get better, I think it would converge to what
we would think is reasonable context anyway.

>
> 4.  I think optionally showing the cumulative token count for a
> "session" (however defined) makes sense.

The token count should only be applicable to the current LLM count. My
understanding is that the machinery around the LLMs is adding the
previous conversation, in whole, or in summary, as context. So it's
really hard to understand the actual token usage once you start having
conversations. That's fine, though, since most tokens are used in the
first message where context appears, and subsequent rounds are fairly
light.

>
>> Question 5: Should there be a standard set of user behaviors about 
>> editing the prompt? In another demo (one I'll send as a followup), 
>> with a universal argument, the user can edit the prompt, minus 
>> context and content (in this case the content is the text to 
>> correct). Maybe that should always be the case. However, that 
>> prompt can be long, perhaps a bit long for the minibuffer. Using a 
>> buffer instead seems like it would complicate the flow. Also, if 
>> the context and content is embedded in that prompt, they would 
>> have to be replaced with some placeholder. I think the prompt 
>> should always be editable, we should have some templating system. 
>> Perhaps emacs already has some templating system, and one that can 
>> pass arguments for number of tokens from context would be nice.
>
> Another unsolved problem in gptel right now.  Here's what it uses
> currently:
>
> - prompt: from the minibuffer
> - context and content: selected region only
>
> The main problem with including context separate from the content here
> is actually not the UI, it's convincing the LLM to consistently
> rewrite only the content and use the context as context.  Using the
> prompt+context as the "system message" works, but not all LLM APIs
> provide a "system message" field.

Yes, the LLM library already separates the "system message" (which we
call context, which I now realize is a bit confusing), and each provider
just deals with it in the best way possible.  Hopefully as LLM
instruction following gets better, it will stop treating context as
anything other than context.  In the end, IIUC, it all ends up in the
same place when feeding text to the LLM anyway.

>
>> Question 6: How do we avoid having a ton of very specific 
>> functions for all the various ways that LLMs can be used? Besides 
>> correcting text, I could have had it expand it, summarize it, 
>> translate it, etc. Ellama offers all these things (but without the 
>> diff and other workflow-y aspects). I think these are too much for 
>> the user to remember.
>
> Yes, this was the original reason I wrote gptel -- the first few
> packages for LLM interaction (only GPT-3.5 back then) wanted to
> prescribe the means of interaction via dedicated commands, which I
> thought overwhelmed the user while also missing what makes LLMs
> different from (say) language checkers like proselint and vale, and
> from code refactoring tools.
>
>> It'd be nice to have one function when the 
>> user wants to do something, and we work out what to do in the 
>> workflow. But the user shouldn't be developing the prompt 
>> themselves; at least at this point, it's kind of hard to just 
>> think of everything you need to think of in a good prompt.  They 
>> need to be developed, updated, etc. What might be good is a system 
>> in which the user chooses what they want to do to a region as a 
>> secondary input, kind of like another kind of 
>> execute-extended-command.
>
> I think having users type out their intention in natural language into
> a prompt is fine -- the prompt can then be saved and added to a
> persistent collection.  We will never be able to cover
> (programmatically) even a reasonable fraction of the things the user
> might want to do.

Agreed. Increasingly, it seems like a really advanced prompt management
system might be necessary.  How to do that is it's own separate set of
questions.  How should they be stored?  Are these variables?  Files?
Data in sqlite?

>
> The things the user might need help with is what I'd call "prompt
> decoration".  There are standard things you can specify in a prompt to
> change the brevity and tone of a response.  LLMs tend to generate
> purple prose, summarize their responses, apologize or warn
> excessively, etc.  We can offer a mechanism to quick-add templates to
> the prompt to stem these behaviors, or encourage other ones.

Agreed.  I have a prompting system in my ekg project, and it allows
transcluding other prompts, which is quite useful.  So you might want to
just include information about your health, or your project, or even
more dynamic things like the date, org agenda for the day, or whatever.
This is powerful and a good thing to include in this future prompt
management system.

>
> Karthik



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-24  4:17   ` T.V Raman
@ 2024-01-24 15:00     ` Andrew Hyatt
  2024-01-24 15:14       ` T.V Raman
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Hyatt @ 2024-01-24 15:00 UTC (permalink / raw)
  To: T.V Raman; +Cc: contact, emacs-devel, sskostyaev

On Tue, Jan 23, 2024 at 08:17 PM "T.V Raman" <raman@google.com> wrote:

> All very good points, Kartik!
>
> Some related thoughts below:
>
> 1. I think we should for now treat  prose-rewriting vs code-rewriting as
>    separate flows -- but that said, limit our types of "flows"
>    to 2. More might emerge over time, but it's too early.

How do you see the code and prose rewriting requiring different UI or processing?

> 2. Multi-step flows with LLMs are still early -- or feel early to me; I
>    think that for now, we should just have human-in-the-loop at each
>    step, but then leverage the power of Emacs to help the user stay
>    efficient in the human-in-the-loop step, start with simple things
>    like putting point and mark in the right place, populate Emacs
>    completions with the right choices etc.

It can't be at every step, though. Maybe you wouldn't consider this a
step, but in my next demo, one step is to get JSON from the LLM, which
requires parsing out the JSON (which tends to be either the entire
response, or often in a markdown block, or if none of the above, we
retry a certain amount of times). But agreed that in general that we do
want humans to be in control, especially when things get complicated.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
  2024-01-24 15:00     ` Andrew Hyatt
@ 2024-01-24 15:14       ` T.V Raman
  0 siblings, 0 replies; 22+ messages in thread
From: T.V Raman @ 2024-01-24 15:14 UTC (permalink / raw)
  To: ahyatt; +Cc: raman, contact, emacs-devel, sskostyaev

1. code rewrite and prose rewrite just feel very different to me --
   starting with simple things like white-space formatting etc.
2. Code rewrites therefore require a different type of mental activity
   -- side-by-side diff, whereas prose rewrite are more about has the
   meaning being  preserved -- and that is not conveyed by ws as directly.
3. You're likely right about js parsing and follow-on steps as being
   "atomic" actions in some sense from the perspective of using AI as
   a tool, but I still feel it too early to connect too many steps
   into one because it happens to work sometimes at present; it'll
   likely both get better and change, so we might end up abstracting
   early and perhaps erroneously at this stage. So if you do
   pool/group steps -- eep that an implementation detail.
   

Andrew Hyatt writes:
 > On Tue, Jan 23, 2024 at 08:17 PM "T.V Raman" <raman@google.com> wrote:
 > 
 > > All very good points, Kartik!
 > >
 > > Some related thoughts below:
 > >
 > > 1. I think we should for now treat  prose-rewriting vs code-rewriting as
 > >    separate flows -- but that said, limit our types of "flows"
 > >    to 2. More might emerge over time, but it's too early.
 > 
 > How do you see the code and prose rewriting requiring different UI or processing?
 > 
 > > 2. Multi-step flows with LLMs are still early -- or feel early to me; I
 > >    think that for now, we should just have human-in-the-loop at each
 > >    step, but then leverage the power of Emacs to help the user stay
 > >    efficient in the human-in-the-loop step, start with simple things
 > >    like putting point and mark in the right place, populate Emacs
 > >    completions with the right choices etc.
 > 
 > It can't be at every step, though. Maybe you wouldn't consider this a
 > step, but in my next demo, one step is to get JSON from the LLM, which
 > requires parsing out the JSON (which tends to be either the entire
 > response, or often in a markdown block, or if none of the above, we
 > retry a certain amount of times). But agreed that in general that we do
 > want humans to be in control, especially when things get complicated.

-- 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: LLM Experiments, Part 1: Corrections
       [not found] <m2il3mj961.fsf@gmail.com>
                   ` (3 preceding siblings ...)
  2024-01-24  2:28 ` Karthik Chikmagalur
@ 2024-05-20 17:28 ` Juri Linkov
  4 siblings, 0 replies; 22+ messages in thread
From: Juri Linkov @ 2024-05-20 17:28 UTC (permalink / raw)
  To: Andrew Hyatt; +Cc: emacs-devel

> Question 3: How should we deal with context? The code that has the text
> corrector doesn't include surrounding context (the text before and after
> the text to rewrite), but it usually is helpful. How much context should we
> add? The llm package does know about model token limits, but more tokens
> add more cost in terms of actual money (per/token billing for services, or
> just the CPU energy costs for local models). Having it be customizable
> makes sense to some extent, but users are not expected to have a good sense
> of how much context to include. My guess is that we should just have
> a small amount of context that won't be a problem for most models. But
> there's other questions as well when you think about context generally: How
> would context work in different modes? What about when context may spread
> in multiple files? It's a problem that I don't have any good insight into
> yet.

I suppose you are already familiar with different methods
of obtaining the relevant context used by copilot:
https://github.blog/2023-05-17-how-github-copilot-is-getting-better-at-understanding-your-code/
https://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html
etc. where tabs correspond to Emacs buffers, so in Emacs the context
could be collected from existing buffers like e.g. dabbrev-expand does.



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-05-20 17:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22 12:57 LLM Experiments, Part 1: Corrections Psionic K
2024-01-22 20:21 ` Andrew Hyatt
2024-01-23  6:49   ` Psionic K
2024-01-23 15:19     ` T.V Raman
2024-01-23 19:36     ` Andrew Hyatt
     [not found] <m2il3mj961.fsf@gmail.com>
2024-01-22 18:50 ` Sergey Kostyaev
2024-01-22 20:31   ` Andrew Hyatt
2024-01-22 22:06     ` T.V Raman
2024-01-23  0:52       ` Andrew Hyatt
2024-01-23  1:57         ` T.V Raman
2024-01-23  3:00         ` Emanuel Berg
2024-01-23  3:49           ` Andrew Hyatt
2024-01-23  1:36 ` João Távora
2024-01-23  4:17   ` T.V Raman
2024-01-23 19:19   ` Andrew Hyatt
2024-01-24  1:26 ` contact
2024-01-24  4:17   ` T.V Raman
2024-01-24 15:00     ` Andrew Hyatt
2024-01-24 15:14       ` T.V Raman
2024-01-24 14:55   ` Andrew Hyatt
2024-01-24  2:28 ` Karthik Chikmagalur
2024-05-20 17:28 ` Juri Linkov

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.