From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?UTF-8?B?Sm/Do28gVMOhdm9yYQ==?= Newsgroups: gmane.emacs.devel Subject: Re: LLM Experiments, Part 1: Corrections Date: Tue, 23 Jan 2024 01:36:28 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19899"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org, sskostyaev@gmail.com To: Andrew Hyatt Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Jan 23 02:37:48 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rS5jY-00052i-HZ for ged-emacs-devel@m.gmane-mx.org; Tue, 23 Jan 2024 02:37:48 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rS5iX-0006dl-VK; Mon, 22 Jan 2024 20:36:46 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rS5iW-0006dY-OU for emacs-devel@gnu.org; Mon, 22 Jan 2024 20:36:44 -0500 Original-Received: from mail-lf1-x131.google.com ([2a00:1450:4864:20::131]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rS5iU-0000ge-RY for emacs-devel@gnu.org; Mon, 22 Jan 2024 20:36:44 -0500 Original-Received: by mail-lf1-x131.google.com with SMTP id 2adb3069b0e04-50e7d6565b5so4566055e87.0 for ; Mon, 22 Jan 2024 17:36:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705973800; x=1706578600; darn=gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cevgnjg/XTFHmYDu3fubBjsyQxaQcyfAgZv6Gbv3zrE=; b=gNKwqP0mecNzij06evP4SHqSSjIR3T0+9TwoS44c2bvQ3PtLwFmYXb3lAvLErB4zd7 2ae5nL4U8qB45G7SmXxW46dh03UNkcK6MOAJdxwLqnZUS9P/FT+DuP5Uv777+L4I40ey 0qTlddnD6dfi/9o2QkAG3hTvAaTIiZT83+T0Cb2IsCVjbcGSWAyB4oSgwP3l7IsjUjCt IgWNXJCtXrPBoh6vsJvYb4GseZRbcrFrGUIfq2WSWlje+wrcs/lNrrlf5/DFeO2mfPYF GlF8G7CUgTa3H2H8JhadW6IVtSuTmvr9SwJp4nbxTrRD21/rbZgEh4PkZeO1XWFVMmT1 sxXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705973800; x=1706578600; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cevgnjg/XTFHmYDu3fubBjsyQxaQcyfAgZv6Gbv3zrE=; b=hfQCNBEX7FXYaJiqnDSBlCype8dQo6PYlFlQY8/fQzX5l1+RqYz0HWs4jMMC3cZkQU hmgBXV2cBzk3sLI3qDSuRjHKBsT5XnFXXuhGlwfX1RxbrOiYVegdIyK7ko54DK5aydnU bz8hvKEbSmZwj0iOQWwyLuH3V5Jkb1slRxLMmrY+CCKtwGIwngYJxb0335avx7ObpHnj vg2dlZ48q0neRIrUQmY+7R6B/gDURQTchbUwDKmFR5xT4MhzleczMFl5iD2Vji0P4mYJ 1dYJhB/zm7KkBwAU+UD9exWkga9+6Qq1csof9Ov3gQONZFdBkw9MVGOLjLngxIxZp5dS B6/Q== X-Gm-Message-State: AOJu0YxEv0GNs9K1PBVWxnUjO9+FnlKmtKMA4dEJAJzm4kUSJfmrBqxB wPloyStT9VjgQP1zqrIAo9f4P6zMB59obZCwq6SnQJi2ZQxP8tW5Tr0ZmABIq3ZmqsVeatV/+EC QuHyhxnMvYZRPiNvdRuRaJKCBsNw= X-Google-Smtp-Source: AGHT+IHmY8mObIvXcJnEqfMUdMLEenJjpv2H405hgxwuXAjN0tPbEa2k6KnL2iqnoUZEIVogOuNumGFHLSPdD1XBgJ8= X-Received: by 2002:a05:6512:783:b0:50f:21e2:b346 with SMTP id x3-20020a056512078300b0050f21e2b346mr1474568lfr.129.1705973799868; Mon, 22 Jan 2024 17:36:39 -0800 (PST) In-Reply-To: Received-SPF: pass client-ip=2a00:1450:4864:20::131; envelope-from=joaotavora@gmail.com; helo=mail-lf1-x131.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:315231 Archived-At: On Mon, Jan 22, 2024 at 4:16=E2=80=AFAM Andrew Hyatt wro= te: > > > Hi everyone, Hi Andrew, I have some ideas to share, though keep in mind this is mainly thinking out loud and I'm largely an LLM newbie. > Question 1: Does the llm-flows.el file really belong in the llm > package? Maybe, but keep the functions isolated. I'd be interested in a diff-mode flow which is different from this ediff-one you demo. So it should be possible to build both. The diff-mode flow I'm thinking of would be similar to the diff option of LSP-proposed edits if your code, btw. See the variable eglot-confirm-server-edits for an idea of the interface. > Question 3: How should we deal with context? The code that has the > text corrector doesn't include surrounding context (the text > before and after the text to rewrite), but it usually is helpful. > How much context should we add? Karthik of gptel.el explained to me that this is one of the biggest challenges of working with LLMs, and that GitHub Copilot and other code-assistance tools work by sending not only the region you're interested in having the LLM help you with but also some auxiliary functions and context discovered heuristically. This is potentially complex, and likely doesn't belong in the your base llm.el but it should be possible to do somehow with an application build on top of llm.el (Karthik suggests tree-sitter or LSP's reference finding abilities to discover what's nearest in terms of context). In case noone mentinoed this already, i think a good logging facility is essential. This could go in the base llm.el library. I'm obviously biased towards my own jsonrpc.el logging facilities, where a separate easy-to-find buffer for each JSON-RPC connection lists all the JSON transport-level conversation details in a consistent format. jsonrpc.el clients can also use those logging facilities to output application-level details. In an LLM library, I suppose the equivalent to JSON transport-level details are the specific API calls to each provider, how it gathers context, prompts, etc. Those would be distinct for each LLM. A provider-agnosntic application built on top of llm.el's abstraction could log in a much more consistent way. So my main point regarding logging is that is should live in a readable log buffer, so it's easy to piece together what happened and debug. Representing JSON as pretty-printed plists is often very practical in my experience (though a bit slow if loads of text is to be printed). Maybe these logging transcripts could even be used to produce automated tests, in case there's a way to achieve any kind of determinism with LLMs (not sure if there is). Similarly to logging, it would be good to have some kind of visual feedback of what context is being sent in each LLM request. Like momentarily highlighting the regions to be sent alongside the prompt. Sometimes that is not feasible. So it could make sense to summarize that extra context in a few lines shown in the minibuffer perhaps. Like "lines 2..10 from foo.cpp\nlines42-420 from bar.cpp" So just my 200c, Good luck, Jo=C3=A3o