all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Org tag generator?
@ 2024-12-18 16:39 Christopher Howard
  2024-12-20 12:14 ` Jean Louis
  0 siblings, 1 reply; 7+ messages in thread
From: Christopher Howard @ 2024-12-18 16:39 UTC (permalink / raw)
  To: Help Gnu Emacs Mailing List

Hi, I use org-set-tags-command a lot in an org document where I store links and notes to various subjects of interest to me. I was wondering if I could have Emacs auto-generate tag suggestions based on the content in the node being tagged. Like, if the node has words like "senator" or "lawmakers" it suggests ":politics:". The suggestions being trained from the other nodes and tags in the document.

This sounds kind of like what an LLM does but the requirements are much lighter, since it just needs to know how I have tagged all my other nodes, which is only a few thousand nodes.

Is there a project out there like this already for Emacs? Or a suggestion on a not too complicated way to tie the tagging functionality to some external software that does this sort of thing?

-- 
📛 Christopher Howard
🚀 gemini://gem.librehacker.com
🌐 http://gem.librehacker.com

בראשית ברא אלהים את השמים ואת הארץ



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Org tag generator?
  2024-12-18 16:39 Org tag generator? Christopher Howard
@ 2024-12-20 12:14 ` Jean Louis
  2024-12-23 17:23   ` Christopher Howard
  0 siblings, 1 reply; 7+ messages in thread
From: Jean Louis @ 2024-12-20 12:14 UTC (permalink / raw)
  To: Christopher Howard; +Cc: Help Gnu Emacs Mailing List

* Christopher Howard <christopher@librehacker.com> [2024-12-18 19:42]:
> Hi, I use org-set-tags-command a lot in an org document where I
> store links and notes to various subjects of interest to me. I was
> wondering if I could have Emacs auto-generate tag suggestions based
> on the content in the node being tagged. Like, if the node has words
> like "senator" or "lawmakers" it suggests ":politics:". The
> suggestions being trained from the other nodes and tags in the
> document.

Well... sure it can work, somehow, as you know Emacs and Lisp can do
basically anything.

(defun generate-keywords-1 (s)
  (let* ((case-fold-search t)
	 (s (replace-regexp-in-string "[^[:word:]]" " " s))
	 (s (replace-regexp-in-string " +" " " s))
	 (s (replace-regexp-in-string "^[[:space:]]+" "" s))
	 (s (replace-regexp-in-string "[[:space:]]+$" "" s))
	 (s (split-string s))
	 (s (seq-uniq s))
	 (s (mapcar 'too-short s))
	 (s (delete nil s))
	 (s (string-join s ", ")))
    s))

(defun generate-keywords (string &optional from to)
  (interactive
   (if (use-region-p)
       (list nil (region-beginning) (region-end))
     (let ((bds (bounds-of-thing-at-point 'paragraph)))
       (list nil (car bds) (cdr bds)))))
  (let (workOnStringP inputStr outputStr)
    (setq workOnStringP (if string t nil))
    (setq inputStr (if workOnStringP string (buffer-substring-no-properties from to)))
    (setq outputStr (generate-keywords-1 inputStr))
    (if workOnStringP
	outputStr
      (save-excursion
	(delete-region from to)
	(goto-char from)
	(insert outputStr)))))

So now I use this function as:

(generate-keywords "This sounds kind of like what an LLM does but the requirements are much lighter, since it just needs to know how I have tagged all my other nodes, which is only a few thousand nodes.") ➜ "This, sounds, kind, like, what, does, requirements, much, lighter, since, just, needs, know, have, tagged, other, nodes, which, only, thousand"

Though that is not so specific. It would be better using Local LLM,
then you make it generate keywords:

(rcd-llm (concat "Generate only 3 major keywords for: " "Is there a project out there like this already for Emacs? Or a suggestion on a not too complicated way to tie the tagging functionality to some external software that does this sort of thing?"))

1. Emacs
2. Tagging
3. Integration

> Is there a project out there like this already for Emacs? Or a suggestion on a not too complicated way to tie the tagging functionality to some external software that does this sort of thing?

Here are 3 major keywords for the given text:

1. Emacs
2. Tagging
3. Integration


> -- 
> 📛 Christopher Howard
> 🚀 gemini://gem.librehacker.com
> 🌐 http://gem.librehacker.com
> 
> בראשית ברא אלהים את השמים ואת הארץ

Based on the provided text, the 3 major keywords are:

1. **Gemini**
2. **Librehacker**
3. **Hebrew** (or **Bible**, as the text appears to be a quote from the Book of Genesis)

* Christopher Howard <christopher@librehacker.com> [2024-12-18 19:42]:
> Hi, I use org-set-tags-command a lot in an org document where I
> store links and notes to various subjects of interest to me. I was
> wondering if I could have Emacs auto-generate tag suggestions based
> on the content in the node being tagged. Like, if the node has words
> like "senator" or "lawmakers" it suggests ":politics:". The
> suggestions being trained from the other nodes and tags in the
> document.


Here are 3 major keywords for the given text:

1. **Org-Mode**
2. **Tag Suggestions**
3. **Emacs Automation**

-- 
Jean Louis



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Org tag generator?
  2024-12-20 12:14 ` Jean Louis
@ 2024-12-23 17:23   ` Christopher Howard
  2024-12-23 20:21     ` Jean Louis
  0 siblings, 1 reply; 7+ messages in thread
From: Christopher Howard @ 2024-12-23 17:23 UTC (permalink / raw)
  To: Help Gnu Emacs Mailing List

Thank you Jean, for the response. I haven't went over your code carefully yet, but it seems like there is something missing here that I want. I perhaps should have stated this more explicitly. I want the generated tags to be based on the software learning how I tagged previous nodes, not simply how it thinks those nodes should be tagged in general. E.g., ":emacs:integration:" maybe are tags that do summarize well the contents of a node for a general audience, but if I am likely to tag it as ":emacs_tricks:" then, ideally, those are the tags that it should recommend. This would be based on the words in other nodes that I have tagged as ":emacs_tricks:".

I have a system in my mind whereby I apply certain tags based on how I think about the data or plan to use the data. What I want is for the software to learn that system based on how I have applied it to previous nodes.

Is this what is happening in the code you provided?

-- 
Christopher Howard



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Org tag generator?
  2024-12-23 17:23   ` Christopher Howard
@ 2024-12-23 20:21     ` Jean Louis
  2024-12-24 16:34       ` Christopher Howard
  0 siblings, 1 reply; 7+ messages in thread
From: Jean Louis @ 2024-12-23 20:21 UTC (permalink / raw)
  To: Christopher Howard; +Cc: Help Gnu Emacs Mailing List

* Christopher Howard <christopher@librehacker.com> [2024-12-23 20:25]:
> Thank you Jean, for the response. I haven't went over your code
> carefully yet, but it seems like there is something missing here
> that I want. I perhaps should have stated this more explicitly. I
> want the generated tags to be based on the software learning how I
> tagged previous nodes, not simply how it thinks those nodes should
> be tagged in general. E.g., ":emacs:integration:" maybe are tags
> that do summarize well the contents of a node for a general
> audience, but if I am likely to tag it as ":emacs_tricks:" then,
> ideally, those are the tags that it should recommend. This would be
> based on the words in other nodes that I have tagged as
> ":emacs_tricks:".

> I have a system in my mind whereby I apply certain tags based on how
> I think about the data or plan to use the data. What I want is for
> the software to learn that system based on how I have applied it to
> previous nodes.

learning:software:tags:system:previously:tagged:nodes:audience:general:emacs:integration:emacs_tricks:data:use

I have just generated above tags by using query:

generate tags based on this text and return single words as tags
separated by colon in single line

and I have used free software model:

mixtral-8x7b-32768, Groq, https://api.groq.com/openai/v1/chat/completions

> Is this what is happening in the code you provided?

About that. 

There are many solutions, though I see it is in running the LLM. I
have small free software model from Alibaba under Apache 2.0 license,
named Qwen, or as file: QwQ-LCoT-3B-Instruct.Q4_K_M.gguf running
locally here on my computer with Nvidia RTX 1050 Ti 4 GB VRAM.

And it can work well when better instructed, though I got result
":sources:integration:"

There is possibility that you make list of your own tags, and then you
supply the list and get it back returned provided it is relevant to
the text.

> What I'm looking for is a way to generate tags based on how I've
> tagged nodes in the past, rather than just how the software thinks a
> node should be tagged in general.

Above are your words, though rewritten.

Yes, I can understand it well. I have 1806 tags and 157971 documents
and people tagged with various tags. Tags represent flexible
INTERSECTION. And INTERSECTION is very important to me, I use it very
often.

For example, I may search for: coffee,price and get those tagged with
coffee and price, Arabica, Robusta, Single Origin, etc. There are many
ways how I use tags.

I think it is better that you make your own function which is related
to your own specific tags. 

You make function in one file and you make data set in the other
file. Or same one.

I used that feature to link my Markdown website pages with specific
important links.

It can work this way.

(defvar my-tags '(
		  ;; following would tag with "emacs" and "editing" if
		  ;; it finds "emacs" or "editing" in the paragraph
		  ("emacs" . '("emacs" "editing"))
		  ;; if it finds emacs and tricks in paragraph it would receive following tags
		  ("tricks" . '("emacs" "tricks"))
		  ("my-other-tag" . '("my word" "next word"))
		  )) ;

That would be very deterministic. You could or should define how to
know that some section should be tagged, and then you define list of
tags, and conditions when such tags should be used.

I think that is good way to go and much simpler.

> For example, even if ":emacs:integration:" might be a good summary
> of a node's content for a general audience, if I'm more likely to
> tag it as ":emacs_tricks:", then that's the tag I'd like the
> software to recommend.

Above words were rewritten.

Yes, I think this approach summarize it well, though I didn't give too
much of clarity.

> This recommendation would be based on the words in other nodes that
> I've tagged as ":emacs_tricks:".

I think that is great idea.

Now I could give some conditions to each tag, as each of my tags can
be also tagged and have its description and properties. For example, I
also have number of tags, let us say:

TAG: toddler                        

I could extract synonyms by using local dictionary:

- tot
- baby
- preschooler
- child

and then relate it to "toddler" and then if any of such word is
mentioned let us say "more than once" in section, it could get a tag.

So I could make a function to verify if any of defined words is
mentioned more than once.

employee

 18     security                       

Similar

 19     encryption                     

I could here add various encryption terms, to relate it to the tag.

 22     graphviz                       

Various related words like dot, etc. graph, could be related to tag
graphviz

 33     surveillance                   

Various tags related could be added here.

 57     relationships                  

And so on.

That is basically more or less manual "machine learning". Once
prepared, the definition then serves to parse your  text and
understand which tags to propose.

I say to propose, as it may need human intervention. But it would be
so much faster.

I am regularly, for decades, doing heavy text processing, from 1999. I
have made directories and specialized search engines before it was
popular on the market, at that time it was quite interesting.

Let me know if you agree on this concept, and then let us make it.

-- 
Jean Louis



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Org tag generator?
  2024-12-23 20:21     ` Jean Louis
@ 2024-12-24 16:34       ` Christopher Howard
  2024-12-24 20:47         ` Jean Louis
  2024-12-26 20:44         ` Jean Louis
  0 siblings, 2 replies; 7+ messages in thread
From: Christopher Howard @ 2024-12-24 16:34 UTC (permalink / raw)
  To: Help Gnu Emacs Mailing List

Jean Louis <bugs@gnu.support> writes:

> and then relate it to "toddler" and then if any of such word is
> mentioned let us say "more than once" in section, it could get a tag.
>
> So I could make a function to verify if any of defined words is
> mentioned more than once.

Now you are recommending a system whereby I myself have to manual recognize and record all specific words that lead me to assign a specific tag. This works, in a sense, but does nothing to reduce my work load. There might be, for example, thousands of plant-related words that lead me to apply the tag :botany:. I don't want to have to figure out what those words are specifically and make a record of them. I just want to intuitively apply the tag :botany: and then the software figures out, after observing many nodes, that the words "rainforest" or "flower" or should generate the tag :botany:.

-- 
Christopher Howard



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Org tag generator?
  2024-12-24 16:34       ` Christopher Howard
@ 2024-12-24 20:47         ` Jean Louis
  2024-12-26 20:44         ` Jean Louis
  1 sibling, 0 replies; 7+ messages in thread
From: Jean Louis @ 2024-12-24 20:47 UTC (permalink / raw)
  To: Christopher Howard; +Cc: Help Gnu Emacs Mailing List

* Christopher Howard <christopher@librehacker.com> [2024-12-24 19:35]:
> Jean Louis <bugs@gnu.support> writes:
> 
> > and then relate it to "toddler" and then if any of such word is
> > mentioned let us say "more than once" in section, it could get a tag.
> >
> > So I could make a function to verify if any of defined words is
> > mentioned more than once.
> 
> Now you are recommending a system whereby I myself have to manual recognize and record all specific words that lead me to assign a specific tag. This works, in a sense, but does nothing to reduce my work load. There might be, for example, thousands of plant-related words that lead me to apply the tag :botany:. I don't want to have to figure out what those words are specifically and make a record of them. I just want to intuitively apply the tag :botany: and then the software figures out, after observing many nodes, that the words "rainforest" or "flower" or should generate the tag :botany:.

Sounds like you wish to use the features of text recognition as it is
today provided in Large Language Model (LLM) software.

Try it out here:
https://huggingface.co/spaces/Qwen/Qwen2.5

Give it the instruction. Here is example:


Give me name of 3 different flowers

Sure! Here are three different flowers:

    Rose - Known for its beauty and fragrance, the rose is a popular flower available in many colors such as red, pink, white, and yellow.

    Daisy - Characterized by its simple, charming appearance with white petals and a yellow center, the daisy symbolizes innocence and purity.

    Tulip - A spring favorite, tulips come in a variety of colors and are known for their cup-shaped blooms. They are often associated with the Netherlands.

Give me 3 different car models

Certainly! Here are three different car models:

    Toyota Camry - A popular mid-size sedan known for its reliability and fuel efficiency.
    Ford Mustang - An iconic American muscle car known for its powerful engines and sporty design.
    Tesla Model S - A high-performance electric sedan known for its advanced technology, long range, and sleek design.

I will now give you lines, and you will recognize if some of these tags apply: '("BOTANY", "TRANSPORT", "TRAVEL", "GARDEN"), return tags joined with colon : suitable for Org mode.

Sure, I can help with that! Please provide the lines, and I'll tag them accordingly.

Toyota Camry - A popular mid-size sedan known for its reliability and fuel efficiency.

TRANSPORT
(the above answer TRANSPORT was given by software)

Daisy - Characterized by its simple, charming appearance with white petals and a yellow center, the daisy symbolizes innocence and purity.

BOTANY:GARDEN

--------------(づ。◕‿‿◕。)づ --------------------------

Once you gain understanding, then you can integrate it in Emacs to
become very very effortless.

I would gather list of those tags and then just add tags by using LLM,
it is on-the-fly.

Here is small demo how integration within Emacs works, though I don't
think this model is suitable for that task of tagging:
https://yewtu.be/watch?v=XrK-X7LZ6aE

I would make it so:

1. Make function to tag your Org documents.
2. Copy original-file.org to tagged-file.org for safety.
3. Work on tagged-file.org
4. Extract list of all Org files automatically.
5. Parse each section and use that list to invoke LLM, to return the tags. 
6. Insert returned tags by Emacs function into the section header.
7. Continue parsing.
8. Continue with all files.

-- 
Jean Louis



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Org tag generator?
  2024-12-24 16:34       ` Christopher Howard
  2024-12-24 20:47         ` Jean Louis
@ 2024-12-26 20:44         ` Jean Louis
  1 sibling, 0 replies; 7+ messages in thread
From: Jean Louis @ 2024-12-26 20:44 UTC (permalink / raw)
  To: Christopher Howard; +Cc: Help Gnu Emacs Mailing List

* Christopher Howard <christopher@librehacker.com> [2024-12-24 19:35]:
> Now you are recommending a system whereby I myself have to manual
> recognize and record all specific words that lead me to assign a
> specific tag. This works, in a sense, but does nothing to reduce my
> work load. There might be, for example, thousands of plant-related
> words that lead me to apply the tag :botany:. I don't want to have
> to figure out what those words are specifically and make a record of
> them. I just want to intuitively apply the tag :botany: and then the
> software figures out, after observing many nodes, that the words
> "rainforest" or "flower" or should generate the tag :botany:.

I have found deterministic way of getting keywords by using some
manual preparation.

I am using PostgreSQL database, and I am aware that this may sound
hard, though is not.

Full documents could easily be imported into PostgreSQL, and then
tsvector for full text search generated.

I have done that.

I have table hyobjects, with hyobjects_tokens, containing tsvector
type for English (or other language).

Let us say I need some 2 words terms from database, by their ranking,
then it is breeze:

SELECT 'Ugandan ' || sub_words[1] AS term, COUNT(*) AS count
FROM hyobjects
CROSS JOIN LATERAL regexp_matches(hyobjects_text, '\mUgandan\s+([a-zA-Z]+)', 'g') AS sub_words
WHERE NOT EXISTS (
    SELECT 1 
    FROM stopwords 
    WHERE LOWER(stopwords_name) = LOWER(sub_words[1])
)
GROUP BY sub_words[1]
ORDER BY count DESC;

         term         | count 
----------------------+-------
 Ugandan company      |    82
 Ugandan shillings    |    58
 Ugandan coffee       |    26
 Ugandan miners       |    12
 Ugandan people       |    12
 Ugandan shilling     |     8
 Ugandan Coffee       |     8
 Ugandan gold         |     7
 Ugandan Constitution |     6
 Ugandan government   |     6

I like this very deterministic way, so I will use it for automatic
referencing and tagging for easier sales and marketing purposes.

If you happen to need help to generate keywords this way, I am 😄
available.

-- 
Jean Louis



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-12-26 20:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-18 16:39 Org tag generator? Christopher Howard
2024-12-20 12:14 ` Jean Louis
2024-12-23 17:23   ` Christopher Howard
2024-12-23 20:21     ` Jean Louis
2024-12-24 16:34       ` Christopher Howard
2024-12-24 20:47         ` Jean Louis
2024-12-26 20:44         ` Jean Louis

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.