From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Jean Louis <bugs@gnu.support>
Newsgroups: gmane.emacs.help
Subject: Re: Org tag generator?
Date: Mon, 23 Dec 2024 23:21:36 +0300
Message-ID: <Z2nGULkpAl9FUiU5@lco2>
References: <87zfktueqr.fsf@librehacker.com> <Z2VfjsJSq4HBDwWS@lco2>
 <878qs6ia9d.fsf@librehacker.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="19405"; mail-complaints-to="usenet@ciao.gmane.io"
User-Agent: Mutt/2.2.12 (2023-09-09)
Cc: Help Gnu Emacs Mailing List <help-gnu-emacs@gnu.org>
To: Christopher Howard <christopher@librehacker.com>
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Mon Dec 23 21:22:17 2024
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>)
	id 1tPowS-0004sR-B9
	for geh-help-gnu-emacs@m.gmane-mx.org; Mon, 23 Dec 2024 21:22:16 +0100
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <help-gnu-emacs-bounces@gnu.org>)
	id 1tPow3-0005Dh-3Q; Mon, 23 Dec 2024 15:21:51 -0500
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <bugs@gnu.support>) id 1tPow0-0005DB-Fb
 for help-gnu-emacs@gnu.org; Mon, 23 Dec 2024 15:21:49 -0500
Original-Received: from stw1.rcdrun.com ([217.170.207.13])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <bugs@gnu.support>) id 1tPovy-0000om-A2
 for help-gnu-emacs@gnu.org; Mon, 23 Dec 2024 15:21:48 -0500
Original-Received: from localhost ([::ffff:41.75.176.71])
 (AUTH: PLAIN admin, TLS: TLS1.3,256bits,ECDHE_RSA_AES_256_GCM_SHA384)
 by stw1.rcdrun.com with ESMTPSA
 id 000000000007DC85.000000006769C656.00110C56; Mon, 23 Dec 2024 13:21:41 -0700
Mail-Followup-To: Christopher Howard <christopher@librehacker.com>,
 Help Gnu Emacs Mailing List <help-gnu-emacs@gnu.org>
Content-Disposition: inline
In-Reply-To: <878qs6ia9d.fsf@librehacker.com>
Received-SPF: pass client-ip=217.170.207.13; envelope-from=bugs@gnu.support;
 helo=stw1.rcdrun.com
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>,
 <mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
 <mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.emacs.help:148934
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/148934>

* Christopher Howard <christopher@librehacker.com> [2024-12-23 20:25]:
> Thank you Jean, for the response. I haven't went over your code
> carefully yet, but it seems like there is something missing here
> that I want. I perhaps should have stated this more explicitly. I
> want the generated tags to be based on the software learning how I
> tagged previous nodes, not simply how it thinks those nodes should
> be tagged in general. E.g., ":emacs:integration:" maybe are tags
> that do summarize well the contents of a node for a general
> audience, but if I am likely to tag it as ":emacs_tricks:" then,
> ideally, those are the tags that it should recommend. This would be
> based on the words in other nodes that I have tagged as
> ":emacs_tricks:".

> I have a system in my mind whereby I apply certain tags based on how
> I think about the data or plan to use the data. What I want is for
> the software to learn that system based on how I have applied it to
> previous nodes.

learning:software:tags:system:previously:tagged:nodes:audience:general:emacs:integration:emacs_tricks:data:use

I have just generated above tags by using query:

generate tags based on this text and return single words as tags
separated by colon in single line

and I have used free software model:

mixtral-8x7b-32768, Groq, https://api.groq.com/openai/v1/chat/completions

> Is this what is happening in the code you provided?

About that. 

There are many solutions, though I see it is in running the LLM. I
have small free software model from Alibaba under Apache 2.0 license,
named Qwen, or as file: QwQ-LCoT-3B-Instruct.Q4_K_M.gguf running
locally here on my computer with Nvidia RTX 1050 Ti 4 GB VRAM.

And it can work well when better instructed, though I got result
":sources:integration:"

There is possibility that you make list of your own tags, and then you
supply the list and get it back returned provided it is relevant to
the text.

> What I'm looking for is a way to generate tags based on how I've
> tagged nodes in the past, rather than just how the software thinks a
> node should be tagged in general.

Above are your words, though rewritten.

Yes, I can understand it well. I have 1806 tags and 157971 documents
and people tagged with various tags. Tags represent flexible
INTERSECTION. And INTERSECTION is very important to me, I use it very
often.

For example, I may search for: coffee,price and get those tagged with
coffee and price, Arabica, Robusta, Single Origin, etc. There are many
ways how I use tags.

I think it is better that you make your own function which is related
to your own specific tags. 

You make function in one file and you make data set in the other
file. Or same one.

I used that feature to link my Markdown website pages with specific
important links.

It can work this way.

(defvar my-tags '(
		  ;; following would tag with "emacs" and "editing" if
		  ;; it finds "emacs" or "editing" in the paragraph
		  ("emacs" . '("emacs" "editing"))
		  ;; if it finds emacs and tricks in paragraph it would receive following tags
		  ("tricks" . '("emacs" "tricks"))
		  ("my-other-tag" . '("my word" "next word"))
		  )) ;

That would be very deterministic. You could or should define how to
know that some section should be tagged, and then you define list of
tags, and conditions when such tags should be used.

I think that is good way to go and much simpler.

> For example, even if ":emacs:integration:" might be a good summary
> of a node's content for a general audience, if I'm more likely to
> tag it as ":emacs_tricks:", then that's the tag I'd like the
> software to recommend.

Above words were rewritten.

Yes, I think this approach summarize it well, though I didn't give too
much of clarity.

> This recommendation would be based on the words in other nodes that
> I've tagged as ":emacs_tricks:".

I think that is great idea.

Now I could give some conditions to each tag, as each of my tags can
be also tagged and have its description and properties. For example, I
also have number of tags, let us say:

TAG: toddler                        

I could extract synonyms by using local dictionary:

- tot
- baby
- preschooler
- child

and then relate it to "toddler" and then if any of such word is
mentioned let us say "more than once" in section, it could get a tag.

So I could make a function to verify if any of defined words is
mentioned more than once.

employee

 18     security                       

Similar

 19     encryption                     

I could here add various encryption terms, to relate it to the tag.

 22     graphviz                       

Various related words like dot, etc. graph, could be related to tag
graphviz

 33     surveillance                   

Various tags related could be added here.

 57     relationships                  

And so on.

That is basically more or less manual "machine learning". Once
prepared, the definition then serves to parse your  text and
understand which tags to propose.

I say to propose, as it may need human intervention. But it would be
so much faster.

I am regularly, for decades, doing heavy text processing, from 1999. I
have made directories and specialized search engines before it was
popular on the market, at that time it was quite interesting.

Let me know if you agree on this concept, and then let us make it.

-- 
Jean Louis