all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Heime via Users list for the GNU Emacs text editor <help-gnu-emacs@gnu.org>
To: Heime via Users list for the GNU Emacs text editor
	<help-gnu-emacs@gnu.org>
Subject: Locating repetitions of text sequences
Date: Sat, 22 Oct 2022 22:31:29 +0000	[thread overview]
Message-ID: <zcivt9XZBb83tF-j06AVfn6gnL25_CcANzOC76q3tHKKMCFQxoHaVzDLXwajLwlNwflCMnN8S6n7kuYrRvUttde7-0ymwFoa8dqy033cp5Y=@protonmail.com> (raw)

https://emacs.stackexchange.com/posts/74219/timeline

Currently implementing a function that finds repeating sequences of text, length N.

Here is some text

Joseph Rudyard Kipling (30 December 1865 - 18 January 1936)
 was an English novelist, short-story writer, poet, and
 journalist. He was born in British India, which inspired
 much of his work.  English novelist, short-story writer,
 poet, and journalist.

 Kipling's works of fiction include the Jungle Book duology
 (The Jungle Book, 1894; The Second Jungle Book, 1895).  His

poems include "Mandalay" (1890), "Gunga Din" (1890), "The Gods

of the Copybook Headings" (1919), and "The White Man's Burden"
 (1899).

With N=5, the first "Search Sequence" with five components is

--------

Joseph Rudyard Kipling (30 December

--------
Which I match with consecutive "Text Extracts" (each time shifted by one component)

--------

Joseph Rudyard Kipling (30 December

Rudyard Kipling (30 December 1865

Kipling (30 December 1865 -

--------

and so on.

Then repeat with again with "Search Sequence"

Joseph Rudyard Kipling (30 December

--------------------

Suppose I now reach the "Search Sequence"

---------

novelist, short-story writer, poet, and

---------

then use the following "Text Extracts"

--------

Kipling (30 December 1865 -

Joseph Rudyard Kipling (30 December

Rudyard Kipling (30 December 1865

(30 December 1865 - 18

December 1865 - 18 January

1865 - 18 January 1936)

--------

continued with

--------

English novelist, short-story writer, poet,

novelist, short-story writer, poet, and

short-story writer, poet, and journalist.

writer, poet, and journalist. Kipling's

--------

where a match is found in the second piece

One then outputs the line number where the match was found, together with the

repeating part.

--------

4- novelist, short-story writer, poet, and

--------

Continuing so till the end of the buffer

Have started with the following function

---------

(defun wseqn ()

"Search buffer for repeating phrases with N number of words."

(interactive)

(let (N x regex-search)

(setq N (read-number "How many words to search?: " 5))

(setq x 1)

(save-excursion

(while

(< x (length (buffer-string))

(save-excursion

(let (p1 p2 (case-fold-search t))

(setq p1 x)

;; After search N words forward, set end point as index

;; of the last char of those words

(dotimes (y N (setq p2 (point)))

(skip-chars-forward "_a-z0-9"))

(setq regex-search

(buffer-substring-no-properties p1 p2)))

(message "regex-search %S" regex-search)

;; Only forward search is necessary. If it was repeated

;; behind, it would have been caught in previous

;; iterations.  This implementation also captures the

;; same repeated phrase by multiple earlier searches.

(save-excursion

(while (search-forward regex-search nil t)

(let (p2)

(setq p2 (point))

(goto-char (- p2 (length regex-search)))

(push-mark p2))))

(setq x (+ x (skip-chars-forward "_a-zA-Z0-9") 1))))))))

             reply	other threads:[~2022-10-22 22:31 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-22 22:31 Heime via Users list for the GNU Emacs text editor [this message]
2022-10-24  3:23 ` Locating repetitions of text sequences Emanuel Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='zcivt9XZBb83tF-j06AVfn6gnL25_CcANzOC76q3tHKKMCFQxoHaVzDLXwajLwlNwflCMnN8S6n7kuYrRvUttde7-0ymwFoa8dqy033cp5Y=@protonmail.com' \
    --to=help-gnu-emacs@gnu.org \
    --cc=heimeborgia@protonmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.