* Locating repetitions of text sequences
@ 2022-10-22 22:31 Heime via Users list for the GNU Emacs text editor
2022-10-24 3:23 ` Emanuel Berg
0 siblings, 1 reply; 2+ messages in thread
From: Heime via Users list for the GNU Emacs text editor @ 2022-10-22 22:31 UTC (permalink / raw)
To: Heime via Users list for the GNU Emacs text editor
https://emacs.stackexchange.com/posts/74219/timeline
Currently implementing a function that finds repeating sequences of text, length N.
Here is some text
Joseph Rudyard Kipling (30 December 1865 - 18 January 1936)
was an English novelist, short-story writer, poet, and
journalist. He was born in British India, which inspired
much of his work. English novelist, short-story writer,
poet, and journalist.
Kipling's works of fiction include the Jungle Book duology
(The Jungle Book, 1894; The Second Jungle Book, 1895). His
poems include "Mandalay" (1890), "Gunga Din" (1890), "The Gods
of the Copybook Headings" (1919), and "The White Man's Burden"
(1899).
With N=5, the first "Search Sequence" with five components is
--------
Joseph Rudyard Kipling (30 December
--------
Which I match with consecutive "Text Extracts" (each time shifted by one component)
--------
Joseph Rudyard Kipling (30 December
Rudyard Kipling (30 December 1865
Kipling (30 December 1865 -
--------
and so on.
Then repeat with again with "Search Sequence"
Joseph Rudyard Kipling (30 December
--------------------
Suppose I now reach the "Search Sequence"
---------
novelist, short-story writer, poet, and
---------
then use the following "Text Extracts"
--------
Kipling (30 December 1865 -
Joseph Rudyard Kipling (30 December
Rudyard Kipling (30 December 1865
(30 December 1865 - 18
December 1865 - 18 January
1865 - 18 January 1936)
--------
continued with
--------
English novelist, short-story writer, poet,
novelist, short-story writer, poet, and
short-story writer, poet, and journalist.
writer, poet, and journalist. Kipling's
--------
where a match is found in the second piece
One then outputs the line number where the match was found, together with the
repeating part.
--------
4- novelist, short-story writer, poet, and
--------
Continuing so till the end of the buffer
Have started with the following function
---------
(defun wseqn ()
"Search buffer for repeating phrases with N number of words."
(interactive)
(let (N x regex-search)
(setq N (read-number "How many words to search?: " 5))
(setq x 1)
(save-excursion
(while
(< x (length (buffer-string))
(save-excursion
(let (p1 p2 (case-fold-search t))
(setq p1 x)
;; After search N words forward, set end point as index
;; of the last char of those words
(dotimes (y N (setq p2 (point)))
(skip-chars-forward "_a-z0-9"))
(setq regex-search
(buffer-substring-no-properties p1 p2)))
(message "regex-search %S" regex-search)
;; Only forward search is necessary. If it was repeated
;; behind, it would have been caught in previous
;; iterations. This implementation also captures the
;; same repeated phrase by multiple earlier searches.
(save-excursion
(while (search-forward regex-search nil t)
(let (p2)
(setq p2 (point))
(goto-char (- p2 (length regex-search)))
(push-mark p2))))
(setq x (+ x (skip-chars-forward "_a-zA-Z0-9") 1))))))))
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Locating repetitions of text sequences
2022-10-22 22:31 Locating repetitions of text sequences Heime via Users list for the GNU Emacs text editor
@ 2022-10-24 3:23 ` Emanuel Berg
0 siblings, 0 replies; 2+ messages in thread
From: Emanuel Berg @ 2022-10-24 3:23 UTC (permalink / raw)
To: help-gnu-emacs
Heime via Users list for the GNU Emacs text editor wrote:
> https://emacs.stackexchange.com/posts/74219/timeline
>
> Currently implementing a function that finds repeating
> sequences of text, length N.
`re-search-forward' or `how-many' depending on what you want
to do exactly ...
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-10-24 3:23 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-22 22:31 Locating repetitions of text sequences Heime via Users list for the GNU Emacs text editor
2022-10-24 3:23 ` Emanuel Berg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).