unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Locating repetitions of text sequences
@ 2022-10-22 22:31 Heime via Users list for the GNU Emacs text editor
  2022-10-24  3:23 ` Emanuel Berg
  0 siblings, 1 reply; 2+ messages in thread
From: Heime via Users list for the GNU Emacs text editor @ 2022-10-22 22:31 UTC (permalink / raw)
  To: Heime via Users list for the GNU Emacs text editor

https://emacs.stackexchange.com/posts/74219/timeline

Currently implementing a function that finds repeating sequences of text, length N.

Here is some text

Joseph Rudyard Kipling (30 December 1865 - 18 January 1936)
 was an English novelist, short-story writer, poet, and
 journalist. He was born in British India, which inspired
 much of his work.  English novelist, short-story writer,
 poet, and journalist.

 Kipling's works of fiction include the Jungle Book duology
 (The Jungle Book, 1894; The Second Jungle Book, 1895).  His

poems include "Mandalay" (1890), "Gunga Din" (1890), "The Gods

of the Copybook Headings" (1919), and "The White Man's Burden"
 (1899).

With N=5, the first "Search Sequence" with five components is

--------

Joseph Rudyard Kipling (30 December

--------
Which I match with consecutive "Text Extracts" (each time shifted by one component)

--------

Joseph Rudyard Kipling (30 December

Rudyard Kipling (30 December 1865

Kipling (30 December 1865 -

--------

and so on.

Then repeat with again with "Search Sequence"

Joseph Rudyard Kipling (30 December

--------------------

Suppose I now reach the "Search Sequence"

---------

novelist, short-story writer, poet, and

---------

then use the following "Text Extracts"

--------

Kipling (30 December 1865 -

Joseph Rudyard Kipling (30 December

Rudyard Kipling (30 December 1865

(30 December 1865 - 18

December 1865 - 18 January

1865 - 18 January 1936)

--------

continued with

--------

English novelist, short-story writer, poet,

novelist, short-story writer, poet, and

short-story writer, poet, and journalist.

writer, poet, and journalist. Kipling's

--------

where a match is found in the second piece

One then outputs the line number where the match was found, together with the

repeating part.

--------

4- novelist, short-story writer, poet, and

--------

Continuing so till the end of the buffer

Have started with the following function

---------

(defun wseqn ()

"Search buffer for repeating phrases with N number of words."

(interactive)

(let (N x regex-search)

(setq N (read-number "How many words to search?: " 5))

(setq x 1)

(save-excursion

(while

(< x (length (buffer-string))

(save-excursion

(let (p1 p2 (case-fold-search t))

(setq p1 x)

;; After search N words forward, set end point as index

;; of the last char of those words

(dotimes (y N (setq p2 (point)))

(skip-chars-forward "_a-z0-9"))

(setq regex-search

(buffer-substring-no-properties p1 p2)))

(message "regex-search %S" regex-search)

;; Only forward search is necessary. If it was repeated

;; behind, it would have been caught in previous

;; iterations.  This implementation also captures the

;; same repeated phrase by multiple earlier searches.

(save-excursion

(while (search-forward regex-search nil t)

(let (p2)

(setq p2 (point))

(goto-char (- p2 (length regex-search)))

(push-mark p2))))

(setq x (+ x (skip-chars-forward "_a-zA-Z0-9") 1))))))))

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Locating repetitions of text sequences
  2022-10-22 22:31 Locating repetitions of text sequences Heime via Users list for the GNU Emacs text editor
@ 2022-10-24  3:23 ` Emanuel Berg
  0 siblings, 0 replies; 2+ messages in thread
From: Emanuel Berg @ 2022-10-24  3:23 UTC (permalink / raw)
  To: help-gnu-emacs

Heime via Users list for the GNU Emacs text editor wrote:

> https://emacs.stackexchange.com/posts/74219/timeline
>
> Currently implementing a function that finds repeating
> sequences of text, length N.

`re-search-forward' or `how-many' depending on what you want
to do exactly ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-10-24  3:23 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-22 22:31 Locating repetitions of text sequences Heime via Users list for the GNU Emacs text editor
2022-10-24  3:23 ` Emanuel Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).