From: Evgeny Roubinchtein <zhenya@freeshell.org>
Subject: Re: How to implement line sorting, uniquifying and counting function in emacs?
Date: Mon, 30 Sep 2002 05:15:16 GMT [thread overview]
Message-ID: <874rc8dpnh.fsf@freeshell.org> (raw)
In-Reply-To: b00bb831.0209291813.72d27e23@posting.google.com
,----
| In shell you can do this:
| cat file | sort | uniq -d | wc
|
| to count the repeated lines. You can also do
|
| cat file | sort | uniq -u | wc
|
| to count the unique lines.
|
| Sometimes I have to do this on windows platform where I do have emacs.
| This means that I cannot escape to shell and that route is not available.
|
| Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
| know the equivalent to wc.
`----
Assuming the text you are interested in is in a buffer, one apprach is
to use the `sort-lines' function. Once the lines are sorted, it's
pretty easy to count unique and non-unique lines. That's one
approach.
(defun count-repeated-lines (&optional beg end)
(let ((buf (current-buffer))
(repeated-count 0)
(unique-count 0)
(cur-line nil)
(prev-line nil))
(with-temp-buffer
(insert-buffer-substring buf
(and beg
(with-current-buffer buf
(save-excursion
(goto-char beg)
(line-beginning-position))))
end)
(sort-lines nil (point-min) (point-max))
;; put a dummy line before the text to make the loop simpler
(goto-char (point-min))
(insert "\n")
(goto-char (point-min))
(while (and (zerop (forward-line 1)) (/= (point) (point-max)))
(setq cur-line (buffer-substring-no-properties (point)
(save-excursion (end-of-line)
(point))))
(if (and prev-line (string= prev-line cur-line))
(setq repeated-count (1+ repeated-count))
(setq unique-count (1+ unique-count)))
(setq prev-line cur-line))
(cons unique-count repeated-count))))
Instead of sorting lines, you could use Emacs built-in hash tables
(built-in as of GNU Emacs v21, not sure what version of XEmacs first
introduced hash tables) to keep track of lines you've encountered so
far. (You also don't need a temporary buffer in that case).
(defun count-repeated-lines (&optional beg end)
(let ((buf (current-buffer))
(beg (or (and beg (save-excursion (goto-char beg) (line-beginning-position)))
(point-min)))
(end (or end (point-max)))
(lines-hash (make-hash-table :test #'equal))
(unique-count 0)
(repeated-count 0)
(cur-line nil))
(save-excursion
(goto-char beg)
(beginning-of-line)
(while (< (point) end)
(setq cur-line (buffer-substring-no-properties (point)
(save-excursion (end-of-line)
(point))))
(if (gethash cur-line lines-hash)
(setq repeated-count (1+ repeated-count))
(setq unique-count (1+ unique-count))
(puthash cur-line t lines-hash))
(forward-line))
(cons unique-count repeated-count ))))
next prev parent reply other threads:[~2002-09-30 5:15 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-09-30 2:13 How to implement line sorting, uniquifying and counting function in emacs? gnuist006
2002-09-30 5:15 ` Evgeny Roubinchtein [this message]
2002-09-30 5:23 ` Evgeny Roubinchtein
2002-09-30 7:04 ` Marc Spitzer
2002-09-30 8:12 ` Jens Schmidt
2002-09-30 16:14 ` Kaz Kylheku
2002-10-01 8:00 ` Steven M. Haflich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874rc8dpnh.fsf@freeshell.org \
--to=zhenya@freeshell.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).