From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Evgeny Roubinchtein Newsgroups: gmane.emacs.help Subject: Re: How to implement line sorting, uniquifying and counting function in emacs? Date: Mon, 30 Sep 2002 05:15:16 GMT Organization: Giganews.Com - Premium News Outsourcing Sender: help-gnu-emacs-admin@gnu.org Message-ID: <874rc8dpnh.fsf@freeshell.org> References: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1033363613 24770 127.0.0.1 (30 Sep 2002 05:26:53 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 30 Sep 2002 05:26:53 +0000 (UTC) Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17vt4z-0006R3-00 for ; Mon, 30 Sep 2002 07:26:50 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17vt4d-00027H-00; Mon, 30 Sep 2002 01:26:27 -0400 Original-Path: shelby.stanford.edu!nntp.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.icl.net!newsfeed.fjserv.net!feed.news.nacamar.de!newsfeed.stueberl.de!cox.net!nntp2.aus1.giganews.com!nntp.giganews.com!nntp3.aus1.giganews.com!bin2.nnrp.aus1.giganews.com.POSTED!not-for-mail Original-Newsgroups: gnu.emacs.help User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.5 (brussels sprouts) X-GC-Trace: gv1-gZeLbtciEHT8KGF23tn0vMuW6ECMbRwCDpndA== Original-NNTP-Posting-Date: Mon, 30 Sep 2002 00:15:16 CDT Original-Lines: 79 Original-X-Trace: sv3-HDzwpcmyUYWzeE91f9S3+d+K83KWqnqItSCROO8qa1WoU3xG/UQZt894x56V8B/ETvlr2HEypL7KB03!ng/67i8fPkq4nsUxJVF4QW8Xn8eM6SYF1kn4zYvJNPZAaafJ8mTTDBx0FxmLNNPlrve/P1uWCkk= Original-X-Complaints-To: abuse@GigaNews.Com X-DMCA-Notifications: http://www.giganews.com/info/dmca.html X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.1 Original-Xref: nntp.stanford.edu gnu.emacs.help:105526 Original-To: help-gnu-emacs@gnu.org Errors-To: help-gnu-emacs-admin@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.help:2070 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:2070 ,---- | In shell you can do this: | cat file | sort | uniq -d | wc | | to count the repeated lines. You can also do | | cat file | sort | uniq -u | wc | | to count the unique lines. | | Sometimes I have to do this on windows platform where I do have emacs. | This means that I cannot escape to shell and that route is not available. | | Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not | know the equivalent to wc. `---- Assuming the text you are interested in is in a buffer, one apprach is to use the `sort-lines' function. Once the lines are sorted, it's pretty easy to count unique and non-unique lines. That's one approach. (defun count-repeated-lines (&optional beg end) (let ((buf (current-buffer)) (repeated-count 0) (unique-count 0) (cur-line nil) (prev-line nil)) (with-temp-buffer (insert-buffer-substring buf (and beg (with-current-buffer buf (save-excursion (goto-char beg) (line-beginning-position)))) end) (sort-lines nil (point-min) (point-max)) ;; put a dummy line before the text to make the loop simpler (goto-char (point-min)) (insert "\n") (goto-char (point-min)) (while (and (zerop (forward-line 1)) (/= (point) (point-max))) (setq cur-line (buffer-substring-no-properties (point) (save-excursion (end-of-line) (point)))) (if (and prev-line (string= prev-line cur-line)) (setq repeated-count (1+ repeated-count)) (setq unique-count (1+ unique-count))) (setq prev-line cur-line)) (cons unique-count repeated-count)))) Instead of sorting lines, you could use Emacs built-in hash tables (built-in as of GNU Emacs v21, not sure what version of XEmacs first introduced hash tables) to keep track of lines you've encountered so far. (You also don't need a temporary buffer in that case). (defun count-repeated-lines (&optional beg end) (let ((buf (current-buffer)) (beg (or (and beg (save-excursion (goto-char beg) (line-beginning-position))) (point-min))) (end (or end (point-max))) (lines-hash (make-hash-table :test #'equal)) (unique-count 0) (repeated-count 0) (cur-line nil)) (save-excursion (goto-char beg) (beginning-of-line) (while (< (point) end) (setq cur-line (buffer-substring-no-properties (point) (save-excursion (end-of-line) (point)))) (if (gethash cur-line lines-hash) (setq repeated-count (1+ repeated-count)) (setq unique-count (1+ unique-count)) (puthash cur-line t lines-hash)) (forward-line)) (cons unique-count repeated-count ))))