unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* How to implement line sorting, uniquifying and counting function in emacs?
@ 2002-09-30  2:13 gnuist006
  2002-09-30  5:15 ` Evgeny Roubinchtein
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: gnuist006 @ 2002-09-30  2:13 UTC (permalink / raw)


In shell you can do this:

   cat file | sort | uniq -d | wc 

to count the repeated lines. You can also do

   cat file | sort | uniq -u | wc

to count the unique lines.

Sometimes I have to do this on windows platform where I do have emacs.
This means that I cannot escape to shell and that route is not available.

Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
know the equivalent to wc.

This is where some help is requested. I think that this is not only a
problem of lisp programming, but also algorithms. Which group has this
kind of expertise?

Cheers!
gnuist

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to implement line sorting, uniquifying and counting function in emacs?
  2002-09-30  2:13 How to implement line sorting, uniquifying and counting function in emacs? gnuist006
@ 2002-09-30  5:15 ` Evgeny Roubinchtein
  2002-09-30  5:23 ` Evgeny Roubinchtein
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Evgeny Roubinchtein @ 2002-09-30  5:15 UTC (permalink / raw)



,----
| In shell you can do this:
|    cat file | sort | uniq -d | wc 
| 
| to count the repeated lines. You can also do
| 
|    cat file | sort | uniq -u | wc
| 
| to count the unique lines.
| 
| Sometimes I have to do this on windows platform where I do have emacs.
| This means that I cannot escape to shell and that route is not available.
| 
| Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
| know the equivalent to wc.
`----

Assuming the text you are interested in is in a buffer, one apprach is
to use the `sort-lines' function.  Once the lines are sorted, it's
pretty easy to count unique and non-unique lines.  That's one
approach.

(defun count-repeated-lines (&optional beg end)
  (let ((buf (current-buffer))
	(repeated-count 0)
	(unique-count 0)
	(cur-line nil)
	(prev-line nil))
    (with-temp-buffer
      (insert-buffer-substring buf 
			       (and beg 
				    (with-current-buffer buf
				      (save-excursion
					(goto-char beg)
					(line-beginning-position))))  
			       end)
      (sort-lines nil (point-min) (point-max))
      ;; put a dummy line before the text to make the loop simpler
      (goto-char (point-min))
      (insert "\n")
      (goto-char (point-min))
      (while (and (zerop (forward-line 1)) (/= (point) (point-max)))
	(setq cur-line (buffer-substring-no-properties (point) 
						       (save-excursion (end-of-line) 
								       (point))))
	(if (and prev-line (string= prev-line cur-line))
	    (setq repeated-count (1+ repeated-count))
	  (setq unique-count (1+ unique-count)))
	(setq prev-line cur-line))
      (cons unique-count repeated-count))))

Instead of sorting lines, you could use Emacs built-in hash tables
(built-in as of GNU Emacs v21, not sure what version of XEmacs first
introduced hash tables) to keep track of lines you've encountered so
far.  (You also don't need a temporary buffer in that case).

(defun count-repeated-lines (&optional beg end)
  (let ((buf (current-buffer))
	(beg (or (and beg (save-excursion (goto-char beg) (line-beginning-position))) 
		 (point-min)))
	(end (or end (point-max)))
	(lines-hash (make-hash-table :test #'equal))
	(unique-count 0)
	(repeated-count 0)
	(cur-line nil))
    (save-excursion 
      (goto-char beg)
      (beginning-of-line)
      (while (< (point) end)
	(setq cur-line (buffer-substring-no-properties (point) 
						       (save-excursion (end-of-line) 
								       (point))))
	(if (gethash cur-line lines-hash)
	    (setq repeated-count (1+ repeated-count))
	  (setq unique-count (1+ unique-count))
	  (puthash cur-line t lines-hash))
	(forward-line))
      (cons unique-count repeated-count ))))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to implement line sorting, uniquifying and counting function in emacs?
  2002-09-30  2:13 How to implement line sorting, uniquifying and counting function in emacs? gnuist006
  2002-09-30  5:15 ` Evgeny Roubinchtein
@ 2002-09-30  5:23 ` Evgeny Roubinchtein
  2002-09-30  7:04 ` Marc Spitzer
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Evgeny Roubinchtein @ 2002-09-30  5:23 UTC (permalink / raw)


Oops... Just noticed I didn't really need to insert a dummy newline
when using a temp buffer.

(defun count-repeated-lines (&optional beg end)
  (let ((buf (current-buffer))
	(repeated-count 0)
	(unique-count 0)
	(cur-line nil)
	(prev-line nil))
    (with-temp-buffer
      (insert-buffer-substring buf 
			       (and beg 
				    (with-current-buffer buf
				      (save-excursion
					(goto-char beg)
					(line-beginning-position))))  
			       end)
      (sort-lines nil (point-min) (point-max))
      (goto-char (point-min))
      (while (/= (point) (point-max))
	(setq cur-line (buffer-substring-no-properties (point) 
						       (save-excursion (end-of-line) 
								       (point))))
	(if (and prev-line (string= prev-line cur-line))
	    (setq repeated-count (1+ repeated-count))
	  (setq unique-count (1+ unique-count)))
	(setq prev-line cur-line)
	(forward-line))
      (cons unique-count repeated-count))))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to implement line sorting, uniquifying and counting function in emacs?
  2002-09-30  2:13 How to implement line sorting, uniquifying and counting function in emacs? gnuist006
  2002-09-30  5:15 ` Evgeny Roubinchtein
  2002-09-30  5:23 ` Evgeny Roubinchtein
@ 2002-09-30  7:04 ` Marc Spitzer
  2002-09-30  8:12 ` Jens Schmidt
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Marc Spitzer @ 2002-09-30  7:04 UTC (permalink / raw)


gnuist006@hotmail.com (gnuist006) wrote in 
news:b00bb831.0209291813.72d27e23@posting.google.com:

> In shell you can do this:
> 
>    cat file | sort | uniq -d | wc 
> 
> to count the repeated lines. You can also do
> 
>    cat file | sort | uniq -u | wc
> 
> to count the unique lines.
> 
> Sometimes I have to do this on windows platform where I do have emacs.
> This means that I cannot escape to shell and that route is not 
available.
> 
> Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
> know the equivalent to wc.
> 
> This is where some help is requested. I think that this is not only a
> problem of lisp programming, but also algorithms. Which group has this
> kind of expertise?
> 
> Cheers!
> gnuist
> 

if elisp has hashes do the following:
1: open file
2: for each line set it as the key of the hash 
   and add 1 to the previous value, first time
   set it to 1
3a: for the uniq -u count the number of keys
3b: for the uniq -d for each value > 1 add it to 
    a total then print the total
3c: for the truely uniq lines, value == 1, count 
    the number of keys who have a value == 1 and 
    print

marc

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to implement line sorting, uniquifying and counting function in emacs?
  2002-09-30  2:13 How to implement line sorting, uniquifying and counting function in emacs? gnuist006
                   ` (2 preceding siblings ...)
  2002-09-30  7:04 ` Marc Spitzer
@ 2002-09-30  8:12 ` Jens Schmidt
  2002-09-30 16:14 ` Kaz Kylheku
  2002-10-01  8:00 ` Steven M. Haflich
  5 siblings, 0 replies; 7+ messages in thread
From: Jens Schmidt @ 2002-09-30  8:12 UTC (permalink / raw)


Sorting:

  M-x apropos sort-.* RET

Counting:

  M-x apropos count.*lines RET

The only non-trivial part is uniquifying of buffer lines:

  M-x query-replace-regexp ^\(.*^Q^J\)\1+ \1 RET

where you need to type ^Q^J as C-q C-j, of course.  A non-interactive
variant should be as easy as the interactive.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to implement line sorting, uniquifying and counting function in emacs?
  2002-09-30  2:13 How to implement line sorting, uniquifying and counting function in emacs? gnuist006
                   ` (3 preceding siblings ...)
  2002-09-30  8:12 ` Jens Schmidt
@ 2002-09-30 16:14 ` Kaz Kylheku
  2002-10-01  8:00 ` Steven M. Haflich
  5 siblings, 0 replies; 7+ messages in thread
From: Kaz Kylheku @ 2002-09-30 16:14 UTC (permalink / raw)


gnuist006@hotmail.com (gnuist006) wrote in message news:<b00bb831.0209291813.72d27e23@posting.google.com>... 
> Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
> know the equivalent to wc.

Lisp does not have sort-lines. *Emacs* Lisp has sort-lines. Please do
not include the comp.lang.lisp newsgroup in Emacs Lisp discussions.

Think before you crosspost; your question ought to have been directed
to the Emacs newsgroup only.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to implement line sorting, uniquifying and counting function in emacs?
  2002-09-30  2:13 How to implement line sorting, uniquifying and counting function in emacs? gnuist006
                   ` (4 preceding siblings ...)
  2002-09-30 16:14 ` Kaz Kylheku
@ 2002-10-01  8:00 ` Steven M. Haflich
  5 siblings, 0 replies; 7+ messages in thread
From: Steven M. Haflich @ 2002-10-01  8:00 UTC (permalink / raw)


gnuist006 wrote:

> Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
> know the equivalent to wc.

A pure Common Lisp equivalent is the following, reading standard-input:

(loop with last-line
     for line in (sort (loop as x = (read-line *standard-input* nil nil)
			  while x collect x)
		      #'string<)
     unless (equal last-line line)
     count 1
     do (setf last-line line))

Probably not want you wanted.  Probably meaningless to you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-10-01  8:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-30  2:13 How to implement line sorting, uniquifying and counting function in emacs? gnuist006
2002-09-30  5:15 ` Evgeny Roubinchtein
2002-09-30  5:23 ` Evgeny Roubinchtein
2002-09-30  7:04 ` Marc Spitzer
2002-09-30  8:12 ` Jens Schmidt
2002-09-30 16:14 ` Kaz Kylheku
2002-10-01  8:00 ` Steven M. Haflich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).