all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Juri Linkov <juri@jurta.org>
To: Dani Moncayo <dmoncayo@gmail.com>
Cc: Juanma Barranquero <lekktu@gmail.com>, 13032@debbugs.gnu.org
Subject: bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
Date: Sun, 02 Dec 2012 02:45:44 +0200	[thread overview]
Message-ID: <87mwxxfhd3.fsf@mail.jurta.org> (raw)
In-Reply-To: <CAH8Pv0gEcUD3wM4QbHjhskW5TW9ztLgucPfYjLntjTPHRnDTJQ@mail.gmail.com> (Dani Moncayo's message of "Sat, 1 Dec 2012 10:08:49 +0100")

> * I'm thinking that the ADJACENT argument is kinda unnecessary.  I
> can't think of a use-case where someone wants to remove only the
> _adjacent_ duplicate lines but not the ones which aren't adjacent.
> So, I think that both the interface and the implementation could be
> simplified by removing that argument.

The ADJACENT argument is an optimization that doesn't require
additional memory (to store previous lines in the cache).
This is necessary when the user needs to delete duplicate lines
in a large sorted file.

> * Why is needed the INTERACTIVE argument?  I mean, Cannot that info
> (whether the function has been called interactively) be retrieved
> using some Lips primitive?

There is called-interactively-p but as I understood, it is unreliable.
This is why other similar commands like `flush-lines', `keep-lines',
`how-many' use the INTERACTIVE argument.  They use it for two purposes:
to decide whether the active region should be used, and to decide whether
the message should be displayed when called interactively.

> * In case the INTERACTIVE argument is indeed necessary, it should be
> explained in the docstring, no?

Yes, below I copied this part from the docstring of `how-many'.

> * I think that the docstring should explain also the return value
> (number of duplicate lines deleted).

Coincidentally, the return value will be explained in the same part
of the docstring.

The remaining problem is to decide where to put this command?
The file replace.el is unsuitable because unlike `flush-lines' and
unlike `how-many', `delete-duplicate-lines' doesn't use regexps.

It seems the right place is sort.el because it also contains a related
command `reverse-region'.  This patch puts `delete-duplicate-lines'
after `reverse-region' at the end of sort.el:

=== modified file 'lisp/sort.el'
--- lisp/sort.el	2012-08-03 08:15:24 +0000
+++ lisp/sort.el	2012-12-02 00:44:42 +0000
@@ -562,6 +562,59 @@ (defun reverse-region (beg end)
 	(setq ll (cdr ll)))
       (insert (car ll)))))
 
+;;;###autoload
+(defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive)
+  "Delete duplicate lines in the region between RSTART and REND.
+
+If REVERSE is nil, search and delete duplicates forward keeping the first
+occurrence of duplicate lines.  If REVERSE is non-nil (when called
+interactively with C-u prefix), search and delete duplicates backward
+keeping the last occurrence of duplicate lines.
+
+If ADJACENT is non-nil (when called interactively with two C-u prefixes),
+delete repeated lines only if they are adjacent.
+
+When called from Lisp and INTERACTIVE is omitted or nil, return the number
+of deleted duplicate lines, do not print it; if INTERACTIVE is t, the
+function behaves in all respects as if it had been called interactively."
+  (interactive
+   (progn
+     (barf-if-buffer-read-only)
+     (list (region-beginning) (region-end)
+	   (equal current-prefix-arg '(4))
+	   (equal current-prefix-arg '(16))
+	   t)))
+  (let ((lines (unless adjacent (make-hash-table :weakness 'key :test 'equal)))
+	line prev-line
+	(count 0)
+	(rstart (copy-marker rstart))
+	(rend (copy-marker rend)))
+    (save-excursion
+      (goto-char (if reverse rend rstart))
+      (if (and reverse (bolp)) (forward-char -1))
+      (while (if reverse
+		 (and (> (point) rstart) (not (bobp)))
+	       (and (< (point) rend) (not (eobp))))
+	(setq line (buffer-substring-no-properties
+		    (line-beginning-position) (line-end-position)))
+	(if (if adjacent (equal line prev-line) (gethash line lines))
+	    (progn
+	      (delete-region (progn (forward-line 0) (point))
+			     (progn (forward-line 1) (point)))
+	      (if reverse (forward-line -1))
+	      (setq count (1+ count)))
+	  (if adjacent (setq prev-line line) (puthash line t lines))
+	  (forward-line (if reverse -1 1)))))
+    (set-marker rstart nil)
+    (set-marker rend nil)
+    (when interactive
+      (message "Deleted %d %sduplicate line%s%s"
+	       count
+	       (if adjacent "adjacent " "")
+	       (if (= count 1) "" "s")
+	       (if reverse " backward " "")))
+    count))
+
 (provide 'sort)
 
 ;;; sort.el ends here






  parent reply	other threads:[~2012-12-02  0:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-29 19:23 bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command Dani Moncayo
2012-11-29 20:49 ` Juanma Barranquero
2012-11-29 21:43   ` Dani Moncayo
2012-11-29 22:45     ` Juanma Barranquero
2012-11-30  0:31 ` Juri Linkov
2012-11-30  0:46   ` Juanma Barranquero
2012-11-30  0:50     ` Juanma Barranquero
2012-11-30  0:57       ` Juri Linkov
2012-11-30  1:02         ` Juanma Barranquero
2012-11-30  1:12     ` Juri Linkov
2012-11-30  7:51       ` Dani Moncayo
2012-12-01  0:34         ` Juri Linkov
2012-12-01  9:08           ` Dani Moncayo
2012-12-01  9:22             ` Dani Moncayo
2012-12-02  0:45             ` Juri Linkov [this message]
2012-12-02  9:13               ` Dani Moncayo
2012-12-03 23:49                 ` Juri Linkov
2012-12-04  0:05                   ` Juri Linkov
2012-12-04  9:13                     ` Dani Moncayo
2012-12-04 23:51                       ` Juri Linkov
2012-12-05  8:08                         ` Dani Moncayo
2012-11-30  7:51   ` Dani Moncayo
2012-12-04  7:04     ` Thierry Volpiatto
2012-12-04 14:46       ` Stefan Monnier
2012-12-04 15:02         ` Thierry Volpiatto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mwxxfhd3.fsf@mail.jurta.org \
    --to=juri@jurta.org \
    --cc=13032@debbugs.gnu.org \
    --cc=dmoncayo@gmail.com \
    --cc=lekktu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.