From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.bugs Subject: bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command Date: Sun, 02 Dec 2012 02:45:44 +0200 Organization: JURTA Message-ID: <87mwxxfhd3.fsf@mail.jurta.org> References: <87obig2ap2.fsf@mail.jurta.org> <874nk728ci.fsf@mail.jurta.org> <874nk63aps.fsf@mail.jurta.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1354409280 13136 80.91.229.3 (2 Dec 2012 00:48:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 2 Dec 2012 00:48:00 +0000 (UTC) Cc: Juanma Barranquero , 13032@debbugs.gnu.org To: Dani Moncayo Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Dec 02 01:48:12 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Texj2-0002nc-Mn for geb-bug-gnu-emacs@m.gmane.org; Sun, 02 Dec 2012 01:48:04 +0100 Original-Received: from localhost ([::1]:48320 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Texir-0004zD-8o for geb-bug-gnu-emacs@m.gmane.org; Sat, 01 Dec 2012 19:47:53 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:51490) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Texio-0004z1-Du for bug-gnu-emacs@gnu.org; Sat, 01 Dec 2012 19:47:51 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Texin-0002Nl-1G for bug-gnu-emacs@gnu.org; Sat, 01 Dec 2012 19:47:50 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:38671) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Texim-0002Nh-Tp for bug-gnu-emacs@gnu.org; Sat, 01 Dec 2012 19:47:48 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1Texkw-0001nh-5j for bug-gnu-emacs@gnu.org; Sat, 01 Dec 2012 19:50:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Juri Linkov Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 02 Dec 2012 00:50:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13032 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13032-submit@debbugs.gnu.org id=B13032.13544093946903 (code B ref 13032); Sun, 02 Dec 2012 00:50:02 +0000 Original-Received: (at 13032) by debbugs.gnu.org; 2 Dec 2012 00:49:54 +0000 Original-Received: from localhost ([127.0.0.1]:48922 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Texkn-0001nD-G2 for submit@debbugs.gnu.org; Sat, 01 Dec 2012 19:49:53 -0500 Original-Received: from ps18281.dreamhost.com ([69.163.218.105]:58079 helo=ps18281.dreamhostps.com) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Texkk-0001n2-U1 for 13032@debbugs.gnu.org; Sat, 01 Dec 2012 19:49:52 -0500 Original-Received: from localhost (ps18281.dreamhostps.com [69.163.218.105]) by ps18281.dreamhostps.com (Postfix) with ESMTP id AB43B451E1D5; Sat, 1 Dec 2012 16:47:35 -0800 (PST) In-Reply-To: (Dani Moncayo's message of "Sat, 1 Dec 2012 10:08:49 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:67748 Archived-At: > * I'm thinking that the ADJACENT argument is kinda unnecessary. I > can't think of a use-case where someone wants to remove only the > _adjacent_ duplicate lines but not the ones which aren't adjacent. > So, I think that both the interface and the implementation could be > simplified by removing that argument. The ADJACENT argument is an optimization that doesn't require additional memory (to store previous lines in the cache). This is necessary when the user needs to delete duplicate lines in a large sorted file. > * Why is needed the INTERACTIVE argument? I mean, Cannot that info > (whether the function has been called interactively) be retrieved > using some Lips primitive? There is called-interactively-p but as I understood, it is unreliable. This is why other similar commands like `flush-lines', `keep-lines', `how-many' use the INTERACTIVE argument. They use it for two purposes: to decide whether the active region should be used, and to decide whether the message should be displayed when called interactively. > * In case the INTERACTIVE argument is indeed necessary, it should be > explained in the docstring, no? Yes, below I copied this part from the docstring of `how-many'. > * I think that the docstring should explain also the return value > (number of duplicate lines deleted). Coincidentally, the return value will be explained in the same part of the docstring. The remaining problem is to decide where to put this command? The file replace.el is unsuitable because unlike `flush-lines' and unlike `how-many', `delete-duplicate-lines' doesn't use regexps. It seems the right place is sort.el because it also contains a related command `reverse-region'. This patch puts `delete-duplicate-lines' after `reverse-region' at the end of sort.el: === modified file 'lisp/sort.el' --- lisp/sort.el 2012-08-03 08:15:24 +0000 +++ lisp/sort.el 2012-12-02 00:44:42 +0000 @@ -562,6 +562,59 @@ (defun reverse-region (beg end) (setq ll (cdr ll))) (insert (car ll))))) +;;;###autoload +(defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive) + "Delete duplicate lines in the region between RSTART and REND. + +If REVERSE is nil, search and delete duplicates forward keeping the first +occurrence of duplicate lines. If REVERSE is non-nil (when called +interactively with C-u prefix), search and delete duplicates backward +keeping the last occurrence of duplicate lines. + +If ADJACENT is non-nil (when called interactively with two C-u prefixes), +delete repeated lines only if they are adjacent. + +When called from Lisp and INTERACTIVE is omitted or nil, return the number +of deleted duplicate lines, do not print it; if INTERACTIVE is t, the +function behaves in all respects as if it had been called interactively." + (interactive + (progn + (barf-if-buffer-read-only) + (list (region-beginning) (region-end) + (equal current-prefix-arg '(4)) + (equal current-prefix-arg '(16)) + t))) + (let ((lines (unless adjacent (make-hash-table :weakness 'key :test 'equal))) + line prev-line + (count 0) + (rstart (copy-marker rstart)) + (rend (copy-marker rend))) + (save-excursion + (goto-char (if reverse rend rstart)) + (if (and reverse (bolp)) (forward-char -1)) + (while (if reverse + (and (> (point) rstart) (not (bobp))) + (and (< (point) rend) (not (eobp)))) + (setq line (buffer-substring-no-properties + (line-beginning-position) (line-end-position))) + (if (if adjacent (equal line prev-line) (gethash line lines)) + (progn + (delete-region (progn (forward-line 0) (point)) + (progn (forward-line 1) (point))) + (if reverse (forward-line -1)) + (setq count (1+ count))) + (if adjacent (setq prev-line line) (puthash line t lines)) + (forward-line (if reverse -1 1))))) + (set-marker rstart nil) + (set-marker rend nil) + (when interactive + (message "Deleted %d %sduplicate line%s%s" + count + (if adjacent "adjacent " "") + (if (= count 1) "" "s") + (if reverse " backward " ""))) + count)) + (provide 'sort) ;;; sort.el ends here