From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.bugs Subject: bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command Date: Sat, 01 Dec 2012 02:34:41 +0200 Organization: JURTA Message-ID: <874nk63aps.fsf@mail.jurta.org> References: <87obig2ap2.fsf@mail.jurta.org> <874nk728ci.fsf@mail.jurta.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1354322400 17884 80.91.229.3 (1 Dec 2012 00:40:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 1 Dec 2012 00:40:00 +0000 (UTC) Cc: Juanma Barranquero , 13032@debbugs.gnu.org To: Dani Moncayo Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Dec 01 01:40:11 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Teb7q-0005zp-Sw for geb-bug-gnu-emacs@m.gmane.org; Sat, 01 Dec 2012 01:40:11 +0100 Original-Received: from localhost ([::1]:49565 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Teb7f-0007bW-Hi for geb-bug-gnu-emacs@m.gmane.org; Fri, 30 Nov 2012 19:39:59 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:53326) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Teb7c-0007Wm-1d for bug-gnu-emacs@gnu.org; Fri, 30 Nov 2012 19:39:57 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Teb7a-0001d6-PD for bug-gnu-emacs@gnu.org; Fri, 30 Nov 2012 19:39:55 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:37350) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Teb7a-0001cs-M5 for bug-gnu-emacs@gnu.org; Fri, 30 Nov 2012 19:39:54 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1Teb9e-0005KX-8Q for bug-gnu-emacs@gnu.org; Fri, 30 Nov 2012 19:42:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Juri Linkov Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 01 Dec 2012 00:42:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13032 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13032-submit@debbugs.gnu.org id=B13032.135432250220463 (code B ref 13032); Sat, 01 Dec 2012 00:42:02 +0000 Original-Received: (at 13032) by debbugs.gnu.org; 1 Dec 2012 00:41:42 +0000 Original-Received: from localhost ([127.0.0.1]:47601 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Teb9J-0005Jz-O9 for submit@debbugs.gnu.org; Fri, 30 Nov 2012 19:41:42 -0500 Original-Received: from ps18281.dreamhost.com ([69.163.218.105]:38343 helo=ps18281.dreamhostps.com) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Teb9F-0005Jl-VP for 13032@debbugs.gnu.org; Fri, 30 Nov 2012 19:41:39 -0500 Original-Received: from localhost (ps18281.dreamhostps.com [69.163.218.105]) by ps18281.dreamhostps.com (Postfix) with ESMTP id 96F14451E17F; Fri, 30 Nov 2012 16:39:28 -0800 (PST) In-Reply-To: (Dani Moncayo's message of "Fri, 30 Nov 2012 08:51:34 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:67700 Archived-At: >>>> C-u M-| awk -- '!a[$0]++' RET > > But I agree that it would be even better if `delete-duplicate-lines' > did TRT even when the lines are not sorted. (I've just tested this > feature in MS-Excel, and it is so: it doesn't requires that the lines > are previously sorted) Actually I use a slightly different command: C-u M-| tac | awk -- '!a[$0]++' | tac RET because I need to keep the last duplicate line instead of the first. `tac' reverses the lines, removes the duplicates keeping the first duplicate, and another `tac' reverses lines back thus keeping the last duplicate. So for `delete-duplicate-lines' to be useful in this case it could support also the reverse search that keeps the last duplicate. You can see this limitation described in docstrings of various functions at http://emacswiki.org/emacs/DuplicateLines as "keeping first occurrence", so these functions are of no help. Adding an argument to keep either the first/last duplicate and an argument to delete only adjacent lines, and using the algorithm like in awk, and using the calling interface like in `flush-lines', necessitates the following small function that can be called with the arg `C-u' to keep the last duplicate line, and `C-u C-u' to delete only adjacent lines: (defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive) "Delete duplicate lines in the region between RSTART and REND. If REVERSE is nil, search and delete duplicates forward keeping the first occurrence of duplicate lines. If REVERSE is non-nil, search and delete duplicates backward keeping the last occurrence of duplicate lines. If ADJACENT is non-nil, delete repeated lines only if they are adjacent." (interactive (progn (barf-if-buffer-read-only) (list (region-beginning) (region-end) (equal current-prefix-arg '(4)) (equal current-prefix-arg '(16)) t))) (let ((lines (unless adjacent (make-hash-table :weakness 'key :test 'equal))) line prev-line (count 0) (rstart (copy-marker rstart)) (rend (copy-marker rend))) (save-excursion (goto-char (if reverse rend rstart)) (if (and reverse (bolp)) (forward-char -1)) (while (if reverse (and (> (point) rstart) (not (bobp))) (and (< (point) rend) (not (eobp)))) (setq line (buffer-substring-no-properties (line-beginning-position) (line-end-position))) (if (if adjacent (equal line prev-line) (gethash line lines)) (progn (delete-region (progn (forward-line 0) (point)) (progn (forward-line 1) (point))) (if reverse (forward-line -1)) (setq count (1+ count))) (if adjacent (setq prev-line line) (puthash line t lines)) (forward-line (if reverse -1 1))))) (set-marker rstart nil) (set-marker rend nil) (when interactive (message "Deleted %d %sduplicate line%s%s" count (if adjacent "adjacent " "") (if (= count 1) "" "s") (if reverse " backward " ""))) count))