all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Juri Linkov <juri@jurta.org>
To: Dani Moncayo <dmoncayo@gmail.com>
Cc: Juanma Barranquero <lekktu@gmail.com>, 13032@debbugs.gnu.org
Subject: bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
Date: Sat, 01 Dec 2012 02:34:41 +0200	[thread overview]
Message-ID: <874nk63aps.fsf@mail.jurta.org> (raw)
In-Reply-To: <CAH8Pv0isa6HCA6rMJmyATkLEHixzLG3oWJ6_FW=M=4Z0CyBFgQ@mail.gmail.com> (Dani Moncayo's message of "Fri, 30 Nov 2012 08:51:34 +0100")

>>>>   C-u M-| awk -- '!a[$0]++' RET
>
> But I agree that it would be even better if `delete-duplicate-lines'
> did TRT even when the lines are not sorted.  (I've just tested this
> feature in MS-Excel, and it is so: it doesn't requires that the lines
> are previously sorted)

Actually I use a slightly different command:

   C-u M-| tac | awk -- '!a[$0]++' | tac RET

because I need to keep the last duplicate line instead of the first.
`tac' reverses the lines, removes the duplicates keeping the first duplicate,
and another `tac' reverses lines back thus keeping the last duplicate.
So for `delete-duplicate-lines' to be useful in this case it could support
also the reverse search that keeps the last duplicate.

You can see this limitation described in docstrings of various functions at
http://emacswiki.org/emacs/DuplicateLines
as "keeping first occurrence", so these functions are of no help.

Adding an argument to keep either the first/last duplicate and an argument
to delete only adjacent lines, and using the algorithm like in awk,
and using the calling interface like in `flush-lines', necessitates
the following small function that can be called with the arg `C-u'
to keep the last duplicate line, and `C-u C-u' to delete only adjacent lines:

(defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive)
  "Delete duplicate lines in the region between RSTART and REND.
If REVERSE is nil, search and delete duplicates forward keeping the first
occurrence of duplicate lines.  If REVERSE is non-nil, search and delete
duplicates backward keeping the last occurrence of duplicate lines.
If ADJACENT is non-nil, delete repeated lines only if they are adjacent."
  (interactive
   (progn
     (barf-if-buffer-read-only)
     (list (region-beginning) (region-end)
           (equal current-prefix-arg '(4))
           (equal current-prefix-arg '(16))
           t)))
  (let ((lines (unless adjacent (make-hash-table :weakness 'key :test 'equal)))
        line prev-line
        (count 0)
        (rstart (copy-marker rstart))
        (rend (copy-marker rend)))
    (save-excursion
      (goto-char (if reverse rend rstart))
      (if (and reverse (bolp)) (forward-char -1))
      (while (if reverse
                 (and (> (point) rstart) (not (bobp)))
               (and (< (point) rend) (not (eobp))))
        (setq line (buffer-substring-no-properties
                    (line-beginning-position) (line-end-position)))
        (if (if adjacent (equal line prev-line) (gethash line lines))
            (progn
              (delete-region (progn (forward-line 0) (point))
                             (progn (forward-line 1) (point)))
              (if reverse (forward-line -1))
              (setq count (1+ count)))
          (if adjacent (setq prev-line line) (puthash line t lines))
          (forward-line (if reverse -1 1)))))
    (set-marker rstart nil)
    (set-marker rend nil)
    (when interactive
      (message "Deleted %d %sduplicate line%s%s"
               count
               (if adjacent "adjacent " "")
               (if (= count 1) "" "s")
               (if reverse " backward " "")))
    count))





  reply	other threads:[~2012-12-01  0:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-29 19:23 bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command Dani Moncayo
2012-11-29 20:49 ` Juanma Barranquero
2012-11-29 21:43   ` Dani Moncayo
2012-11-29 22:45     ` Juanma Barranquero
2012-11-30  0:31 ` Juri Linkov
2012-11-30  0:46   ` Juanma Barranquero
2012-11-30  0:50     ` Juanma Barranquero
2012-11-30  0:57       ` Juri Linkov
2012-11-30  1:02         ` Juanma Barranquero
2012-11-30  1:12     ` Juri Linkov
2012-11-30  7:51       ` Dani Moncayo
2012-12-01  0:34         ` Juri Linkov [this message]
2012-12-01  9:08           ` Dani Moncayo
2012-12-01  9:22             ` Dani Moncayo
2012-12-02  0:45             ` Juri Linkov
2012-12-02  9:13               ` Dani Moncayo
2012-12-03 23:49                 ` Juri Linkov
2012-12-04  0:05                   ` Juri Linkov
2012-12-04  9:13                     ` Dani Moncayo
2012-12-04 23:51                       ` Juri Linkov
2012-12-05  8:08                         ` Dani Moncayo
2012-11-30  7:51   ` Dani Moncayo
2012-12-04  7:04     ` Thierry Volpiatto
2012-12-04 14:46       ` Stefan Monnier
2012-12-04 15:02         ` Thierry Volpiatto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874nk63aps.fsf@mail.jurta.org \
    --to=juri@jurta.org \
    --cc=13032@debbugs.gnu.org \
    --cc=dmoncayo@gmail.com \
    --cc=lekktu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.