unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
@ 2012-11-29 19:23 Dani Moncayo
  2012-11-29 20:49 ` Juanma Barranquero
  2012-11-30  0:31 ` Juri Linkov
  0 siblings, 2 replies; 25+ messages in thread
From: Dani Moncayo @ 2012-11-29 19:23 UTC (permalink / raw)
  To: 13032

Severity: wishlist

Recent versions of MS-Excel and also LibreOffice's Calc have a feature
that I find very useful: the ability of remove duplicate lines from a
given list (range).  I think it would be worth to add such a feature
to Emacs.

That is: provide a function `delete-duplicate-lines' (or some such)
that removes all duplicate lines in the active region and prints in
the echo area a message like "Duplicate lines removed: <n>".

TIA.

PS: There has been some discussion about this in this thread:
http://lists.gnu.org/archive/html/help-gnu-emacs/2012-11/msg00417.html.
 Jambunathan K provided a possible implementation, but it lacks the
message in the echo area (which I think is important).


In GNU Emacs 24.3.50.1 (i386-mingw-nt6.1.7601)
 of 2012-11-28 on MS-W7-DANI
Bzr revision: 111021 jay.p.belanger@gmail.com-20121128045113-o6xvwncuryx8al3u
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
 `configure --with-gcc (4.7) --no-opt --enable-checking --cflags
 -Ic:/emacs/libs/libXpm-3.5.10/include -Ic:/emacs/libs/libXpm-3.5.10/src
 -Ic:/emacs/libs/libpng-1.2.37-lib/include -Ic:/emacs/libs/zlib-1.2.5
 -Ic:/emacs/libs/giflib-4.1.4-1-lib/include
 -Ic:/emacs/libs/jpeg-6b-4-lib/include
 -Ic:/emacs/libs/tiff-3.8.2-1-lib/include
 -Ic:/emacs/libs/libxml2-2.7.8-w32-bin/include/libxml2
 -Ic:/emacs/libs/gnutls-3.0.9-w32-bin/include
 -Ic:/emacs/libs/libiconv-1.9.2-1-lib/include'

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1252
  default enable-multibyte-characters: t

-- 
Dani Moncayo





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-29 19:23 bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command Dani Moncayo
@ 2012-11-29 20:49 ` Juanma Barranquero
  2012-11-29 21:43   ` Dani Moncayo
  2012-11-30  0:31 ` Juri Linkov
  1 sibling, 1 reply; 25+ messages in thread
From: Juanma Barranquero @ 2012-11-29 20:49 UTC (permalink / raw)
  To: Dani Moncayo; +Cc: 13032

On Thu, Nov 29, 2012 at 8:23 PM, Dani Moncayo <dmoncayo@gmail.com> wrote:
> Severity: wishlist

> That is: provide a function `delete-duplicate-lines' (or some such)
> that removes all duplicate lines in the active region and prints in
> the echo area a message like "Duplicate lines removed: <n>".

Perhaps you can work from this (not very well tested):

(defun delete-duplicate-lines (beg end)
  "Delete consecutive duplicate lines in region BEG..END."
  (interactive "r")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (goto-char beg)
      (let ((kill-whole-line t)
            (last (buffer-substring (line-beginning-position)
(line-end-position)))
            (removed 0)
            current)
        (forward-line 1)
        (while (and (< (point) (or end 1))
                    (not (eobp)))
          (setq current (buffer-substring (line-beginning-position)
(line-end-position)))
          (if (string= last current)
              (progn
                (kill-line)
                (setq removed (1+ removed)))
            (setq last current)
            (forward-line 1)))
        (message "Duplicate lines removed: %d" removed)))))





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-29 20:49 ` Juanma Barranquero
@ 2012-11-29 21:43   ` Dani Moncayo
  2012-11-29 22:45     ` Juanma Barranquero
  0 siblings, 1 reply; 25+ messages in thread
From: Dani Moncayo @ 2012-11-29 21:43 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 13032

> Perhaps you can work from this (not very well tested):

Thank you Juanma.  I've given it a quick try and it seems to work.

I've only seen a minor detail that I don't like: when the command does
nothing (because there are no consecutive duplicate lines), the region
remains active.  But this is a general problem in Emacs which I've
already complained about (bug #10056).  IMO, the mark should be
deactivated after every command that operates on the active region,
without regard to whether the buffer was changed or not.  There could
be some exception, but this should be the general principle.

I'll put your version in my init file for now, while the maintainers
decide whether it is appropriate to add this command to Emacs or not.

Thanks.

-- 
Dani Moncayo





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-29 21:43   ` Dani Moncayo
@ 2012-11-29 22:45     ` Juanma Barranquero
  0 siblings, 0 replies; 25+ messages in thread
From: Juanma Barranquero @ 2012-11-29 22:45 UTC (permalink / raw)
  To: Dani Moncayo; +Cc: 13032

> I've only seen a minor detail that I don't like: when the command does
> nothing (because there are no consecutive duplicate lines), the region
> remains active.

Add a call to deactivate-mark at the end.

    Juanma





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-29 19:23 bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command Dani Moncayo
  2012-11-29 20:49 ` Juanma Barranquero
@ 2012-11-30  0:31 ` Juri Linkov
  2012-11-30  0:46   ` Juanma Barranquero
  2012-11-30  7:51   ` Dani Moncayo
  1 sibling, 2 replies; 25+ messages in thread
From: Juri Linkov @ 2012-11-30  0:31 UTC (permalink / raw)
  To: Dani Moncayo; +Cc: 13032

> That is: provide a function `delete-duplicate-lines' (or some such)
> that removes all duplicate lines in the active region and prints in
> the echo area a message like "Duplicate lines removed: <n>".

This is what I currently use to delete duplicate lines:

  C-u M-| awk -- '!a[$0]++' RET

Do you intend to create a Lisp function with the same result?





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-30  0:31 ` Juri Linkov
@ 2012-11-30  0:46   ` Juanma Barranquero
  2012-11-30  0:50     ` Juanma Barranquero
  2012-11-30  1:12     ` Juri Linkov
  2012-11-30  7:51   ` Dani Moncayo
  1 sibling, 2 replies; 25+ messages in thread
From: Juanma Barranquero @ 2012-11-30  0:46 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 13032

On Fri, Nov 30, 2012 at 1:31 AM, Juri Linkov <juri@jurta.org> wrote:

>   C-u M-| awk -- '!a[$0]++' RET

Isn't

  C-u M-| uniq RET

shorter and easier to type?





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-30  0:46   ` Juanma Barranquero
@ 2012-11-30  0:50     ` Juanma Barranquero
  2012-11-30  0:57       ` Juri Linkov
  2012-11-30  1:12     ` Juri Linkov
  1 sibling, 1 reply; 25+ messages in thread
From: Juanma Barranquero @ 2012-11-30  0:50 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 13032

(FWIW, yes, I'm aware that your awk script and uniq don't do the same
thing, but I think what Dani requested was in fact removing
consecutive duplicates...)





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-30  0:50     ` Juanma Barranquero
@ 2012-11-30  0:57       ` Juri Linkov
  2012-11-30  1:02         ` Juanma Barranquero
  0 siblings, 1 reply; 25+ messages in thread
From: Juri Linkov @ 2012-11-30  0:57 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 13032

> (FWIW, yes, I'm aware that your awk script and uniq don't do the same
> thing, but I think what Dani requested was in fact removing
> consecutive duplicates...)

I wonder why only consecutive duplicates?  The existing functions
`delete-duplicates' and `delete-dups' that operate on lists
don't delete just consecutive duplicates.  They delete all duplicates.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-30  0:57       ` Juri Linkov
@ 2012-11-30  1:02         ` Juanma Barranquero
  0 siblings, 0 replies; 25+ messages in thread
From: Juanma Barranquero @ 2012-11-30  1:02 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 13032

On Fri, Nov 30, 2012 at 1:57 AM, Juri Linkov <juri@jurta.org> wrote:

> I wonder why only consecutive duplicates?  The existing functions
> `delete-duplicates' and `delete-dups' that operate on lists
> don't delete just consecutive duplicates.  They delete all duplicates.

Yes. Dani has not said what's his use case.

    Juanma





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-30  0:46   ` Juanma Barranquero
  2012-11-30  0:50     ` Juanma Barranquero
@ 2012-11-30  1:12     ` Juri Linkov
  2012-11-30  7:51       ` Dani Moncayo
  1 sibling, 1 reply; 25+ messages in thread
From: Juri Linkov @ 2012-11-30  1:12 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 13032

>>   C-u M-| awk -- '!a[$0]++' RET
>
> Isn't
>
>   C-u M-| uniq RET
>
> shorter and easier to type?

I use `uniq' only on files where lines are sorted.  OTOH, something like
'!a[$0]++' that is not limited to consecutive duplicates is better for
files where lines are not sorted such as log files, etc.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-30  1:12     ` Juri Linkov
@ 2012-11-30  7:51       ` Dani Moncayo
  2012-12-01  0:34         ` Juri Linkov
  0 siblings, 1 reply; 25+ messages in thread
From: Dani Moncayo @ 2012-11-30  7:51 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Juanma Barranquero, 13032

>>>   C-u M-| awk -- '!a[$0]++' RET
>>
>> Isn't
>>
>>   C-u M-| uniq RET
>>
>> shorter and easier to type?
>
> I use `uniq' only on files where lines are sorted.  OTOH, something like
> '!a[$0]++' that is not limited to consecutive duplicates is better for
> files where lines are not sorted such as log files, etc.

My use cases usually involves compacting a collection of lines
gathered from several places.  So the compacting operation is normally
coupled with a sort operation.

Thus, the command provided by Juanma is good enough for these use
cases (I first do a `sort-lines' and then a `delete-duplicate-lines').

But I agree that it would be even better if `delete-duplicate-lines'
did TRT even when the lines are not sorted.  (I've just tested this
feature in MS-Excel, and it is so: it doesn't requires that the lines
are previously sorted)

Thank you.

-- 
Dani Moncayo





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-30  0:31 ` Juri Linkov
  2012-11-30  0:46   ` Juanma Barranquero
@ 2012-11-30  7:51   ` Dani Moncayo
  2012-12-04  7:04     ` Thierry Volpiatto
  1 sibling, 1 reply; 25+ messages in thread
From: Dani Moncayo @ 2012-11-30  7:51 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 13032

> This is what I currently use to delete duplicate lines:
>
>   C-u M-| awk -- '!a[$0]++' RET
>
> Do you intend to create a Lisp function with the same result?

I don't know awk, but I've tried that command and seems to do what I
want: remove all duplicate lines in the region.  Although it don't
inform about the number of lines deleted, which is important to me.

-- 
Dani Moncayo





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-30  7:51       ` Dani Moncayo
@ 2012-12-01  0:34         ` Juri Linkov
  2012-12-01  9:08           ` Dani Moncayo
  0 siblings, 1 reply; 25+ messages in thread
From: Juri Linkov @ 2012-12-01  0:34 UTC (permalink / raw)
  To: Dani Moncayo; +Cc: Juanma Barranquero, 13032

>>>>   C-u M-| awk -- '!a[$0]++' RET
>
> But I agree that it would be even better if `delete-duplicate-lines'
> did TRT even when the lines are not sorted.  (I've just tested this
> feature in MS-Excel, and it is so: it doesn't requires that the lines
> are previously sorted)

Actually I use a slightly different command:

   C-u M-| tac | awk -- '!a[$0]++' | tac RET

because I need to keep the last duplicate line instead of the first.
`tac' reverses the lines, removes the duplicates keeping the first duplicate,
and another `tac' reverses lines back thus keeping the last duplicate.
So for `delete-duplicate-lines' to be useful in this case it could support
also the reverse search that keeps the last duplicate.

You can see this limitation described in docstrings of various functions at
http://emacswiki.org/emacs/DuplicateLines
as "keeping first occurrence", so these functions are of no help.

Adding an argument to keep either the first/last duplicate and an argument
to delete only adjacent lines, and using the algorithm like in awk,
and using the calling interface like in `flush-lines', necessitates
the following small function that can be called with the arg `C-u'
to keep the last duplicate line, and `C-u C-u' to delete only adjacent lines:

(defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive)
  "Delete duplicate lines in the region between RSTART and REND.
If REVERSE is nil, search and delete duplicates forward keeping the first
occurrence of duplicate lines.  If REVERSE is non-nil, search and delete
duplicates backward keeping the last occurrence of duplicate lines.
If ADJACENT is non-nil, delete repeated lines only if they are adjacent."
  (interactive
   (progn
     (barf-if-buffer-read-only)
     (list (region-beginning) (region-end)
           (equal current-prefix-arg '(4))
           (equal current-prefix-arg '(16))
           t)))
  (let ((lines (unless adjacent (make-hash-table :weakness 'key :test 'equal)))
        line prev-line
        (count 0)
        (rstart (copy-marker rstart))
        (rend (copy-marker rend)))
    (save-excursion
      (goto-char (if reverse rend rstart))
      (if (and reverse (bolp)) (forward-char -1))
      (while (if reverse
                 (and (> (point) rstart) (not (bobp)))
               (and (< (point) rend) (not (eobp))))
        (setq line (buffer-substring-no-properties
                    (line-beginning-position) (line-end-position)))
        (if (if adjacent (equal line prev-line) (gethash line lines))
            (progn
              (delete-region (progn (forward-line 0) (point))
                             (progn (forward-line 1) (point)))
              (if reverse (forward-line -1))
              (setq count (1+ count)))
          (if adjacent (setq prev-line line) (puthash line t lines))
          (forward-line (if reverse -1 1)))))
    (set-marker rstart nil)
    (set-marker rend nil)
    (when interactive
      (message "Deleted %d %sduplicate line%s%s"
               count
               (if adjacent "adjacent " "")
               (if (= count 1) "" "s")
               (if reverse " backward " "")))
    count))





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-01  0:34         ` Juri Linkov
@ 2012-12-01  9:08           ` Dani Moncayo
  2012-12-01  9:22             ` Dani Moncayo
  2012-12-02  0:45             ` Juri Linkov
  0 siblings, 2 replies; 25+ messages in thread
From: Dani Moncayo @ 2012-12-01  9:08 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Juanma Barranquero, 13032

> (defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive)
>   "Delete duplicate lines in the region between RSTART and REND.
> If REVERSE is nil, search and delete duplicates forward keeping the first
> occurrence of duplicate lines.  If REVERSE is non-nil, search and delete
> duplicates backward keeping the last occurrence of duplicate lines.
> If ADJACENT is non-nil, delete repeated lines only if they are adjacent."

Looks pretty fine to me.  Your version is more general and versatile.

Some comments:
* Why is needed the INTERACTIVE command?  I mean, Cannot that info
(whether the function has been called interactively) be retrieved
using some Lips primitive?
* In case the INTERACTIVE command is indeed necessary, it should be
explained in the docstring, no?
* I think that the docstring should explain also the return value
(number of duplicate lines deleted).

Thank you Juri.  I hope Stefan or Chong add this feature to Emacs.

-- 
Dani Moncayo





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-01  9:08           ` Dani Moncayo
@ 2012-12-01  9:22             ` Dani Moncayo
  2012-12-02  0:45             ` Juri Linkov
  1 sibling, 0 replies; 25+ messages in thread
From: Dani Moncayo @ 2012-12-01  9:22 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Juanma Barranquero, 13032

>> (defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive)
>>   "Delete duplicate lines in the region between RSTART and REND.
>> If REVERSE is nil, search and delete duplicates forward keeping the first
>> occurrence of duplicate lines.  If REVERSE is non-nil, search and delete
>> duplicates backward keeping the last occurrence of duplicate lines.
>> If ADJACENT is non-nil, delete repeated lines only if they are adjacent."
>
> Looks pretty fine to me.  Your version is more general and versatile.
>
> Some comments:
> * Why is needed the INTERACTIVE command?  I mean, Cannot that info
> (whether the function has been called interactively) be retrieved
> using some Lips primitive?
> * In case the INTERACTIVE command is indeed necessary, it should be
> explained in the docstring, no?
> * I think that the docstring should explain also the return value
> (number of duplicate lines deleted).

Sorry, replace "command" by "argument" in the above paragraph.

Another comment:
* I'm thinking that the ADJACENT argument is kinda unnecessary.  I
can't think of a use-case where someone wants to remove only the
_adjacent_ duplicate lines but not the ones which aren't adjacent.
So, I think that both the interface and the implementation could be
simplified by removing that argument.

-- 
Dani Moncayo





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-01  9:08           ` Dani Moncayo
  2012-12-01  9:22             ` Dani Moncayo
@ 2012-12-02  0:45             ` Juri Linkov
  2012-12-02  9:13               ` Dani Moncayo
  1 sibling, 1 reply; 25+ messages in thread
From: Juri Linkov @ 2012-12-02  0:45 UTC (permalink / raw)
  To: Dani Moncayo; +Cc: Juanma Barranquero, 13032

> * I'm thinking that the ADJACENT argument is kinda unnecessary.  I
> can't think of a use-case where someone wants to remove only the
> _adjacent_ duplicate lines but not the ones which aren't adjacent.
> So, I think that both the interface and the implementation could be
> simplified by removing that argument.

The ADJACENT argument is an optimization that doesn't require
additional memory (to store previous lines in the cache).
This is necessary when the user needs to delete duplicate lines
in a large sorted file.

> * Why is needed the INTERACTIVE argument?  I mean, Cannot that info
> (whether the function has been called interactively) be retrieved
> using some Lips primitive?

There is called-interactively-p but as I understood, it is unreliable.
This is why other similar commands like `flush-lines', `keep-lines',
`how-many' use the INTERACTIVE argument.  They use it for two purposes:
to decide whether the active region should be used, and to decide whether
the message should be displayed when called interactively.

> * In case the INTERACTIVE argument is indeed necessary, it should be
> explained in the docstring, no?

Yes, below I copied this part from the docstring of `how-many'.

> * I think that the docstring should explain also the return value
> (number of duplicate lines deleted).

Coincidentally, the return value will be explained in the same part
of the docstring.

The remaining problem is to decide where to put this command?
The file replace.el is unsuitable because unlike `flush-lines' and
unlike `how-many', `delete-duplicate-lines' doesn't use regexps.

It seems the right place is sort.el because it also contains a related
command `reverse-region'.  This patch puts `delete-duplicate-lines'
after `reverse-region' at the end of sort.el:

=== modified file 'lisp/sort.el'
--- lisp/sort.el	2012-08-03 08:15:24 +0000
+++ lisp/sort.el	2012-12-02 00:44:42 +0000
@@ -562,6 +562,59 @@ (defun reverse-region (beg end)
 	(setq ll (cdr ll)))
       (insert (car ll)))))
 
+;;;###autoload
+(defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive)
+  "Delete duplicate lines in the region between RSTART and REND.
+
+If REVERSE is nil, search and delete duplicates forward keeping the first
+occurrence of duplicate lines.  If REVERSE is non-nil (when called
+interactively with C-u prefix), search and delete duplicates backward
+keeping the last occurrence of duplicate lines.
+
+If ADJACENT is non-nil (when called interactively with two C-u prefixes),
+delete repeated lines only if they are adjacent.
+
+When called from Lisp and INTERACTIVE is omitted or nil, return the number
+of deleted duplicate lines, do not print it; if INTERACTIVE is t, the
+function behaves in all respects as if it had been called interactively."
+  (interactive
+   (progn
+     (barf-if-buffer-read-only)
+     (list (region-beginning) (region-end)
+	   (equal current-prefix-arg '(4))
+	   (equal current-prefix-arg '(16))
+	   t)))
+  (let ((lines (unless adjacent (make-hash-table :weakness 'key :test 'equal)))
+	line prev-line
+	(count 0)
+	(rstart (copy-marker rstart))
+	(rend (copy-marker rend)))
+    (save-excursion
+      (goto-char (if reverse rend rstart))
+      (if (and reverse (bolp)) (forward-char -1))
+      (while (if reverse
+		 (and (> (point) rstart) (not (bobp)))
+	       (and (< (point) rend) (not (eobp))))
+	(setq line (buffer-substring-no-properties
+		    (line-beginning-position) (line-end-position)))
+	(if (if adjacent (equal line prev-line) (gethash line lines))
+	    (progn
+	      (delete-region (progn (forward-line 0) (point))
+			     (progn (forward-line 1) (point)))
+	      (if reverse (forward-line -1))
+	      (setq count (1+ count)))
+	  (if adjacent (setq prev-line line) (puthash line t lines))
+	  (forward-line (if reverse -1 1)))))
+    (set-marker rstart nil)
+    (set-marker rend nil)
+    (when interactive
+      (message "Deleted %d %sduplicate line%s%s"
+	       count
+	       (if adjacent "adjacent " "")
+	       (if (= count 1) "" "s")
+	       (if reverse " backward " "")))
+    count))
+
 (provide 'sort)
 
 ;;; sort.el ends here






^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-02  0:45             ` Juri Linkov
@ 2012-12-02  9:13               ` Dani Moncayo
  2012-12-03 23:49                 ` Juri Linkov
  0 siblings, 1 reply; 25+ messages in thread
From: Dani Moncayo @ 2012-12-02  9:13 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Juanma Barranquero, 13032

>> * I'm thinking that the ADJACENT argument is kinda unnecessary.  I
>> can't think of a use-case where someone wants to remove only the
>> _adjacent_ duplicate lines but not the ones which aren't adjacent.
>> So, I think that both the interface and the implementation could be
>> simplified by removing that argument.
>
> The ADJACENT argument is an optimization that doesn't require
> additional memory (to store previous lines in the cache).
> This is necessary when the user needs to delete duplicate lines
> in a large sorted file.

Ah, good point.  I guess that the optimization is twofold: in memory
and also in performance.  Then, IMO this should be explained in the
docstring, so that users know that they should use this feature when
running this command over a large chunk of lines.

Thank you.

-- 
Dani Moncayo





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-02  9:13               ` Dani Moncayo
@ 2012-12-03 23:49                 ` Juri Linkov
  2012-12-04  0:05                   ` Juri Linkov
  0 siblings, 1 reply; 25+ messages in thread
From: Juri Linkov @ 2012-12-03 23:49 UTC (permalink / raw)
  To: Dani Moncayo; +Cc: Juanma Barranquero, 13032-done

>> The ADJACENT argument is an optimization that doesn't require
>> additional memory (to store previous lines in the cache).
>> This is necessary when the user needs to delete duplicate lines
>> in a large sorted file.
>
> Ah, good point.  I guess that the optimization is twofold: in memory
> and also in performance.  Then, IMO this should be explained in the
> docstring, so that users know that they should use this feature when
> running this command over a large chunk of lines.

Thanks for the suggestion, I added this as well.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-03 23:49                 ` Juri Linkov
@ 2012-12-04  0:05                   ` Juri Linkov
  2012-12-04  9:13                     ` Dani Moncayo
  0 siblings, 1 reply; 25+ messages in thread
From: Juri Linkov @ 2012-12-04  0:05 UTC (permalink / raw)
  To: 13032

>>> The ADJACENT argument is an optimization that doesn't require
>>> additional memory (to store previous lines in the cache).
>>> This is necessary when the user needs to delete duplicate lines
>>> in a large sorted file.
>>
>> Ah, good point.  I guess that the optimization is twofold: in memory
>> and also in performance.  Then, IMO this should be explained in the
>> docstring, so that users know that they should use this feature when
>> running this command over a large chunk of lines.
>
> Thanks for the suggestion, I added this as well.

It just occurred to me that we could also add an alias `uniq' that will
call the command `delete-duplicate-lines' with non-nil ADJACENT arg.

We already have aliases like `mkdir' for `make-directory',
so the command `uniq' would be handy too.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-11-30  7:51   ` Dani Moncayo
@ 2012-12-04  7:04     ` Thierry Volpiatto
  2012-12-04 14:46       ` Stefan Monnier
  0 siblings, 1 reply; 25+ messages in thread
From: Thierry Volpiatto @ 2012-12-04  7:04 UTC (permalink / raw)
  To: 13032

Hi, just for info, here a simple and fast version.

Dani Moncayo <dmoncayo@gmail.com> writes:

>> This is what I currently use to delete duplicate lines:
>>
>>   C-u M-| awk -- '!a[$0]++' RET
>>
>> Do you intend to create a Lisp function with the same result?
>
> I don't know awk, but I've tried that command and seems to do what I
> want: remove all duplicate lines in the region.  Although it don't
> inform about the number of lines deleted, which is important to me.


--8<---------------cut here---------------start------------->8---
(defun delete-duplicate-lines (beg end)
  "Delete duplicate lines in region."
  (interactive "r")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (let ((lines (helm-fast-remove-dups
                    (split-string (buffer-string) "\n" t)
                    :test 'equal)))
        (delete-region (point-min) (point-max))
        (loop for l in lines do (insert (concat l "\n")))))))
--8<---------------cut here---------------end--------------->8---

helm-fast-remove-dups is a function in helm:
https://github.com/emacs-helm/helm/blob/master/helm-utils.el
line 342

For the number of lines removed it is easy to modify the function to do
so.

-- 
  Thierry
Get my Gnupg key:
gpg --keyserver pgp.mit.edu --recv-keys 59F29997 






^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-04  0:05                   ` Juri Linkov
@ 2012-12-04  9:13                     ` Dani Moncayo
  2012-12-04 23:51                       ` Juri Linkov
  0 siblings, 1 reply; 25+ messages in thread
From: Dani Moncayo @ 2012-12-04  9:13 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 13032

> It just occurred to me that we could also add an alias `uniq' that will
> call the command `delete-duplicate-lines' with non-nil ADJACENT arg.
>
> We already have aliases like `mkdir' for `make-directory',
> so the command `uniq' would be handy too.

Fine with me.

BTW, I've just noticed that the command doesn't deactivate the mark
when there is no duplicate lines in the region.  Could that be fixed?

Thank you.

-- 
Dani Moncayo





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-04  7:04     ` Thierry Volpiatto
@ 2012-12-04 14:46       ` Stefan Monnier
  2012-12-04 15:02         ` Thierry Volpiatto
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2012-12-04 14:46 UTC (permalink / raw)
  To: Thierry Volpiatto; +Cc: 13032

>       (let ((lines (helm-fast-remove-dups
>                     (split-string (buffer-string) "\n" t)
>                     :test 'equal)))
>         (delete-region (point-min) (point-max))
>         (loop for l in lines do (insert (concat l "\n")))))))

The inconvenient with this version is that any overlays/markers will
be lost, and the buffer will be marked as modified even if there were no
duplicate lines.


        Stefan





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-04 14:46       ` Stefan Monnier
@ 2012-12-04 15:02         ` Thierry Volpiatto
  0 siblings, 0 replies; 25+ messages in thread
From: Thierry Volpiatto @ 2012-12-04 15:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 13032

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>       (let ((lines (helm-fast-remove-dups
>>                     (split-string (buffer-string) "\n" t)
>>                     :test 'equal)))
>>         (delete-region (point-min) (point-max))
>>         (loop for l in lines do (insert (concat l "\n")))))))
>
> The inconvenient with this version is that any overlays/markers will
> be lost, and the buffer will be marked as modified even if there were no
> duplicate lines.
Ok, was just for info on a fast alternative without such enhancements.

-- 
  Thierry
Get my Gnupg key:
gpg --keyserver pgp.mit.edu --recv-keys 59F29997 





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-04  9:13                     ` Dani Moncayo
@ 2012-12-04 23:51                       ` Juri Linkov
  2012-12-05  8:08                         ` Dani Moncayo
  0 siblings, 1 reply; 25+ messages in thread
From: Juri Linkov @ 2012-12-04 23:51 UTC (permalink / raw)
  To: Dani Moncayo; +Cc: 13032

>> It just occurred to me that we could also add an alias `uniq' that will
>> call the command `delete-duplicate-lines' with non-nil ADJACENT arg.
>>
>> We already have aliases like `mkdir' for `make-directory',
>> so the command `uniq' would be handy too.
>
> Fine with me.

But the problem is that `uniq' might be confused with a similarly named
feature `uniquify' that uniquifies buffer names.

> BTW, I've just noticed that the command doesn't deactivate the mark
> when there is no duplicate lines in the region.  Could that be fixed?

This problem is not specific to `delete-duplicate-lines'.
All similar functions like e.g. `delete-matching-lines',
`delete-non-matching-lines' and `delete-blank-lines'
behave the same way.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
  2012-12-04 23:51                       ` Juri Linkov
@ 2012-12-05  8:08                         ` Dani Moncayo
  0 siblings, 0 replies; 25+ messages in thread
From: Dani Moncayo @ 2012-12-05  8:08 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 13032

>>> It just occurred to me that we could also add an alias `uniq' that will
>>> call the command `delete-duplicate-lines' with non-nil ADJACENT arg.
>>>
>>> We already have aliases like `mkdir' for `make-directory',
>>> so the command `uniq' would be handy too.
>>
>> Fine with me.
>
> But the problem is that `uniq' might be confused with a similarly named
> feature `uniquify' that uniquifies buffer names.

Indeed.  That is the problem of using such ambiguous names.  FWIW, I
have no particular interest in this `uniq' alias.

>> BTW, I've just noticed that the command doesn't deactivate the mark
>> when there is no duplicate lines in the region.  Could that be fixed?
>
> This problem is not specific to `delete-duplicate-lines'.
> All similar functions like e.g. `delete-matching-lines',
> `delete-non-matching-lines' and `delete-blank-lines'
> behave the same way.

Indeed.  I filed bug #10056 because of this kind of problem.  I've
included these cases in that bug report.

Thank you.

-- 
Dani Moncayo





^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-12-05  8:08 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-29 19:23 bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command Dani Moncayo
2012-11-29 20:49 ` Juanma Barranquero
2012-11-29 21:43   ` Dani Moncayo
2012-11-29 22:45     ` Juanma Barranquero
2012-11-30  0:31 ` Juri Linkov
2012-11-30  0:46   ` Juanma Barranquero
2012-11-30  0:50     ` Juanma Barranquero
2012-11-30  0:57       ` Juri Linkov
2012-11-30  1:02         ` Juanma Barranquero
2012-11-30  1:12     ` Juri Linkov
2012-11-30  7:51       ` Dani Moncayo
2012-12-01  0:34         ` Juri Linkov
2012-12-01  9:08           ` Dani Moncayo
2012-12-01  9:22             ` Dani Moncayo
2012-12-02  0:45             ` Juri Linkov
2012-12-02  9:13               ` Dani Moncayo
2012-12-03 23:49                 ` Juri Linkov
2012-12-04  0:05                   ` Juri Linkov
2012-12-04  9:13                     ` Dani Moncayo
2012-12-04 23:51                       ` Juri Linkov
2012-12-05  8:08                         ` Dani Moncayo
2012-11-30  7:51   ` Dani Moncayo
2012-12-04  7:04     ` Thierry Volpiatto
2012-12-04 14:46       ` Stefan Monnier
2012-12-04 15:02         ` Thierry Volpiatto

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).