* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command @ 2012-11-29 19:23 Dani Moncayo 2012-11-29 20:49 ` Juanma Barranquero 2012-11-30 0:31 ` Juri Linkov 0 siblings, 2 replies; 25+ messages in thread From: Dani Moncayo @ 2012-11-29 19:23 UTC (permalink / raw) To: 13032 Severity: wishlist Recent versions of MS-Excel and also LibreOffice's Calc have a feature that I find very useful: the ability of remove duplicate lines from a given list (range). I think it would be worth to add such a feature to Emacs. That is: provide a function `delete-duplicate-lines' (or some such) that removes all duplicate lines in the active region and prints in the echo area a message like "Duplicate lines removed: <n>". TIA. PS: There has been some discussion about this in this thread: http://lists.gnu.org/archive/html/help-gnu-emacs/2012-11/msg00417.html. Jambunathan K provided a possible implementation, but it lacks the message in the echo area (which I think is important). In GNU Emacs 24.3.50.1 (i386-mingw-nt6.1.7601) of 2012-11-28 on MS-W7-DANI Bzr revision: 111021 jay.p.belanger@gmail.com-20121128045113-o6xvwncuryx8al3u Windowing system distributor `Microsoft Corp.', version 6.1.7601 Configured using: `configure --with-gcc (4.7) --no-opt --enable-checking --cflags -Ic:/emacs/libs/libXpm-3.5.10/include -Ic:/emacs/libs/libXpm-3.5.10/src -Ic:/emacs/libs/libpng-1.2.37-lib/include -Ic:/emacs/libs/zlib-1.2.5 -Ic:/emacs/libs/giflib-4.1.4-1-lib/include -Ic:/emacs/libs/jpeg-6b-4-lib/include -Ic:/emacs/libs/tiff-3.8.2-1-lib/include -Ic:/emacs/libs/libxml2-2.7.8-w32-bin/include/libxml2 -Ic:/emacs/libs/gnutls-3.0.9-w32-bin/include -Ic:/emacs/libs/libiconv-1.9.2-1-lib/include' Important settings: value of $LANG: ENU locale-coding-system: cp1252 default enable-multibyte-characters: t -- Dani Moncayo ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-29 19:23 bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command Dani Moncayo @ 2012-11-29 20:49 ` Juanma Barranquero 2012-11-29 21:43 ` Dani Moncayo 2012-11-30 0:31 ` Juri Linkov 1 sibling, 1 reply; 25+ messages in thread From: Juanma Barranquero @ 2012-11-29 20:49 UTC (permalink / raw) To: Dani Moncayo; +Cc: 13032 On Thu, Nov 29, 2012 at 8:23 PM, Dani Moncayo <dmoncayo@gmail.com> wrote: > Severity: wishlist > That is: provide a function `delete-duplicate-lines' (or some such) > that removes all duplicate lines in the active region and prints in > the echo area a message like "Duplicate lines removed: <n>". Perhaps you can work from this (not very well tested): (defun delete-duplicate-lines (beg end) "Delete consecutive duplicate lines in region BEG..END." (interactive "r") (save-excursion (save-restriction (narrow-to-region beg end) (goto-char beg) (let ((kill-whole-line t) (last (buffer-substring (line-beginning-position) (line-end-position))) (removed 0) current) (forward-line 1) (while (and (< (point) (or end 1)) (not (eobp))) (setq current (buffer-substring (line-beginning-position) (line-end-position))) (if (string= last current) (progn (kill-line) (setq removed (1+ removed))) (setq last current) (forward-line 1))) (message "Duplicate lines removed: %d" removed))))) ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-29 20:49 ` Juanma Barranquero @ 2012-11-29 21:43 ` Dani Moncayo 2012-11-29 22:45 ` Juanma Barranquero 0 siblings, 1 reply; 25+ messages in thread From: Dani Moncayo @ 2012-11-29 21:43 UTC (permalink / raw) To: Juanma Barranquero; +Cc: 13032 > Perhaps you can work from this (not very well tested): Thank you Juanma. I've given it a quick try and it seems to work. I've only seen a minor detail that I don't like: when the command does nothing (because there are no consecutive duplicate lines), the region remains active. But this is a general problem in Emacs which I've already complained about (bug #10056). IMO, the mark should be deactivated after every command that operates on the active region, without regard to whether the buffer was changed or not. There could be some exception, but this should be the general principle. I'll put your version in my init file for now, while the maintainers decide whether it is appropriate to add this command to Emacs or not. Thanks. -- Dani Moncayo ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-29 21:43 ` Dani Moncayo @ 2012-11-29 22:45 ` Juanma Barranquero 0 siblings, 0 replies; 25+ messages in thread From: Juanma Barranquero @ 2012-11-29 22:45 UTC (permalink / raw) To: Dani Moncayo; +Cc: 13032 > I've only seen a minor detail that I don't like: when the command does > nothing (because there are no consecutive duplicate lines), the region > remains active. Add a call to deactivate-mark at the end. Juanma ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-29 19:23 bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command Dani Moncayo 2012-11-29 20:49 ` Juanma Barranquero @ 2012-11-30 0:31 ` Juri Linkov 2012-11-30 0:46 ` Juanma Barranquero 2012-11-30 7:51 ` Dani Moncayo 1 sibling, 2 replies; 25+ messages in thread From: Juri Linkov @ 2012-11-30 0:31 UTC (permalink / raw) To: Dani Moncayo; +Cc: 13032 > That is: provide a function `delete-duplicate-lines' (or some such) > that removes all duplicate lines in the active region and prints in > the echo area a message like "Duplicate lines removed: <n>". This is what I currently use to delete duplicate lines: C-u M-| awk -- '!a[$0]++' RET Do you intend to create a Lisp function with the same result? ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-30 0:31 ` Juri Linkov @ 2012-11-30 0:46 ` Juanma Barranquero 2012-11-30 0:50 ` Juanma Barranquero 2012-11-30 1:12 ` Juri Linkov 2012-11-30 7:51 ` Dani Moncayo 1 sibling, 2 replies; 25+ messages in thread From: Juanma Barranquero @ 2012-11-30 0:46 UTC (permalink / raw) To: Juri Linkov; +Cc: 13032 On Fri, Nov 30, 2012 at 1:31 AM, Juri Linkov <juri@jurta.org> wrote: > C-u M-| awk -- '!a[$0]++' RET Isn't C-u M-| uniq RET shorter and easier to type? ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-30 0:46 ` Juanma Barranquero @ 2012-11-30 0:50 ` Juanma Barranquero 2012-11-30 0:57 ` Juri Linkov 2012-11-30 1:12 ` Juri Linkov 1 sibling, 1 reply; 25+ messages in thread From: Juanma Barranquero @ 2012-11-30 0:50 UTC (permalink / raw) To: Juri Linkov; +Cc: 13032 (FWIW, yes, I'm aware that your awk script and uniq don't do the same thing, but I think what Dani requested was in fact removing consecutive duplicates...) ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-30 0:50 ` Juanma Barranquero @ 2012-11-30 0:57 ` Juri Linkov 2012-11-30 1:02 ` Juanma Barranquero 0 siblings, 1 reply; 25+ messages in thread From: Juri Linkov @ 2012-11-30 0:57 UTC (permalink / raw) To: Juanma Barranquero; +Cc: 13032 > (FWIW, yes, I'm aware that your awk script and uniq don't do the same > thing, but I think what Dani requested was in fact removing > consecutive duplicates...) I wonder why only consecutive duplicates? The existing functions `delete-duplicates' and `delete-dups' that operate on lists don't delete just consecutive duplicates. They delete all duplicates. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-30 0:57 ` Juri Linkov @ 2012-11-30 1:02 ` Juanma Barranquero 0 siblings, 0 replies; 25+ messages in thread From: Juanma Barranquero @ 2012-11-30 1:02 UTC (permalink / raw) To: Juri Linkov; +Cc: 13032 On Fri, Nov 30, 2012 at 1:57 AM, Juri Linkov <juri@jurta.org> wrote: > I wonder why only consecutive duplicates? The existing functions > `delete-duplicates' and `delete-dups' that operate on lists > don't delete just consecutive duplicates. They delete all duplicates. Yes. Dani has not said what's his use case. Juanma ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-30 0:46 ` Juanma Barranquero 2012-11-30 0:50 ` Juanma Barranquero @ 2012-11-30 1:12 ` Juri Linkov 2012-11-30 7:51 ` Dani Moncayo 1 sibling, 1 reply; 25+ messages in thread From: Juri Linkov @ 2012-11-30 1:12 UTC (permalink / raw) To: Juanma Barranquero; +Cc: 13032 >> C-u M-| awk -- '!a[$0]++' RET > > Isn't > > C-u M-| uniq RET > > shorter and easier to type? I use `uniq' only on files where lines are sorted. OTOH, something like '!a[$0]++' that is not limited to consecutive duplicates is better for files where lines are not sorted such as log files, etc. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-30 1:12 ` Juri Linkov @ 2012-11-30 7:51 ` Dani Moncayo 2012-12-01 0:34 ` Juri Linkov 0 siblings, 1 reply; 25+ messages in thread From: Dani Moncayo @ 2012-11-30 7:51 UTC (permalink / raw) To: Juri Linkov; +Cc: Juanma Barranquero, 13032 >>> C-u M-| awk -- '!a[$0]++' RET >> >> Isn't >> >> C-u M-| uniq RET >> >> shorter and easier to type? > > I use `uniq' only on files where lines are sorted. OTOH, something like > '!a[$0]++' that is not limited to consecutive duplicates is better for > files where lines are not sorted such as log files, etc. My use cases usually involves compacting a collection of lines gathered from several places. So the compacting operation is normally coupled with a sort operation. Thus, the command provided by Juanma is good enough for these use cases (I first do a `sort-lines' and then a `delete-duplicate-lines'). But I agree that it would be even better if `delete-duplicate-lines' did TRT even when the lines are not sorted. (I've just tested this feature in MS-Excel, and it is so: it doesn't requires that the lines are previously sorted) Thank you. -- Dani Moncayo ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-30 7:51 ` Dani Moncayo @ 2012-12-01 0:34 ` Juri Linkov 2012-12-01 9:08 ` Dani Moncayo 0 siblings, 1 reply; 25+ messages in thread From: Juri Linkov @ 2012-12-01 0:34 UTC (permalink / raw) To: Dani Moncayo; +Cc: Juanma Barranquero, 13032 >>>> C-u M-| awk -- '!a[$0]++' RET > > But I agree that it would be even better if `delete-duplicate-lines' > did TRT even when the lines are not sorted. (I've just tested this > feature in MS-Excel, and it is so: it doesn't requires that the lines > are previously sorted) Actually I use a slightly different command: C-u M-| tac | awk -- '!a[$0]++' | tac RET because I need to keep the last duplicate line instead of the first. `tac' reverses the lines, removes the duplicates keeping the first duplicate, and another `tac' reverses lines back thus keeping the last duplicate. So for `delete-duplicate-lines' to be useful in this case it could support also the reverse search that keeps the last duplicate. You can see this limitation described in docstrings of various functions at http://emacswiki.org/emacs/DuplicateLines as "keeping first occurrence", so these functions are of no help. Adding an argument to keep either the first/last duplicate and an argument to delete only adjacent lines, and using the algorithm like in awk, and using the calling interface like in `flush-lines', necessitates the following small function that can be called with the arg `C-u' to keep the last duplicate line, and `C-u C-u' to delete only adjacent lines: (defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive) "Delete duplicate lines in the region between RSTART and REND. If REVERSE is nil, search and delete duplicates forward keeping the first occurrence of duplicate lines. If REVERSE is non-nil, search and delete duplicates backward keeping the last occurrence of duplicate lines. If ADJACENT is non-nil, delete repeated lines only if they are adjacent." (interactive (progn (barf-if-buffer-read-only) (list (region-beginning) (region-end) (equal current-prefix-arg '(4)) (equal current-prefix-arg '(16)) t))) (let ((lines (unless adjacent (make-hash-table :weakness 'key :test 'equal))) line prev-line (count 0) (rstart (copy-marker rstart)) (rend (copy-marker rend))) (save-excursion (goto-char (if reverse rend rstart)) (if (and reverse (bolp)) (forward-char -1)) (while (if reverse (and (> (point) rstart) (not (bobp))) (and (< (point) rend) (not (eobp)))) (setq line (buffer-substring-no-properties (line-beginning-position) (line-end-position))) (if (if adjacent (equal line prev-line) (gethash line lines)) (progn (delete-region (progn (forward-line 0) (point)) (progn (forward-line 1) (point))) (if reverse (forward-line -1)) (setq count (1+ count))) (if adjacent (setq prev-line line) (puthash line t lines)) (forward-line (if reverse -1 1))))) (set-marker rstart nil) (set-marker rend nil) (when interactive (message "Deleted %d %sduplicate line%s%s" count (if adjacent "adjacent " "") (if (= count 1) "" "s") (if reverse " backward " ""))) count)) ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-01 0:34 ` Juri Linkov @ 2012-12-01 9:08 ` Dani Moncayo 2012-12-01 9:22 ` Dani Moncayo 2012-12-02 0:45 ` Juri Linkov 0 siblings, 2 replies; 25+ messages in thread From: Dani Moncayo @ 2012-12-01 9:08 UTC (permalink / raw) To: Juri Linkov; +Cc: Juanma Barranquero, 13032 > (defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive) > "Delete duplicate lines in the region between RSTART and REND. > If REVERSE is nil, search and delete duplicates forward keeping the first > occurrence of duplicate lines. If REVERSE is non-nil, search and delete > duplicates backward keeping the last occurrence of duplicate lines. > If ADJACENT is non-nil, delete repeated lines only if they are adjacent." Looks pretty fine to me. Your version is more general and versatile. Some comments: * Why is needed the INTERACTIVE command? I mean, Cannot that info (whether the function has been called interactively) be retrieved using some Lips primitive? * In case the INTERACTIVE command is indeed necessary, it should be explained in the docstring, no? * I think that the docstring should explain also the return value (number of duplicate lines deleted). Thank you Juri. I hope Stefan or Chong add this feature to Emacs. -- Dani Moncayo ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-01 9:08 ` Dani Moncayo @ 2012-12-01 9:22 ` Dani Moncayo 2012-12-02 0:45 ` Juri Linkov 1 sibling, 0 replies; 25+ messages in thread From: Dani Moncayo @ 2012-12-01 9:22 UTC (permalink / raw) To: Juri Linkov; +Cc: Juanma Barranquero, 13032 >> (defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive) >> "Delete duplicate lines in the region between RSTART and REND. >> If REVERSE is nil, search and delete duplicates forward keeping the first >> occurrence of duplicate lines. If REVERSE is non-nil, search and delete >> duplicates backward keeping the last occurrence of duplicate lines. >> If ADJACENT is non-nil, delete repeated lines only if they are adjacent." > > Looks pretty fine to me. Your version is more general and versatile. > > Some comments: > * Why is needed the INTERACTIVE command? I mean, Cannot that info > (whether the function has been called interactively) be retrieved > using some Lips primitive? > * In case the INTERACTIVE command is indeed necessary, it should be > explained in the docstring, no? > * I think that the docstring should explain also the return value > (number of duplicate lines deleted). Sorry, replace "command" by "argument" in the above paragraph. Another comment: * I'm thinking that the ADJACENT argument is kinda unnecessary. I can't think of a use-case where someone wants to remove only the _adjacent_ duplicate lines but not the ones which aren't adjacent. So, I think that both the interface and the implementation could be simplified by removing that argument. -- Dani Moncayo ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-01 9:08 ` Dani Moncayo 2012-12-01 9:22 ` Dani Moncayo @ 2012-12-02 0:45 ` Juri Linkov 2012-12-02 9:13 ` Dani Moncayo 1 sibling, 1 reply; 25+ messages in thread From: Juri Linkov @ 2012-12-02 0:45 UTC (permalink / raw) To: Dani Moncayo; +Cc: Juanma Barranquero, 13032 > * I'm thinking that the ADJACENT argument is kinda unnecessary. I > can't think of a use-case where someone wants to remove only the > _adjacent_ duplicate lines but not the ones which aren't adjacent. > So, I think that both the interface and the implementation could be > simplified by removing that argument. The ADJACENT argument is an optimization that doesn't require additional memory (to store previous lines in the cache). This is necessary when the user needs to delete duplicate lines in a large sorted file. > * Why is needed the INTERACTIVE argument? I mean, Cannot that info > (whether the function has been called interactively) be retrieved > using some Lips primitive? There is called-interactively-p but as I understood, it is unreliable. This is why other similar commands like `flush-lines', `keep-lines', `how-many' use the INTERACTIVE argument. They use it for two purposes: to decide whether the active region should be used, and to decide whether the message should be displayed when called interactively. > * In case the INTERACTIVE argument is indeed necessary, it should be > explained in the docstring, no? Yes, below I copied this part from the docstring of `how-many'. > * I think that the docstring should explain also the return value > (number of duplicate lines deleted). Coincidentally, the return value will be explained in the same part of the docstring. The remaining problem is to decide where to put this command? The file replace.el is unsuitable because unlike `flush-lines' and unlike `how-many', `delete-duplicate-lines' doesn't use regexps. It seems the right place is sort.el because it also contains a related command `reverse-region'. This patch puts `delete-duplicate-lines' after `reverse-region' at the end of sort.el: === modified file 'lisp/sort.el' --- lisp/sort.el 2012-08-03 08:15:24 +0000 +++ lisp/sort.el 2012-12-02 00:44:42 +0000 @@ -562,6 +562,59 @@ (defun reverse-region (beg end) (setq ll (cdr ll))) (insert (car ll))))) +;;;###autoload +(defun delete-duplicate-lines (rstart rend &optional reverse adjacent interactive) + "Delete duplicate lines in the region between RSTART and REND. + +If REVERSE is nil, search and delete duplicates forward keeping the first +occurrence of duplicate lines. If REVERSE is non-nil (when called +interactively with C-u prefix), search and delete duplicates backward +keeping the last occurrence of duplicate lines. + +If ADJACENT is non-nil (when called interactively with two C-u prefixes), +delete repeated lines only if they are adjacent. + +When called from Lisp and INTERACTIVE is omitted or nil, return the number +of deleted duplicate lines, do not print it; if INTERACTIVE is t, the +function behaves in all respects as if it had been called interactively." + (interactive + (progn + (barf-if-buffer-read-only) + (list (region-beginning) (region-end) + (equal current-prefix-arg '(4)) + (equal current-prefix-arg '(16)) + t))) + (let ((lines (unless adjacent (make-hash-table :weakness 'key :test 'equal))) + line prev-line + (count 0) + (rstart (copy-marker rstart)) + (rend (copy-marker rend))) + (save-excursion + (goto-char (if reverse rend rstart)) + (if (and reverse (bolp)) (forward-char -1)) + (while (if reverse + (and (> (point) rstart) (not (bobp))) + (and (< (point) rend) (not (eobp)))) + (setq line (buffer-substring-no-properties + (line-beginning-position) (line-end-position))) + (if (if adjacent (equal line prev-line) (gethash line lines)) + (progn + (delete-region (progn (forward-line 0) (point)) + (progn (forward-line 1) (point))) + (if reverse (forward-line -1)) + (setq count (1+ count))) + (if adjacent (setq prev-line line) (puthash line t lines)) + (forward-line (if reverse -1 1))))) + (set-marker rstart nil) + (set-marker rend nil) + (when interactive + (message "Deleted %d %sduplicate line%s%s" + count + (if adjacent "adjacent " "") + (if (= count 1) "" "s") + (if reverse " backward " ""))) + count)) + (provide 'sort) ;;; sort.el ends here ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-02 0:45 ` Juri Linkov @ 2012-12-02 9:13 ` Dani Moncayo 2012-12-03 23:49 ` Juri Linkov 0 siblings, 1 reply; 25+ messages in thread From: Dani Moncayo @ 2012-12-02 9:13 UTC (permalink / raw) To: Juri Linkov; +Cc: Juanma Barranquero, 13032 >> * I'm thinking that the ADJACENT argument is kinda unnecessary. I >> can't think of a use-case where someone wants to remove only the >> _adjacent_ duplicate lines but not the ones which aren't adjacent. >> So, I think that both the interface and the implementation could be >> simplified by removing that argument. > > The ADJACENT argument is an optimization that doesn't require > additional memory (to store previous lines in the cache). > This is necessary when the user needs to delete duplicate lines > in a large sorted file. Ah, good point. I guess that the optimization is twofold: in memory and also in performance. Then, IMO this should be explained in the docstring, so that users know that they should use this feature when running this command over a large chunk of lines. Thank you. -- Dani Moncayo ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-02 9:13 ` Dani Moncayo @ 2012-12-03 23:49 ` Juri Linkov 2012-12-04 0:05 ` Juri Linkov 0 siblings, 1 reply; 25+ messages in thread From: Juri Linkov @ 2012-12-03 23:49 UTC (permalink / raw) To: Dani Moncayo; +Cc: Juanma Barranquero, 13032-done >> The ADJACENT argument is an optimization that doesn't require >> additional memory (to store previous lines in the cache). >> This is necessary when the user needs to delete duplicate lines >> in a large sorted file. > > Ah, good point. I guess that the optimization is twofold: in memory > and also in performance. Then, IMO this should be explained in the > docstring, so that users know that they should use this feature when > running this command over a large chunk of lines. Thanks for the suggestion, I added this as well. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-03 23:49 ` Juri Linkov @ 2012-12-04 0:05 ` Juri Linkov 2012-12-04 9:13 ` Dani Moncayo 0 siblings, 1 reply; 25+ messages in thread From: Juri Linkov @ 2012-12-04 0:05 UTC (permalink / raw) To: 13032 >>> The ADJACENT argument is an optimization that doesn't require >>> additional memory (to store previous lines in the cache). >>> This is necessary when the user needs to delete duplicate lines >>> in a large sorted file. >> >> Ah, good point. I guess that the optimization is twofold: in memory >> and also in performance. Then, IMO this should be explained in the >> docstring, so that users know that they should use this feature when >> running this command over a large chunk of lines. > > Thanks for the suggestion, I added this as well. It just occurred to me that we could also add an alias `uniq' that will call the command `delete-duplicate-lines' with non-nil ADJACENT arg. We already have aliases like `mkdir' for `make-directory', so the command `uniq' would be handy too. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-04 0:05 ` Juri Linkov @ 2012-12-04 9:13 ` Dani Moncayo 2012-12-04 23:51 ` Juri Linkov 0 siblings, 1 reply; 25+ messages in thread From: Dani Moncayo @ 2012-12-04 9:13 UTC (permalink / raw) To: Juri Linkov; +Cc: 13032 > It just occurred to me that we could also add an alias `uniq' that will > call the command `delete-duplicate-lines' with non-nil ADJACENT arg. > > We already have aliases like `mkdir' for `make-directory', > so the command `uniq' would be handy too. Fine with me. BTW, I've just noticed that the command doesn't deactivate the mark when there is no duplicate lines in the region. Could that be fixed? Thank you. -- Dani Moncayo ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-04 9:13 ` Dani Moncayo @ 2012-12-04 23:51 ` Juri Linkov 2012-12-05 8:08 ` Dani Moncayo 0 siblings, 1 reply; 25+ messages in thread From: Juri Linkov @ 2012-12-04 23:51 UTC (permalink / raw) To: Dani Moncayo; +Cc: 13032 >> It just occurred to me that we could also add an alias `uniq' that will >> call the command `delete-duplicate-lines' with non-nil ADJACENT arg. >> >> We already have aliases like `mkdir' for `make-directory', >> so the command `uniq' would be handy too. > > Fine with me. But the problem is that `uniq' might be confused with a similarly named feature `uniquify' that uniquifies buffer names. > BTW, I've just noticed that the command doesn't deactivate the mark > when there is no duplicate lines in the region. Could that be fixed? This problem is not specific to `delete-duplicate-lines'. All similar functions like e.g. `delete-matching-lines', `delete-non-matching-lines' and `delete-blank-lines' behave the same way. ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-04 23:51 ` Juri Linkov @ 2012-12-05 8:08 ` Dani Moncayo 0 siblings, 0 replies; 25+ messages in thread From: Dani Moncayo @ 2012-12-05 8:08 UTC (permalink / raw) To: Juri Linkov; +Cc: 13032 >>> It just occurred to me that we could also add an alias `uniq' that will >>> call the command `delete-duplicate-lines' with non-nil ADJACENT arg. >>> >>> We already have aliases like `mkdir' for `make-directory', >>> so the command `uniq' would be handy too. >> >> Fine with me. > > But the problem is that `uniq' might be confused with a similarly named > feature `uniquify' that uniquifies buffer names. Indeed. That is the problem of using such ambiguous names. FWIW, I have no particular interest in this `uniq' alias. >> BTW, I've just noticed that the command doesn't deactivate the mark >> when there is no duplicate lines in the region. Could that be fixed? > > This problem is not specific to `delete-duplicate-lines'. > All similar functions like e.g. `delete-matching-lines', > `delete-non-matching-lines' and `delete-blank-lines' > behave the same way. Indeed. I filed bug #10056 because of this kind of problem. I've included these cases in that bug report. Thank you. -- Dani Moncayo ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-30 0:31 ` Juri Linkov 2012-11-30 0:46 ` Juanma Barranquero @ 2012-11-30 7:51 ` Dani Moncayo 2012-12-04 7:04 ` Thierry Volpiatto 1 sibling, 1 reply; 25+ messages in thread From: Dani Moncayo @ 2012-11-30 7:51 UTC (permalink / raw) To: Juri Linkov; +Cc: 13032 > This is what I currently use to delete duplicate lines: > > C-u M-| awk -- '!a[$0]++' RET > > Do you intend to create a Lisp function with the same result? I don't know awk, but I've tried that command and seems to do what I want: remove all duplicate lines in the region. Although it don't inform about the number of lines deleted, which is important to me. -- Dani Moncayo ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-11-30 7:51 ` Dani Moncayo @ 2012-12-04 7:04 ` Thierry Volpiatto 2012-12-04 14:46 ` Stefan Monnier 0 siblings, 1 reply; 25+ messages in thread From: Thierry Volpiatto @ 2012-12-04 7:04 UTC (permalink / raw) To: 13032 Hi, just for info, here a simple and fast version. Dani Moncayo <dmoncayo@gmail.com> writes: >> This is what I currently use to delete duplicate lines: >> >> C-u M-| awk -- '!a[$0]++' RET >> >> Do you intend to create a Lisp function with the same result? > > I don't know awk, but I've tried that command and seems to do what I > want: remove all duplicate lines in the region. Although it don't > inform about the number of lines deleted, which is important to me. --8<---------------cut here---------------start------------->8--- (defun delete-duplicate-lines (beg end) "Delete duplicate lines in region." (interactive "r") (save-excursion (save-restriction (narrow-to-region beg end) (let ((lines (helm-fast-remove-dups (split-string (buffer-string) "\n" t) :test 'equal))) (delete-region (point-min) (point-max)) (loop for l in lines do (insert (concat l "\n"))))))) --8<---------------cut here---------------end--------------->8--- helm-fast-remove-dups is a function in helm: https://github.com/emacs-helm/helm/blob/master/helm-utils.el line 342 For the number of lines removed it is easy to modify the function to do so. -- Thierry Get my Gnupg key: gpg --keyserver pgp.mit.edu --recv-keys 59F29997 ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-04 7:04 ` Thierry Volpiatto @ 2012-12-04 14:46 ` Stefan Monnier 2012-12-04 15:02 ` Thierry Volpiatto 0 siblings, 1 reply; 25+ messages in thread From: Stefan Monnier @ 2012-12-04 14:46 UTC (permalink / raw) To: Thierry Volpiatto; +Cc: 13032 > (let ((lines (helm-fast-remove-dups > (split-string (buffer-string) "\n" t) > :test 'equal))) > (delete-region (point-min) (point-max)) > (loop for l in lines do (insert (concat l "\n"))))))) The inconvenient with this version is that any overlays/markers will be lost, and the buffer will be marked as modified even if there were no duplicate lines. Stefan ^ permalink raw reply [flat|nested] 25+ messages in thread
* bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command 2012-12-04 14:46 ` Stefan Monnier @ 2012-12-04 15:02 ` Thierry Volpiatto 0 siblings, 0 replies; 25+ messages in thread From: Thierry Volpiatto @ 2012-12-04 15:02 UTC (permalink / raw) To: Stefan Monnier; +Cc: 13032 Stefan Monnier <monnier@iro.umontreal.ca> writes: >> (let ((lines (helm-fast-remove-dups >> (split-string (buffer-string) "\n" t) >> :test 'equal))) >> (delete-region (point-min) (point-max)) >> (loop for l in lines do (insert (concat l "\n"))))))) > > The inconvenient with this version is that any overlays/markers will > be lost, and the buffer will be marked as modified even if there were no > duplicate lines. Ok, was just for info on a fast alternative without such enhancements. -- Thierry Get my Gnupg key: gpg --keyserver pgp.mit.edu --recv-keys 59F29997 ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2012-12-05 8:08 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-29 19:23 bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command Dani Moncayo 2012-11-29 20:49 ` Juanma Barranquero 2012-11-29 21:43 ` Dani Moncayo 2012-11-29 22:45 ` Juanma Barranquero 2012-11-30 0:31 ` Juri Linkov 2012-11-30 0:46 ` Juanma Barranquero 2012-11-30 0:50 ` Juanma Barranquero 2012-11-30 0:57 ` Juri Linkov 2012-11-30 1:02 ` Juanma Barranquero 2012-11-30 1:12 ` Juri Linkov 2012-11-30 7:51 ` Dani Moncayo 2012-12-01 0:34 ` Juri Linkov 2012-12-01 9:08 ` Dani Moncayo 2012-12-01 9:22 ` Dani Moncayo 2012-12-02 0:45 ` Juri Linkov 2012-12-02 9:13 ` Dani Moncayo 2012-12-03 23:49 ` Juri Linkov 2012-12-04 0:05 ` Juri Linkov 2012-12-04 9:13 ` Dani Moncayo 2012-12-04 23:51 ` Juri Linkov 2012-12-05 8:08 ` Dani Moncayo 2012-11-30 7:51 ` Dani Moncayo 2012-12-04 7:04 ` Thierry Volpiatto 2012-12-04 14:46 ` Stefan Monnier 2012-12-04 15:02 ` Thierry Volpiatto
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.