* Sorting on compound keys? @ 2011-05-24 20:57 Tim Landscheidt 2011-05-25 5:58 ` Andreas Röhler 2011-05-29 21:50 ` Mark Tilford 0 siblings, 2 replies; 10+ messages in thread From: Tim Landscheidt @ 2011-05-24 20:57 UTC (permalink / raw) To: help-gnu-emacs Hi, sometimes I want to sort unified diffs of CSV files (sepa- rated by tabs (here: \t)): | +A 1\t1\tx | +A 1\t2\ty | +B 2\t3\tz | -A 1\t1\tx | -B 2\t2\ty | -B 2\t3\tz by the second column, then the first column, then "+" vs. "-". Unfortunately, it seems that sort-regexp-fields doesn't allow more than one match field as a key. sort-fields doesn't work either as it requires the fields to be sur- rounded by white space (no "+" vs. "-") and doesn't allow white space inside the fields. Is there any function in vanilla Emacs (23.1.1) that I missed? I looked at pimping sort-regexp-fields, but it seems to me that sort-subr would have to be rewritten from scratch to achieve sorting on compound keys. Tim ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Sorting on compound keys? 2011-05-24 20:57 Sorting on compound keys? Tim Landscheidt @ 2011-05-25 5:58 ` Andreas Röhler 2011-05-25 22:08 ` Tim Landscheidt 2011-05-29 21:50 ` Mark Tilford 1 sibling, 1 reply; 10+ messages in thread From: Andreas Röhler @ 2011-05-25 5:58 UTC (permalink / raw) To: help-gnu-emacs Am 24.05.2011 22:57, schrieb Tim Landscheidt: > Hi, > > sometimes I want to sort unified diffs of CSV files (sepa- > rated by tabs (here: \t)): > > | +A 1\t1\tx > | +A 1\t2\ty > | +B 2\t3\tz > | -A 1\t1\tx > | -B 2\t2\ty > | -B 2\t3\tz > > by the second column, then the first column, then "+" vs. > "-". Unfortunately, it seems that sort-regexp-fields doesn't > allow more than one match field as a key. sort-fields > doesn't work either as it requires the fields to be sur- > rounded by white space (no "+" vs. "-") and doesn't allow > white space inside the fields. > > Is there any function in vanilla Emacs (23.1.1) that I > missed? I looked at pimping sort-regexp-fields, but it seems > to me that sort-subr would have to be rewritten from scratch Hi, last time I looked into that feature was missing indeed. However, didn't look for a need of re-write from the scratch, just to extend to existing routine - ie. introduce one or more levels of sorting. Cheers, Andreas > to achieve sorting on compound keys. > > Tim > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Sorting on compound keys? 2011-05-25 5:58 ` Andreas Röhler @ 2011-05-25 22:08 ` Tim Landscheidt 2011-05-26 6:28 ` Andreas Röhler 0 siblings, 1 reply; 10+ messages in thread From: Tim Landscheidt @ 2011-05-25 22:08 UTC (permalink / raw) To: help-gnu-emacs Andreas Röhler <andreas.roehler@easy-emacs.de> wrote: >> sometimes I want to sort unified diffs of CSV files (sepa- >> rated by tabs (here: \t)): >> | +A 1\t1\tx >> | +A 1\t2\ty >> | +B 2\t3\tz >> | -A 1\t1\tx >> | -B 2\t2\ty >> | -B 2\t3\tz >> by the second column, then the first column, then "+" vs. >> "-". Unfortunately, it seems that sort-regexp-fields doesn't >> allow more than one match field as a key. sort-fields >> doesn't work either as it requires the fields to be sur- >> rounded by white space (no "+" vs. "-") and doesn't allow >> white space inside the fields. >> Is there any function in vanilla Emacs (23.1.1) that I >> missed? I looked at pimping sort-regexp-fields, but it seems >> to me that sort-subr would have to be rewritten from scratch >> to achieve sorting on compound keys. > last time I looked into that feature was missing indeed. > However, didn't look for a need of re-write from the > scratch, just to extend to existing routine - ie. introduce > one or more levels of sorting. I remember our discussion in de.comp.editoren :-), but as I read sort-subr it is hard-coded that the sort key is one literal, continuous part of the buffer as sort-lists is a list of buffer positions. Tim ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Sorting on compound keys? 2011-05-25 22:08 ` Tim Landscheidt @ 2011-05-26 6:28 ` Andreas Röhler 2011-05-26 22:49 ` Tim Landscheidt 0 siblings, 1 reply; 10+ messages in thread From: Andreas Röhler @ 2011-05-26 6:28 UTC (permalink / raw) To: help-gnu-emacs Am 26.05.2011 00:08, schrieb Tim Landscheidt: > Andreas Röhler<andreas.roehler@easy-emacs.de> wrote: > >>> sometimes I want to sort unified diffs of CSV files (sepa- >>> rated by tabs (here: \t)): > >>> | +A 1\t1\tx >>> | +A 1\t2\ty >>> | +B 2\t3\tz >>> | -A 1\t1\tx >>> | -B 2\t2\ty >>> | -B 2\t3\tz > >>> by the second column, then the first column, then "+" vs. >>> "-". Unfortunately, it seems that sort-regexp-fields doesn't >>> allow more than one match field as a key. sort-fields >>> doesn't work either as it requires the fields to be sur- >>> rounded by white space (no "+" vs. "-") and doesn't allow >>> white space inside the fields. > >>> Is there any function in vanilla Emacs (23.1.1) that I >>> missed? I looked at pimping sort-regexp-fields, but it seems >>> to me that sort-subr would have to be rewritten from scratch >>> to achieve sorting on compound keys. > >> last time I looked into that feature was missing indeed. >> However, didn't look for a need of re-write from the >> scratch, just to extend to existing routine - ie. introduce >> one or more levels of sorting. > > I remember our discussion in de.comp.editoren :-), but as I > read sort-subr it is hard-coded that the sort key is one > literal, continuous part of the buffer as sort-lists is a > list of buffer positions. > > Tim > > > sort-subr takes functions to determine the fields to sort. As for the functions as arguments, maybe have a look at `ar-th-sort' in thingatpt-utils-base.el https://code.launchpad.net/s-x-emacs-werkstatt/ Cheers, Andreas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Sorting on compound keys? 2011-05-26 6:28 ` Andreas Röhler @ 2011-05-26 22:49 ` Tim Landscheidt 2011-05-29 20:17 ` Andreas Röhler 0 siblings, 1 reply; 10+ messages in thread From: Tim Landscheidt @ 2011-05-26 22:49 UTC (permalink / raw) To: help-gnu-emacs Andreas Röhler <andreas.roehler@easy-emacs.de> wrote: >>>> sometimes I want to sort unified diffs of CSV files (sepa- >>>> rated by tabs (here: \t)): >>>> | +A 1\t1\tx >>>> | +A 1\t2\ty >>>> | +B 2\t3\tz >>>> | -A 1\t1\tx >>>> | -B 2\t2\ty >>>> | -B 2\t3\tz >>>> by the second column, then the first column, then "+" vs. >>>> "-". Unfortunately, it seems that sort-regexp-fields doesn't >>>> allow more than one match field as a key. sort-fields >>>> doesn't work either as it requires the fields to be sur- >>>> rounded by white space (no "+" vs. "-") and doesn't allow >>>> white space inside the fields. >>>> Is there any function in vanilla Emacs (23.1.1) that I >>>> missed? I looked at pimping sort-regexp-fields, but it seems >>>> to me that sort-subr would have to be rewritten from scratch >>>> to achieve sorting on compound keys. >>> last time I looked into that feature was missing indeed. >>> However, didn't look for a need of re-write from the >>> scratch, just to extend to existing routine - ie. introduce >>> one or more levels of sorting. >> I remember our discussion in de.comp.editoren :-), but as I >> read sort-subr it is hard-coded that the sort key is one >> literal, continuous part of the buffer as sort-lists is a >> list of buffer positions. > sort-subr takes functions to determine the fields to sort. No, it accepts functions to determine the *boundaries* of the fields that have to be part of the buffer as I have written above. > As for the functions as arguments, maybe have a look at > `ar-th-sort' in thingatpt-utils-base.el > https://code.launchpad.net/s-x-emacs-werkstatt/ How is this useful in this case? Tim ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Sorting on compound keys? 2011-05-26 22:49 ` Tim Landscheidt @ 2011-05-29 20:17 ` Andreas Röhler 2011-06-10 0:26 ` Tim Landscheidt 0 siblings, 1 reply; 10+ messages in thread From: Andreas Röhler @ 2011-05-29 20:17 UTC (permalink / raw) To: help-gnu-emacs; +Cc: Tim Landscheidt Am 27.05.2011 00:49, schrieb Tim Landscheidt: > Andreas Röhler<andreas.roehler@easy-emacs.de> wrote: > >>>>> sometimes I want to sort unified diffs of CSV files (sepa- >>>>> rated by tabs (here: \t)): > >>>>> | +A 1\t1\tx >>>>> | +A 1\t2\ty >>>>> | +B 2\t3\tz >>>>> | -A 1\t1\tx >>>>> | -B 2\t2\ty >>>>> | -B 2\t3\tz > >>>>> by the second column, then the first column, then "+" vs. >>>>> "-". Unfortunately, it seems that sort-regexp-fields doesn't >>>>> allow more than one match field as a key. sort-fields >>>>> doesn't work either as it requires the fields to be sur- >>>>> rounded by white space (no "+" vs. "-") and doesn't allow >>>>> white space inside the fields. > >>>>> Is there any function in vanilla Emacs (23.1.1) that I >>>>> missed? I looked at pimping sort-regexp-fields, but it seems >>>>> to me that sort-subr would have to be rewritten from scratch >>>>> to achieve sorting on compound keys. > >>>> last time I looked into that feature was missing indeed. >>>> However, didn't look for a need of re-write from the >>>> scratch, just to extend to existing routine - ie. introduce >>>> one or more levels of sorting. > >>> I remember our discussion in de.comp.editoren :-), but as I >>> read sort-subr it is hard-coded that the sort key is one >>> literal, continuous part of the buffer as sort-lists is a >>> list of buffer positions. > >> sort-subr takes functions to determine the fields to sort. > > No, it accepts functions to determine the *boundaries* of > the fields that have to be part of the buffer as I have > written above. > >> As for the functions as arguments, maybe have a look at >> `ar-th-sort' in thingatpt-utils-base.el > >> https://code.launchpad.net/s-x-emacs-werkstatt/ > > How is this useful in this case? > > Tim > > > Hi Tim, you are right. It must not be done inside sort-subr, but on the top of it. BTW as sort-subr takes whitespace as field-delimiter, there is no way to get +A considered as two fields. Beside this limitation, code below should provide multiple-fields sorting. ;;; sort-multiple-keys.el --- sort multiple fields ;; Author: Andreas Roehler <andreas.roehler@online.de> ;; Keywords: data ;; This program is free software; you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with this program. If not, see <http://www.gnu.org/licenses/>. ;;; Commentary: ;; Sort lines in region lexicographically by the ;; ARG-LIST fields. Fields already sorted by a field ;; specified by a previous arg are sorted by the next ;; remaining. Uses any number of args given in a list. ;; Fields are separated by whitespace and numbered from ;; 1 up. With a negative arg, sorts by the ARGth field ;; counted from the right. Called from a program, there ;; are three arguments: BEG END and FIELD-LIST. BEG ;; and END specify region to sort. The variable ;; `sort-fold-case' determines whether alphabetic case ;; affects the sort order. ;; Example - assume the code below uncommented at the ;; beginning of a buffer: ;; +C 2 1 x ;; +A 2 2 y ;; +A 1 2 y ;; +A 1 2 z ;; +C 1 1 x ;; +A 4 2 z ;; +A 3 2 y ;; +B 3 3 x ;; +C 2 1 x ;; +B 2 3 z ;; -A 6 1 x ;; -B 1 2 y ;; -A 2 1 x ;; -B 1 3 z ;; sort region hierarchically with first, fourth and second field ;; (sort-multiple-fields 1 126 '(1 4 2)) ;; ==> ;; +A 1 2 y ;; +A 2 2 y ;; +A 3 2 y ;; +A 1 2 z ;; +A 4 2 z ;; +B 2 3 z ;; +B 3 3 x ;; +C 1 1 x ;; +C 2 1 x ;; +C 2 1 x ;; -A 2 1 x ;; -A 6 1 x ;; -B 1 2 y ;; -B 1 3 z ;;; Code: (defun sort-multiple-fields (beg end fields) (interactive "*r\nnSort for field: ") (save-excursion (when (interactive-p) (while (yes-or-no-p "Sort another field?") (add-to-list 'fields (read-number "Sort for field: "))) (message "Sorting for fields %s" (prin1-to-string fields))) (let* ((positions (copy-sequence fields)) (max-field (car (sort positions #'>)))) (sort-multiple-fields-base beg end fields)))) (defun sort-multiple-fields-base (beg end fields) (lexical-let ((key (or (car-safe fields) (list fields))) (this-fields (copy-sequence fields)) last) (save-restriction (narrow-to-region beg end) (sort-fields key beg end) (setq last (car fields)) (when (cadr this-fields) (setq this-fields (cdr this-fields)) (sort-multiple-fields-intern beg end last this-fields fields))))) (defun sort-multiple-fields-intern (beg end &optional last this-fields fields) (lexical-let ((beg beg) (pos end) (end end) (last last) (fields fields) (this-fields (copy-sequence this-fields)) regexp) (setq key (pop this-fields)) (dotimes (i max-field) ;; i starts with 0, first field is done above (cond ((eq 0 i) (if (eq 1 last) (setq regexp "^[ \t\n]*\\([^ \t\n]+\\)") (setq regexp "^[ \t\n]*[^ \t\n]+"))) ((eq last (1+ i)) (setq regexp (concat regexp "[ \t\n]+\\([^ \t\n]+\\)"))) (t (setq regexp (concat regexp "[ \t\n]+[^ \t\n]+"))))) (setq regexp (concat regexp ".*$")) (goto-char beg) (while (and (re-search-forward regexp pos t 1) (setq beg (line-beginning-position)) (setq erg (match-string-no-properties 1))) ;; at least one success (when (and (re-search-forward regexp pos t 1) (string= (match-string-no-properties 1) erg) (setq end (line-end-position))) (while (and (re-search-forward regexp pos t 1) (string= (match-string-no-properties 1) erg) (setq end (line-end-position)))) (when (and beg end) ;; we really moved, there is another region to sort (save-restriction (narrow-to-region beg end) (sort-fields key beg end) (when (car this-fields) (setq last key) (sort-multiple-fields-intern beg end last this-fields)))))))) (provide 'sort-multiple-keys) ;;; sort-multiple-keys.el ends here ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Sorting on compound keys? 2011-05-29 20:17 ` Andreas Röhler @ 2011-06-10 0:26 ` Tim Landscheidt 2011-06-13 7:32 ` Andreas Röhler 0 siblings, 1 reply; 10+ messages in thread From: Tim Landscheidt @ 2011-06-10 0:26 UTC (permalink / raw) To: help-gnu-emacs Andreas Röhler <andreas.roehler@easy-emacs.de> wrote: > you are right. It must not be done inside sort-subr, but on the top of it. > BTW as sort-subr takes whitespace as field-delimiter, there > is no way to get +A considered as two fields. Beside this > limitation, code below should provide multiple-fields > sorting. > [...] sort-subr allows arbitrary field definitions (as long as they are literal, continuous parts of the buffer), but Mark's article reminded me that not only the effect your sort-multiple-fields has can be achieved with multiple calls to sort-fields as well, but ... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Sorting on compound keys? 2011-06-10 0:26 ` Tim Landscheidt @ 2011-06-13 7:32 ` Andreas Röhler 0 siblings, 0 replies; 10+ messages in thread From: Andreas Röhler @ 2011-06-13 7:32 UTC (permalink / raw) To: help-gnu-emacs Am 10.06.2011 02:26, schrieb Tim Landscheidt: [ ... ] >> BTW as sort-subr takes whitespace as field-delimiter, there >> is no way to get +A considered as two fields. Beside this >> limitation, code below should provide multiple-fields >> sorting. >> [...] > > sort-subr allows arbitrary field definitions (as long as > they are literal, continuous parts of the buffer), Thanks correcting that. Should have mentioned `sort-fields' instead. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Sorting on compound keys? 2011-05-24 20:57 Sorting on compound keys? Tim Landscheidt 2011-05-25 5:58 ` Andreas Röhler @ 2011-05-29 21:50 ` Mark Tilford 2011-06-10 0:27 ` Tim Landscheidt 1 sibling, 1 reply; 10+ messages in thread From: Mark Tilford @ 2011-05-29 21:50 UTC (permalink / raw) To: help-gnu-emacs On Tue, May 24, 2011 at 3:57 PM, Tim Landscheidt <tim@tim-landscheidt.de> wrote: > Hi, > > sometimes I want to sort unified diffs of CSV files (sepa- > rated by tabs (here: \t)): > > | +A 1\t1\tx > | +A 1\t2\ty > | +B 2\t3\tz > | -A 1\t1\tx > | -B 2\t2\ty > | -B 2\t3\tz > > by the second column, then the first column, then "+" vs. > "-". Unfortunately, it seems that sort-regexp-fields doesn't > allow more than one match field as a key. sort-fields > doesn't work either as it requires the fields to be sur- > rounded by white space (no "+" vs. "-") and doesn't allow > white space inside the fields. > > Is there any function in vanilla Emacs (23.1.1) that I > missed? I looked at pimping sort-regexp-fields, but it seems > to me that sort-subr would have to be rewritten from scratch > to achieve sorting on compound keys. > > Tim Is there an option to do a stable sort, such as mergesort? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Sorting on compound keys? 2011-05-29 21:50 ` Mark Tilford @ 2011-06-10 0:27 ` Tim Landscheidt 0 siblings, 0 replies; 10+ messages in thread From: Tim Landscheidt @ 2011-06-10 0:27 UTC (permalink / raw) To: help-gnu-emacs Mark Tilford <ralphmerridew@gmail.com> wrote: >> sometimes I want to sort unified diffs of CSV files (sepa- >> rated by tabs (here: \t)): >> | +A 1\t1\tx >> | +A 1\t2\ty >> | +B 2\t3\tz >> | -A 1\t1\tx >> | -B 2\t2\ty >> | -B 2\t3\tz >> by the second column, then the first column, then "+" vs. >> "-". Unfortunately, it seems that sort-regexp-fields doesn't >> allow more than one match field as a key. sort-fields >> doesn't work either as it requires the fields to be sur- >> rounded by white space (no "+" vs. "-") and doesn't allow >> white space inside the fields. >> Is there any function in vanilla Emacs (23.1.1) that I >> missed? I looked at pimping sort-regexp-fields, but it seems >> to me that sort-subr would have to be rewritten from scratch >> to achieve sorting on compound keys. > Is there an option to do a stable sort, such as mergesort? Eureka! Of course! All Emacs sort functions are stable, so 99 % of my use cases can be dealt with by multiple calls to sort-regexp-fields (the only exception being sorting numeri- cally and the like). Unfortunately, those multiple calls can be tedious when done interactively, so voilà: | (defun tl-sort-regexp-fields (reverse record-regexp key-regexp beg end) | (interactive "P\nsRegexp specifying records to sort: | sRegexp specifying key within record: \nr") | (if (string-match "\\`\\(?:-\\\\[1-9]\\|\\(?:-?\\\\[1-9]\\)\\{2,\\}\\)\\'" key-regexp) | (let | ((i (length key-regexp))) | (while (> i 0) | (let ((key-reverse (and (> i 2) (= (aref key-regexp (- i 3)) ?-))) | (key (substring key-regexp (- i 2) i))) | (sort-regexp-fields (if reverse (not key-reverse) key-reverse) record-regexp key beg end) | (if key-reverse | (setq i (- i 1))) | (setq i (- i 2))))) | (sort-regexp-fields reverse record-regexp key-regexp beg end))) A key-regexp of "\2\3\1" will yield the region sorted by the second field, then the third, then the first. The fields can be prefixed with "-" to negate the sort order for this field, e. g. "\2-\3\1" will sort by the second field ascend- ingly, then the third descendingly, then the first ascend- ingly. With regard to performance, the region is sorted once for every key, so it may not be suitable for larger datasets, but up to a few thousand lines it's fast enough for me. If someone wants to integrate this into Emacs, please go ahead. Thanks, also to Andreas, Tim P. S.: Is there really no xor in elisp? ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-06-13 7:32 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-24 20:57 Sorting on compound keys? Tim Landscheidt 2011-05-25 5:58 ` Andreas Röhler 2011-05-25 22:08 ` Tim Landscheidt 2011-05-26 6:28 ` Andreas Röhler 2011-05-26 22:49 ` Tim Landscheidt 2011-05-29 20:17 ` Andreas Röhler 2011-06-10 0:26 ` Tim Landscheidt 2011-06-13 7:32 ` Andreas Röhler 2011-05-29 21:50 ` Mark Tilford 2011-06-10 0:27 ` Tim Landscheidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).