unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Sorting on compound keys?
@ 2011-05-24 20:57 Tim Landscheidt
  2011-05-25  5:58 ` Andreas Röhler
  2011-05-29 21:50 ` Mark Tilford
  0 siblings, 2 replies; 10+ messages in thread
From: Tim Landscheidt @ 2011-05-24 20:57 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

sometimes I want to sort unified diffs of CSV files (sepa-
rated by tabs (here: \t)):

| +A 1\t1\tx
| +A 1\t2\ty
| +B 2\t3\tz
| -A 1\t1\tx
| -B 2\t2\ty
| -B 2\t3\tz

by the second column, then the first column, then "+" vs.
"-". Unfortunately, it seems that sort-regexp-fields doesn't
allow more than one match field as a key. sort-fields
doesn't work either as it requires the fields to be sur-
rounded by white space (no "+" vs. "-") and doesn't allow
white space inside the fields.

  Is there any function in vanilla Emacs (23.1.1) that I
missed? I looked at pimping sort-regexp-fields, but it seems
to me that sort-subr would have to be rewritten from scratch
to achieve sorting on compound keys.

Tim




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sorting on compound keys?
  2011-05-24 20:57 Sorting on compound keys? Tim Landscheidt
@ 2011-05-25  5:58 ` Andreas Röhler
  2011-05-25 22:08   ` Tim Landscheidt
  2011-05-29 21:50 ` Mark Tilford
  1 sibling, 1 reply; 10+ messages in thread
From: Andreas Röhler @ 2011-05-25  5:58 UTC (permalink / raw)
  To: help-gnu-emacs

Am 24.05.2011 22:57, schrieb Tim Landscheidt:
> Hi,
>
> sometimes I want to sort unified diffs of CSV files (sepa-
> rated by tabs (here: \t)):
>
> | +A 1\t1\tx
> | +A 1\t2\ty
> | +B 2\t3\tz
> | -A 1\t1\tx
> | -B 2\t2\ty
> | -B 2\t3\tz
>
> by the second column, then the first column, then "+" vs.
> "-". Unfortunately, it seems that sort-regexp-fields doesn't
> allow more than one match field as a key. sort-fields
> doesn't work either as it requires the fields to be sur-
> rounded by white space (no "+" vs. "-") and doesn't allow
> white space inside the fields.
>
>    Is there any function in vanilla Emacs (23.1.1) that I
> missed? I looked at pimping sort-regexp-fields, but it seems
> to me that sort-subr would have to be rewritten from scratch

Hi,

last time I looked into that feature was missing indeed.
However, didn't look for a need of re-write from the scratch, just to 
extend to existing routine - ie. introduce one or more levels of sorting.

Cheers,

Andreas

> to achieve sorting on compound keys.
>
> Tim
>
>
>




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sorting on compound keys?
  2011-05-25  5:58 ` Andreas Röhler
@ 2011-05-25 22:08   ` Tim Landscheidt
  2011-05-26  6:28     ` Andreas Röhler
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Landscheidt @ 2011-05-25 22:08 UTC (permalink / raw)
  To: help-gnu-emacs

Andreas Röhler <andreas.roehler@easy-emacs.de> wrote:

>> sometimes I want to sort unified diffs of CSV files (sepa-
>> rated by tabs (here: \t)):

>> | +A 1\t1\tx
>> | +A 1\t2\ty
>> | +B 2\t3\tz
>> | -A 1\t1\tx
>> | -B 2\t2\ty
>> | -B 2\t3\tz

>> by the second column, then the first column, then "+" vs.
>> "-". Unfortunately, it seems that sort-regexp-fields doesn't
>> allow more than one match field as a key. sort-fields
>> doesn't work either as it requires the fields to be sur-
>> rounded by white space (no "+" vs. "-") and doesn't allow
>> white space inside the fields.

>>    Is there any function in vanilla Emacs (23.1.1) that I
>> missed? I looked at pimping sort-regexp-fields, but it seems
>> to me that sort-subr would have to be rewritten from scratch
>> to achieve sorting on compound keys.

> last time I looked into that feature was missing indeed.
> However, didn't look for a need of re-write from the
> scratch, just to extend to existing routine - ie. introduce
> one or more levels of sorting.

I remember our discussion in de.comp.editoren :-), but as I
read sort-subr it is hard-coded that the sort key is one
literal, continuous part of the buffer as sort-lists is a
list of buffer positions.

Tim




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sorting on compound keys?
  2011-05-25 22:08   ` Tim Landscheidt
@ 2011-05-26  6:28     ` Andreas Röhler
  2011-05-26 22:49       ` Tim Landscheidt
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Röhler @ 2011-05-26  6:28 UTC (permalink / raw)
  To: help-gnu-emacs

Am 26.05.2011 00:08, schrieb Tim Landscheidt:
> Andreas Röhler<andreas.roehler@easy-emacs.de>  wrote:
>
>>> sometimes I want to sort unified diffs of CSV files (sepa-
>>> rated by tabs (here: \t)):
>
>>> | +A 1\t1\tx
>>> | +A 1\t2\ty
>>> | +B 2\t3\tz
>>> | -A 1\t1\tx
>>> | -B 2\t2\ty
>>> | -B 2\t3\tz
>
>>> by the second column, then the first column, then "+" vs.
>>> "-". Unfortunately, it seems that sort-regexp-fields doesn't
>>> allow more than one match field as a key. sort-fields
>>> doesn't work either as it requires the fields to be sur-
>>> rounded by white space (no "+" vs. "-") and doesn't allow
>>> white space inside the fields.
>
>>>     Is there any function in vanilla Emacs (23.1.1) that I
>>> missed? I looked at pimping sort-regexp-fields, but it seems
>>> to me that sort-subr would have to be rewritten from scratch
>>> to achieve sorting on compound keys.
>
>> last time I looked into that feature was missing indeed.
>> However, didn't look for a need of re-write from the
>> scratch, just to extend to existing routine - ie. introduce
>> one or more levels of sorting.
>
> I remember our discussion in de.comp.editoren :-), but as I
> read sort-subr it is hard-coded that the sort key is one
> literal, continuous part of the buffer as sort-lists is a
> list of buffer positions.
>
> Tim
>
>
>


sort-subr takes functions to determine the fields to sort.

As for the functions as arguments, maybe have a look at
`ar-th-sort' in thingatpt-utils-base.el

https://code.launchpad.net/s-x-emacs-werkstatt/

Cheers,

Andreas






^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sorting on compound keys?
  2011-05-26  6:28     ` Andreas Röhler
@ 2011-05-26 22:49       ` Tim Landscheidt
  2011-05-29 20:17         ` Andreas Röhler
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Landscheidt @ 2011-05-26 22:49 UTC (permalink / raw)
  To: help-gnu-emacs

Andreas Röhler <andreas.roehler@easy-emacs.de> wrote:

>>>> sometimes I want to sort unified diffs of CSV files (sepa-
>>>> rated by tabs (here: \t)):

>>>> | +A 1\t1\tx
>>>> | +A 1\t2\ty
>>>> | +B 2\t3\tz
>>>> | -A 1\t1\tx
>>>> | -B 2\t2\ty
>>>> | -B 2\t3\tz

>>>> by the second column, then the first column, then "+" vs.
>>>> "-". Unfortunately, it seems that sort-regexp-fields doesn't
>>>> allow more than one match field as a key. sort-fields
>>>> doesn't work either as it requires the fields to be sur-
>>>> rounded by white space (no "+" vs. "-") and doesn't allow
>>>> white space inside the fields.

>>>>     Is there any function in vanilla Emacs (23.1.1) that I
>>>> missed? I looked at pimping sort-regexp-fields, but it seems
>>>> to me that sort-subr would have to be rewritten from scratch
>>>> to achieve sorting on compound keys.

>>> last time I looked into that feature was missing indeed.
>>> However, didn't look for a need of re-write from the
>>> scratch, just to extend to existing routine - ie. introduce
>>> one or more levels of sorting.

>> I remember our discussion in de.comp.editoren :-), but as I
>> read sort-subr it is hard-coded that the sort key is one
>> literal, continuous part of the buffer as sort-lists is a
>> list of buffer positions.

> sort-subr takes functions to determine the fields to sort.

No, it accepts functions to determine the *boundaries* of
the fields that have to be part of the buffer as I have
written above.

> As for the functions as arguments, maybe have a look at
> `ar-th-sort' in thingatpt-utils-base.el

> https://code.launchpad.net/s-x-emacs-werkstatt/

How is this useful in this case?

Tim




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sorting on compound keys?
  2011-05-26 22:49       ` Tim Landscheidt
@ 2011-05-29 20:17         ` Andreas Röhler
  2011-06-10  0:26           ` Tim Landscheidt
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Röhler @ 2011-05-29 20:17 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: Tim Landscheidt

Am 27.05.2011 00:49, schrieb Tim Landscheidt:
> Andreas Röhler<andreas.roehler@easy-emacs.de>  wrote:
>
>>>>> sometimes I want to sort unified diffs of CSV files (sepa-
>>>>> rated by tabs (here: \t)):
>
>>>>> | +A 1\t1\tx
>>>>> | +A 1\t2\ty
>>>>> | +B 2\t3\tz
>>>>> | -A 1\t1\tx
>>>>> | -B 2\t2\ty
>>>>> | -B 2\t3\tz
>
>>>>> by the second column, then the first column, then "+" vs.
>>>>> "-". Unfortunately, it seems that sort-regexp-fields doesn't
>>>>> allow more than one match field as a key. sort-fields
>>>>> doesn't work either as it requires the fields to be sur-
>>>>> rounded by white space (no "+" vs. "-") and doesn't allow
>>>>> white space inside the fields.
>
>>>>>      Is there any function in vanilla Emacs (23.1.1) that I
>>>>> missed? I looked at pimping sort-regexp-fields, but it seems
>>>>> to me that sort-subr would have to be rewritten from scratch
>>>>> to achieve sorting on compound keys.
>
>>>> last time I looked into that feature was missing indeed.
>>>> However, didn't look for a need of re-write from the
>>>> scratch, just to extend to existing routine - ie. introduce
>>>> one or more levels of sorting.
>
>>> I remember our discussion in de.comp.editoren :-), but as I
>>> read sort-subr it is hard-coded that the sort key is one
>>> literal, continuous part of the buffer as sort-lists is a
>>> list of buffer positions.
>
>> sort-subr takes functions to determine the fields to sort.
>
> No, it accepts functions to determine the *boundaries* of
> the fields that have to be part of the buffer as I have
> written above.
>
>> As for the functions as arguments, maybe have a look at
>> `ar-th-sort' in thingatpt-utils-base.el
>
>> https://code.launchpad.net/s-x-emacs-werkstatt/
>
> How is this useful in this case?
>
> Tim
>
>
>

Hi Tim,

you are right. It must not be done inside sort-subr, but on the top of it.

BTW as sort-subr takes whitespace as field-delimiter, there is no way to 
get +A considered as two fields. Beside this limitation, code below 
should provide multiple-fields sorting.

;;; sort-multiple-keys.el --- sort multiple fields

;; Author: Andreas Roehler <andreas.roehler@online.de>

;; Keywords: data

;; This program is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.

;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with this program.  If not, see <http://www.gnu.org/licenses/>.

;;; Commentary:

;; Sort lines in region lexicographically by the
;; ARG-LIST fields. Fields already sorted by a field
;; specified by a previous arg are sorted by the next
;; remaining. Uses any number of args given in a list.

;; Fields are separated by whitespace and numbered from
;; 1 up. With a negative arg, sorts by the ARGth field
;; counted from the right. Called from a program, there
;; are three arguments:  BEG  END and FIELD-LIST. BEG
;; and END specify region to sort. The variable
;; `sort-fold-case' determines whether alphabetic case
;; affects the sort order.


;; Example - assume the code below uncommented at the
;; beginning of a buffer:

;; +C 2	1	x
;; +A 2	2	y
;; +A 1	2	y
;; +A 1	2	z
;; +C 1	1	x
;; +A 4	2	z
;; +A 3	2	y
;; +B 3	3	x
;; +C 2	1	x
;; +B 2	3	z
;; -A 6	1	x
;; -B 1	2	y
;; -A 2	1	x
;; -B 1	3	z

;; sort region hierarchically with first, fourth and second field
;; (sort-multiple-fields 1 126 '(1 4 2))
;; ==>

;; +A 1	2	y
;; +A 2	2	y
;; +A 3	2	y
;; +A 1	2	z
;; +A 4	2	z
;; +B 2	3	z
;; +B 3	3	x
;; +C 1	1	x
;; +C 2	1	x
;; +C 2	1	x
;; -A 2	1	x
;; -A 6	1	x
;; -B 1	2	y
;; -B 1	3	z



;;; Code:

(defun sort-multiple-fields (beg end fields)
   (interactive "*r\nnSort for field: ")
   (save-excursion
     (when (interactive-p)
       (while
           (yes-or-no-p "Sort another field?")
         (add-to-list 'fields (read-number "Sort for field: ")))
       (message "Sorting for fields %s" (prin1-to-string fields)))
     (let* ((positions (copy-sequence fields))
           (max-field (car (sort positions #'>))))
       (sort-multiple-fields-base beg end fields))))

(defun sort-multiple-fields-base (beg end fields)
   (lexical-let ((key (or (car-safe fields) (list fields)))
                 (this-fields (copy-sequence fields))
                 last)
     (save-restriction
       (narrow-to-region beg end)
       (sort-fields key beg end)
         (setq last (car fields))
         (when (cadr this-fields)
           (setq this-fields (cdr this-fields))
         (sort-multiple-fields-intern beg end last this-fields fields)))))

(defun sort-multiple-fields-intern (beg end &optional last this-fields 
fields)
   (lexical-let ((beg beg)
                 (pos end)
                 (end end)
                 (last last)
                 (fields fields)
                 (this-fields (copy-sequence this-fields))
                 regexp)
     (setq key (pop this-fields))
     (dotimes (i max-field)
       ;; i starts with 0, first field is done above
       (cond ((eq 0 i)
              (if (eq 1 last)
                  (setq regexp "^[ \t\n]*\\([^ \t\n]+\\)")
                (setq regexp "^[ \t\n]*[^ \t\n]+")))
             ((eq last (1+ i))
              (setq regexp (concat regexp "[ \t\n]+\\([^ \t\n]+\\)")))
             (t (setq regexp (concat regexp "[ \t\n]+[^ \t\n]+")))))
     (setq regexp (concat regexp ".*$"))
     (goto-char beg)
     (while (and (re-search-forward regexp pos t 1)
                 (setq beg (line-beginning-position))
                 (setq erg (match-string-no-properties 1)))
       ;; at least one success
       (when (and (re-search-forward regexp pos t 1)
                  (string= (match-string-no-properties 1) erg)
                  (setq end (line-end-position)))
         (while (and (re-search-forward regexp pos t 1)
                     (string= (match-string-no-properties 1) erg)
                     (setq end (line-end-position))))
         (when (and beg end)
           ;; we really moved, there is another region to sort
           (save-restriction
             (narrow-to-region beg end)
             (sort-fields key beg end)
             (when (car this-fields)
               (setq last key)
               (sort-multiple-fields-intern beg end last this-fields))))))))

(provide 'sort-multiple-keys)
;;; sort-multiple-keys.el ends here













^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sorting on compound keys?
  2011-05-24 20:57 Sorting on compound keys? Tim Landscheidt
  2011-05-25  5:58 ` Andreas Röhler
@ 2011-05-29 21:50 ` Mark Tilford
  2011-06-10  0:27   ` Tim Landscheidt
  1 sibling, 1 reply; 10+ messages in thread
From: Mark Tilford @ 2011-05-29 21:50 UTC (permalink / raw)
  To: help-gnu-emacs

On Tue, May 24, 2011 at 3:57 PM, Tim Landscheidt <tim@tim-landscheidt.de> wrote:
> Hi,
>
> sometimes I want to sort unified diffs of CSV files (sepa-
> rated by tabs (here: \t)):
>
> | +A 1\t1\tx
> | +A 1\t2\ty
> | +B 2\t3\tz
> | -A 1\t1\tx
> | -B 2\t2\ty
> | -B 2\t3\tz
>
> by the second column, then the first column, then "+" vs.
> "-". Unfortunately, it seems that sort-regexp-fields doesn't
> allow more than one match field as a key. sort-fields
> doesn't work either as it requires the fields to be sur-
> rounded by white space (no "+" vs. "-") and doesn't allow
> white space inside the fields.
>
>  Is there any function in vanilla Emacs (23.1.1) that I
> missed? I looked at pimping sort-regexp-fields, but it seems
> to me that sort-subr would have to be rewritten from scratch
> to achieve sorting on compound keys.
>
> Tim

Is there an option to do a stable sort, such as mergesort?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sorting on compound keys?
  2011-05-29 20:17         ` Andreas Röhler
@ 2011-06-10  0:26           ` Tim Landscheidt
  2011-06-13  7:32             ` Andreas Röhler
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Landscheidt @ 2011-06-10  0:26 UTC (permalink / raw)
  To: help-gnu-emacs

Andreas Röhler <andreas.roehler@easy-emacs.de> wrote:

> you are right. It must not be done inside sort-subr, but on the top of it.

> BTW as sort-subr takes whitespace as field-delimiter, there
> is no way to get +A considered as two fields. Beside this
> limitation, code below should provide multiple-fields
> sorting.
> [...]

sort-subr allows arbitrary field definitions (as long as
they are literal, continuous parts of the buffer), but
Mark's article reminded me that not only the effect your
sort-multiple-fields has can be achieved with multiple calls
to sort-fields as well, but ...




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sorting on compound keys?
  2011-05-29 21:50 ` Mark Tilford
@ 2011-06-10  0:27   ` Tim Landscheidt
  0 siblings, 0 replies; 10+ messages in thread
From: Tim Landscheidt @ 2011-06-10  0:27 UTC (permalink / raw)
  To: help-gnu-emacs

Mark Tilford <ralphmerridew@gmail.com> wrote:

>> sometimes I want to sort unified diffs of CSV files (sepa-
>> rated by tabs (here: \t)):

>> | +A 1\t1\tx
>> | +A 1\t2\ty
>> | +B 2\t3\tz
>> | -A 1\t1\tx
>> | -B 2\t2\ty
>> | -B 2\t3\tz

>> by the second column, then the first column, then "+" vs.
>> "-". Unfortunately, it seems that sort-regexp-fields doesn't
>> allow more than one match field as a key. sort-fields
>> doesn't work either as it requires the fields to be sur-
>> rounded by white space (no "+" vs. "-") and doesn't allow
>> white space inside the fields.

>>  Is there any function in vanilla Emacs (23.1.1) that I
>> missed? I looked at pimping sort-regexp-fields, but it seems
>> to me that sort-subr would have to be rewritten from scratch
>> to achieve sorting on compound keys.

> Is there an option to do a stable sort, such as mergesort?

Eureka! Of course! All Emacs sort functions are stable, so
99 % of my use cases can be dealt with by multiple calls to
sort-regexp-fields (the only exception being sorting numeri-
cally and the like).

  Unfortunately, those multiple calls can be tedious when
done interactively, so voilà:

| (defun tl-sort-regexp-fields (reverse record-regexp key-regexp beg end)
|   (interactive "P\nsRegexp specifying records to sort: 
| sRegexp specifying key within record: \nr")
|   (if (string-match "\\`\\(?:-\\\\[1-9]\\|\\(?:-?\\\\[1-9]\\)\\{2,\\}\\)\\'" key-regexp)
|       (let
|           ((i (length key-regexp)))
|         (while (> i 0)
|           (let ((key-reverse (and (> i 2) (= (aref key-regexp (- i 3)) ?-)))
|                 (key (substring key-regexp (- i 2) i)))
|             (sort-regexp-fields (if reverse (not key-reverse) key-reverse) record-regexp key beg end)
|             (if key-reverse
|                 (setq i (- i 1)))
|             (setq i (- i 2)))))
|     (sort-regexp-fields reverse record-regexp key-regexp beg end)))

A key-regexp of "\2\3\1" will yield the region sorted by the
second field, then the third, then the first. The fields can
be prefixed with "-" to negate the sort order for this
field, e. g. "\2-\3\1" will sort by the second field ascend-
ingly, then the third descendingly, then the first ascend-
ingly.

  With regard to performance, the region is sorted once for
every key, so it may not be suitable for larger datasets,
but up to a few thousand lines it's fast enough for me. If
someone wants to integrate this into Emacs, please go ahead.

Thanks, also to Andreas,
Tim

P. S.: Is there really no xor in elisp?




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sorting on compound keys?
  2011-06-10  0:26           ` Tim Landscheidt
@ 2011-06-13  7:32             ` Andreas Röhler
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Röhler @ 2011-06-13  7:32 UTC (permalink / raw)
  To: help-gnu-emacs

Am 10.06.2011 02:26, schrieb Tim Landscheidt:
[ ... ]
>> BTW as sort-subr takes whitespace as field-delimiter, there
>> is no way to get +A considered as two fields. Beside this
>> limitation, code below should provide multiple-fields
>> sorting.
>> [...]
>
> sort-subr allows arbitrary field definitions (as long as
> they are literal, continuous parts of the buffer),

Thanks correcting that.

Should have mentioned `sort-fields' instead.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-06-13  7:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-24 20:57 Sorting on compound keys? Tim Landscheidt
2011-05-25  5:58 ` Andreas Röhler
2011-05-25 22:08   ` Tim Landscheidt
2011-05-26  6:28     ` Andreas Röhler
2011-05-26 22:49       ` Tim Landscheidt
2011-05-29 20:17         ` Andreas Röhler
2011-06-10  0:26           ` Tim Landscheidt
2011-06-13  7:32             ` Andreas Röhler
2011-05-29 21:50 ` Mark Tilford
2011-06-10  0:27   ` Tim Landscheidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).