From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Tim Landscheidt Newsgroups: gmane.emacs.help Subject: Re: Sorting on compound keys? Date: Fri, 10 Jun 2011 00:27:37 +0000 Organization: Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: dough.gmane.org 1307665926 14416 80.91.229.12 (10 Jun 2011 00:32:06 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 10 Jun 2011 00:32:06 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Jun 10 02:32:02 2011 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QUpdq-0001Z9-6j for geh-help-gnu-emacs@m.gmane.org; Fri, 10 Jun 2011 02:32:02 +0200 Original-Received: from localhost ([::1]:36339 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QUpdo-0000cd-Sk for geh-help-gnu-emacs@m.gmane.org; Thu, 09 Jun 2011 20:32:01 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:54891) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QUpc5-0000c0-78 for help-gnu-emacs@gnu.org; Thu, 09 Jun 2011 20:30:14 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QUpc3-0002Cg-AF for help-gnu-emacs@gnu.org; Thu, 09 Jun 2011 20:30:12 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:49616) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QUpc2-0002CD-LU for help-gnu-emacs@gnu.org; Thu, 09 Jun 2011 20:30:11 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QUpbx-00013Q-Kt for help-gnu-emacs@gnu.org; Fri, 10 Jun 2011 02:30:05 +0200 Original-Received: from d221004.adsl.hansenet.de ([80.171.221.4]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 10 Jun 2011 02:30:05 +0200 Original-Received: from tim by d221004.adsl.hansenet.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 10 Jun 2011 02:30:05 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: help-gnu-emacs@gnu.org Original-Lines: 65 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: d221004.adsl.hansenet.de Mail-Copies-To: never User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:kU7Xw63rpTKwA7wGnVlE6rHTwZs= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 80.91.229.12 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:81322 Archived-At: Mark Tilford wrote: >> sometimes I want to sort unified diffs of CSV files (sepa- >> rated by tabs (here: \t)): >> | +A 1\t1\tx >> | +A 1\t2\ty >> | +B 2\t3\tz >> | -A 1\t1\tx >> | -B 2\t2\ty >> | -B 2\t3\tz >> by the second column, then the first column, then "+" vs. >> "-". Unfortunately, it seems that sort-regexp-fields doesn't >> allow more than one match field as a key. sort-fields >> doesn't work either as it requires the fields to be sur- >> rounded by white space (no "+" vs. "-") and doesn't allow >> white space inside the fields. >>  Is there any function in vanilla Emacs (23.1.1) that I >> missed? I looked at pimping sort-regexp-fields, but it seems >> to me that sort-subr would have to be rewritten from scratch >> to achieve sorting on compound keys. > Is there an option to do a stable sort, such as mergesort? Eureka! Of course! All Emacs sort functions are stable, so 99 % of my use cases can be dealt with by multiple calls to sort-regexp-fields (the only exception being sorting numeri- cally and the like). Unfortunately, those multiple calls can be tedious when done interactively, so voilą: | (defun tl-sort-regexp-fields (reverse record-regexp key-regexp beg end) | (interactive "P\nsRegexp specifying records to sort: | sRegexp specifying key within record: \nr") | (if (string-match "\\`\\(?:-\\\\[1-9]\\|\\(?:-?\\\\[1-9]\\)\\{2,\\}\\)\\'" key-regexp) | (let | ((i (length key-regexp))) | (while (> i 0) | (let ((key-reverse (and (> i 2) (= (aref key-regexp (- i 3)) ?-))) | (key (substring key-regexp (- i 2) i))) | (sort-regexp-fields (if reverse (not key-reverse) key-reverse) record-regexp key beg end) | (if key-reverse | (setq i (- i 1))) | (setq i (- i 2))))) | (sort-regexp-fields reverse record-regexp key-regexp beg end))) A key-regexp of "\2\3\1" will yield the region sorted by the second field, then the third, then the first. The fields can be prefixed with "-" to negate the sort order for this field, e. g. "\2-\3\1" will sort by the second field ascend- ingly, then the third descendingly, then the first ascend- ingly. With regard to performance, the region is sorted once for every key, so it may not be suitable for larger datasets, but up to a few thousand lines it's fast enough for me. If someone wants to integrate this into Emacs, please go ahead. Thanks, also to Andreas, Tim P. S.: Is there really no xor in elisp?