all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Sorting lines by length
@ 2014-09-16 12:56 Loris Bennett
  2014-09-16 13:48 ` Doug Lewan
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Loris Bennett @ 2014-09-16 12:56 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

Is there a canonical way of sorting lines by length, longest first?

I have a file with which might look like this:

7-Jan-2013 node025 node061
14-Jan-2013 node025 node034 node061
21-Jan-2013 node025 node034 node050 node061
28-Jan-2013 node025 node034 node061
4-Feb-2013 node025 node034 node061
11-Feb-2013 node025 node034 node061
18-Feb-2013 node034
25-Feb-2013 node034
11-Mar-2013 node025

I actually just need the longest line first.  For the example above this
is quite easy to see, but in the real file, there are around 100 lines
and the longest might have around 1000 characters.

My use case is reading the data into an R data frame.  The number of
columns in the resulting data frame seems to be determined by the number
of items in the first line.

Cheers,

Loris

-- 
This signature is currently under construction.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Sorting lines by length
  2014-09-16 12:56 Sorting lines by length Loris Bennett
@ 2014-09-16 13:48 ` Doug Lewan
  2014-09-16 13:56 ` Michael Heerdegen
  2014-09-17  2:14 ` Pascal J. Bourguignon
  2 siblings, 0 replies; 5+ messages in thread
From: Doug Lewan @ 2014-09-16 13:48 UTC (permalink / raw)
  To: Loris Bennett, help-gnu-emacs@gnu.org

The code for (sort-lines) looks simple enough. Look at the definition of (sort-subr) (the last line of the definition of (sort-lines) in emacs 24.3). You could use something like the following for a predicate:

(defun compare-lengths (left right)
  "Return non-nil if the string LEFT is shorter than the RIGHT."
  (< (length left) (length right)))

,Doug
Douglas Lewan
Shubert Ticketing
(201) 489-8600 ext 224 or ext 4335

"This is a slow pup," he said continuing his ascent.

> -----Original Message-----
> From: help-gnu-emacs-bounces+dougl=shubertticketing.com@gnu.org
> [mailto:help-gnu-emacs-bounces+dougl=shubertticketing.com@gnu.org] On
> Behalf Of Loris Bennett
> Sent: Tuesday, 2014 September 16 08:57
> To: help-gnu-emacs@gnu.org
> Subject: Sorting lines by length
> 
> Hi,
> 
> Is there a canonical way of sorting lines by length, longest first?
> 
> I have a file with which might look like this:
> 
> 7-Jan-2013 node025 node061
> 14-Jan-2013 node025 node034 node061
> 21-Jan-2013 node025 node034 node050 node061
> 28-Jan-2013 node025 node034 node061
> 4-Feb-2013 node025 node034 node061
> 11-Feb-2013 node025 node034 node061
> 18-Feb-2013 node034
> 25-Feb-2013 node034
> 11-Mar-2013 node025
> 
> I actually just need the longest line first.  For the example above
> this
> is quite easy to see, but in the real file, there are around 100 lines
> and the longest might have around 1000 characters.
> 
> My use case is reading the data into an R data frame.  The number of
> columns in the resulting data frame seems to be determined by the
> number
> of items in the first line.
> 
> Cheers,
> 
> Loris
> 
> --
> This signature is currently under construction.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sorting lines by length
  2014-09-16 12:56 Sorting lines by length Loris Bennett
  2014-09-16 13:48 ` Doug Lewan
@ 2014-09-16 13:56 ` Michael Heerdegen
  2014-09-17  2:14 ` Pascal J. Bourguignon
  2 siblings, 0 replies; 5+ messages in thread
From: Michael Heerdegen @ 2014-09-16 13:56 UTC (permalink / raw)
  To: help-gnu-emacs

"Loris Bennett" <loris.bennett@fu-berlin.de> writes:

> Is there a canonical way of sorting lines by length, longest first?

`sort-subr' probably.  Use it like

  (sort-subr t #'forward-line #'end-of-line nil nil
             (lambda (l1 l2)
               (apply #'< (mapcar (lambda (range) (- (cdr range) (car range)))
                                  (list l1 l2)))))

You can even modify the lambda to do something more sophisticated like
counting words.

Michael.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sorting lines by length
  2014-09-16 12:56 Sorting lines by length Loris Bennett
  2014-09-16 13:48 ` Doug Lewan
  2014-09-16 13:56 ` Michael Heerdegen
@ 2014-09-17  2:14 ` Pascal J. Bourguignon
  2014-09-17  2:53   ` Drew Adams
  2 siblings, 1 reply; 5+ messages in thread
From: Pascal J. Bourguignon @ 2014-09-17  2:14 UTC (permalink / raw)
  To: help-gnu-emacs

"Loris Bennett" <loris.bennett@fu-berlin.de> writes:

> Hi,
>
> Is there a canonical way of sorting lines by length, longest first?
>
> I have a file with which might look like this:
>
> 7-Jan-2013 node025 node061
> 14-Jan-2013 node025 node034 node061
> 21-Jan-2013 node025 node034 node050 node061
> 28-Jan-2013 node025 node034 node061
> 4-Feb-2013 node025 node034 node061
> 11-Feb-2013 node025 node034 node061
> 18-Feb-2013 node034
> 25-Feb-2013 node034
> 11-Mar-2013 node025
>
> I actually just need the longest line first.  For the example above this
> is quite easy to see, but in the real file, there are around 100 lines
> and the longest might have around 1000 characters.
>
> My use case is reading the data into an R data frame.  The number of
> columns in the resulting data frame seems to be determined by the number
> of items in the first line.

You want the longest line.  This is quite different than wanting to sort
lines.

Finding the longest line is a O(n) operation.
Sorting lines is a O(n*log(n)) operation.


(defun goto-longest-line () 
  (interactive)
  (let ((max-line-start 0)
        (max-line-length 0))
    (goto-char (point-min))
    (while (< (point) (point-max))
      (let ((start (point)))
        (forward-line)
        (let ((length (- (point) start)))
          (if (< max-line-length length)
             (setf max-line-length length
                   max-line-start  start)))))
    (goto-char max-line-start)
    (set-mark (+ max-line-start max-line-length))))
  

-- 
__Pascal Bourguignon__                 http://www.informatimago.com/
“The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.” -- Carl Bass CEO Autodesk


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Sorting lines by length
  2014-09-17  2:14 ` Pascal J. Bourguignon
@ 2014-09-17  2:53   ` Drew Adams
  0 siblings, 0 replies; 5+ messages in thread
From: Drew Adams @ 2014-09-17  2:53 UTC (permalink / raw)
  To: Pascal J. Bourguignon, help-gnu-emacs

> You want the longest line.  This is quite different than wanting to
> sort lines.  Finding the longest line is a O(n) operation.
> Sorting lines is a O(n*log(n)) operation.
> 
> (defun goto-longest-line ()
>   (interactive)
>   (let ((max-line-start 0)
>         (max-line-length 0))
>     (goto-char (point-min))
>     (while (< (point) (point-max))
>       (let ((start (point)))
>         (forward-line)
>         (let ((length (- (point) start)))
>           (if (< max-line-length length)
>              (setf max-line-length length
>                    max-line-start  start)))))
>     (goto-char max-line-start)
>     (set-mark (+ max-line-start max-line-length))))

FWIW -

The version of `goto-longest-line' (coincidentally the same name) in
`misc-cmds.el' is similar but does a bit more.  Here is the doc string:

,----
| goto-longest-line is an interactive Lisp function in `misc-cmds.el'.
| (goto-longest-line BEG END)
| 
| Go to the first of the longest lines in the region or buffer.
| If the region is active, it is checked.
| If not, the buffer (or its restriction) is checked.
| 
| Returns a list of three elements:
| 
|  (LINE LINE-LENGTH OTHER-LINES LINES-CHECKED)
| 
| LINE is the first of the longest lines measured.
| LINE-LENGTH is the length of LINE.
| OTHER-LINES is a list of other lines checked that are as long as LINE.
| LINES-CHECKED is the number of lines measured.
| 
| Interactively, a message displays this information.
| 
| If there is only one line in the active region, then the region is
| deactivated after this command, and the message mentions only LINE and
| LINE-LENGTH.
| 
| If this command is repeated, it checks for the longest line after the
| cursor.  That is *not* necessarily the longest line other than the
| current line.  That longest line could be before or after the current
| line.
| 
| To search only from the current line forward, not throughout the
| buffer, you can use `C-SPC' to set the mark, then use this
| (repeatedly).
`----

And if you use `isearch+.el' then `C-end' is (by default) bound to
this command during Isearch. `C-g' puts you back where you left off
searching, `C-s' resumes searching from wherever you stop hitting
`C-end', etc.  I use this quite often during Isearch.

http://www.emacswiki.org/emacs-en/download/misc-cmds.el
http://www.emacswiki.org/emacs-en/download/isearch%2b.el

http://www.emacswiki.org/IsearchPlus



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-09-17  2:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-16 12:56 Sorting lines by length Loris Bennett
2014-09-16 13:48 ` Doug Lewan
2014-09-16 13:56 ` Michael Heerdegen
2014-09-17  2:14 ` Pascal J. Bourguignon
2014-09-17  2:53   ` Drew Adams

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.