unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* sort-lines including non ASCII
@ 2016-07-05 20:58 Uwe Brauer
  2016-07-05 21:57 ` Óscar Fuentes
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Uwe Brauer @ 2016-07-05 20:58 UTC (permalink / raw)
  To: emacs-devel


Hello

Take the following Spanish example.

Arrieta 
Anton   
Álvarez


Using sort-lines *does* not result in

Álvarez
Anton   
Arrieta 

But in 

Anton   
Arrieta 
Álvarez

Which is contra intuitive. Does anybody know about a function with such
a feature? Any plans?


thanks

Uwe Brauer 




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-05 20:58 sort-lines including non ASCII Uwe Brauer
@ 2016-07-05 21:57 ` Óscar Fuentes
  2016-07-07  7:35   ` Uwe Brauer
  2016-07-06 14:34 ` Eli Zaretskii
  2016-07-07  8:23 ` Teemu Likonen
  2 siblings, 1 reply; 32+ messages in thread
From: Óscar Fuentes @ 2016-07-05 21:57 UTC (permalink / raw)
  To: emacs-devel

Uwe Brauer <oub@mat.ucm.es> writes:

> Take the following Spanish example.
>
> Arrieta 
> Anton   
> Álvarez
>
>
> Using sort-lines *does* not result in
>
> Álvarez
> Anton   
> Arrieta 
>
> But in 
>
> Anton   
> Arrieta 
> Álvarez
>
> Which is contra intuitive. Does anybody know about a function with such
> a feature? Any plans?

(sort (list "Arrieta" "Antón" "Álvarez") 'string-collate-lessp)

If that doesn't work, check your LOCALE (or LANG) environment variable
(see the docstring of string-collate-lessp for details).

Someting like

(sort (list "Antón" "Arrieta" "Álvarez")
      (lambda (a b)
	(string-collate-lessp a b "es_ES.UTF-8" t)))

should do the right thing regardless of your environment variables (at
least on GNU/Linux).




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-05 20:58 sort-lines including non ASCII Uwe Brauer
  2016-07-05 21:57 ` Óscar Fuentes
@ 2016-07-06 14:34 ` Eli Zaretskii
  2016-07-06 14:52   ` Michael Heerdegen
  2016-07-07  7:41   ` Uwe Brauer
  2016-07-07  8:23 ` Teemu Likonen
  2 siblings, 2 replies; 32+ messages in thread
From: Eli Zaretskii @ 2016-07-06 14:34 UTC (permalink / raw)
  To: Uwe Brauer; +Cc: emacs-devel

> From: Uwe Brauer <oub@mat.ucm.es>
> Date: Tue, 05 Jul 2016 20:58:46 +0000
> 
> Using sort-lines *does* not result in
> 
> Álvarez
> Anton   
> Arrieta 
> 
> But in 
> 
> Anton   
> Arrieta 
> Álvarez
> 
> Which is contra intuitive.

Because you are thinking Spanish, I presume.  Emacs by default is not
sensitive to the current locale or language, when it compares strings,
and instead does that in binary order of the characters' Unicode
codepoints.  The advantage is that the order comes out the same in any
locale.

Óscar suggested string-collate-lessp, which is indeed what you want,
but please bear in mind that the resulting program will behave
differently in different locales.  Even if you specify the locale to
sort explicitly, the program might be in trouble on someone else's
machine if that locale is not installed or not available.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-06 14:34 ` Eli Zaretskii
@ 2016-07-06 14:52   ` Michael Heerdegen
  2016-07-07  7:34     ` Uwe Brauer
  2016-07-07  7:41   ` Uwe Brauer
  1 sibling, 1 reply; 32+ messages in thread
From: Michael Heerdegen @ 2016-07-06 14:52 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Óscar suggested string-collate-lessp, which is indeed what you want,

Should we make `sort-lines' accept a predicate argument passed to
`sort-subr'?  AFAIU one currently has to hack source code to get what
the OP wants.


Michael.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-06 14:52   ` Michael Heerdegen
@ 2016-07-07  7:34     ` Uwe Brauer
  2016-07-07 15:17       ` Eli Zaretskii
  0 siblings, 1 reply; 32+ messages in thread
From: Uwe Brauer @ 2016-07-07  7:34 UTC (permalink / raw)
  To: emacs-devel

>>> "Michael" == Michael Heerdegen <michael_heerdegen@web.de> writes:

   > Eli Zaretskii <eliz@gnu.org> writes:
   >> Óscar suggested string-collate-lessp, which is indeed what you want,

   > Should we make `sort-lines' accept a predicate argument passed to
   > `sort-subr'?  AFAIU one currently has to hack source code to get what
   > the OP wants.

I think that would be nice.

   > Michael.







^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-05 21:57 ` Óscar Fuentes
@ 2016-07-07  7:35   ` Uwe Brauer
  0 siblings, 0 replies; 32+ messages in thread
From: Uwe Brauer @ 2016-07-07  7:35 UTC (permalink / raw)
  To: emacs-devel


   > Uwe Brauer <oub@mat.ucm.es> writes:

   > (sort (list "Arrieta" "Antón" "Álvarez") 'string-collate-lessp)

   > If that doesn't work, check your LOCALE (or LANG) environment variable
   > (see the docstring of string-collate-lessp for details).

   > Someting like

   > (sort (list "Antón" "Arrieta" "Álvarez")
   >       (lambda (a b)
   > 	(string-collate-lessp a b "es_ES.UTF-8" t)))

   > should do the right thing regardless of your environment variables (at
   > least on GNU/Linux).

Thanks but suppose I have 100 lines, using your solution would be
cumbersome. Wouldn't it be better to modify sort-lines?






^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-06 14:34 ` Eli Zaretskii
  2016-07-06 14:52   ` Michael Heerdegen
@ 2016-07-07  7:41   ` Uwe Brauer
  2016-07-07 15:20     ` Eli Zaretskii
  1 sibling, 1 reply; 32+ messages in thread
From: Uwe Brauer @ 2016-07-07  7:41 UTC (permalink / raw)
  To: emacs-devel



   > Because you are thinking Spanish, I presume.  Emacs by default is not
   > sensitive to the current locale or language, when it compares strings,
   > and instead does that in binary order of the characters' Unicode
   > codepoints.  The advantage is that the order comes out the same in any
   > locale.

Hm I just made an experiment with Hebrew, with and without niqqud and
indeed 

בית
אבא
אוויר

Is sorted correctly and also



אוויר
בית
אַבָא

So the niqqud does not influence the sorting but the accent in spanish
does. Most likely Unicode is the culprit here, but it is contra
intuitive.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-05 20:58 sort-lines including non ASCII Uwe Brauer
  2016-07-05 21:57 ` Óscar Fuentes
  2016-07-06 14:34 ` Eli Zaretskii
@ 2016-07-07  8:23 ` Teemu Likonen
  2016-07-07 15:23   ` Eli Zaretskii
  2 siblings, 1 reply; 32+ messages in thread
From: Teemu Likonen @ 2016-07-07  8:23 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 244 bytes --]

If we are allowed to step outside Emacs filtering lines through (GNU)
/usr/bin/sort should do what you want.

-- 
/// Teemu Likonen   - .-..   <https://github.com/tlikonen> //
// PGP: 4E10 55DC 84E9 DFF6 13D7 8557 719D 69D3 2453 9450 ///

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07  7:34     ` Uwe Brauer
@ 2016-07-07 15:17       ` Eli Zaretskii
  2016-07-07 16:30         ` Michael Heerdegen
  0 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2016-07-07 15:17 UTC (permalink / raw)
  To: Uwe Brauer, Michael Heerdegen; +Cc: emacs-devel

> From: Uwe Brauer <oub@mat.ucm.es>
> Date: Thu, 07 Jul 2016 07:34:00 +0000
> 
>    > Should we make `sort-lines' accept a predicate argument passed to
>    > `sort-subr'?  AFAIU one currently has to hack source code to get what
>    > the OP wants.
> 
> I think that would be nice.

How do you solve the problem of different argument lists in different
predicates?  And how do we allow to specify such an alternative
comparison in interactive usage?

Also, if string-collate-lessp is the only additional possibility we
can think of, why not change sort-subr to use it "when it's TRT", as
we do currently when sort-subr's PREDICATE argument is nil?  Making
sort-lines entirely open-ended like the suggestion says might be too
much.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07  7:41   ` Uwe Brauer
@ 2016-07-07 15:20     ` Eli Zaretskii
  2016-07-07 16:13       ` Uwe Brauer
  0 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2016-07-07 15:20 UTC (permalink / raw)
  To: Uwe Brauer; +Cc: emacs-devel

> From: Uwe Brauer <oub@mat.ucm.es>
> Date: Thu, 07 Jul 2016 07:41:03 +0000
> 
>  > Because you are thinking Spanish, I presume.  Emacs by default is not
>    > sensitive to the current locale or language, when it compares strings,
>    > and instead does that in binary order of the characters' Unicode
>    > codepoints.  The advantage is that the order comes out the same in any
>    > locale.
> 
> Hm I just made an experiment with Hebrew, with and without niqqud and
> indeed 

> בית
> אבא
> אוויר

> Is sorted correctly and also

> אוויר
> בית
> אַבָא

> So the niqqud does not influence the sorting but the accent in spanish
> does. Most likely Unicode is the culprit here, but it is contra
> intuitive.

Unicode has nothing to do with this.  The difference between אַ and Á
is that the former is always 2 characters, while the latter is usually
only one.  That's why sort-lines produces what looks like correct
results with Hebrew.  To see the problem there, you need to sort אבא
with אַבָא and אתבשא, for example.  Or something similar.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07  8:23 ` Teemu Likonen
@ 2016-07-07 15:23   ` Eli Zaretskii
  2016-07-08  4:17     ` Teemu Likonen
  0 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2016-07-07 15:23 UTC (permalink / raw)
  To: Teemu Likonen; +Cc: emacs-devel

> From: Teemu Likonen <tlikonen@iki.fi>
> Date: Thu, 07 Jul 2016 11:23:30 +0300
> 
> If we are allowed to step outside Emacs filtering lines through (GNU)
> /usr/bin/sort should do what you want.

Only if the locale outside Emacs is the one you want to use for
collation.  Emacs is a multilingual environment, so it supports
multiple collating locales, independently of the system one.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07 15:20     ` Eli Zaretskii
@ 2016-07-07 16:13       ` Uwe Brauer
  2016-07-07 16:35         ` Eli Zaretskii
  0 siblings, 1 reply; 32+ messages in thread
From: Uwe Brauer @ 2016-07-07 16:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Uwe Brauer, emacs-devel







   > Unicode has nothing to do with this.  The difference between אַ and Á
   > is that the former is always 2 characters, while the latter is usually
   > only one.  That's why sort-lines produces what looks like correct
   > results with Hebrew.  To see the problem there, you need to sort אבא
   > with אַבָא and אתבשא, for example.  Or something similar.

Ok, well than there is a simple solution at hand, run iso-unaccentuate
over the lines, sort them, and run iso-accentuate again (these functions
are now in an obsolete package, which proves to be useful). I tried it
out is works nicely.

BTW why is Á considered as 1 but  אַ as two characters.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07 15:17       ` Eli Zaretskii
@ 2016-07-07 16:30         ` Michael Heerdegen
  2016-07-07 16:56           ` Eli Zaretskii
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Heerdegen @ 2016-07-07 16:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Uwe Brauer, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> How do you solve the problem of different argument lists in different
> predicates?  And how do we allow to specify such an alternative
> comparison in interactive usage?

Yes, both objections are a problem indeed.

> Also, if string-collate-lessp is the only additional possibility we
> can think of, why not change sort-subr to use it "when it's TRT", as
> we do currently when sort-subr's PREDICATE argument is nil?  Making
> sort-lines entirely open-ended like the suggestion says might be too
> much.

Yeah, I came to the same conclusion.  But when exactly is "when it's
TRT"?

Michael.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07 16:13       ` Uwe Brauer
@ 2016-07-07 16:35         ` Eli Zaretskii
  0 siblings, 0 replies; 32+ messages in thread
From: Eli Zaretskii @ 2016-07-07 16:35 UTC (permalink / raw)
  To: Uwe Brauer; +Cc: emacs-devel

> From: Uwe Brauer <oub@mat.ucm.es>
> Cc: Uwe Brauer <oub@mat.ucm.es>, emacs-devel@gnu.org
> Date: Thu, 07 Jul 2016 16:13:20 +0000
> 
> BTW why is Á considered as 1 but  אַ as two characters.

For historical reasons.  The European guys defined Á as a separate
character, while niqqud guys didn't do the same for אַ (and actually
doing so in any abjad-type writing system, if you know what that is,
is an anathema).



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07 16:30         ` Michael Heerdegen
@ 2016-07-07 16:56           ` Eli Zaretskii
  2016-07-07 17:32             ` Michael Heerdegen
  0 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2016-07-07 16:56 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: oub, emacs-devel

> From: Michael Heerdegen <michael_heerdegen@web.de>
> Cc: Uwe Brauer <oub@mat.ucm.es>,  emacs-devel@gnu.org
> Date: Thu, 07 Jul 2016 18:30:57 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > How do you solve the problem of different argument lists in different
> > predicates?  And how do we allow to specify such an alternative
> > comparison in interactive usage?
> 
> Yes, both objections are a problem indeed.
> 
> > Also, if string-collate-lessp is the only additional possibility we
> > can think of, why not change sort-subr to use it "when it's TRT", as
> > we do currently when sort-subr's PREDICATE argument is nil?  Making
> > sort-lines entirely open-ended like the suggestion says might be too
> > much.
> 
> Yeah, I came to the same conclusion.  But when exactly is "when it's
> TRT"?

Maybe if we figure out how to allow this in interactive usage, we can
pass that bit down?  And if worse comes to worst, perhaps a separate
command is in order, which uses collation by default?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07 16:56           ` Eli Zaretskii
@ 2016-07-07 17:32             ` Michael Heerdegen
  2016-07-07 19:53               ` Eli Zaretskii
  2016-07-08 13:40               ` Richard Stallman
  0 siblings, 2 replies; 32+ messages in thread
From: Michael Heerdegen @ 2016-07-07 17:32 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Maybe if we figure out how to allow this in interactive usage, we can
> pass that bit down?  And if worse comes to worst, perhaps a separate
> command is in order, which uses collation by default?

Since `sort-lines' calls `sort-subr' with a fixed second and third
argument, I guess we can assume that the key type the predicate must
accept is always the same: (#1=(beg . end) . #1#).  It would be nice if
`sort-lines' as a function would at least accept an arbitrary predicate,
and we transform it to accept the correct key type and pass it to
`sort-subr', so that `string-collate-lessp' would work as PREDICATE
argument.

BTW, a relevant question is: Is `compare-buffer-substring' faster than
`buffer-substring'+`string<'?

I've no strong opinion about the command usage.  I would even find it
acceptable to leave it as is and force the user to call the thing as a
function with M-:, since a lambda as predicate might also be useful
quite often.

Michael.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07 17:32             ` Michael Heerdegen
@ 2016-07-07 19:53               ` Eli Zaretskii
  2016-07-07 22:55                 ` Michael Heerdegen
  2016-07-08 13:40               ` Richard Stallman
  1 sibling, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2016-07-07 19:53 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: emacs-devel

> From: Michael Heerdegen <michael_heerdegen@web.de>
> Date: Thu, 07 Jul 2016 19:32:17 +0200
> 
> BTW, a relevant question is: Is `compare-buffer-substring' faster than
> `buffer-substring'+`string<'?

Hard to say.  Measuring is the easiest way to answer that.

> I've no strong opinion about the command usage.  I would even find it
> acceptable to leave it as is and force the user to call the thing as a
> function with M-:, since a lambda as predicate might also be useful
> quite often.

My opinion is the opposite: I think it's more important to have a
command that could collate-order strings according to a user-specified
locale, than make sort-lines more flexible on the Lisp level.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07 19:53               ` Eli Zaretskii
@ 2016-07-07 22:55                 ` Michael Heerdegen
  2016-07-08 10:01                   ` Eli Zaretskii
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Heerdegen @ 2016-07-07 22:55 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 275 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

> > BTW, a relevant question is: Is `compare-buffer-substring' faster than
> > `buffer-substring'+`string<'?
>
> Hard to say.  Measuring is the easiest way to answer that.

Here is a first try.  The speed difference is negligible here.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Make-sort-lines-accept-a-predicate.patch --]
[-- Type: text/x-diff, Size: 1937 bytes --]

From 6229d19438d641d3ec81dec962984b3a9f5f72e7 Mon Sep 17 00:00:00 2001
From: Michael Heerdegen <michael_heerdegen@web.de>
Date: Fri, 8 Jul 2016 00:46:20 +0200
Subject: [PATCH] Make sort-lines accept a predicate

---
 lisp/sort.el | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lisp/sort.el b/lisp/sort.el
index 4d7311f..266b916 100644
--- a/lisp/sort.el
+++ b/lisp/sort.el
@@ -1,4 +1,4 @@
-;;; sort.el --- commands to sort text in an Emacs buffer
+;;; sort.el --- commands to sort text in an Emacs buffer  -*- lexical-binding : t -*-
 
 ;; Copyright (C) 1986-1987, 1994-1995, 2001-2016 Free Software
 ;; Foundation, Inc.
@@ -99,7 +99,7 @@ sort-subr
 	  (setq sort-lists
 		(sort sort-lists
 		      (cond (predicate
-			     `(lambda (a b) (,predicate (car a) (car b))))
+			     (lambda (a b) (funcall predicate (car a) (car b))))
 			    ((numberp (car (car sort-lists)))
 			     'car-less-than-car)
 			    ((consp (car (car sort-lists)))
@@ -197,7 +197,7 @@ sort-reorder-buffer
 	(delete-region max (1+ max))))))
 
 ;;;###autoload
-(defun sort-lines (reverse beg end)
+(defun sort-lines (reverse beg end &optional predicate)
   "Sort lines in region alphabetically; argument means descending order.
 Called from a program, there are three arguments:
 REVERSE (non-nil means reverse order), BEG and END (region to sort).
@@ -210,7 +210,13 @@ sort-lines
       (goto-char (point-min))
       (let ;; To make `end-of-line' and etc. to ignore fields.
 	  ((inhibit-field-text-motion t))
-	(sort-subr reverse 'forward-line 'end-of-line)))))
+	(sort-subr
+         reverse #'forward-line #'end-of-line nil nil
+         (and predicate
+              (lambda (a b)
+                (funcall predicate
+                         (buffer-substring (car a) (cdr a))
+                         (buffer-substring (car b) (cdr b))))))))))
 
 ;;;###autoload
 (defun sort-paragraphs (reverse beg end)
-- 
2.8.1


[-- Attachment #3: Type: text/plain, Size: 440 bytes --]


I had to convert the file to lexical binding to avoid a quoted lambda,
or else we had been forbidden to name the optional argument "predicate"
(variable name clash).

> My opinion is the opposite: I think it's more important to have a
> command that could collate-order strings according to a user-specified
> locale, than make sort-lines more flexible on the Lisp level.

What would you do?  Just create an additional command?


Michael.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07 15:23   ` Eli Zaretskii
@ 2016-07-08  4:17     ` Teemu Likonen
  2016-07-08  6:32       ` Eli Zaretskii
  0 siblings, 1 reply; 32+ messages in thread
From: Teemu Likonen @ 2016-07-08  4:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

Eli Zaretskii [2016-07-07 18:23:03+03] wrote:

>> If we are allowed to step outside Emacs filtering lines through (GNU)
>> /usr/bin/sort should do what you want.
>
> Only if the locale outside Emacs is the one you want to use for
> collation. Emacs is a multilingual environment, so it supports
> multiple collating locales, independently of the system one.

Yes, Emacs is nice, but system's locale is just some installed files and
environment variables. They can be changed too. Here's an example how to
sort lines using locale fi_FI.UTF-8 in GNU systems:

    M-x shell-command-on-region RET LC_COLLATE=fi_FI.UTF-8 sort RET

-- 
/// Teemu Likonen   - .-..   <https://github.com/tlikonen> //
// PGP: 4E10 55DC 84E9 DFF6 13D7 8557 719D 69D3 2453 9450 ///

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-08  4:17     ` Teemu Likonen
@ 2016-07-08  6:32       ` Eli Zaretskii
  2016-07-08  6:36         ` Eli Zaretskii
  0 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2016-07-08  6:32 UTC (permalink / raw)
  To: Teemu Likonen; +Cc: emacs-devel

> From: Teemu Likonen <tlikonen@iki.fi>
> Cc: emacs-devel@gnu.org
> Date: Fri, 08 Jul 2016 07:17:53 +0300
> 
> >> If we are allowed to step outside Emacs filtering lines through (GNU)
> >> /usr/bin/sort should do what you want.
> >
> > Only if the locale outside Emacs is the one you want to use for
> > collation. Emacs is a multilingual environment, so it supports
> > multiple collating locales, independently of the system one.
> 
> Yes, Emacs is nice, but system's locale is just some installed files and
> environment variables. They can be changed too. Here's an example how to
> sort lines using locale fi_FI.UTF-8 in GNU systems:
> 
>     M-x shell-command-on-region RET LC_COLLATE=fi_FI.UTF-8 sort RET

Why would one need that, given that Emacs includes the same
functionality built-in now?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-08  6:32       ` Eli Zaretskii
@ 2016-07-08  6:36         ` Eli Zaretskii
  2016-07-08  6:50           ` Teemu Likonen
  0 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2016-07-08  6:36 UTC (permalink / raw)
  To: tlikonen; +Cc: emacs-devel

> Date: Fri, 08 Jul 2016 09:32:54 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
> 
> >     M-x shell-command-on-region RET LC_COLLATE=fi_FI.UTF-8 sort RET
> 
> Why would one need that, given that Emacs includes the same
> functionality built-in now?

Also, this will only work on Posix systems, while the equivalent Emacs
functionality works on more (almost all?) of them.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-08  6:36         ` Eli Zaretskii
@ 2016-07-08  6:50           ` Teemu Likonen
  0 siblings, 0 replies; 32+ messages in thread
From: Teemu Likonen @ 2016-07-08  6:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 400 bytes --]

Eli Zaretskii [2016-07-08 09:36:39+03] wrote:

>> Why would one need that, given that Emacs includes the same
>> functionality built-in now?
>
> Also, this will only work on Posix systems, while the equivalent Emacs
> functionality works on more (almost all?) of them.

I'm just helping the original poster. That's all. Sorry.

Of course it's a good thing that Emacs can do locale-sensitive sorting.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07 22:55                 ` Michael Heerdegen
@ 2016-07-08 10:01                   ` Eli Zaretskii
  2016-07-14 21:10                     ` Michael Heerdegen
  0 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2016-07-08 10:01 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: emacs-devel

> From: Michael Heerdegen <michael_heerdegen@web.de>
> Date: Fri, 08 Jul 2016 00:55:26 +0200
> 
> > > BTW, a relevant question is: Is `compare-buffer-substring' faster than
> > > `buffer-substring'+`string<'?
> >
> > Hard to say.  Measuring is the easiest way to answer that.
> 
> Here is a first try.  The speed difference is negligible here.

That's not unexpected, given that they do almost the same things.

> -(defun sort-lines (reverse beg end)
> +(defun sort-lines (reverse beg end &optional predicate)
>    "Sort lines in region alphabetically; argument means descending order.
>  Called from a program, there are three arguments:
>  REVERSE (non-nil means reverse order), BEG and END (region to sort).
> @@ -210,7 +210,13 @@ sort-lines
>        (goto-char (point-min))
>        (let ;; To make `end-of-line' and etc. to ignore fields.
>  	  ((inhibit-field-text-motion t))
> -	(sort-subr reverse 'forward-line 'end-of-line)))))
> +	(sort-subr
> +         reverse #'forward-line #'end-of-line nil nil
> +         (and predicate
> +              (lambda (a b)
> +                (funcall predicate
> +                         (buffer-substring (car a) (cdr a))
> +                         (buffer-substring (car b) (cdr b))))))))))

First, I suggest buffer-substring-no-properties, it should be faster
(properties are not needed in the predicate, right?).

More importantly, I might be missing something, but how does this
support additional arguments to predicate, like those that
string-collate-lessp accepts?  Do you expect users to write their own
predicate that hides those arguments?

> > My opinion is the opposite: I think it's more important to have a
> > command that could collate-order strings according to a user-specified
> > locale, than make sort-lines more flexible on the Lisp level.
> 
> What would you do?  Just create an additional command?

If that's the best idea, then yes.

Thanks.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-07 17:32             ` Michael Heerdegen
  2016-07-07 19:53               ` Eli Zaretskii
@ 2016-07-08 13:40               ` Richard Stallman
  2016-07-08 14:36                 ` Michael Heerdegen
  1 sibling, 1 reply; 32+ messages in thread
From: Richard Stallman @ 2016-07-08 13:40 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Since `sort-lines' calls `sort-subr' with a fixed second and third
  > argument, I guess we can assume that the key type the predicate must
  > accept is always the same: (#1=(beg . end) . #1#).  It would be nice if
  > `sort-lines' as a function would at least accept an arbitrary predicate,
  > and we transform it to accept the correct key type and pass it to
  > `sort-subr', so that `string-collate-lessp' would work as PREDICATE
  > argument.

Could you please show a concrete example of the code you propose ought
to be accepted?  The only way that occurs to me, to transform an
arbitrary predicate, is to write a lambda expression around it which
will handle the arguments as they are actually passed.  That doesn't
require any change in Emacs.

That is a little inconvenient, and perhaps we could provide some
more convenient interface -- but what, precsely, would it be?


-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-08 13:40               ` Richard Stallman
@ 2016-07-08 14:36                 ` Michael Heerdegen
  2016-07-09 16:58                   ` Richard Stallman
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Heerdegen @ 2016-07-08 14:36 UTC (permalink / raw)
  To: Richard Stallman; +Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:

> Could you please show a concrete example of the code you propose ought
> to be accepted?  The only way that occurs to me, to transform an
> arbitrary predicate, is to write a lambda expression around it which
> will handle the arguments as they are actually passed.  That doesn't
> require any change in Emacs.

So far I only fixed a quoted lambda causing a bug, and made `sort-lines'
accept a PREDICATE argument.  This is useful because until now, the user
had to duplicate the function's code if he wanted to specify a
predicate for sorting lines.

The first step (implemented by the patch so far) allows to call
`sort-lines' like

  (sort-lines nil beg end #'string<)

I think it is more useful to make the predicate accept something
reasonable (strings) than some data structure used in the implementation
of `sort-subr'.  That's why the predicate passed to `sort-subr' needs to
be wrapped in a lambda.

The second step will be to implement a command named
`sort-lines-collate' or so that prompts for arguments (like the locale
to use) and calls `sort-lines' with the corresponding arguments.

Does that answer your question?


Michael.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-08 14:36                 ` Michael Heerdegen
@ 2016-07-09 16:58                   ` Richard Stallman
  2016-07-12 23:06                     ` John Wiegley
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Stallman @ 2016-07-09 16:58 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The first step (implemented by the patch so far) allows to call
  > `sort-lines' like

  >   (sort-lines nil beg end #'string<)

That makes sense to me.

  > The second step will be to implement a command named
  > `sort-lines-collate' or so that prompts for arguments (like the locale
  > to use) and calls `sort-lines' with the corresponding arguments.

That seems ok to me.


-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-09 16:58                   ` Richard Stallman
@ 2016-07-12 23:06                     ` John Wiegley
  0 siblings, 0 replies; 32+ messages in thread
From: John Wiegley @ 2016-07-12 23:06 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Michael Heerdegen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

>>>>> "RS" == Richard Stallman <rms@gnu.org> writes:

>> The first step (implemented by the patch so far) allows to call
>> `sort-lines' like

>> (sort-lines nil beg end #'string<)

RS> That makes sense to me.

>> The second step will be to implement a command named
>> `sort-lines-collate' or so that prompts for arguments (like the locale
>> to use) and calls `sort-lines' with the corresponding arguments.

RS> That seems ok to me.

Makes sense to me as well.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-08 10:01                   ` Eli Zaretskii
@ 2016-07-14 21:10                     ` Michael Heerdegen
  2016-07-14 21:14                       ` Clément Pit--Claudel
  2016-07-14 21:19                       ` Noam Postavsky
  0 siblings, 2 replies; 32+ messages in thread
From: Michael Heerdegen @ 2016-07-14 21:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> If that's the best idea, then yes.

Ok, so a new command - "sort-lines-collate" probably.  I'm a bit stuck.

First: Is there a natural way to read in a locale with completion?  Or
which source could I consult to get a list of all available locales
(from Elisp)?

I also wonder how the interactive spec should be - how the prefix arg
should be used.  We have now three parameters that `sort-lines-collate'
would control: REVERSE, LOCALE and IGNORE-CASE.  Maybe all prefix args
just turn on REVERSE (for compatibility with `sort-lines'), with the
exception of C-u C-u that prompts for values of all of them...?


Thanks,

Michael.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-14 21:10                     ` Michael Heerdegen
@ 2016-07-14 21:14                       ` Clément Pit--Claudel
  2016-07-14 21:19                       ` Noam Postavsky
  1 sibling, 0 replies; 32+ messages in thread
From: Clément Pit--Claudel @ 2016-07-14 21:14 UTC (permalink / raw)
  To: emacs-devel


[-- Attachment #1.1: Type: text/plain, Size: 426 bytes --]

On 2016-07-14 23:10, Michael Heerdegen wrote:
> I also wonder how the interactive spec should be - how the prefix arg
> should be used.  We have now three parameters that `sort-lines-collate'
> would control: REVERSE, LOCALE and IGNORE-CASE.  Maybe all prefix args
> just turn on REVERSE (for compatibility with `sort-lines'), with the
> exception of C-u C-u that prompts for values of all of them...?

Sounds good.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-14 21:10                     ` Michael Heerdegen
  2016-07-14 21:14                       ` Clément Pit--Claudel
@ 2016-07-14 21:19                       ` Noam Postavsky
  2016-07-14 21:26                         ` Michael Heerdegen
  1 sibling, 1 reply; 32+ messages in thread
From: Noam Postavsky @ 2016-07-14 21:19 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: Eli Zaretskii, Emacs developers

On Thu, Jul 14, 2016 at 5:10 PM, Michael Heerdegen
<michael_heerdegen@web.de> wrote:
> I also wonder how the interactive spec should be - how the prefix arg
> should be used.  We have now three parameters that `sort-lines-collate'
> would control: REVERSE, LOCALE and IGNORE-CASE.  Maybe all prefix args
> just turn on REVERSE (for compatibility with `sort-lines'), with the
> exception of C-u C-u that prompts for values of all of them...?

Negative args for reverse, C-u for ignore case, C-- C-u gives reversed
ignore-case?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-14 21:19                       ` Noam Postavsky
@ 2016-07-14 21:26                         ` Michael Heerdegen
  2016-07-14 21:57                           ` Noam Postavsky
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Heerdegen @ 2016-07-14 21:26 UTC (permalink / raw)
  To: emacs-devel

Noam Postavsky <npostavs@users.sourceforge.net> writes:

> Negative args for reverse, C-u for ignore case, C-- C-u gives reversed
> ignore-case?

Normally I hate such stuff, but this suggestion seems quite memorable.

The downside is that when people coming from `sort-lines' try to
toggle REVERSE with a positive prefix arg.

Michael.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sort-lines including non ASCII
  2016-07-14 21:26                         ` Michael Heerdegen
@ 2016-07-14 21:57                           ` Noam Postavsky
  0 siblings, 0 replies; 32+ messages in thread
From: Noam Postavsky @ 2016-07-14 21:57 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: Emacs developers

On Thu, Jul 14, 2016 at 5:26 PM, Michael Heerdegen
<michael_heerdegen@web.de> wrote:
> Noam Postavsky <npostavs@users.sourceforge.net> writes:
>
>> Negative args for reverse, C-u for ignore case, C-- C-u gives reversed
>> ignore-case?
>
> Normally I hate such stuff, but this suggestion seems quite memorable.
>
> The downside is that when people coming from `sort-lines' try to
> toggle REVERSE with a positive prefix arg.

Solution: change `sort-lines' too ;)

>
> Michael.
>
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-07-14 21:57 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-05 20:58 sort-lines including non ASCII Uwe Brauer
2016-07-05 21:57 ` Óscar Fuentes
2016-07-07  7:35   ` Uwe Brauer
2016-07-06 14:34 ` Eli Zaretskii
2016-07-06 14:52   ` Michael Heerdegen
2016-07-07  7:34     ` Uwe Brauer
2016-07-07 15:17       ` Eli Zaretskii
2016-07-07 16:30         ` Michael Heerdegen
2016-07-07 16:56           ` Eli Zaretskii
2016-07-07 17:32             ` Michael Heerdegen
2016-07-07 19:53               ` Eli Zaretskii
2016-07-07 22:55                 ` Michael Heerdegen
2016-07-08 10:01                   ` Eli Zaretskii
2016-07-14 21:10                     ` Michael Heerdegen
2016-07-14 21:14                       ` Clément Pit--Claudel
2016-07-14 21:19                       ` Noam Postavsky
2016-07-14 21:26                         ` Michael Heerdegen
2016-07-14 21:57                           ` Noam Postavsky
2016-07-08 13:40               ` Richard Stallman
2016-07-08 14:36                 ` Michael Heerdegen
2016-07-09 16:58                   ` Richard Stallman
2016-07-12 23:06                     ` John Wiegley
2016-07-07  7:41   ` Uwe Brauer
2016-07-07 15:20     ` Eli Zaretskii
2016-07-07 16:13       ` Uwe Brauer
2016-07-07 16:35         ` Eli Zaretskii
2016-07-07  8:23 ` Teemu Likonen
2016-07-07 15:23   ` Eli Zaretskii
2016-07-08  4:17     ` Teemu Likonen
2016-07-08  6:32       ` Eli Zaretskii
2016-07-08  6:36         ` Eli Zaretskii
2016-07-08  6:50           ` Teemu Likonen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).