* Re: [Emacs-diffs] trunk r117726: Add string collation. @ 2014-08-25 4:05 Dmitry Antipov 2014-08-25 5:48 ` bug#18051: " Paul Eggert 0 siblings, 1 reply; 29+ messages in thread From: Dmitry Antipov @ 2014-08-25 4:05 UTC (permalink / raw) To: Michael Albinus; +Cc: Emacs development discussions As of r117732, --enable-gcc-warnings leads to: ../../trunk/src/sysdep.c:3527:1: error: no previous prototype for ‘str_collate’ [-Werror=missing-prototypes] str_collate (Lisp_Object s1, Lisp_Object s2) ^ cc1: all warnings being treated as errors Dmitry ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-25 4:05 [Emacs-diffs] trunk r117726: Add string collation Dmitry Antipov @ 2014-08-25 5:48 ` Paul Eggert 2014-08-25 6:19 ` Dmitry Antipov 0 siblings, 1 reply; 29+ messages in thread From: Paul Eggert @ 2014-08-25 5:48 UTC (permalink / raw) To: Dmitry Antipov; +Cc: Michael Albinus, 18051 Dmitry Antipov wrote: > ../../trunk/src/sysdep.c:3527:1: error: no previous prototype for > ‘str_collate’ [-Werror=missing-prototypes] I fixed that problem, along with some other minor glitches associated with the patch, in trunk bzr 117733. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-25 5:48 ` bug#18051: " Paul Eggert @ 2014-08-25 6:19 ` Dmitry Antipov 2014-08-25 6:41 ` Michael Albinus 0 siblings, 1 reply; 29+ messages in thread From: Dmitry Antipov @ 2014-08-25 6:19 UTC (permalink / raw) To: Paul Eggert, Michael Albinus; +Cc: 18051 On 08/25/2014 09:48 AM, Paul Eggert wrote: > I fixed that problem, along with some other minor glitches > associated with the patch, in trunk bzr 117733. Thanks. BTW, I think that collation functions with 3rd optional argument to specify locale settings will be a bit more versatile, e.g. (string-collate-lessp a b "es_ES.UTF-8") Dmitry ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-25 6:19 ` Dmitry Antipov @ 2014-08-25 6:41 ` Michael Albinus 2014-08-25 15:03 ` Eli Zaretskii 0 siblings, 1 reply; 29+ messages in thread From: Michael Albinus @ 2014-08-25 6:41 UTC (permalink / raw) To: Dmitry Antipov; +Cc: Paul Eggert, 18051 Dmitry Antipov <dmantipov@yandex.ru> writes: > BTW, I think that collation functions with 3rd optional argument > to specify locale settings will be a bit more versatile, e.g. > > (string-collate-lessp a b "es_ES.UTF-8") We discuss this already, see <http://lists.gnu.org/archive/html/bug-gnu-emacs/2014-08/msg00623.html> My major reservation to this approach is that it doesn't fit well using string-collate-lessp as predicate of sort. That's why I have proposed a global variable as alternative, which could be let-bounded. > Dmitry Best regards, Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-25 6:41 ` Michael Albinus @ 2014-08-25 15:03 ` Eli Zaretskii 2014-08-25 16:01 ` Eli Zaretskii 2014-08-27 11:24 ` Michael Albinus 0 siblings, 2 replies; 29+ messages in thread From: Eli Zaretskii @ 2014-08-25 15:03 UTC (permalink / raw) To: Michael Albinus; +Cc: dmantipov, 18051, eggert > From: Michael Albinus <michael.albinus@gmx.de> > Date: Mon, 25 Aug 2014 08:41:03 +0200 > Cc: Paul Eggert <eggert@cs.ucla.edu>, 18051@debbugs.gnu.org > > > BTW, I think that collation functions with 3rd optional argument > > to specify locale settings will be a bit more versatile, e.g. > > > > (string-collate-lessp a b "es_ES.UTF-8") > > We discuss this already, see > <http://lists.gnu.org/archive/html/bug-gnu-emacs/2014-08/msg00623.html> > > My major reservation to this approach is that it doesn't fit well using > string-collate-lessp as predicate of sort. That's why I have proposed a > global variable as alternative, which could be let-bounded. I think that binding a variable will indeed be cleaner. Using process-environment for that purpose should be reserved for the application level. Also, what if LC_COLLATE is not set in the environment, but 'setlocale' does return some value for it? shouldn't we use that? Here are a few more thoughts about related issues: 1. Why does str_collate return a ptrdiff_t value? AFAIK, wcscoll etc. return int data type, and of rather small values. 2. Should we signal an error if the input strings are not pure-ASCII or multibyte? Unibyte strings will at best cause incorrect results. And what about strings with invalid codepoints, e.g. those outside of the Unicode range, which can happen inside Lisp strings? 3. What about errors in wcscoll? The current code ignores them; however, the value returned by wcscoll in case of an error is not documented, so it could be random. Should we signal an error if errno gets set by wcscoll? 4. How to control the optional features of the collating sequence? I mean, for example, the fact that punctuation characters are ignored in the .UTF-8 locales on glibc hosts (or so it seems). At least on Windows, a somewhat higher degree of control is available, but it must be specified separately of the locale ID. E.g., the comparison function accepts flags to ignore punctuation and symbols, width differences, diacritics, etc. Should we have another variable, perhaps w32-specific, to request these features? Alternatively, we could use .UTF-8 on Windows to communicate that, although that sounds like a kludge. 5. The locale names on Windows are different from Posix: Windows uses 3-letter abbreviations of the country and the language, e.g. "fra_FRA" instead of the Posix "fr_FR". Do we want the locale string values used for let-binding the above-mentioned variable to be portable across systems? Then we'd need some conversion database on MS-Windows. 6. I think we will want case-insensitive version of this function. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-25 15:03 ` Eli Zaretskii @ 2014-08-25 16:01 ` Eli Zaretskii 2014-08-27 11:24 ` Michael Albinus 1 sibling, 0 replies; 29+ messages in thread From: Eli Zaretskii @ 2014-08-25 16:01 UTC (permalink / raw) To: michael.albinus; +Cc: dmantipov, 18051, eggert This is now implemented for MS-Windows as well. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-25 15:03 ` Eli Zaretskii 2014-08-25 16:01 ` Eli Zaretskii @ 2014-08-27 11:24 ` Michael Albinus 2014-08-27 15:40 ` Eli Zaretskii 2014-08-27 19:00 ` Paul Eggert 1 sibling, 2 replies; 29+ messages in thread From: Michael Albinus @ 2014-08-27 11:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: dmantipov, 18051, eggert Eli Zaretskii <eliz@gnu.org> writes: > Here are a few more thoughts about related issues: > > 1. Why does str_collate return a ptrdiff_t value? AFAIK, wcscoll > etc. return int data type, and of rather small values. Hm, yes. Both wcscoll and w32_compare_strings return int, so I've changed that for str_collate accordingly. > 2. Should we signal an error if the input strings are not pure-ASCII > or multibyte? Unibyte strings will at best cause incorrect > results. Maybe we shall convert the strings to multibyte, via string_to_multibyte()? If the string is already multibyte, it doesn't harm. > And what about strings with invalid codepoints, > e.g. those outside of the Unicode range, which can happen inside > Lisp strings? > 3. What about errors in wcscoll? The current code ignores them; > however, the value returned by wcscoll in case of an error is not > documented, so it could be random. Should we signal an error if > errno gets set by wcscoll? wcscoll sets EINVAL when the codepoint is out of range. I've added a check for this case, returning an error. (string-collate-equalp (string 1) (string ?\U0020FFFF)) => error: Non-Unicode character: 0x20ffff > 4. How to control the optional features of the collating sequence? I > mean, for example, the fact that punctuation characters are ignored > in the .UTF-8 locales on glibc hosts (or so it seems). At least on > Windows, a somewhat higher degree of control is available, but it > must be specified separately of the locale ID. E.g., the > comparison function accepts flags to ignore punctuation and > symbols, width differences, diacritics, etc. Should we have another > variable, perhaps w32-specific, to request these features? > Alternatively, we could use .UTF-8 on Windows to communicate that, > although that sounds like a kludge. In Posix systems, I'm not aware of configuring such optional features via glibc. The most granular selection is what you dou with LC_COLLATE. If we want to offer more granular settings, we would need to use a library like libicu (http://icu-project.org/). Could be done, but should be optional. > 5. The locale names on Windows are different from Posix: Windows uses > 3-letter abbreviations of the country and the language, > e.g. "fra_FRA" instead of the Posix "fr_FR". Do we want the locale > string values used for let-binding the above-mentioned variable to > be portable across systems? Then we'd need some conversion > database on MS-Windows. Here I'm a bit undecided. We could let it to the users to find the proper locale name, but this is inconvenient. OTOH it would be much work to install a mapping system, and we would need to maintain it. What if there would be a new "en_SC" (Scotland) locale? We would need to maintain such changes in Emacs forever ... > 6. I think we will want case-insensitive version of this function. That's also on my todo list. But I'm a little bit undecided whether we shall add it to string-collate-* functions, or whether there shall be further functions. Maybe we could use sort-fold-case for this as indication? Or is this too specific? Best regards, Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-27 11:24 ` Michael Albinus @ 2014-08-27 15:40 ` Eli Zaretskii 2014-08-27 18:12 ` Michael Albinus 2014-08-27 19:00 ` Paul Eggert 1 sibling, 1 reply; 29+ messages in thread From: Eli Zaretskii @ 2014-08-27 15:40 UTC (permalink / raw) To: Michael Albinus; +Cc: dmantipov, 18051, eggert > From: Michael Albinus <michael.albinus@gmx.de> > Cc: dmantipov@yandex.ru, eggert@cs.ucla.edu, 18051@debbugs.gnu.org > Date: Wed, 27 Aug 2014 13:24:48 +0200 > > > 2. Should we signal an error if the input strings are not pure-ASCII > > or multibyte? Unibyte strings will at best cause incorrect > > results. > > Maybe we shall convert the strings to multibyte, via string_to_multibyte()? That will not help. I say code that invokes these functions with unibyte non-ASCII strings has a bug that should be flagged. > > 5. The locale names on Windows are different from Posix: Windows uses > > 3-letter abbreviations of the country and the language, > > e.g. "fra_FRA" instead of the Posix "fr_FR". Do we want the locale > > string values used for let-binding the above-mentioned variable to > > be portable across systems? Then we'd need some conversion > > database on MS-Windows. > > Here I'm a bit undecided. We could let it to the users to find the > proper locale name, but this is inconvenient. OTOH it would be much work > to install a mapping system, and we would need to maintain it. What if > there would be a new "en_SC" (Scotland) locale? We would need to > maintain such changes in Emacs forever ... I think these interfaces will almost always be used with the current locale. So with that in mind, I think we can document this issue, and then safely leave this problem to the code that needs to use non-default locales. > > 6. I think we will want case-insensitive version of this function. > > That's also on my todo list. But I'm a little bit undecided whether we > shall add it to string-collate-* functions, or whether there shall be > further functions. > > Maybe we could use sort-fold-case for this as indication? Or is this too > specific? See my suggestion in the other message. Thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-27 15:40 ` Eli Zaretskii @ 2014-08-27 18:12 ` Michael Albinus 2014-08-27 18:26 ` Eli Zaretskii 0 siblings, 1 reply; 29+ messages in thread From: Michael Albinus @ 2014-08-27 18:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: dmantipov, 18051, eggert Eli Zaretskii <eliz@gnu.org> writes: >> > 2. Should we signal an error if the input strings are not pure-ASCII >> > or multibyte? Unibyte strings will at best cause incorrect >> > results. >> >> Maybe we shall convert the strings to multibyte, via string_to_multibyte()? > > That will not help. > > I say code that invokes these functions with unibyte non-ASCII strings > has a bug that should be flagged. Well, you have much more experience with unicode than I have. >> > 5. The locale names on Windows are different from Posix: Windows uses >> > 3-letter abbreviations of the country and the language, >> > e.g. "fra_FRA" instead of the Posix "fr_FR". Do we want the locale >> > string values used for let-binding the above-mentioned variable to >> > be portable across systems? Then we'd need some conversion >> > database on MS-Windows. >> >> Here I'm a bit undecided. We could let it to the users to find the >> proper locale name, but this is inconvenient. OTOH it would be much work >> to install a mapping system, and we would need to maintain it. What if >> there would be a new "en_SC" (Scotland) locale? We would need to >> maintain such changes in Emacs forever ... > > I think these interfaces will almost always be used with the current > locale. So with that in mind, I think we can document this issue, and > then safely leave this problem to the code that needs to use > non-default locales. I don't get this. What do you propose here? Set the locale specific to the system Emacs is running, or do you propose a mapping to something which is portable over system boundaries? > Thanks. Best regards, Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-27 18:12 ` Michael Albinus @ 2014-08-27 18:26 ` Eli Zaretskii 0 siblings, 0 replies; 29+ messages in thread From: Eli Zaretskii @ 2014-08-27 18:26 UTC (permalink / raw) To: Michael Albinus; +Cc: dmantipov, 18051, eggert > From: Michael Albinus <michael.albinus@gmx.de> > Cc: dmantipov@yandex.ru, eggert@cs.ucla.edu, 18051@debbugs.gnu.org > Date: Wed, 27 Aug 2014 20:12:12 +0200 > > >> > 5. The locale names on Windows are different from Posix: Windows uses > >> > 3-letter abbreviations of the country and the language, > >> > e.g. "fra_FRA" instead of the Posix "fr_FR". Do we want the locale > >> > string values used for let-binding the above-mentioned variable to > >> > be portable across systems? Then we'd need some conversion > >> > database on MS-Windows. > >> > >> Here I'm a bit undecided. We could let it to the users to find the > >> proper locale name, but this is inconvenient. OTOH it would be much work > >> to install a mapping system, and we would need to maintain it. What if > >> there would be a new "en_SC" (Scotland) locale? We would need to > >> maintain such changes in Emacs forever ... > > > > I think these interfaces will almost always be used with the current > > locale. So with that in mind, I think we can document this issue, and > > then safely leave this problem to the code that needs to use > > non-default locales. > > I don't get this. What do you propose here? Set the locale specific to > the system Emacs is running, or do you propose a mapping to something > which is portable over system boundaries? The former. IOW, the (rare, IMO) Lisp program that wants to override the default locale will have to figure out how to do that in a way that works on all the supported platforms. E.g., one way is (let ((locale (if (eq system-type 'windows-nt) "enu_USA" "en_US"))) ... ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-27 11:24 ` Michael Albinus 2014-08-27 15:40 ` Eli Zaretskii @ 2014-08-27 19:00 ` Paul Eggert 2014-08-27 19:08 ` Paul Eggert 1 sibling, 1 reply; 29+ messages in thread From: Paul Eggert @ 2014-08-27 19:00 UTC (permalink / raw) To: Michael Albinus, Eli Zaretskii; +Cc: dmantipov, 18051 I found the following issues and installed what I hope are fixes as trunk bzr 117751. First, the code should use wcscoll_t rather than uselocale, as uselocale modifies thread state and this is less robust; for example, it wasn't safe to call 'error' right after the first call to uselocale. Second, if the locale is invalid, string-collate-lessp should throw an error, the same way it throws an error when the strings are invalid. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-27 19:00 ` Paul Eggert @ 2014-08-27 19:08 ` Paul Eggert 2014-08-27 19:54 ` Eli Zaretskii 0 siblings, 1 reply; 29+ messages in thread From: Paul Eggert @ 2014-08-27 19:08 UTC (permalink / raw) To: Michael Albinus, Eli Zaretskii; +Cc: dmantipov, 18051 A couple more things. First, the current algorithm looks only at LC_COLLATE, but the usual approach is to default LC_COLLATE to LANG if LC_COLLATE isn't set, and to have LC_ALL override LC_COLLATE. Shouldn't Emacs take a similar approach, for compatibility? More generally, it strikes me that string-collate-lessp will be quite slow due to the overhead of looking up the locale environment string and creating and destroying a locale for each string comparison. Instead, shouldn't Emacs should have a locale object that the Emacs Lisp programmer can create, an object that encapsulates the low level locale_t object, and which can be passed as an optional argument to string-collate-lessp? That way, string-collate-p would never have to inspect the environment itself, or to create or destroy a locale. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-27 19:08 ` Paul Eggert @ 2014-08-27 19:54 ` Eli Zaretskii 2014-08-27 21:27 ` Paul Eggert 0 siblings, 1 reply; 29+ messages in thread From: Eli Zaretskii @ 2014-08-27 19:54 UTC (permalink / raw) To: Paul Eggert; +Cc: michael.albinus, dmantipov, 18051 > Date: Wed, 27 Aug 2014 12:08:52 -0700 > From: Paul Eggert <eggert@cs.ucla.edu> > CC: dmantipov@yandex.ru, 18051@debbugs.gnu.org > > First, the current algorithm looks only at LC_COLLATE, but the usual > approach is to default LC_COLLATE to LANG if LC_COLLATE isn't set, and > to have LC_ALL override LC_COLLATE. Shouldn't Emacs take a similar > approach, for compatibility? I think we agreed to have a variable that holds the non-default locale as a Lisp string. LANG and LC_COLLATE will then be used internally by newlocale and/or wcscoll_t, as users expect. I don't think it's appropriate for a primitive to take arguments from environment variables, certainly not those on process-environment. If some Lisp application would want to do that, let them. > More generally, it strikes me that string-collate-lessp will be quite > slow due to the overhead of looking up the locale environment string and > creating and destroying a locale for each string comparison. The lookup will no longer be relevant, when we switch to a variable. As for creating and destroying the locale, I guess you are right. > Instead, shouldn't Emacs should have a locale object that the Emacs > Lisp programmer can create, an object that encapsulates the low > level locale_t object, and which can be passed as an optional > argument to string-collate-lessp? That's what Guile does. But it will complicate using these functions in sorting routines. Perhaps binding a variable to the object will do. Alternatively, a simple one-slot cache internal to string_collate will probably remove most of the overhead. (You will see that w32_compare_strings already employs a similar cache.) ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-27 19:54 ` Eli Zaretskii @ 2014-08-27 21:27 ` Paul Eggert 2014-08-27 21:37 ` Michael Albinus 0 siblings, 1 reply; 29+ messages in thread From: Paul Eggert @ 2014-08-27 21:27 UTC (permalink / raw) To: Eli Zaretskii; +Cc: michael.albinus, dmantipov, 18051 Eli Zaretskii wrote: > I think we agreed to have a variable that holds the non-default locale > as a Lisp string. Ah, sorry, missed that (it is a long thread...). Makes sense. I assume this is on someone's TODO list since it's not done that way now. > Perhaps binding a variable to the object will do. We could do both: i.e., give the comparison function an optional argument that defaults to the value of the bound variable. I'd think the value should be a locale object, though, not a string like "en_US". And perhaps the object should also record whether the comparison is case-sensitive, and other stuff like that. > Alternatively, a simple one-slot cache internal to string_collate will > probably remove most of the overhead. It would now, but it would also add another obstacle to adding multithreading capabilities, as the locking around the cache would inhibit scalability. So I'd rather avoid such a cache if it's easy. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-27 21:27 ` Paul Eggert @ 2014-08-27 21:37 ` Michael Albinus 2014-08-28 2:39 ` Eli Zaretskii 2014-08-29 8:59 ` martin rudalics 0 siblings, 2 replies; 29+ messages in thread From: Michael Albinus @ 2014-08-27 21:37 UTC (permalink / raw) To: Paul Eggert; +Cc: dmantipov, 18051 Paul Eggert <eggert@cs.ucla.edu> writes: > Ah, sorry, missed that (it is a long thread...). Makes sense. I > assume this is on someone's TODO list since it's not done that way > now. Eli, that means you or me :-) I do not want to interfere your work, but in case you are busy with other tasks, I could do. Pls let me know. >> Perhaps binding a variable to the object will do. > > We could do both: i.e., give the comparison function an optional > argument that defaults to the value of the bound variable. I'd think > the value should be a locale object, though, not a string like > "en_US". And perhaps the object should also record whether the > comparison is case-sensitive, and other stuff like that. Good idea, that would also make Glenn happy. (That's not a joke, I mean it seriously!) Best regards, Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-27 21:37 ` Michael Albinus @ 2014-08-28 2:39 ` Eli Zaretskii 2014-08-29 8:59 ` martin rudalics 1 sibling, 0 replies; 29+ messages in thread From: Eli Zaretskii @ 2014-08-28 2:39 UTC (permalink / raw) To: Michael Albinus; +Cc: eggert, 18051, dmantipov > From: Michael Albinus <michael.albinus@gmx.de> > Cc: Eli Zaretskii <eliz@gnu.org>, dmantipov@yandex.ru, 18051@debbugs.gnu.org > Date: Wed, 27 Aug 2014 23:37:35 +0200 > > Paul Eggert <eggert@cs.ucla.edu> writes: > > > Ah, sorry, missed that (it is a long thread...). Makes sense. I > > assume this is on someone's TODO list since it's not done that way > > now. > > Eli, that means you or me :-) > > I do not want to interfere your work, but in case you are busy with > other tasks, I could do. Pls let me know. Feel free, and thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-27 21:37 ` Michael Albinus 2014-08-28 2:39 ` Eli Zaretskii @ 2014-08-29 8:59 ` martin rudalics 2014-08-29 9:59 ` Michael Albinus 2014-08-29 10:06 ` Eli Zaretskii 1 sibling, 2 replies; 29+ messages in thread From: martin rudalics @ 2014-08-29 8:59 UTC (permalink / raw) To: Michael Albinus, Paul Eggert; +Cc: dmantipov, 18051 > Good idea, that would also make Glenn happy. (That's not a joke, I mean > it seriously!) It would make me happy as well. I have not yet started to convert my fairly insane sorting functions to the new ones because mine are generally based on case-insensitiveness. Also I'm not yet sure how the new predicates will relate to functions like `compare-strings' (which IIUC is needed until now to make sorting case-insensitive), `sort-lines', `sort-subr' and the like. I'd hope that all of these could profit from the new functions. In any case, many thanks to you and Eli for the work. martin ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-29 8:59 ` martin rudalics @ 2014-08-29 9:59 ` Michael Albinus 2014-08-29 17:21 ` martin rudalics 2014-08-29 10:06 ` Eli Zaretskii 1 sibling, 1 reply; 29+ messages in thread From: Michael Albinus @ 2014-08-29 9:59 UTC (permalink / raw) To: martin rudalics; +Cc: Paul Eggert, 18051, dmantipov martin rudalics <rudalics@gmx.at> writes: > I have not yet started to convert my > fairly insane sorting functions to the new ones because mine are > generally based on case-insensitiveness. I'm just working on this. `string-collate-lessp' will have the signature (string-collate-lessp S1 S2 &optional LOCALE IGNORE-CASE) > Also I'm not yet sure how the > new predicates will relate to functions like `compare-strings' (which > IIUC is needed until now to make sorting case-insensitive), Likely, there shall also be `collate-strings'. > `sort-lines', `sort-subr' and the like. I'd hope that all of these > could profit from the new functions. `sort-subr' has PREDICATE as argument, you could take `string-collate-lessp'. Maybe with some adaptions in `sort-subr', in order to use also LOCALE and IGNORE-CASE. `sort-lines' uses `sort-subr', without PREDIACATE. Might be also extended. > martin Best regards, Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-29 9:59 ` Michael Albinus @ 2014-08-29 17:21 ` martin rudalics 2014-08-29 17:56 ` Michael Albinus 0 siblings, 1 reply; 29+ messages in thread From: martin rudalics @ 2014-08-29 17:21 UTC (permalink / raw) To: Michael Albinus; +Cc: Paul Eggert, 18051, dmantipov > I'm just working on this. `string-collate-lessp' will have the signature > > (string-collate-lessp S1 S2 &optional LOCALE IGNORE-CASE) Fine. One additional question: Couldn't we also try to fix searching http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13041 with the new functions? martin ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-29 17:21 ` martin rudalics @ 2014-08-29 17:56 ` Michael Albinus 0 siblings, 0 replies; 29+ messages in thread From: Michael Albinus @ 2014-08-29 17:56 UTC (permalink / raw) To: martin rudalics; +Cc: Paul Eggert, 18051, dmantipov martin rudalics <rudalics@gmx.at> writes: > Fine. One additional question: Couldn't we also try to fix searching > > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13041 > > with the new functions? Don't know (yet). Pushed on my TODO. > martin Best regards, Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-29 8:59 ` martin rudalics 2014-08-29 9:59 ` Michael Albinus @ 2014-08-29 10:06 ` Eli Zaretskii 2014-08-29 18:01 ` Michael Albinus 1 sibling, 1 reply; 29+ messages in thread From: Eli Zaretskii @ 2014-08-29 10:06 UTC (permalink / raw) To: martin rudalics; +Cc: michael.albinus, eggert, 18051, dmantipov > Date: Fri, 29 Aug 2014 10:59:37 +0200 > From: martin rudalics <rudalics@gmx.at> > Cc: dmantipov@yandex.ru, 18051@debbugs.gnu.org > > > Good idea, that would also make Glenn happy. (That's not a joke, I mean > > it seriously!) > > It would make me happy as well. I have not yet started to convert my > fairly insane sorting functions to the new ones because mine are > generally based on case-insensitiveness. Also I'm not yet sure how the > new predicates will relate to functions like `compare-strings' (which > IIUC is needed until now to make sorting case-insensitive), > `sort-lines', `sort-subr' and the like. I'd hope that all of these > could profit from the new functions. Case-insensitive versions of the new functions are yet to be written; stay tuned. For now, on MS-Windows, you can have that if you use the NORM_IGNORECASE flag as the second argument of CompareStringW inside w32_compare_strings. For Posix, I guess we should run the 2 strings through towupper (or towupper_l, if it exists), and then compare the results with wcscoll/wcscoll_l. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-29 10:06 ` Eli Zaretskii @ 2014-08-29 18:01 ` Michael Albinus 2014-08-29 19:31 ` Eli Zaretskii 0 siblings, 1 reply; 29+ messages in thread From: Michael Albinus @ 2014-08-29 18:01 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, dmantipov, 18051 Eli Zaretskii <eliz@gnu.org> writes: > Case-insensitive versions of the new functions are yet to be written; > stay tuned. I've just committed a patch to the trunk which adds optional arguments LOCALE and IGNORE-CASE to the collation functions. > For now, on MS-Windows, you can have that if you use the > NORM_IGNORECASE flag as the second argument of CompareStringW inside > w32_compare_strings. As usual, this I haven't implemented. I would let it to you, Eli. > For Posix, I guess we should run the 2 strings through towupper (or > towupper_l, if it exists), and then compare the results with > wcscoll/wcscoll_l. Yes. Best regards, Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-29 18:01 ` Michael Albinus @ 2014-08-29 19:31 ` Eli Zaretskii 2014-08-29 21:01 ` Michael Albinus 0 siblings, 1 reply; 29+ messages in thread From: Eli Zaretskii @ 2014-08-29 19:31 UTC (permalink / raw) To: Michael Albinus; +Cc: eggert, dmantipov, 18051 > From: Michael Albinus <michael.albinus@gmx.de> > Cc: martin rudalics <rudalics@gmx.at>, eggert@cs.ucla.edu, dmantipov@yandex.ru, 18051@debbugs.gnu.org > Date: Fri, 29 Aug 2014 20:01:50 +0200 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Case-insensitive versions of the new functions are yet to be written; > > stay tuned. > > I've just committed a patch to the trunk which adds optional arguments > LOCALE and IGNORE-CASE to the collation functions. Thanks. > > For now, on MS-Windows, you can have that if you use the > > NORM_IGNORECASE flag as the second argument of CompareStringW inside > > w32_compare_strings. > > As usual, this I haven't implemented. I would let it to you, Eli. As usual, done. I needed to introduce a w32-specific variable, which needs to be bound to a non-nil value in order to have UTS#10 (a.k.a. "Unicode Collation Algorithm", or "UCA") compliant collation order, which ignores punctuation differences, on MS-Windows. This is because Windows doesn't support UTF-8 as a codeset in its locales (and Windows locales have different names anyway). This means that if a Lisp program needs to make sure it gets a UCA-compliant collation order on all platforms, it will have to pass a "xx_YY.UTF-8" locale on Posix platforms, and on Windows bind that w32-specific variable to a non-nil value. Btw, I think we will need a lot of verbiage in the ELisp manual to make sure people understand what to expect from these functions. In particular, the results are extremely locale- and platform-specific, so one cannot expect exactly the same results in all cases, only something similar. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-29 19:31 ` Eli Zaretskii @ 2014-08-29 21:01 ` Michael Albinus 2014-09-01 15:20 ` Eli Zaretskii 0 siblings, 1 reply; 29+ messages in thread From: Michael Albinus @ 2014-08-29 21:01 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, dmantipov, 18051 Eli Zaretskii <eliz@gnu.org> writes: > Btw, I think we will need a lot of verbiage in the ELisp manual to > make sure people understand what to expect from these functions. In > particular, the results are extremely locale- and platform-specific, > so one cannot expect exactly the same results in all cases, only > something similar. Oh yes. I will start on this next days (as usual, I'm short in time) as well as adding test cases to fns-tests.el. Best regards, Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-08-29 21:01 ` Michael Albinus @ 2014-09-01 15:20 ` Eli Zaretskii 2014-09-01 20:46 ` Michael Heerdegen 0 siblings, 1 reply; 29+ messages in thread From: Eli Zaretskii @ 2014-09-01 15:20 UTC (permalink / raw) To: Michael Albinus, michael_heerdegen; +Cc: 18051 In trunk revision 117797, ls-lisp acquired the ability to sort file names using the new string-collate-lessp function, thus producing results that should be similar, if not identical, to what GNU ls does, at least on GNU/Linux in the same locale. (On MS-Windows, the behavior will be similar; it cannot be identical because Windows doesn't implement UTS#10 (a.k.a. "UCA", the Unicode Collation Algorithm) to the letter in its locale-dependent collation routines.) Trunk revision 117798 implements the GNU ls -v switch in ls-lisp. Michael (Heerdegen), as you were the one who requested these features, please give them some testing and see if you like them. Many thanks to Michael Albinus for all the hard work on the infrastructure that made this possible. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-09-01 15:20 ` Eli Zaretskii @ 2014-09-01 20:46 ` Michael Heerdegen 2014-10-17 20:26 ` Michael Heerdegen 0 siblings, 1 reply; 29+ messages in thread From: Michael Heerdegen @ 2014-09-01 20:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Michael Albinus, 18051 Eli Zaretskii <eliz@gnu.org> writes: > Michael (Heerdegen), as you were the one who requested these features, > please give them some testing and see if you like them. I'll try and test ASAP. > Many thanks to Michael Albinus for all the hard work on the > infrastructure that made this possible. I have to thank both of you! Presumably it was not easy. Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-09-01 20:46 ` Michael Heerdegen @ 2014-10-17 20:26 ` Michael Heerdegen 2014-10-18 5:38 ` Eli Zaretskii 0 siblings, 1 reply; 29+ messages in thread From: Michael Heerdegen @ 2014-10-17 20:26 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 18051, Michael Albinus Hi Eli and Michael, > > Michael (Heerdegen), as you were the one who requested these features, > > please give them some testing and see if you like them. > > I'll try and test ASAP. I used that stuff for a while now, and I think everything worked as expected. If I remember correctly, I saw just one tiny inconsistency: with the new ls-lisp -v switch the sorting position of a backup file named foo~ was different from ls -v when also numbered backup files foo~n~ of the same file existed. Dunno if this is relevant, it's a corner case. For string collation and locales, I must say that I'm no expert at that field and don't really know what tests could be useful for testing. I can only say that everything seems to be ok with the locales I am using. Let me know when you think that I could nonetheless be of any help there. Thanks both of you again for your work, Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-10-17 20:26 ` Michael Heerdegen @ 2014-10-18 5:38 ` Eli Zaretskii 2014-10-18 14:27 ` Michael Heerdegen 0 siblings, 1 reply; 29+ messages in thread From: Eli Zaretskii @ 2014-10-18 5:38 UTC (permalink / raw) To: michael_heerdegen; +Cc: 18051-done, michael.albinus > From: Michael Heerdegen <michael_heerdegen@web.de> > Cc: 18051@debbugs.gnu.org, Michael Albinus <michael.albinus@gmx.de> > Date: Fri, 17 Oct 2014 22:26:32 +0200 > > If I remember correctly, I saw just one tiny inconsistency: with the new > ls-lisp -v switch the sorting position of a backup file named foo~ was > different from ls -v when also numbered backup files foo~n~ of the same > file existed. Is that on Windows or on Unix? On Windows, this is expected, as only an approximation to the Unicode Collation Algorithm is available there. On GNU/Linux, it would be strange, since 'ls' uses the same functions as Emacs now does in ls-lisp. > For string collation and locales, I must say that I'm no expert at that > field and don't really know what tests could be useful for testing. I > can only say that everything seems to be ok with the locales I am using. That's good enough for me, so I'm closing the bug. Thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#18051: [Emacs-diffs] trunk r117726: Add string collation. 2014-10-18 5:38 ` Eli Zaretskii @ 2014-10-18 14:27 ` Michael Heerdegen 0 siblings, 0 replies; 29+ messages in thread From: Michael Heerdegen @ 2014-10-18 14:27 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 18051-done, michael.albinus Eli Zaretskii <eliz@gnu.org> writes: > > with the new ls-lisp -v switch the sorting position of a backup file > > named foo~ was different from ls -v when also numbered backup files > > foo~n~ of the same file existed. > On GNU/Linux, it would be strange, since 'ls' uses the same functions > as Emacs now does in ls-lisp. Gnu/Linux. But I can't reproduce this anymore, it works as expected, probably I was mistaken. Thanks, Michael. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2014-10-18 14:27 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-08-25 4:05 [Emacs-diffs] trunk r117726: Add string collation Dmitry Antipov 2014-08-25 5:48 ` bug#18051: " Paul Eggert 2014-08-25 6:19 ` Dmitry Antipov 2014-08-25 6:41 ` Michael Albinus 2014-08-25 15:03 ` Eli Zaretskii 2014-08-25 16:01 ` Eli Zaretskii 2014-08-27 11:24 ` Michael Albinus 2014-08-27 15:40 ` Eli Zaretskii 2014-08-27 18:12 ` Michael Albinus 2014-08-27 18:26 ` Eli Zaretskii 2014-08-27 19:00 ` Paul Eggert 2014-08-27 19:08 ` Paul Eggert 2014-08-27 19:54 ` Eli Zaretskii 2014-08-27 21:27 ` Paul Eggert 2014-08-27 21:37 ` Michael Albinus 2014-08-28 2:39 ` Eli Zaretskii 2014-08-29 8:59 ` martin rudalics 2014-08-29 9:59 ` Michael Albinus 2014-08-29 17:21 ` martin rudalics 2014-08-29 17:56 ` Michael Albinus 2014-08-29 10:06 ` Eli Zaretskii 2014-08-29 18:01 ` Michael Albinus 2014-08-29 19:31 ` Eli Zaretskii 2014-08-29 21:01 ` Michael Albinus 2014-09-01 15:20 ` Eli Zaretskii 2014-09-01 20:46 ` Michael Heerdegen 2014-10-17 20:26 ` Michael Heerdegen 2014-10-18 5:38 ` Eli Zaretskii 2014-10-18 14:27 ` Michael Heerdegen
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.