unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Collation tests in fns-tests.el
@ 2015-10-30 17:51 Ken Brown
  2015-10-30 20:28 ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Ken Brown @ 2015-10-30 17:51 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Emacs

Hi Michael,

I'm curious why you put the following test in fns-tests.el:

;; Punctuation and whitespace characters are not taken into account
   ;; for collation in other locales.
   (should
    (equal
     (sort '("11" "12" "1 1" "1 2" "1.1" "1.2")
	  (lambda (a b)
	    (let ((w32-collate-ignore-punctuation t))
	      (string-collate-lessp
	       a b (if (eq system-type 'windows-nt) "enu_USA" "en_US.UTF-8")))))
     '("11" "1 1" "1.1" "12" "1 2" "1.2")))

This suggests that punctuation and whitespace should definitely not be 
taken into account in non-POSIX locales.  But the docstring of 'sort' is 
much less definitive:

"This function obeys the conventions for collation order in your locale 
settings.  For example, punctuation and whitespace characters *might* be 
considered less significant for sorting."  [My emphasis.]

Is there some place where emacs relies on punctuation and whitespace 
being ignored?  That certainly isn't the case on all supported systems, 
nor is it mandated by POSIX.

Ken

P.S. My question is motivated by the fact that punctuation and 
whitespace are not ignored on Cygwin in non-POSIX locales, and it does 
not seem to be easy to make this happen.  If you're interested in the 
gory details, start here:

   https://www.cygwin.com/ml/cygwin/2015-10/msg00516.html



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Collation tests in fns-tests.el
  2015-10-30 17:51 Collation tests in fns-tests.el Ken Brown
@ 2015-10-30 20:28 ` Eli Zaretskii
  2015-10-30 20:40   ` Eli Zaretskii
  2015-10-30 21:10   ` Ken Brown
  0 siblings, 2 replies; 9+ messages in thread
From: Eli Zaretskii @ 2015-10-30 20:28 UTC (permalink / raw)
  To: Ken Brown; +Cc: michael.albinus, emacs-devel

> From: Ken Brown <kbrown@cornell.edu>
> Date: Fri, 30 Oct 2015 13:51:45 -0400
> Cc: Emacs <emacs-devel@gnu.org>
> 
> I'm curious why you put the following test in fns-tests.el:
> 
> ;; Punctuation and whitespace characters are not taken into account
>    ;; for collation in other locales.
>    (should
>     (equal
>      (sort '("11" "12" "1 1" "1 2" "1.1" "1.2")
> 	  (lambda (a b)
> 	    (let ((w32-collate-ignore-punctuation t))
> 	      (string-collate-lessp
> 	       a b (if (eq system-type 'windows-nt) "enu_USA" "en_US.UTF-8")))))
>      '("11" "1 1" "1.1" "12" "1 2" "1.2")))
> 
> This suggests that punctuation and whitespace should definitely not be 
> taken into account in non-POSIX locales.

They were found to be ignored in all the cases we tested until now.

> But the docstring of 'sort' is much less definitive:
> 
> "This function obeys the conventions for collation order in your locale 
> settings.  For example, punctuation and whitespace characters *might* be 
> considered less significant for sorting."  [My emphasis.]
> 
> Is there some place where emacs relies on punctuation and whitespace 
> being ignored?

Listing of files generally ignores them, as one example.  ls-lisp.el
relies on that to emulate what 'ls' the program does on Posix hosts.

> P.S. My question is motivated by the fact that punctuation and 
> whitespace are not ignored on Cygwin in non-POSIX locales, and it does 
> not seem to be easy to make this happen.  If you're interested in the 
> gory details, start here:
> 
>    https://www.cygwin.com/ml/cygwin/2015-10/msg00516.html

You already said in that discussion what I'd suggest ;-)

Since Cygwin tries to be compatible to GNU/Linux (i.e. glibc), it
should indeed use some non-zero flags in its implementation of string
collation-dependent comparison.  IMO, it makes no sense not to do
that, since users expect that to happen.  Then the above test will
work for it, and moreover, ls-lisp.el will, too.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Collation tests in fns-tests.el
  2015-10-30 20:28 ` Eli Zaretskii
@ 2015-10-30 20:40   ` Eli Zaretskii
  2015-10-30 21:10   ` Ken Brown
  1 sibling, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2015-10-30 20:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael.albinus, kbrown, emacs-devel

> Date: Fri, 30 Oct 2015 22:28:09 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: michael.albinus@gmx.de, emacs-devel@gnu.org
> 
> > From: Ken Brown <kbrown@cornell.edu>
> > Date: Fri, 30 Oct 2015 13:51:45 -0400
> > Cc: Emacs <emacs-devel@gnu.org>
> > 
> > I'm curious why you put the following test in fns-tests.el:
> > 
> > ;; Punctuation and whitespace characters are not taken into account
> >    ;; for collation in other locales.
> >    (should
> >     (equal
> >      (sort '("11" "12" "1 1" "1 2" "1.1" "1.2")
> > 	  (lambda (a b)
> > 	    (let ((w32-collate-ignore-punctuation t))
> > 	      (string-collate-lessp
> > 	       a b (if (eq system-type 'windows-nt) "enu_USA" "en_US.UTF-8")))))
> >      '("11" "1 1" "1.1" "12" "1 2" "1.2")))
> > 
> > This suggests that punctuation and whitespace should definitely not be 
> > taken into account in non-POSIX locales.
> 
> They were found to be ignored in all the cases we tested until now.

I should have added "in UTF-8 locales" here, sorry.

But since Cygwin nowadays behaves like a UTF-8 locale (AFAIK), this
doesn't change the conclusions regarding Cygwin behavior, I think.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Collation tests in fns-tests.el
  2015-10-30 20:28 ` Eli Zaretskii
  2015-10-30 20:40   ` Eli Zaretskii
@ 2015-10-30 21:10   ` Ken Brown
  2015-10-30 21:35     ` Eli Zaretskii
  1 sibling, 1 reply; 9+ messages in thread
From: Ken Brown @ 2015-10-30 21:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael.albinus, emacs-devel

On 10/30/2015 4:28 PM, Eli Zaretskii wrote:
> You already said in that discussion what I'd suggest ;-)
>
> Since Cygwin tries to be compatible to GNU/Linux (i.e. glibc), it
> should indeed use some non-zero flags in its implementation of string
> collation-dependent comparison.  IMO, it makes no sense not to do
> that, since users expect that to happen.

Yes, I agree completely.  The issue is implementation.  Simply using the 
NORM_IGNORESYMBOLS flag yields comparison functions that can return 0 on 
unequal strings.  Eric pointed out the problem with that; moreover, it 
seriously violates users' expectations and compatibility with glibc.  I 
thought I had a way around that, but Corinna pointed out in 
https://www.cygwin.com/ml/cygwin/2015-10/msg00559.html why my suggestion 
doesn't work.  At this point I'm out of ideas.

Ken




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Collation tests in fns-tests.el
  2015-10-30 21:10   ` Ken Brown
@ 2015-10-30 21:35     ` Eli Zaretskii
  2015-10-30 22:16       ` Ken Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2015-10-30 21:35 UTC (permalink / raw)
  To: Ken Brown; +Cc: michael.albinus, emacs-devel

> Cc: michael.albinus@gmx.de, emacs-devel@gnu.org
> From: Ken Brown <kbrown@cornell.edu>
> Date: Fri, 30 Oct 2015 17:10:48 -0400
> 
> On 10/30/2015 4:28 PM, Eli Zaretskii wrote:
> > You already said in that discussion what I'd suggest ;-)
> >
> > Since Cygwin tries to be compatible to GNU/Linux (i.e. glibc), it
> > should indeed use some non-zero flags in its implementation of string
> > collation-dependent comparison.  IMO, it makes no sense not to do
> > that, since users expect that to happen.
> 
> Yes, I agree completely.  The issue is implementation.  Simply using the 
> NORM_IGNORESYMBOLS flag yields comparison functions that can return 0 on 
> unequal strings.  Eric pointed out the problem with that; moreover, it 
> seriously violates users' expectations and compatibility with glibc.  I 
> thought I had a way around that, but Corinna pointed out in 
> https://www.cygwin.com/ml/cygwin/2015-10/msg00559.html why my suggestion 
> doesn't work.  At this point I'm out of ideas.

I don't see why that conclusion is the only reasonable one (the
"seriously violates users' expectation" part surprises me), but I
don't really consider myself an expert on this, certainly not in
Cygwin.

If Cygwin's implementation of strcoll cannot be fixed, then we should
treat this test on Cygwin as expected failure.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Collation tests in fns-tests.el
  2015-10-30 21:35     ` Eli Zaretskii
@ 2015-10-30 22:16       ` Ken Brown
  2015-10-31  8:49         ` Michael Albinus
  0 siblings, 1 reply; 9+ messages in thread
From: Ken Brown @ 2015-10-30 22:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael.albinus, emacs-devel

On 10/30/2015 5:35 PM, Eli Zaretskii wrote:
>> Cc: michael.albinus@gmx.de, emacs-devel@gnu.org
>> From: Ken Brown <kbrown@cornell.edu>
>> Date: Fri, 30 Oct 2015 17:10:48 -0400
>>
>> On 10/30/2015 4:28 PM, Eli Zaretskii wrote:
>>> You already said in that discussion what I'd suggest ;-)
>>>
>>> Since Cygwin tries to be compatible to GNU/Linux (i.e. glibc), it
>>> should indeed use some non-zero flags in its implementation of string
>>> collation-dependent comparison.  IMO, it makes no sense not to do
>>> that, since users expect that to happen.
>>
>> Yes, I agree completely.  The issue is implementation.  Simply using the
>> NORM_IGNORESYMBOLS flag yields comparison functions that can return 0 on
>> unequal strings.  Eric pointed out the problem with that; moreover, it
>> seriously violates users' expectations and compatibility with glibc.  I
>> thought I had a way around that, but Corinna pointed out in
>> https://www.cygwin.com/ml/cygwin/2015-10/msg00559.html why my suggestion
>> doesn't work.  At this point I'm out of ideas.
>
> I don't see why that conclusion is the only reasonable one (the
> "seriously violates users' expectation" part surprises me), but I
> don't really consider myself an expert on this, certainly not in
> Cygwin.
>
> If Cygwin's implementation of strcoll cannot be fixed, then we should
> treat this test on Cygwin as expected failure.

I'll probably do that, but I'll wait to see if Michael has anything to add.

Ken




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Collation tests in fns-tests.el
  2015-10-30 22:16       ` Ken Brown
@ 2015-10-31  8:49         ` Michael Albinus
  2015-10-31  9:07           ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Albinus @ 2015-10-31  8:49 UTC (permalink / raw)
  To: Ken Brown; +Cc: Eli Zaretskii, emacs-devel

Ken Brown <kbrown@cornell.edu> writes:

>> If Cygwin's implementation of strcoll cannot be fixed, then we should
>> treat this test on Cygwin as expected failure.
>
> I'll probably do that, but I'll wait to see if Michael has anything to add.

I have no other idea, sorry. Let's mark the test case as expected to
fail for Cygwin. And maybe a note about this behaviour could be added in
doc/lispref/strings.texi.

> Ken

Best regards, Michael.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Collation tests in fns-tests.el
  2015-10-31  8:49         ` Michael Albinus
@ 2015-10-31  9:07           ` Eli Zaretskii
  2015-11-02 16:25             ` Ken Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2015-10-31  9:07 UTC (permalink / raw)
  To: Michael Albinus; +Cc: kbrown, emacs-devel

> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org
> Date: Sat, 31 Oct 2015 09:49:30 +0100
> 
> Ken Brown <kbrown@cornell.edu> writes:
> 
> >> If Cygwin's implementation of strcoll cannot be fixed, then we should
> >> treat this test on Cygwin as expected failure.
> >
> > I'll probably do that, but I'll wait to see if Michael has anything to add.
> 
> I have no other idea, sorry. Let's mark the test case as expected to
> fail for Cygwin. And maybe a note about this behaviour could be added in
> doc/lispref/strings.texi.

Well, one other idea is for Cygwin's Emacs build to have its private
implementation of strcoll, similar to what w32_compare_strings does.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Collation tests in fns-tests.el
  2015-10-31  9:07           ` Eli Zaretskii
@ 2015-11-02 16:25             ` Ken Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Ken Brown @ 2015-11-02 16:25 UTC (permalink / raw)
  To: Eli Zaretskii, Michael Albinus; +Cc: emacs-devel

On 10/31/2015 5:07 AM, Eli Zaretskii wrote:
>> From: Michael Albinus <michael.albinus@gmx.de>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org
>> Date: Sat, 31 Oct 2015 09:49:30 +0100
>>
>> Ken Brown <kbrown@cornell.edu> writes:
>>
>>>> If Cygwin's implementation of strcoll cannot be fixed, then we should
>>>> treat this test on Cygwin as expected failure.
>>>
>>> I'll probably do that, but I'll wait to see if Michael has anything to add.
>>
>> I have no other idea, sorry. Let's mark the test case as expected to
>> fail for Cygwin. And maybe a note about this behaviour could be added in
>> doc/lispref/strings.texi.
>
> Well, one other idea is for Cygwin's Emacs build to have its private
> implementation of strcoll, similar to what w32_compare_strings does.

I  could do this in the future if necessary.  For now, I've just 
followed Michael's suggestion.

Ken




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-11-02 16:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-30 17:51 Collation tests in fns-tests.el Ken Brown
2015-10-30 20:28 ` Eli Zaretskii
2015-10-30 20:40   ` Eli Zaretskii
2015-10-30 21:10   ` Ken Brown
2015-10-30 21:35     ` Eli Zaretskii
2015-10-30 22:16       ` Ken Brown
2015-10-31  8:49         ` Michael Albinus
2015-10-31  9:07           ` Eli Zaretskii
2015-11-02 16:25             ` Ken Brown

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).