From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Michael Albinus Newsgroups: gmane.emacs.bugs Subject: bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function? Date: Sat, 23 Aug 2014 18:42:44 +0200 Message-ID: <87a96vuph7.fsf@gmx.de> References: <87ha2f5gp8.fsf@web.de> <87tx6c7f5v.fsf@web.de> <8338dw5zrf.fsf@gnu.org> <87lhro7dp4.fsf@web.de> <83zjg44jzd.fsf@gnu.org> <87wqb8mqqv.fsf@web.de> <83y4vo4fbr.fsf@gnu.org> <87silwmo8h.fsf@web.de> <83wqb84e7l.fsf@gnu.org> <87iomsgsqg.fsf@gmx.de> <83tx6c44x7.fsf@gnu.org> <87egxggigj.fsf@gmx.de> <877g28w19r.fsf@gmx.de> <83sikvcbqr.fsf@gnu.org> <83r40fc876.fsf@gnu.org> <87wqa7uf7w.fsf@gmx.de> <83oavjc5jj.fsf@gnu.org> <87y4uixleg.fsf@gmx.de> <83sikpc3cd.fsf@gnu.org> <87ha147gd5.fsf@gmx.de> <83a96vmv80.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1408812267 25554 80.91.229.3 (23 Aug 2014 16:44:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 23 Aug 2014 16:44:27 +0000 (UTC) Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Aug 23 18:44:20 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XLEQM-0001Vx-CE for geb-bug-gnu-emacs@m.gmane.org; Sat, 23 Aug 2014 18:44:18 +0200 Original-Received: from localhost ([::1]:41770 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XLEQL-0000Co-Uf for geb-bug-gnu-emacs@m.gmane.org; Sat, 23 Aug 2014 12:44:17 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52588) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XLEQD-0000BN-8J for bug-gnu-emacs@gnu.org; Sat, 23 Aug 2014 12:44:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XLEQ7-0005VA-9j for bug-gnu-emacs@gnu.org; Sat, 23 Aug 2014 12:44:09 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:43384) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XLEQ7-0005V4-6I for bug-gnu-emacs@gnu.org; Sat, 23 Aug 2014 12:44:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1XLEQ6-00030W-JO for bug-gnu-emacs@gnu.org; Sat, 23 Aug 2014 12:44:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Michael Albinus Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 23 Aug 2014 16:44:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 18051 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 18051-submit@debbugs.gnu.org id=B18051.140881218911484 (code B ref 18051); Sat, 23 Aug 2014 16:44:02 +0000 Original-Received: (at 18051) by debbugs.gnu.org; 23 Aug 2014 16:43:09 +0000 Original-Received: from localhost ([127.0.0.1]:50327 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XLEPD-0002z9-Kx for submit@debbugs.gnu.org; Sat, 23 Aug 2014 12:43:08 -0400 Original-Received: from mout.gmx.net ([212.227.15.18]:57590) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XLEP8-0002yb-6J for 18051@debbugs.gnu.org; Sat, 23 Aug 2014 12:43:03 -0400 Original-Received: from detlef.gmx.de ([87.146.34.141]) by mail.gmx.com (mrgmx003) with ESMTPSA (Nemesis) id 0LmKag-1WmRod0z6Q-00Ztbu; Sat, 23 Aug 2014 18:42:54 +0200 In-Reply-To: <83a96vmv80.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 23 Aug 2014 12:05:51 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) X-Provags-ID: V03:K0:LX0bv4qMGKO3naaBFGF6HbOxfzMz+iykYjV9Skcb7LyQs373v8y T0SkVvtlQAIx+P2cBKtmkcvri4sYdm3v13xh0QDUw/Cgjm5evMRgMIuEgzh0lLrWe/JgTRF BH0TEQxcGPSEJm69lncCgiTW5URJjS2AcFn5xZ+VVsYPugaF4y0taqsuj6/Fkc7/M1PfZD9 McAUsvb2AOzGS5pivn8OA== X-UI-Out-Filterresults: notjunk:1; X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:92624 Archived-At: --=-=-= Content-Type: text/plain Eli Zaretskii writes: > I think everything in str_collate starting with the "Convert byte > stream to code pointers." comment (btw, I guess you meant "code > points" here) should be in a separate function, and the best place for > that function is sysdep.c. At least on MS-Windows, both the part that > converts a Lisp string into wchar_t array, and the part that performs > a locale-sensitive string comparison, will be implemented differently. Well, I've moved (most of) str_collate to sysdep.c. > Thanks. (You didn't attach the new patch.) Oops. Appended this time. > Btw, I wonder whether we should have a way to pass the locale string > explicitly, instead of relying on $LC_COLLATE. We could add an optional argument to string-collate-*. But this would break signature equivalence with string-lessp and string-equal, respectively. Or we could introduce a global var, which shall be let-bound to the locale string. >> I have added also configure checks HAVE_NEWLOCALE, HAVE_USELOCALE and >> HAVE_FREELOCALE for the respective glibc functions. I don't know whether >> it is overengineering, and whether I could simply apply the existing >> HAVE_SETLOCALE check. I believe all these functions do exist in parallel >> in locale.h, don't they? > > I'll defer to glibc experts on that. My knowledge of 'newlocale' > facilities is limited to what I saw in Guile's i18n.c module. According to the manpages, setlocale is conforming to "C89, C99, POSIX.1-2001". {new,use,free}locale are conforming to "POSIX.1-2008". So we must check for HAVE_USELOCALE, indeed. Checks for HAVE_NEWLOCALE and HAVE_FREELOCALE are not necessary, the functions exist in parallel to uselocale (introduced in glibc 2.3). This raises the question, whether we shall use also my first setlocale approach in case of uselocale absence? Best regards, Michael. --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=collate-patch === modified file 'src/fns.c' --- src/fns.c 2014-08-02 15:56:18 +0000 +++ src/fns.c 2014-08-23 15:57:06 +0000 @@ -40,7 +40,7 @@ #include "xterm.h" #endif -Lisp_Object Qstring_lessp; +Lisp_Object Qstring_lessp, Qstring_collate_lessp, Qstring_collate_equalp; static Lisp_Object Qprovide, Qrequire; static Lisp_Object Qyes_or_no_p_history; Lisp_Object Qcursor_in_echo_area; @@ -343,6 +343,84 @@ } return i1 < SCHARS (s2) ? Qt : Qnil; } + +#ifdef __STDC_ISO_10646__ +/* Defined in sysdep.c. */ +extern ptrdiff_t str_collate (Lisp_Object, Lisp_Object); +#endif /* __STDC_ISO_10646__ */ + +DEFUN ("string-collate-lessp", Fstring_collate_lessp, Sstring_collate_lessp, 2, 2, 0, + doc: /* Return t if first arg string is less than second in collation order. + +Case is significant. Symbols are also allowed; their print names are +used instead. + +This function obeys the conventions for collation order in your +locale settings. For example, punctuation and whitespace characters +are considered less significant for sorting. + +\(sort '\("11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp) + => \("11" "1 1" "1.1" "12" "1 2" "1.2") + +If your system does not support a locale environment, this function +behaves like `string-lessp'. + +If the environment variable \"LC_COLLATE\" is set in `process-environment', +it overrides the setting of your current locale. */) + (Lisp_Object s1, Lisp_Object s2) +{ +#ifdef __STDC_ISO_10646__ + /* Check parameters. */ + if (SYMBOLP (s1)) + s1 = SYMBOL_NAME (s1); + if (SYMBOLP (s2)) + s2 = SYMBOL_NAME (s2); + CHECK_STRING (s1); + CHECK_STRING (s2); + + return (str_collate (s1, s2) < 0) ? Qt : Qnil; + +#else + return Fstring_lessp (s1, s2); +#endif /* __STDC_ISO_10646__ */ +} + +DEFUN ("string-collate-equalp", Fstring_collate_equalp, Sstring_collate_equalp, 2, 2, 0, + doc: /* Return t if two strings have identical contents. + +Case is significant. Symbols are also allowed; their print names are +used instead. + +This function obeys the conventions for collation order in your locale +settings. For example, characters with different coding points but +the same meaning are considered as equal, like different grave accent +unicode characters. + +\(string-collate-equalp \(string ?\\uFF40) \(string ?\\u1FEF)) + => t + +If your system does not support a locale environment, this function +behaves like `string-equal'. + +If the environment variable \"LC_COLLATE\" is set in `process-environment', +it overrides the setting of your current locale. */) + (Lisp_Object s1, Lisp_Object s2) +{ +#ifdef __STDC_ISO_10646__ + /* Check parameters. */ + if (SYMBOLP (s1)) + s1 = SYMBOL_NAME (s1); + if (SYMBOLP (s2)) + s2 = SYMBOL_NAME (s2); + CHECK_STRING (s1); + CHECK_STRING (s2); + + return (str_collate (s1, s2) == 0) ? Qt : Qnil; + +#else + return Fstring_equal (s1, s2); +#endif /* __STDC_ISO_10646__ */ +} static Lisp_Object concat (ptrdiff_t nargs, Lisp_Object *args, enum Lisp_Type target_type, bool last_special); @@ -4919,6 +4997,8 @@ defsubr (&Sdefine_hash_table_test); DEFSYM (Qstring_lessp, "string-lessp"); + DEFSYM (Qstring_collate_lessp, "string-collate-lessp"); + DEFSYM (Qstring_collate_equalp, "string-collate-equalp"); DEFSYM (Qprovide, "provide"); DEFSYM (Qrequire, "require"); DEFSYM (Qyes_or_no_p_history, "yes-or-no-p-history"); @@ -4972,6 +5052,8 @@ defsubr (&Sstring_equal); defsubr (&Scompare_strings); defsubr (&Sstring_lessp); + defsubr (&Sstring_collate_lessp); + defsubr (&Sstring_collate_equalp); defsubr (&Sappend); defsubr (&Sconcat); defsubr (&Svconcat); === modified file 'src/sysdep.c' --- src/sysdep.c 2014-07-14 19:23:18 +0000 +++ src/sysdep.c 2014-08-23 16:36:39 +0000 @@ -3513,3 +3513,63 @@ } #endif /* !defined (WINDOWSNT) */ + +/* Wide character string collation. */ + +#ifdef __STDC_ISO_10646__ +#include + +#ifdef HAVE_USELOCALE +#include +#endif /* HAVE_USELOCALE */ + +ptrdiff_t +str_collate (Lisp_Object s1, Lisp_Object s2) +{ + register ptrdiff_t res, len, i, i_byte; + wchar_t *p1, *p2; +#ifdef HAVE_USELOCALE + Lisp_Object lc_collate; + locale_t loc = (locale_t) 0, oldloc = (locale_t) 0; +#endif /* HAVE_USELOCALE */ + + USE_SAFE_ALLOCA; + + /* Convert byte stream to code points. */ + len = SCHARS (s1); i = i_byte = 0; + p1 = (wchar_t *) SAFE_ALLOCA ((len+1) * (sizeof *p1)); + while (i < len) + FETCH_STRING_CHAR_ADVANCE (*(p1+i-1), s1, i, i_byte); + *(p1+len) = 0; + + len = SCHARS (s2); i = i_byte = 0; + p2 = (wchar_t *) SAFE_ALLOCA ((len+1) * (sizeof *p2)); + while (i < len) + FETCH_STRING_CHAR_ADVANCE (*(p2+i-1), s2, i, i_byte); + *(p2+len) = 0; + +#ifdef HAVE_USELOCALE + /* Create a new locale object, and set it. */ + lc_collate = + Fgetenv_internal (build_string ("LC_COLLATE"), Vprocess_environment); + + if (STRINGP (lc_collate) + && (loc = newlocale (LC_COLLATE_MASK, SSDATA (lc_collate), (locale_t) 0))) + oldloc = uselocale (loc); +#endif /* HAVE_USELOCALE */ + + res = wcscoll (p1, p2); + +#ifdef HAVE_USELOCALE + /* Free the locale object, and reset. */ + if (loc) + freelocale (loc); + if (oldloc) + uselocale (oldloc); +#endif /* HAVE_USELOCALE */ + + /* Return result. */ + SAFE_FREE (); + return res; +} +#endif /* __STDC_ISO_10646__ */ --=-=-=--