all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Roland Winkler" <winkler@gnu.org>
To: martin rudalics <rudalics@gmx.at>
Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
Subject: Re: strip accents and sorting [was: BibTeX issues]
Date: Fri, 30 Aug 2019 11:27:33 -0500	[thread overview]
Message-ID: <20085.68375.750044.23913@gargle.gargle.HOWL> (raw)
In-Reply-To: <f87102db-6bc5-e1b6-f490-72554f15694c@gmx.at>

On Thu Aug 29 2019 martin rudalics wrote:
>  > But (string-lessp "ä-umlaut" "ö-combine") gives nil
> 
> But (string-collate-lessp "ä-umlaut" "ö-combine") gives t

...not for me, which is likely due to my locale LC_COLLATE=C

I could use instead, say, LC_COLLATE=en_US.utf8.  Then the above
call of string-collate-lessp yields t.  But this also implies case
folding and ignoring dots in directory listings, which is not what I
want.  In other words, these locales have too many features bundled
together.

Maybe these feature sets of different locales are documented
*somewhere* in a neat way, and there is a locale with a feature set
that does exactly what I want.  But to the best of my knowledge this
documentation resides outside emacs so that things get rather
complicated when this affects an emacs session in important or
possibly subtle ways.

> so it should be fairly easy to fix `sort-lines' and friends
> accordingly.

In that sense I am not sure I would like to see `sort-lines' and
friends be fixed "accordingly".  If at all, I'd vote for a user
option that likely I'd use to disable such things.

On the other hand, as Eli pointed out in his reply about accented
characters being represented via a single character as compared to
using combining characters

> The Unicode Standard mandates that they be handled identically,
> including in searching and sorting.  We don't yet implement that
> 100%, but see char-fold.el for a partial (and not very efficient)
> implementation during search.

So I would assume that the locale should not matter at all in the
context of unicode combining characters. (Or there should be a way
to control exactly this aspect of unicode combining characters with
no additional (mis)features bundled with it.)

I understand that it is a different matter how accented characters
are sorted relative to each other and also relative to un-accented
characters.  So it can make a lot of sense to have different locales
for that aspect.

Maybe I am missing something here.  (And I have not yet looked in
more detail at char-fold.el mentioned by Eli, which could be a
better way to go within the emacs world.)

Roland



  reply	other threads:[~2019-08-30 16:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27  8:40 BibTeX issues Joost Kremers
2019-08-28 17:45 ` Roland Winkler
2019-08-28 18:45   ` Eli Zaretskii
2019-08-29  3:26     ` strip accents and sorting [was: BibTeX issues] Roland Winkler
2019-08-29  6:15       ` martin rudalics
2019-08-30 16:27         ` Roland Winkler [this message]
2019-08-30 17:51           ` Eli Zaretskii
2019-08-30 18:38             ` Eli Zaretskii
2019-08-30 19:09               ` Roland Winkler
2019-08-30 19:19                 ` Eli Zaretskii
2019-08-30 19:49                   ` Roland Winkler
2019-08-31  6:45                     ` Eli Zaretskii
2019-08-29  7:10       ` Eli Zaretskii
2019-08-30 16:29         ` Roland Winkler
2019-08-29  7:49   ` BibTeX issues Joost Kremers
2019-08-30 19:18     ` Roland Winkler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20085.68375.750044.23913@gargle.gargle.HOWL \
    --to=winkler@gnu.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=rudalics@gmx.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.