* How to compare strings? @ 2007-04-29 16:23 David Kastrup 2007-04-29 19:38 ` Eli Zaretskii [not found] ` <mailman.2692.1177876391.7795.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 16+ messages in thread From: David Kastrup @ 2007-04-29 16:23 UTC (permalink / raw) To: help-gnu-emacs Hi, how do I compare strings in the sort order of the current language environment? Does Emacs have a concept of sort order depending on language? If not, why not? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? 2007-04-29 16:23 How to compare strings? David Kastrup @ 2007-04-29 19:38 ` Eli Zaretskii 2007-04-29 20:06 ` Lennart Borgman (gmail) [not found] ` <mailman.2692.1177876391.7795.help-gnu-emacs@gnu.org> 1 sibling, 1 reply; 16+ messages in thread From: Eli Zaretskii @ 2007-04-29 19:38 UTC (permalink / raw) To: help-gnu-emacs > From: David Kastrup <dak@gnu.org> > Date: Sun, 29 Apr 2007 18:23:09 +0200 > > how do I compare strings in the sort order of the current language > environment? I don't understand the question. I'm sure you are aware that in the Emacs internal representation of strings, each character has a distinct codepoint. That is, unlike outside Emacs, where the same code can stand for different characters depending on the locale (because each locale assumes a certain default encoding of text), inside Emacs Latin-1 è and Latin-2 č are two different characters represented by two different codes, even though their respective 8-bit encodings are identical (\350 or hex E8). In the above example, these two internal codes are 2280 and 2408 decimal. (In Emacs 23, these codes will change, but will still be different.) Thus, as long as the string was decoded correctly, comparing such strings is a simple matter of using string< and its ilk. > Does Emacs have a concept of sort order depending on language? If > not, why not? Because characters that have different order depending on the language have different codepoints inside Emacs, and thus the issue doesn't exist. Or am I missing something? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? 2007-04-29 19:38 ` Eli Zaretskii @ 2007-04-29 20:06 ` Lennart Borgman (gmail) 2007-04-29 20:52 ` Maciej Katafiasz [not found] ` <mailman.2696.1177880336.7795.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 16+ messages in thread From: Lennart Borgman (gmail) @ 2007-04-29 20:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: help-gnu-emacs Eli Zaretskii wrote: > Because characters that have different order depending on the language > have different codepoints inside Emacs, and thus the issue doesn't > exist. > > Or am I missing something? I think that sorting differs more than that between different languages. Or at least it used to do that. Perhaps things have changed today, I am not sure. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? 2007-04-29 20:06 ` Lennart Borgman (gmail) @ 2007-04-29 20:52 ` Maciej Katafiasz [not found] ` <mailman.2696.1177880336.7795.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 16+ messages in thread From: Maciej Katafiasz @ 2007-04-29 20:52 UTC (permalink / raw) To: help-gnu-emacs Den Sun, 29 Apr 2007 22:06:29 +0200 skrev Lennart Borgman (gmail): > Eli Zaretskii wrote: > >> Because characters that have different order depending on the language >> have different codepoints inside Emacs, and thus the issue doesn't >> exist. >> >> Or am I missing something? > > I think that sorting differs more than that between different languages. > Or at least it used to do that. Perhaps things have changed today, I am > not sure. It does. Swedish: a ae o oe å ä ö German: a å ä ae o ö oe Cheers, Maciej ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <mailman.2696.1177880336.7795.help-gnu-emacs@gnu.org>]
* Re: How to compare strings? [not found] ` <mailman.2696.1177880336.7795.help-gnu-emacs@gnu.org> @ 2007-05-01 13:19 ` Malte Spiess 0 siblings, 0 replies; 16+ messages in thread From: Malte Spiess @ 2007-05-01 13:19 UTC (permalink / raw) To: help-gnu-emacs Maciej Katafiasz <mathrick@gmail.com> writes: > Den Sun, 29 Apr 2007 22:06:29 +0200 skrev Lennart Borgman (gmail): > >> Eli Zaretskii wrote: >> >>> Because characters that have different order depending on the language >>> have different codepoints inside Emacs, and thus the issue doesn't >>> exist. >>> >>> Or am I missing something? >> >> I think that sorting differs more than that between different languages. >> Or at least it used to do that. Perhaps things have changed today, I am >> not sure. > > It does. > > Swedish: > a ae o oe å ä ö > > German: > a å ä ae o ö oe Well, in Estonian it's even worse, since here the z is between the s and the t (going r s z t u v) - so even with normal letters the sorting is different. I should add that "z" is not part of the normal Estonian alphabet, but you sort in foreign words like this. > Cheers, > Maciej Greetings Malte ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <mailman.2692.1177876391.7795.help-gnu-emacs@gnu.org>]
* Re: How to compare strings? [not found] ` <mailman.2692.1177876391.7795.help-gnu-emacs@gnu.org> @ 2007-04-29 20:39 ` Joost Kremers 2007-04-29 21:31 ` sigvaldi ` (3 more replies) 2007-04-29 22:25 ` David Kastrup 1 sibling, 4 replies; 16+ messages in thread From: Joost Kremers @ 2007-04-29 20:39 UTC (permalink / raw) To: help-gnu-emacs Eli Zaretskii wrote: >> From: David Kastrup <dak@gnu.org> >> Does Emacs have a concept of sort order depending on language? If >> not, why not? > > Because characters that have different order depending on the language > have different codepoints inside Emacs, and thus the issue doesn't > exist. > > Or am I missing something? Well, in German dictionaries you will generally find words with ö interspersed with those with o, but within the letter O, o>ö. So both "Ode" and "öde" appear under O, but the former before the latter. Both, however, appear before "oder". Yet, other languages that use ö may well alphabetise it as a completely separate letter. IIRC this is done for example in Hungarian dictionaries, where O and Ö are different sections of the dictionary, Ö following after O. In Icelandic I think (may well be wrong, though), that it is customary to sort words with Ö at the end, that is, even *after* Z. -- Joost Kremers joostkremers@yahoo.com Selbst in die Unterwelt dringt durch Spalten Licht EN:SiS(9) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? 2007-04-29 20:39 ` Joost Kremers @ 2007-04-29 21:31 ` sigvaldi 2007-04-29 21:47 ` Harald Hanche-Olsen ` (2 subsequent siblings) 3 siblings, 0 replies; 16+ messages in thread From: sigvaldi @ 2007-04-29 21:31 UTC (permalink / raw) To: help-gnu-emacs Joost Kremers wrote: > Eli Zaretskii wrote: > >> From: David Kastrup <dak@gnu.org> > >> Does Emacs have a concept of sort order depending on language? If > >> not, why not? > > > > Because characters that have different order depending on the language > > have different codepoints inside Emacs, and thus the issue doesn't > > exist. > > > > Or am I missing something? > > Well, in German dictionaries you will generally find words with ö > interspersed with those with o, but within the letter O, o>ö. So both "Ode" > and "öde" appear under O, but the former before the latter. Both, however, > appear before "oder". > > Yet, other languages that use ö may well alphabetise it as a completely > separate letter. IIRC this is done for example in Hungarian dictionaries, > where O and Ö are different sections of the dictionary, Ö following after > O. In Icelandic I think (may well be wrong, though), that it is customary > to sort words with Ö at the end, that is, even *after* Z. > The Icelandic alphabet ends in XYZÞÆÖ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? 2007-04-29 20:39 ` Joost Kremers 2007-04-29 21:31 ` sigvaldi @ 2007-04-29 21:47 ` Harald Hanche-Olsen 2007-04-29 21:56 ` Lennart Borgman (gmail) [not found] ` <mailman.2701.1177884177.7795.help-gnu-emacs@gnu.org> 3 siblings, 0 replies; 16+ messages in thread From: Harald Hanche-Olsen @ 2007-04-29 21:47 UTC (permalink / raw) To: help-gnu-emacs + Joost Kremers <joostkremers@yahoo.com>: | Yet, other languages that use ö may well alphabetise it as a | completely separate letter. IIRC this is done for example in | Hungarian dictionaries, where O and Ö are different sections of the | dictionary, Ö following after O. In Icelandic I think (may well be | wrong, though), that it is customary to sort words with Ö at the | end, that is, even *after* Z. Swedish too: The Swedish alphabet ends ...XYZÅÄÖ. And the Danish and Norwegian, ...XYZÆØÅ. (The Danish/Norwegian use of Ø corresponds roughly to the Swedish Ö, and Æ to Ä.) -- * Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/> - It is undesirable to believe a proposition when there is no ground whatsoever for supposing it is true. -- Bertrand Russell ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? 2007-04-29 20:39 ` Joost Kremers 2007-04-29 21:31 ` sigvaldi 2007-04-29 21:47 ` Harald Hanche-Olsen @ 2007-04-29 21:56 ` Lennart Borgman (gmail) 2007-04-29 22:22 ` Jesper Harder [not found] ` <mailman.2702.1177885779.7795.help-gnu-emacs@gnu.org> [not found] ` <mailman.2701.1177884177.7795.help-gnu-emacs@gnu.org> 3 siblings, 2 replies; 16+ messages in thread From: Lennart Borgman (gmail) @ 2007-04-29 21:56 UTC (permalink / raw) To: Joost Kremers; +Cc: help-gnu-emacs Joost Kremers wrote: > Eli Zaretskii wrote: >>> From: David Kastrup <dak@gnu.org> >>> Does Emacs have a concept of sort order depending on language? If >>> not, why not? >> Because characters that have different order depending on the language >> have different codepoints inside Emacs, and thus the issue doesn't >> exist. >> >> Or am I missing something? > > Well, in German dictionaries you will generally find words with ö > interspersed with those with o, but within the letter O, o>ö. So both "Ode" > and "öde" appear under O, but the former before the latter. Both, however, > appear before "oder". But I think there are completely different problems too. Does not some languages sort partly depending the phonetics instead of the spelling? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? 2007-04-29 21:56 ` Lennart Borgman (gmail) @ 2007-04-29 22:22 ` Jesper Harder [not found] ` <mailman.2702.1177885779.7795.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 16+ messages in thread From: Jesper Harder @ 2007-04-29 22:22 UTC (permalink / raw) To: help-gnu-emacs "Lennart Borgman (gmail)" <lennart.borgman@gmail.com> writes: > But I think there are completely different problems too. Does not some > languages sort partly depending the phonetics instead of the spelling? Yes. In Danish 'aa' is alphabetized according to how it's pronounced. If it is pronounced as two vowels (e.g. ekstraarbejde), it's alphabetized as two a's. If it is pronounced as one vowel (e.g. afrikaans) is alphabetized as å (the last letter in the Danish alphabet). ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <mailman.2702.1177885779.7795.help-gnu-emacs@gnu.org>]
* Re: How to compare strings? [not found] ` <mailman.2702.1177885779.7795.help-gnu-emacs@gnu.org> @ 2007-04-29 23:06 ` Joost Kremers 0 siblings, 0 replies; 16+ messages in thread From: Joost Kremers @ 2007-04-29 23:06 UTC (permalink / raw) To: help-gnu-emacs Jesper Harder wrote: > "Lennart Borgman (gmail)" <lennart.borgman@gmail.com> writes: > >> But I think there are completely different problems too. Does not some >> languages sort partly depending the phonetics instead of the spelling? > > Yes. In Danish 'aa' is alphabetized according to how it's > pronounced. > > If it is pronounced as two vowels (e.g. ekstraarbejde), it's > alphabetized as two a's. If it is pronounced as one vowel > (e.g. afrikaans) is alphabetized as å (the last letter in the Danish > alphabet). technically, this is not (if i understand things correctly, i don't speak danish) a case of alphabetising according to pronunciation. when 'aa' is, as you put it, pronounced as one vowel, it is technically a digraph, i.e. a combination of two letters that indicate a single sound. many languages have digraphs, e.g. english has th, ch, ph and ng, and quite a few vowel combinations that are pronounced as one vowel (or diphthong); dutch has quite a few vowel digraphs (with pronunciations that are somewhat more regular than in english ;-), e.g. oe, eu, ui, au, ou, ei and ij. in some languages, digraphs are treated as single letters for alphabetisation. the 'aa' case in danish above is an example. sometimes, digraphs present particularly interesting problems. in dutch dictionaries, the digraph ij is treated as two letters, so words starting with ij appear under i, but in phone books and the like, it's often treated as equivalent to y, so that names starting with ij appear intermingled with y. and then there's the case of nahuatl, which has a bunch of consonant digraphs (ch, cu/uc, hu/uh, qu, tl, tz). dictionaries often (though not always, there's no "standard" here), have separate sections for words starting with these digraphs, but for the rest treat them as two separate letters for alphabetisation within a section. (well, there's of course the whole issue of roots vs. stems and the fact that cu/uc and hu/uh change based on the position of the word they're in, but let's not get into that. ;-) -- Joost Kremers joostkremers@yahoo.com Selbst in die Unterwelt dringt durch Spalten Licht EN:SiS(9) ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <mailman.2701.1177884177.7795.help-gnu-emacs@gnu.org>]
* Re: How to compare strings? [not found] ` <mailman.2701.1177884177.7795.help-gnu-emacs@gnu.org> @ 2007-04-29 22:08 ` Joost Kremers 2007-04-30 7:50 ` Harald Hanche-Olsen 0 siblings, 1 reply; 16+ messages in thread From: Joost Kremers @ 2007-04-29 22:08 UTC (permalink / raw) To: help-gnu-emacs Lennart Borgman (gmail) wrote: > But I think there are completely different problems too. Does not some > languages sort partly depending the phonetics instead of the spelling? TBH i have no idea what you mean by that... could you give an example? -- Joost Kremers joostkremers@yahoo.com Selbst in die Unterwelt dringt durch Spalten Licht EN:SiS(9) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? 2007-04-29 22:08 ` Joost Kremers @ 2007-04-30 7:50 ` Harald Hanche-Olsen 0 siblings, 0 replies; 16+ messages in thread From: Harald Hanche-Olsen @ 2007-04-30 7:50 UTC (permalink / raw) To: help-gnu-emacs + Joost Kremers <joostkremers@yahoo.com>: | Lennart Borgman (gmail) wrote: |> But I think there are completely different problems too. Does not some |> languages sort partly depending the phonetics instead of the spelling? | | TBH i have no idea what you mean by that... could you give an example? It's true, at least in Norwegian phone books. Before the letter Å entered our alphabet, Aa was used instead. You don't find that in regular words, anymore, but the practice survives in many family names. So a name like Aarnes would be alphabetized like it were Årnes. And just to make matters really confusing, the rule is supposed not to be followed with foreign names where the aa really does not corresponding to the letter å, so an algorithmic solution is impossible. (I strongly suspect that Norwegian phone books consistently alphabetize aa as å, though, regardless of the origin of the name.) -- * Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/> - It is undesirable to believe a proposition when there is no ground whatsoever for supposing it is true. -- Bertrand Russell ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? [not found] ` <mailman.2692.1177876391.7795.help-gnu-emacs@gnu.org> 2007-04-29 20:39 ` Joost Kremers @ 2007-04-29 22:25 ` David Kastrup 2007-04-30 5:30 ` Stefan Monnier 2007-04-30 19:28 ` Eli Zaretskii 1 sibling, 2 replies; 16+ messages in thread From: David Kastrup @ 2007-04-29 22:25 UTC (permalink / raw) To: help-gnu-emacs Eli Zaretskii <eliz@gnu.org> writes: >> From: David Kastrup <dak@gnu.org> >> Date: Sun, 29 Apr 2007 18:23:09 +0200 >> >> how do I compare strings in the sort order of the current language >> environment? > > I don't understand the question. I'm sure you are aware that in the > Emacs internal representation of strings, each character has a > distinct codepoint. That is, unlike outside Emacs, where the same > code can stand for different characters depending on the locale > (because each locale assumes a certain default encoding of text), > inside Emacs Latin-1 è and Latin-2 č are two different characters > represented by two different codes, even though their respective 8-bit > encodings are identical (\350 or hex E8). And? > In the above example, these two internal codes are 2280 and 2408 > decimal. (In Emacs 23, these codes will change, but will still be > different.) > > Thus, as long as the string was decoded correctly, comparing such > strings is a simple matter of using string< and its ilk. But it does not establish the sort order of a language, but rather the sort order of Unicode (or MULE) code points. Something entirely different. >> Does Emacs have a concept of sort order depending on language? If >> not, why not? > > Because characters that have different order depending on the > language have different codepoints inside Emacs, and thus the issue > doesn't exist. > > Or am I missing something? You are seemingly talking about something entirely different. I can't even make sense of your explanations. Different languages have different orders of sorting characters. Look up the man pages of strcoll and strxfrm. Pick up sometelephone directories or dictionaries of such languages. Please note that this is only partly related to the coding scheme (utf-8/latin-1 etc). For example, in some languages, accented letters will be right behind the corresponding unaccented letter. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? 2007-04-29 22:25 ` David Kastrup @ 2007-04-30 5:30 ` Stefan Monnier 2007-04-30 19:28 ` Eli Zaretskii 1 sibling, 0 replies; 16+ messages in thread From: Stefan Monnier @ 2007-04-30 5:30 UTC (permalink / raw) To: help-gnu-emacs >> Or am I missing something? > You are seemingly talking about something entirely different. I can't > even make sense of your explanations. It was just a roundabout way to say "No, Emacs does not support language-dependent sort order". Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to compare strings? 2007-04-29 22:25 ` David Kastrup 2007-04-30 5:30 ` Stefan Monnier @ 2007-04-30 19:28 ` Eli Zaretskii 1 sibling, 0 replies; 16+ messages in thread From: Eli Zaretskii @ 2007-04-30 19:28 UTC (permalink / raw) To: help-gnu-emacs > From: David Kastrup <dak@gnu.org> > Date: Mon, 30 Apr 2007 00:25:59 +0200 > > You are seemingly talking about something entirely different. I can't > even make sense of your explanations. I simply didn't understand what you were asking (I actually told that right at the beginning of my response). I thought you were asking about script-specific sorting, and that is what I responded to. But you in fact asked about language-specific sorting that goes beyond script. Emacs doesn't currently support any language-specific features (not just sorting, _any_ features), unless they happen to coincide with script-specific features. Adding such language-specific features should probably be part of the agenda for the Unicode based Emacs (a.k.a. Emacs 23), and I expect it to be a lot of work. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-05-01 13:19 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-04-29 16:23 How to compare strings? David Kastrup 2007-04-29 19:38 ` Eli Zaretskii 2007-04-29 20:06 ` Lennart Borgman (gmail) 2007-04-29 20:52 ` Maciej Katafiasz [not found] ` <mailman.2696.1177880336.7795.help-gnu-emacs@gnu.org> 2007-05-01 13:19 ` Malte Spiess [not found] ` <mailman.2692.1177876391.7795.help-gnu-emacs@gnu.org> 2007-04-29 20:39 ` Joost Kremers 2007-04-29 21:31 ` sigvaldi 2007-04-29 21:47 ` Harald Hanche-Olsen 2007-04-29 21:56 ` Lennart Borgman (gmail) 2007-04-29 22:22 ` Jesper Harder [not found] ` <mailman.2702.1177885779.7795.help-gnu-emacs@gnu.org> 2007-04-29 23:06 ` Joost Kremers [not found] ` <mailman.2701.1177884177.7795.help-gnu-emacs@gnu.org> 2007-04-29 22:08 ` Joost Kremers 2007-04-30 7:50 ` Harald Hanche-Olsen 2007-04-29 22:25 ` David Kastrup 2007-04-30 5:30 ` Stefan Monnier 2007-04-30 19:28 ` Eli Zaretskii
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).