upcase-word, &c. for non-latin alphabets

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* upcase-word, &c. for non-latin alphabets
@ 2014-04-09 15:52 Will Parsons
  2014-04-09 16:30 ` Eli Zaretskii
       [not found] ` <mailman.19307.1397061058.10748.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 8+ messages in thread
From: Will Parsons @ 2014-04-09 15:52 UTC (permalink / raw)
  To: help-gnu-emacs

I'd like to be able use the upcase/downcase/capitalize-word functions
(or equivalents) for non-Latin alphabets (utf-8 encoding).  Can anyone
give me any hints on how to go about it?  I assume that one step is to
define the correlation between uppercase and lowercase letters
somehow, but beyond that I don't have any idea.  (Since the standard
functions are built-in rather than elisp, I can't easily see how they
do it.)

-- 
Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: upcase-word, &c. for non-latin alphabets
  2014-04-09 15:52 upcase-word, &c. for non-latin alphabets Will Parsons
@ 2014-04-09 16:30 ` Eli Zaretskii
  2014-04-09 16:59   ` Yuri Khan
       [not found] ` <mailman.19307.1397061058.10748.help-gnu-emacs@gnu.org>
  1 sibling, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2014-04-09 16:30 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Will Parsons <varro@nodomain.invalid>
> Date: 9 Apr 2014 15:52:04 GMT
> 
> I'd like to be able use the upcase/downcase/capitalize-word functions
> (or equivalents) for non-Latin alphabets (utf-8 encoding).  Can anyone
> give me any hints on how to go about it?

Just use M-u, M-l, and M-c.  Emacs already knows about all the
alphabets it supports (which is all of the Unicode).



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: upcase-word, &c. for non-latin alphabets
  2014-04-09 16:30 ` Eli Zaretskii
@ 2014-04-09 16:59   ` Yuri Khan
  2014-04-09 17:19     ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Yuri Khan @ 2014-04-09 16:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs@gnu.org

On Wed, Apr 9, 2014 at 11:30 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Will Parsons <varro@nodomain.invalid>
>> Date: 9 Apr 2014 15:52:04 GMT
>>
>> I'd like to be able use the upcase/downcase/capitalize-word functions
>> (or equivalents) for non-Latin alphabets (utf-8 encoding).  Can anyone
>> give me any hints on how to go about it?
>
> Just use M-u, M-l, and M-c.  Emacs already knows about all the
> alphabets it supports (which is all of the Unicode).

Does it also do the right thing for the Turkic dotted and dotless I
(which is about the only thing in Unicode that cannot be up/downcased
without knowing the language)?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: upcase-word, &c. for non-latin alphabets
  2014-04-09 16:59   ` Yuri Khan
@ 2014-04-09 17:19     ` Eli Zaretskii
  2014-04-10  2:55       ` Yuri Khan
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2014-04-09 17:19 UTC (permalink / raw)
  To: Yuri Khan; +Cc: help-gnu-emacs

> Date: Wed, 9 Apr 2014 23:59:43 +0700
> From: Yuri Khan <yuri.v.khan@gmail.com>
> Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org>
> 
> On Wed, Apr 9, 2014 at 11:30 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> >> From: Will Parsons <varro@nodomain.invalid>
> >> Date: 9 Apr 2014 15:52:04 GMT
> >>
> >> I'd like to be able use the upcase/downcase/capitalize-word functions
> >> (or equivalents) for non-Latin alphabets (utf-8 encoding).  Can anyone
> >> give me any hints on how to go about it?
> >
> > Just use M-u, M-l, and M-c.  Emacs already knows about all the
> > alphabets it supports (which is all of the Unicode).
> 
> Does it also do the right thing for the Turkic dotted and dotless I
> (which is about the only thing in Unicode that cannot be up/downcased
> without knowing the language)?

N o, Emacs doesn't yet support titlecase.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: upcase-word, &c. for non-latin alphabets
       [not found] ` <mailman.19307.1397061058.10748.help-gnu-emacs@gnu.org>
@ 2014-04-10  0:38   ` Will Parsons
  2014-04-10  2:45     ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Will Parsons @ 2014-04-10  0:38 UTC (permalink / raw)
  To: help-gnu-emacs

Eli Zaretskii wrote:
>> From: Will Parsons <varro@nodomain.invalid>
>> Date: 9 Apr 2014 15:52:04 GMT
>> 
>> I'd like to be able use the upcase/downcase/capitalize-word functions
>> (or equivalents) for non-Latin alphabets (utf-8 encoding).  Can anyone
>> give me any hints on how to go about it?
>
> Just use M-u, M-l, and M-c.  Emacs already knows about all the
> alphabets it supports (which is all of the Unicode).

I thought I tried it and it didn't work, but on retrying, I see that
is does work, partially.  I.e., some letters change case, but not
others.  The alphabet I'm working with is fairly exotic, viz. Coptic,
so I'm not supprised there are problems (in fact I'm surprised it
works even partially), so perhaps this qualifies as a bug?

I'm curious though, since upcase-word &al. are built-in, does that
mean that that the base Emacs code contains enormous correspondance
tables for the large number of Unicode code blocks that must exist?

-- 
Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: upcase-word, &c. for non-latin alphabets
  2014-04-10  0:38   ` Will Parsons
@ 2014-04-10  2:45     ` Eli Zaretskii
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2014-04-10  2:45 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Will Parsons <varro@nodomain.invalid>
> Date: 10 Apr 2014 00:38:24 GMT
> 
> > Just use M-u, M-l, and M-c.  Emacs already knows about all the
> > alphabets it supports (which is all of the Unicode).
> 
> I thought I tried it and it didn't work, but on retrying, I see that
> is does work, partially.  I.e., some letters change case, but not
> others.  The alphabet I'm working with is fairly exotic, viz. Coptic,
> so I'm not supprised there are problems (in fact I'm surprised it
> works even partially), so perhaps this qualifies as a bug?

Yes, please report the problems with "M-x report-emacs-bug RET".

> I'm curious though, since upcase-word &al. are built-in, does that
> mean that that the base Emacs code contains enormous correspondance
> tables for the large number of Unicode code blocks that must exist?

Yes.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: upcase-word, &c. for non-latin alphabets
  2014-04-09 17:19     ` Eli Zaretskii
@ 2014-04-10  2:55       ` Yuri Khan
  2014-04-10 15:26         ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Yuri Khan @ 2014-04-10  2:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs@gnu.org

On Thu, Apr 10, 2014 at 12:19 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>>
>> Does it also do the right thing for the Turkic dotted and dotless I
>> (which is about the only thing in Unicode that cannot be up/downcased
>> without knowing the language)?
>
> N o, Emacs doesn't yet support titlecase.

As far as I understand, titlecase is when you capitalize each word in
a sentence (in English; in other languages rules differ). My question
was not about that.

In Turkic languages (including but not limited to Turkish), the
lowercase of ?I is not ?i but ?ı (U+0131 Latin small letter dotless
i), and the uppercase of ?i is ?İ (U+0130 Latin capital letter I with
dot above).

$ emacs --version
GNU Emacs 24.3.1
[…]
$ emacs -Q
M-x set-language-environment
Turkish
(in *scratch*) M-< M-u
==> ;; THIS buffer… (wrong for Turkish)
<down> <home> M-l
==> ;; if you want… (also wrong for Turkish)

Further, if I paste the above paragraph and try M-u and M-l on those
funny letters, Emacs does nothing to them.

This happens even when I do “sudo locale-gen tr_TR.utf8” and/or start
Emacs as “LANG=tr_TR.utf8 emacs -Q”.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: upcase-word, &c. for non-latin alphabets
  2014-04-10  2:55       ` Yuri Khan
@ 2014-04-10 15:26         ` Eli Zaretskii
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2014-04-10 15:26 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Thu, 10 Apr 2014 09:55:40 +0700
> From: Yuri Khan <yuri.v.khan@gmail.com>
> Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org>
> 
> On Thu, Apr 10, 2014 at 12:19 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> >>
> >> Does it also do the right thing for the Turkic dotted and dotless I
> >> (which is about the only thing in Unicode that cannot be up/downcased
> >> without knowing the language)?
> >
> > No, Emacs doesn't yet support titlecase.
> 
> As far as I understand, titlecase is when you capitalize each word in
> a sentence (in English; in other languages rules differ). My question
> was not about that.
> 
> In Turkic languages (including but not limited to Turkish), the
> lowercase of ?I is not ?i but ?ı (U+0131 Latin small letter dotless
> i), and the uppercase of ?i is ?İ (U+0130 Latin capital letter I with
> dot above).

I understood the question.  It doesn't matter how you call this
feature, the answer is still NO.  Emacs doesn't support this yet (as
it doesn't support other case-folding variations, like Greek letters
at the end of a word, or the German ß whose upcase variant is SS).




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-04-10 15:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-09 15:52 upcase-word, &c. for non-latin alphabets Will Parsons
2014-04-09 16:30 ` Eli Zaretskii
2014-04-09 16:59   ` Yuri Khan
2014-04-09 17:19     ` Eli Zaretskii
2014-04-10  2:55       ` Yuri Khan
2014-04-10 15:26         ` Eli Zaretskii
     [not found] ` <mailman.19307.1397061058.10748.help-gnu-emacs@gnu.org>
2014-04-10  0:38   ` Will Parsons
2014-04-10  2:45     ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).