[yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
@ 2007-03-31 20:43 Richard Stallman
  2007-04-01 17:39 ` Chong Yidong
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Richard Stallman @ 2007-03-31 20:43 UTC (permalink / raw)
  To: emacs-devel

Would people please DTRT, then ack?

------- Start of forwarded message -------
X-Spam-Status: No, score=2.8 required=5.0 tests=DNS_FROM_RFC_ABUSE,
	DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS,SPF_PASS,UNPARSEABLE_RELAY 
	autolearn=no version=3.1.0
From: Volkan YAZICI <yazicivo@ttnet.net.tr>
To: bug-gnu-emacs@gnu.org
Date: Wed, 28 Mar 2007 00:37:21 +0300
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Subject: Locale Dependent Downcasing in smtpmail

Hi,

smtpmail tries to downcase the strings using DOWNCASE function during
the SMTP communication. But this leads to some problems in some
locales. I spotted that problem when I tried to launch emacs with
LC_CTYPE=tr_TR locale. In Turkish, downcased I is a dotless i.
Therefore, while it tries to downcase some AUTH mechanisms (in
smtpmail-via-smtp function), PLAIN and LOGIN turns into pla?n and
log?n. And this causes (smtpmail-intersection smtpmail-auth-supported
mechs) to return nil in smtpmail-try-auth-methods function.

IMHO, smtpmail-via-smtp function should switch to ASCII locale (if
that's possible) before calling DOWNCASE.

Regards.

_______________________________________________
bug-gnu-emacs mailing list
bug-gnu-emacs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnu-emacs
------- End of forwarded message -------

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-03-31 20:43 [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail] Richard Stallman
@ 2007-04-01 17:39 ` Chong Yidong
  2007-04-02  6:51 ` Kenichi Handa
  2007-04-02 17:31 ` Volkan YAZICI
  2 siblings, 0 replies; 45+ messages in thread
From: Chong Yidong @ 2007-04-01 17:39 UTC (permalink / raw)
  To: rms, Volkan YAZICI; +Cc: emacs-devel

> smtpmail tries to downcase the strings using DOWNCASE function during
> the SMTP communication. But this leads to some problems in some
> locales. I spotted that problem when I tried to launch emacs with
> LC_CTYPE=tr_TR locale. In Turkish, downcased I is a dotless i.
> Therefore, while it tries to downcase some AUTH mechanisms (in
> smtpmail-via-smtp function), PLAIN and LOGIN turns into pla?n and
> log?n. And this causes (smtpmail-intersection smtpmail-auth-supported
> mechs) to return nil in smtpmail-try-auth-methods function.
>
> IMHO, smtpmail-via-smtp function should switch to ASCII locale (if
> that's possible) before calling DOWNCASE.

I've installed a fix for this.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-03-31 20:43 [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail] Richard Stallman
  2007-04-01 17:39 ` Chong Yidong
@ 2007-04-02  6:51 ` Kenichi Handa
  2007-04-02 22:52   ` Chong Yidong
  2007-04-02 17:31 ` Volkan YAZICI
  2 siblings, 1 reply; 45+ messages in thread
From: Kenichi Handa @ 2007-04-02  6:51 UTC (permalink / raw)
  To: rms; +Cc: yazicivo, emacs-devel

In article <E1HXkPz-0003le-KI@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

> Would people please DTRT, then ack?
> ------- Start of forwarded message -------
> X-Spam-Status: No, score=2.8 required=5.0 tests=DNS_FROM_RFC_ABUSE,
> 	DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS,SPF_PASS,UNPARSEABLE_RELAY 
> 	autolearn=no version=3.1.0
> From: Volkan YAZICI <yazicivo@ttnet.net.tr>
> To: bug-gnu-emacs@gnu.org
> Date: Wed, 28 Mar 2007 00:37:21 +0300
> MIME-Version: 1.0
> Content-Type: text/plain; charset=utf-8
> Subject: Locale Dependent Downcasing in smtpmail

> Hi,

> smtpmail tries to downcase the strings using DOWNCASE function during
> the SMTP communication. But this leads to some problems in some
> locales. I spotted that problem when I tried to launch emacs with
> LC_CTYPE=tr_TR locale. In Turkish, downcased I is a dotless i.
> Therefore, while it tries to downcase some AUTH mechanisms (in
> smtpmail-via-smtp function), PLAIN and LOGIN turns into pla?n and
> log?n. And this causes (smtpmail-intersection smtpmail-auth-supported
> mechs) to return nil in smtpmail-try-auth-methods function.

Does the attached change fix the problem?

---
Kenichi Handa
handa@m17n.org

*** smtpmail.el	10 Feb 2007 16:30:14 +0900	1.91
--- smtpmail.el	02 Apr 2007 15:49:05 +0900	
***************
*** 691,697 ****
  			  (>= (car response-code) 400))
  		      (throw 'done nil)))
  	      (dolist (line (cdr (cdr response-code)))
! 		(let ((name (mapcar (lambda (s) (intern (downcase s)))
  				    (split-string (substring line 4) "[ ]"))))
  		  (and (eq (length name) 1)
  		       (setq name (car name)))
--- 691,704 ----
  			  (>= (car response-code) 400))
  		      (throw 'done nil)))
  	      (dolist (line (cdr (cdr response-code)))
! 		(let ((name (mapcar (lambda (s)
! 				      (setq s (downcase s))
! 				      ;; If `I' is downcased to dotless-i,
! 				      ;; convert it to `i'.
! 				      (if (/= (downcase ?I) ?i)
! 					  (subst-char-in-string
! 					   (downcase ?I) ?i s t))
! 				      (intern s))
  				    (split-string (substring line 4) "[ ]"))))
  		  (and (eq (length name) 1)
  		       (setq name (car name)))

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-03-31 20:43 [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail] Richard Stallman
  2007-04-01 17:39 ` Chong Yidong
  2007-04-02  6:51 ` Kenichi Handa
@ 2007-04-02 17:31 ` Volkan YAZICI
  2007-04-03  8:06   ` Kenichi Handa
  2 siblings, 1 reply; 45+ messages in thread
From: Volkan YAZICI @ 2007-04-02 17:31 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: rms, emacs-devel

Kenichi Handa <handa@m17n.org> writes:
> In article <E1HXkPz-0003le-KI@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> Does the attached change fix the problem?
>
> ---
> Kenichi Handa
> handa@m17n.org
>
> *** smtpmail.el	10 Feb 2007 16:30:14 +0900	1.91
> --- smtpmail.el	02 Apr 2007 15:49:05 +0900	
> ***************
> *** 691,697 ****
>   			  (>= (car response-code) 400))
>   		      (throw 'done nil)))
>   	      (dolist (line (cdr (cdr response-code)))
> ! 		(let ((name (mapcar (lambda (s) (intern (downcase s)))
>   				    (split-string (substring line 4) "[ ]"))))
>   		  (and (eq (length name) 1)
>   		       (setq name (car name)))
> --- 691,704 ----
>   			  (>= (car response-code) 400))
>   		      (throw 'done nil)))
>   	      (dolist (line (cdr (cdr response-code)))
> ! 		(let ((name (mapcar (lambda (s)
> ! 				      (setq s (downcase s))
> ! 				      ;; If `I' is downcased to dotless-i,
> ! 				      ;; convert it to `i'.
> ! 				      (if (/= (downcase ?I) ?i)
> ! 					  (subst-char-in-string
> ! 					   (downcase ?I) ?i s t))

Such a fix is quite unfeasible. What do you think to do for other
problematic characters as well? Introduce a new if-else clause for
every one?

I am not faimilar with introducing a new macro policy of emacs team
but it'd probably be useful (handy?) to have something similar to this
macro:

(with-case-table 'ascii
  ;; Any call to DOWNCASE/UPCASE within this (dynamic?) scope will use
  ;; the case conversion table specified in the first argument of the
  ;; WITH-CASE-TABLE macro.
  ...)


Regards.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-02  6:51 ` Kenichi Handa
@ 2007-04-02 22:52   ` Chong Yidong
  2007-04-02 23:20     ` Volkan YAZICI
                       ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Chong Yidong @ 2007-04-02 22:52 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: yazicivo, rms, emacs-devel

Kenichi Handa <handa@m17n.org> writes:

>> smtpmail tries to downcase the strings using DOWNCASE function
>> during the SMTP communication.  In Turkish, downcased I is a
>> dotless i.  Therefore, while it tries to downcase some AUTH
>> mechanisms (in smtpmail-via-smtp function), PLAIN and LOGIN turns
>> into pla?n and log?n.
>
> Does the attached change fix the problem?
>
> ! 		(let ((name (mapcar (lambda (s)
> ! 				      (setq s (downcase s))
> ! 				      ;; If `I' is downcased to dotless-i,
> ! 				      ;; convert it to `i'.
> ! 				      (if (/= (downcase ?I) ?i)
> ! 					  (subst-char-in-string
> ! 					   (downcase ?I) ?i s t))
> ! 				      (intern s))

I wonder if there's a better way to do this.  Maybe we can define an
ascii case table that doesn't get overwritten by the locale; then code
like the above can bind to this case table temporarily (or we can
define a downcase-ascii function that does such a thing).

But maybe, for Emacs 22, the above hack is all we need.  Is it true
that in practice, all we have to worry about is "i"?

(I tried changing this another way, but that turned out to be bogus,
so I reverted my patch.  Sorry for the noise.)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-02 22:52   ` Chong Yidong
@ 2007-04-02 23:20     ` Volkan YAZICI
  2007-04-03  1:24     ` Kenichi Handa
  2007-04-03 21:40     ` Richard Stallman
  2 siblings, 0 replies; 45+ messages in thread
From: Volkan YAZICI @ 2007-04-02 23:20 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel, rms, Kenichi Handa

Chong Yidong <cyd@stupidchicken.com> writes:
> Kenichi Handa <handa@m17n.org> writes:
>> Does the attached change fix the problem?
>>
>> ! 		(let ((name (mapcar (lambda (s)
>> ! 				      (setq s (downcase s))
>> ! 				      ;; If `I' is downcased to dotless-i,
>> ! 				      ;; convert it to `i'.
>> ! 				      (if (/= (downcase ?I) ?i)
>> ! 					  (subst-char-in-string
>> ! 					   (downcase ?I) ?i s t))
>> ! 				      (intern s))
>
> I wonder if there's a better way to do this.

Indeed, here's my reply to Kenichi Handa (in case it didn't reach to
you):

  Such a fix is quite unfeasible. What do you think to do for other
  problematic characters as well? Introduce a new if-else clause for
  every one?

  I am not faimilar with introducing a new macro policy of emacs team
  but it'd probably be useful (handy?) to have something similar to
  this macro:

  (with-locale-ctype 'ascii
   ;; Any call to DOWNCASE/UPCASE within this (dynamic?) scope will
   ;; use the case conversion table specified in the first argument
   ;; of the WITH-CASE-TABLE macro.
   ...)

> Maybe we can define an ascii case table that doesn't get overwritten
> by the locale; then code like the above can bind to this case table
> temporarily (or we can define a downcase-ascii function that does
> such a thing).

I really wonder if is there really no possible way to switch between
case conversion tables of different locales properly. At least, can't
we totally switch to ASCII locale temporarily?

> But maybe, for Emacs 22, the above hack is all we need. Is it true
> that in practice, all we have to worry about is "i"?

Yes. But I can only answer for Turkish characters. Somebody needs to
look through whole locales. ;-) (BTW, did I mention that I suspect
similar cases in other places as well, like gnus?)

Regards.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-02 22:52   ` Chong Yidong
  2007-04-02 23:20     ` Volkan YAZICI
@ 2007-04-03  1:24     ` Kenichi Handa
  2007-04-03 21:40     ` Richard Stallman
  2 siblings, 0 replies; 45+ messages in thread
From: Kenichi Handa @ 2007-04-03  1:24 UTC (permalink / raw)
  To: Chong Yidong; +Cc: yazicivo, rms, emacs-devel

In article <87ejn2jsnu.fsf@stupidchicken.com>, Chong Yidong <cyd@stupidchicken.com> writes:

> Kenichi Handa <handa@m17n.org> writes:
>>> smtpmail tries to downcase the strings using DOWNCASE function
>>> during the SMTP communication.  In Turkish, downcased I is a
>>> dotless i.  Therefore, while it tries to downcase some AUTH
>>> mechanisms (in smtpmail-via-smtp function), PLAIN and LOGIN turns
>>> into pla?n and log?n.
> >
> > Does the attached change fix the problem?
> >
> > ! 		(let ((name (mapcar (lambda (s)
> > ! 				      (setq s (downcase s))
> > ! 				      ;; If `I' is downcased to dotless-i,
> > ! 				      ;; convert it to `i'.
> > ! 				      (if (/= (downcase ?I) ?i)
> > ! 					  (subst-char-in-string
> > ! 					   (downcase ?I) ?i s t))
> > ! 				      (intern s))

> I wonder if there's a better way to do this.  Maybe we can define an
> ascii case table that doesn't get overwritten by the locale; then code
> like the above can bind to this case table temporarily (or we can
> define a downcase-ascii function that does such a thing).

> But maybe, for Emacs 22, the above hack is all we need.  Is it true
> that in practice, all we have to worry about is "i"?

In practice I think only dotless-i is the problematic
character.  But, I don't know what is the right thing for
the code around there for non-ascii characters.  For
instance, dotted-I (U+0130) is downcased to `i'.  Is it ok?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-02 17:31 ` Volkan YAZICI
@ 2007-04-03  8:06   ` Kenichi Handa
  2007-04-03  8:28     ` Werner LEMBERG
  2007-04-03  9:24     ` Eli Zaretskii
  0 siblings, 2 replies; 45+ messages in thread
From: Kenichi Handa @ 2007-04-03  8:06 UTC (permalink / raw)
  To: Volkan YAZICI; +Cc: rms, emacs-devel

In article <87y7lahee5.fsf@ttnet.net.tr>, Volkan YAZICI <yazicivo@ttnet.net.tr> writes:

> > --- 691,704 ----
> >   			  (>= (car response-code) 400))
> >   		      (throw 'done nil)))
> >   	      (dolist (line (cdr (cdr response-code)))
> > ! 		(let ((name (mapcar (lambda (s)
> > ! 				      (setq s (downcase s))
> > ! 				      ;; If `I' is downcased to dotless-i,
> > ! 				      ;; convert it to `i'.
> > ! 				      (if (/= (downcase ?I) ?i)
> > ! 					  (subst-char-in-string
> > ! 					   (downcase ?I) ?i s t))

> Such a fix is quite unfeasible. What do you think to do for other
> problematic characters as well? Introduce a new if-else clause for
> every one?

To avoid such an ad-hoc fix, I must know the purpose of
downcasing here.  Do we need just "tr A-Z a-z"?  Or, do we
have to downcase also non-ASCII chars?  In the latter case,
what to do with conversion from dotted-I to `i' in Turkish?
Do we need such an advanced downcasing as "MASSE" -> "maße"
for German?

> I am not faimilar with introducing a new macro policy of emacs team
> but it'd probably be useful (handy?) to have something similar to this
> macro:

> (with-case-table 'ascii
>   ;; Any call to DOWNCASE/UPCASE within this (dynamic?) scope will use
>   ;; the case conversion table specified in the first argument of the
>   ;; WITH-CASE-TABLE macro.
>   ...)

I also thought about such a thing at first, but the above
questions rose, and unless I know clearly what to do,
anything I do will be ad-hoc.  So, I reached to the quite
localized fix (also by considering that the release is
near).

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-03  8:06   ` Kenichi Handa
@ 2007-04-03  8:28     ` Werner LEMBERG
  2007-04-03  9:24     ` Eli Zaretskii
  1 sibling, 0 replies; 45+ messages in thread
From: Werner LEMBERG @ 2007-04-03  8:28 UTC (permalink / raw)
  To: handa; +Cc: yazicivo, rms, emacs-devel


> Do we need such an advanced downcasing as "MASSE" -> "maße"
> for German?

This is not possible without knowing the lowercase version first:

  Maße  -> MASSE
  Masse -> MASSE


     Werner

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-03  8:06   ` Kenichi Handa
  2007-04-03  8:28     ` Werner LEMBERG
@ 2007-04-03  9:24     ` Eli Zaretskii
  2007-04-03  9:33       ` Simon Josefsson
  1 sibling, 1 reply; 45+ messages in thread
From: Eli Zaretskii @ 2007-04-03  9:24 UTC (permalink / raw)
  To: Kenichi Handa, Simon Josefsson; +Cc: yazicivo, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Tue, 03 Apr 2007 17:06:07 +0900
> Cc: rms@gnu.org, emacs-devel@gnu.org
> 
> To avoid such an ad-hoc fix, I must know the purpose of
> downcasing here.  Do we need just "tr A-Z a-z"?  Or, do we
> have to downcase also non-ASCII chars?

I think the purpose is quite obvious from this fragment:

	    (smtpmail-send-command process (format "EHLO %s" (smtpmail-fqdn)))

	    (if (or (null (car (setq response-code
				     (smtpmail-read-response process))))
		    (not (integerp (car response-code)))
		    (>= (car response-code) 400))
		(progn
		  ;; HELO
		  (smtpmail-send-command
		   process (format "HELO %s" (smtpmail-fqdn)))

		  (if (or (null (car (setq response-code
					   (smtpmail-read-response process))))
			  (not (integerp (car response-code)))
			  (>= (car response-code) 400))
		      (throw 'done nil)))
	      (dolist (line (cdr (cdr response-code)))
		(let ((name (mapcar (lambda (s) (intern (downcase s)))
				    (split-string (substring line 4) "[ ]"))))
		  (and (eq (length name) 1)
		       (setq name (car name)))
		  (and name
		       (cond ((memq (if (consp name) (car name) name)
				    '(verb xvrb 8bitmime onex xone
					   expn size dsn etrn
					   enhancedstatuscodes
					   help xusr
					   auth=login auth starttls))
			      (setq supported-extensions
				    (cons name supported-extensions)))
			     (smtpmail-warn-about-unknown-extensions
			      (message "Unknown extension %s" name)))))))

My interpretation of this is that smtpmail sends EHLO/HELO command to
the SMTP server, and then examines the response, which specifies the
features supported by the server as a list of strings separated by
whitespace.  For each such feature, we downcase and intern it, and
then check whether the resulting symbol is a member of the list of
features known to smtpmail, it adds the feature to the
supported-extensions list.  Thus, downcasing needs to support only the
words in the above list of known extensions (verb, xvrb, 8bitmime,
etc.), which are pure-ASCII words.

IOW, "tr A-Z a-z" should be enough.

Simon, am I right?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-03  9:24     ` Eli Zaretskii
@ 2007-04-03  9:33       ` Simon Josefsson
  2007-04-03 13:44         ` Volkan YAZICI
  0 siblings, 1 reply; 45+ messages in thread
From: Simon Josefsson @ 2007-04-03  9:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yazicivo, emacs-devel, Kenichi Handa

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Kenichi Handa <handa@m17n.org>
>> Date: Tue, 03 Apr 2007 17:06:07 +0900
>> Cc: rms@gnu.org, emacs-devel@gnu.org
>> 
>> To avoid such an ad-hoc fix, I must know the purpose of
>> downcasing here.  Do we need just "tr A-Z a-z"?  Or, do we
>> have to downcase also non-ASCII chars?
>
> I think the purpose is quite obvious from this fragment:
...
> My interpretation of this is that smtpmail sends EHLO/HELO command to
> the SMTP server, and then examines the response, which specifies the
> features supported by the server as a list of strings separated by
> whitespace.  For each such feature, we downcase and intern it, and
> then check whether the resulting symbol is a member of the list of
> features known to smtpmail, it adds the feature to the
> supported-extensions list.  Thus, downcasing needs to support only the
> words in the above list of known extensions (verb, xvrb, 8bitmime,
> etc.), which are pure-ASCII words.
>
> IOW, "tr A-Z a-z" should be enough.
>
> Simon, am I right?

Yes, I agree.  The reason is to allow servers to specify the verbs in
lower case, and for things to work anyway.  The relevant part from RFC
2821 is:

      ehlo-line    = ehlo-keyword *( SP ehlo-param )

      ehlo-keyword = (ALPHA / DIGIT) *(ALPHA / DIGIT / "-")
                   ; additional syntax of ehlo-params depends on
                   ; ehlo-keyword

I note that for future compatibility, we could treat this as UTF-8 but
I believe it will cause more failures than it is worth.  There are no
advantages today in doing that, and nobody can tell whether there will
be any advantages from it ever.  So the safest is likely to leave this
as ASCII-only.

/Simon

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-03  9:33       ` Simon Josefsson
@ 2007-04-03 13:44         ` Volkan YAZICI
  2007-04-03 15:29           ` Eli Zaretskii
  2007-04-03 21:16           ` Davis Herring
  0 siblings, 2 replies; 45+ messages in thread
From: Volkan YAZICI @ 2007-04-03 13:44 UTC (permalink / raw)
  To: Simon Josefsson; +Cc: Eli Zaretskii, emacs-devel, Kenichi Handa

Simon Josefsson <simon@josefsson.org> writes:
> I note that for future compatibility, we could treat this as UTF-8 but
> I believe it will cause more failures than it is worth.  There are no
> advantages today in doing that, and nobody can tell whether there will
> be any advantages from it ever.  So the safest is likely to leave this
> as ASCII-only.

I agree.

BTW, here are some more buggy line from lisp/gnus: (These are just a
small minority of the problematic lines as far as I can see from "grep
downcase -RHn lisp/gnus")

nndoc.el:511:   (intern (downcase (mail-header-strip encoding))))))
nndoc.el:905:   subtype (downcase (match-string 2 content-type))
rfc2047.el:674: (concat "=?" (downcase (symbol-name mime-charset))

Failing Cases:
(downcase "ISO-8859-1") ==> ıso-8859-1
(downcase "text/plain") ==> text-plaın


Regards.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-03 13:44         ` Volkan YAZICI
@ 2007-04-03 15:29           ` Eli Zaretskii
  2007-04-03 15:50             ` David Kastrup
  2007-04-03 16:30             ` Chong Yidong
  2007-04-03 21:16           ` Davis Herring
  1 sibling, 2 replies; 45+ messages in thread
From: Eli Zaretskii @ 2007-04-03 15:29 UTC (permalink / raw)
  To: Volkan YAZICI; +Cc: simon, emacs-devel, handa

> Cc: Eli Zaretskii <eliz@gnu.org>,  Kenichi Handa <handa@m17n.org>,
> 	  emacs-devel@gnu.org
> From: Volkan YAZICI <yazicivo@ttnet.net.tr>
> Date: Tue, 03 Apr 2007 16:44:40 +0300
> 
> BTW, here are some more buggy line from lisp/gnus: (These are just a
> small minority of the problematic lines as far as I can see from "grep
> downcase -RHn lisp/gnus")
> 
> nndoc.el:511:   (intern (downcase (mail-header-strip encoding))))))
> nndoc.el:905:   subtype (downcase (match-string 2 content-type))
> rfc2047.el:674: (concat "=?" (downcase (symbol-name mime-charset))
> 
> Failing Cases:
> (downcase "ISO-8859-1") ==> ıso-8859-1
> (downcase "text/plain") ==> text-plaın

Perhaps we should have something like downcase-ascii for such
situations.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-03 15:29           ` Eli Zaretskii
@ 2007-04-03 15:50             ` David Kastrup
  2007-04-03 16:03               ` Andreas Schwab
       [not found]               ` <87k5wt5tnx.fsf@ttnet.net.tr>
  2007-04-03 16:30             ` Chong Yidong
  1 sibling, 2 replies; 45+ messages in thread
From: David Kastrup @ 2007-04-03 15:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: simon, Volkan YAZICI, handa, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: Eli Zaretskii <eliz@gnu.org>,  Kenichi Handa <handa@m17n.org>,
>> 	  emacs-devel@gnu.org
>> From: Volkan YAZICI <yazicivo@ttnet.net.tr>
>> Date: Tue, 03 Apr 2007 16:44:40 +0300
>> 
>> BTW, here are some more buggy line from lisp/gnus: (These are just a
>> small minority of the problematic lines as far as I can see from "grep
>> downcase -RHn lisp/gnus")
>> 
>> nndoc.el:511:   (intern (downcase (mail-header-strip encoding))))))
>> nndoc.el:905:   subtype (downcase (match-string 2 content-type))
>> rfc2047.el:674: (concat "=?" (downcase (symbol-name mime-charset))
>> 
>> Failing Cases:
>> (downcase "ISO-8859-1") ==> ıso-8859-1
>> (downcase "text/plain") ==> text-plaın
>
> Perhaps we should have something like downcase-ascii for such
> situations.

(downcase "i") -> "ı" is clearly wrong even in Turkish.  And so is
(downcase "/") -> "-"

I actually have a hard time believing this.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-03 15:50             ` David Kastrup
@ 2007-04-03 16:03               ` Andreas Schwab
       [not found]               ` <87k5wt5tnx.fsf@ttnet.net.tr>
  1 sibling, 0 replies; 45+ messages in thread
From: Andreas Schwab @ 2007-04-03 16:03 UTC (permalink / raw)
  To: David Kastrup; +Cc: simon, Eli Zaretskii, Volkan YAZICI, emacs-devel, handa

David Kastrup <dak@gnu.org> writes:

> (downcase "i") -> "ı" is clearly wrong even in Turkish.  And so is
> (downcase "/") -> "-"
>
> I actually have a hard time believing this.

I cannot reproduce that here.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
       [not found]               ` <87k5wt5tnx.fsf@ttnet.net.tr>
@ 2007-04-03 16:21                 ` David Kastrup
       [not found]                 ` <87slbhquuw.fsf@ttnet.net.tr>
  1 sibling, 0 replies; 45+ messages in thread
From: David Kastrup @ 2007-04-03 16:21 UTC (permalink / raw)
  To: Volkan YAZICI; +Cc: simon, Eli Zaretskii, handa, emacs-devel

Volkan YAZICI <yazicivo@ttnet.net.tr> writes:

> David Kastrup <dak@gnu.org> writes:
>>>> Failing Cases:
>>>> (downcase "ISO-8859-1") ==> ıso-8859-1
>>>> (downcase "text/plain") ==> text-plaın
>                    ^
> Sorry, for the typo. That's my mistake. I just wanted to give an
> example test case for the possible failure situations.

That covers the slash.  But I also don't believe
(downcase "i") -> "ı".  Are you sure about that part of the second line?

> Here's small phrase from wikipedia:

[...]

Yes, I understood all that.  But please clear whether you were serious
about (downcase "i"):

>> (downcase "i") -> "ı" is clearly wrong even in Turkish.
>>
>> I actually have a hard time believing this.
>
> Sorry for the mess.

So do you get this or not?  The slash you said was a typo.  What about
"i"?

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-03 15:29           ` Eli Zaretskii
  2007-04-03 15:50             ` David Kastrup
@ 2007-04-03 16:30             ` Chong Yidong
  2007-04-03 17:57               ` with-case-table / ascii-case-table (was: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]) Reiner Steib
  2007-04-04 14:02               ` [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail] Richard Stallman
  1 sibling, 2 replies; 45+ messages in thread
From: Chong Yidong @ 2007-04-03 16:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: simon, Volkan YAZICI, handa, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> nndoc.el:511:   (intern (downcase (mail-header-strip encoding))))))
>> nndoc.el:905:   subtype (downcase (match-string 2 content-type))
>> rfc2047.el:674: (concat "=?" (downcase (symbol-name mime-charset))
>> 
>> Failing Cases:
>> (downcase "ISO-8859-1") ==> ıso-8859-1
>> (downcase "text/plain") ==> text-plaın
>
> Perhaps we should have something like downcase-ascii for such
> situations.

Yeah.

How about the following approach?  At the beginning of characters.el,
save the standard case table (which AFAICT hasn't been modified at
that point), as a variable ascii-case-table.  Then downcase-ascii can
use it.

*** emacs/lisp/international/characters.el.~1.65.~	2007-03-05 02:00:16.000000000 -0500
--- emacs/lisp/international/characters.el	2007-04-03 12:18:07.000000000 -0400
***************
*** 43,48 ****
--- 43,54 ----
  
  ;;; Predefined categories.
  
+ ;; Save ASCII case table.
+ 
+ (require 'case-table)
+ (defvar ascii-case-table (copy-case-table (standard-case-table))
+   "Case table for the ASCII character set.")
+ 
  ;; For each character set.
  
  (define-category ?a "ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0])")
*** emacs/lisp/subr.el.~1.549.~	2007-03-19 14:37:19.000000000 -0400
--- emacs/lisp/subr.el	2007-04-03 12:26:04.000000000 -0400
***************
*** 2804,2809 ****
--- 2804,2819 ----
        ;; Reconstruct a string from the pieces.
        (setq matches (cons (substring string start l) matches)) ; leftover
        (apply #'concat (nreverse matches)))))
+ 
+ (defun downcase-ascii (string)
+   "Convert ASCII argument to lower case and return that.
+ The argument may be a character or string.  The result has the same type.
+ The argument object is not altered--the value is a copy."
+   (let ((old-case-table (current-case-table)))
+     (unwind-protect
+ 	(progn (set-case-table ascii-case-table)
+ 	       (downcase string))
+       (set-case-table old-case-table))))
  \f
  ;;;; invisibility specs
  

^ permalink raw reply	[flat|nested] 45+ messages in thread

* with-case-table / ascii-case-table (was: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail])
  2007-04-03 16:30             ` Chong Yidong
@ 2007-04-03 17:57               ` Reiner Steib
  2007-04-04 15:40                 ` with-case-table / ascii-case-table Chong Yidong
  2007-04-04 14:02               ` [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail] Richard Stallman
  1 sibling, 1 reply; 45+ messages in thread
From: Reiner Steib @ 2007-04-03 17:57 UTC (permalink / raw)
  To: Chong Yidong
  Cc: simon, handa, ding, emacs-devel, Volkan YAZICI, Eli Zaretskii

[-- Attachment #1: Type: text/plain, Size: 2373 bytes --]

On Tue, Apr 03 2007, Chong Yidong wrote:

> How about the following approach?  At the beginning of characters.el,
> save the standard case table (which AFAICT hasn't been modified at
> that point), as a variable ascii-case-table.  Then downcase-ascii can
> use it.
[...]
> + (defun downcase-ascii (string)
> +   "Convert ASCII argument to lower case and return that.
> + The argument may be a character or string.  The result has the same type.
> + The argument object is not altered--the value is a copy."
> +   (let ((old-case-table (current-case-table)))
> +     (unwind-protect
> + 	(progn (set-case-table ascii-case-table)
> + 	       (downcase string))
> +       (set-case-table old-case-table))))

Note that the problem is not only the `downcase' function.  So maybe
adding `with-case-table' would be a good idea.

Turkish Gnus user's might still suffer from the "slow search
operations problem" [1] in `gnus/nnfolder.el'.  Based on your previous
patch, I've made a preliminary patch for `nnfolder.el' [2].  I didn't
test it yet (I don't use the nnfolder back end).  It has to be
adjusted to use `ascii-case-table' and some compatibility code for
Emacs 21 (where the downcase problem is not present).

Bye, Reiner.

[1]
,----[ http://thread.gmane.org/gmane.emacs.gnus.general/63925/focus=63979 ]
| From: Reiner Steib
| Subject: Re: Slow operations on buffers of tens of megabytes
| Newsgroups: gmane.emacs.pretest.bugs, gmane.emacs.gnus.general
| Date: 2006-11-13 17:28:58 GMT (19 weeks, 5 days, 3 hours and 34 minutes ago)
| 
| On Thu, Nov 09 2006, Alexandre Oliva wrote:
| 
| > Ultimately, I'm a bit concerned about messing with the case table of
| > an nnfolder buffer for the entire duration of the buffer.  It's hard
| > to tell whether there'd be any less visible fallouts.
| 
| Richard has eliminated the peculiar upcasing dotless-i to I in CVS.
| Does it fix your problem?
| 
| (IIUC, it should fix it _unless_ the user has a Turkish language
| environment.  I.e. Turkish Gnus user's might still suffer from this
| problem.)
| 
| ,----
| | 2006-11-12  Richard Stallman  <rms <at> gnu.org>
| | 
| | 	* language/european.el (turkish-case-conversion-enable)
| | 	(turkish-case-conversion-disable): New functions.
| | 	("Turkish" lang env): Use them.
| | 
| | 	* international/characters.el (case table):
| | 	Do nothing special for i and I.
| `----
`----

[2] 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rs-nnfolder-search-marker-2007-04-01.patch --]
[-- Type: text/x-patch, Size: 6354 bytes --]

--- nnfolder.el	26 Jan 2007 20:41:00 +0100	7.17
+++ nnfolder.el	01 Apr 2007 23:09:26 +0200	
@@ -104,6 +104,39 @@
 (defconst nnfolder-article-marker "X-Gnus-Article-Number: "
   "String used to demarcate what the article number for a message is.")
 
+;; Make sure we're using the standard case table.  In a Turkish locale, the
+;; "i" in "X-Gnus-Article-Number: " makes parsing large nnfolder groups very
+;; slow.
+;; ,----[ http://thread.gmane.org/gmane.emacs.gnus.general/63925/focus=63979 ]
+;; | Subject: Slow operations on buffers of tens of megabytes
+;; | Newsgroups: gmane.emacs.pretest.bugs,gmane.emacs.gnus.general
+;; | Date: 2006-11-13
+;; `----
+(defun nnfolder-search (&optional function string &rest args)
+  "Search for `nnfolder-article-marker' using the standard case table.
+FUNCTION is used for searching.  If STRING is given, it's used
+instead of `nnfolder-article-marker'.  The remaining ARGS are
+passed to the FUNCTION."
+  (let ((old-case-table (current-case-table))
+	point)
+    (unwind-protect
+	(progn
+	  (set-case-table (standard-case-table))
+	  (setq point
+		(apply (cond
+			((fboundp function)
+			 function)
+			(function
+			 'search-backward)
+			(t
+			 'search-forward))
+		       (or string
+			   (concat "\n" nnfolder-article-marker))
+		       args)))
+      (set-case-table old-case-table))
+    ;; Be sure to return what the FUNCTION returned.
+    point))
+
 (defvoo nnfolder-current-group nil)
 (defvoo nnfolder-current-buffer nil)
 (defvoo nnfolder-status-string "")
@@ -198,8 +231,7 @@
 		      ;; as caused by active file bogosity.
 		      (cond
 		       ((bobp))
-		       ((search-backward (concat "\n" nnfolder-article-marker)
-					 nil t)
+		       ((nnfolder-search 'search-backward nil nil t)
 			(goto-char (match-end 0))
 			(setq num (string-to-number
 				   (buffer-substring
@@ -208,8 +240,7 @@
 			(< num article)))
 		      ;; Check that we are before an article with a
 		      ;; higher number.
-		      (search-forward (concat "\n" nnfolder-article-marker)
-				      nil t)
+		      (nnfolder-search 'search-forward nil nil t)
 		      (progn
 			(setq num (string-to-number
 				   (buffer-substring
@@ -284,8 +315,7 @@
 	      (cons nnfolder-current-group article)
 	    (goto-char (point-min))
 	    (cons nnfolder-current-group
-		  (if (search-forward (concat "\n" nnfolder-article-marker)
-				      nil t)
+		  (if (nnfolder-search 'search-forward nil nil t)
 		      (string-to-number (buffer-substring
 				      (point) (point-at-eol)))
 		    -1))))))))
@@ -405,10 +435,9 @@
     (when nnfolder-current-buffer
       (set-buffer nnfolder-current-buffer)
       (goto-char (point-min))
-      (let ((marker (concat "\n" nnfolder-article-marker))
-	    (number "[0-9]+")
+      (let ((number "[0-9]+")
 	    numbers)
-	(while (and (search-forward marker nil t)
+	(while (and (nnfolder-search 'search-forward nil nil t)
 		    (re-search-forward number nil t))
 	  (let ((newnum (string-to-number (match-string 0))))
 	    (if (nnmail-within-headers-p)
@@ -436,8 +465,7 @@
       (while (and maybe-expirable is-old)
 	(goto-char (point-min))
 	(when (and (nnfolder-goto-article (car maybe-expirable))
-		   (search-forward (concat "\n" nnfolder-article-marker)
-				   nil t))
+		   (nnfolder-search 'search-forward nil nil t))
 	  (forward-sexp)
 	  (when (setq is-old
 		      (nnmail-expired-article-p
@@ -480,8 +508,7 @@
 	 (erase-buffer)
 	 (insert-buffer-substring nntp-server-buffer)
 	 (goto-char (point-min))
-	 (while (re-search-forward
-		 (concat "^" nnfolder-article-marker)
+	 (while (nnfolder-search 're-search-forward nil
 		 (save-excursion (and (search-forward "\n\n" nil t) (point)))
 		 t)
 	   (gnus-delete-line))
@@ -523,7 +550,9 @@
 	(if (search-forward "\n\n" nil t)
 	    (forward-line -1)
 	  (goto-char (point-max)))
-	(while (re-search-backward (concat "^" nnfolder-article-marker) nil t)
+	(while (nnfolder-search 're-search-backward
+				(concat "^" nnfolder-article-marker)
+				nil t)
 	  (delete-region (point) (progn (forward-line 1) (point))))
 	(when nnmail-cache-accepted-message-ids
 	  (nnmail-cache-insert (nnmail-fetch-field "message-id") 
@@ -642,13 +671,12 @@
 (defun nnfolder-adjust-min-active (group)
   ;; Find the lowest active article in this group.
   (let* ((active (cadr (assoc group nnfolder-group-alist)))
-	 (marker (concat "\n" nnfolder-article-marker))
 	 (number "[0-9]+")
 	 (activemin (cdr active)))
     (save-excursion
       (set-buffer nnfolder-current-buffer)
       (goto-char (point-min))
-      (while (and (search-forward marker nil t)
+      (while (and (nnfolder-search 'search-forward nil nil t)
 		  (re-search-forward number nil t))
 	(let ((newnum (string-to-number (match-string 0))))
 	  (if (nnmail-within-headers-p)
@@ -788,7 +816,7 @@
       (if (search-forward "\n\n" nil t)
 	  (forward-line -1)
 	(goto-char (point-max)))
-      (while (search-backward (concat "\n" nnfolder-article-marker) nil t)
+      (while (nnfolder-search 'search-backward nil nil t)
 	(delete-region (1+ (point)) (progn (forward-line 2) (point))))
 
       ;; Insert the new newsgroup marker.
@@ -894,7 +922,6 @@
 	(nnmail-activate 'nnfolder)
 	;; Read in the file.
 	(let ((delim "^From ")
-	      (marker (concat "\n" nnfolder-article-marker))
 	      (number "[0-9]+")
 	      (active (or (cadr (assoc group nnfolder-group-alist))
 			  (cons 1 0)))
@@ -928,7 +955,7 @@
 	  (when (or nnfolder-ignore-active-file
 		    novbuf
 		    (< maxid 2))
-	    (while (and (search-forward marker nil t)
+	    (while (and (nnfolder-search 'search-forward nil nil t)
 			(looking-at number))
 	      (setq newnum (string-to-number (match-string 0)))
 	      (when (nnmail-within-headers-p)
@@ -959,7 +986,7 @@
 	  (when (not (or nnfolder-distrust-mbox
 			 (< maxid 2)))
 	    (goto-char (point-max))
-	    (unless (re-search-backward marker nil t)
+	    (unless (nnfolder-search 're-search-backward nil nil t)
 	      (goto-char (point-min)))
 	    ;;(when (nnmail-search-unix-mail-delim)
 	    ;;  (goto-char (point-min)))
@@ -982,7 +1009,7 @@
 				(point)
 			      (point-max)))
 	    (goto-char start)
-	    (when (not (search-forward marker end t))
+	    (when (not (nnfolder-search 'search-forward nil end t))
 	      (narrow-to-region start end)
 	      (nnmail-insert-lines)
 	      (nnfolder-insert-newsgroup-line

[-- Attachment #3: Type: text/plain, Size: 100 bytes --]

-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

[-- Attachment #4: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
       [not found]                 ` <87slbhquuw.fsf@ttnet.net.tr>
@ 2007-04-03 18:44                   ` David Kastrup
  0 siblings, 0 replies; 45+ messages in thread
From: David Kastrup @ 2007-04-03 18:44 UTC (permalink / raw)
  To: Volkan YAZICI; +Cc: simon, Eli Zaretskii, handa, emacs-devel

Volkan YAZICI <yazicivo@ttnet.net.tr> writes:

> David Kastrup <dak@gnu.org> writes:
>> Yes, I understood all that.  But please clear whether you were
>> serious about (downcase "i"):
>
> ;; These are results from an emacs session started with
> ;; LC_CTYPE=tr_TR.UTF-8 locale and they're correct.
> (downcase "ıI iİ") ==> "ıı ii"
> (upcase   "ıI iİ") ==> "II İİ"
>
> Rule is easy, dotted and dotless i and I are different characters,
> therefore respectively their uppercase and lowercase ones differ too.
>
>  "ı" ==> "I"
>  "i" ==> "İ"
>
> I hope this clarifies the problem.

Yes.  It means that the complete last line of your example was wrong.
Even in Turkish, (downcase "text/plain") will be "text/plain" then.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in  smtpmail]
  2007-04-03 13:44         ` Volkan YAZICI
  2007-04-03 15:29           ` Eli Zaretskii
@ 2007-04-03 21:16           ` Davis Herring
  1 sibling, 0 replies; 45+ messages in thread
From: Davis Herring @ 2007-04-03 21:16 UTC (permalink / raw)
  To: Volkan YAZICI; +Cc: Simon Josefsson, Eli Zaretskii, Kenichi Handa, emacs-devel

> Failing Cases:
> (downcase "ISO-8859-1") ==> Ä±so-8859-1
> (downcase "text/plain") ==> text-plaÄ±n

Did '/' really become '-' in the second case?  I can't imagine why it
would, but perhaps some locale thought it was a date-separator or
something.  [Also, I somewhat suspect that my mailer will mangle the
dotless letters; apologies if so.]

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-02 22:52   ` Chong Yidong
  2007-04-02 23:20     ` Volkan YAZICI
  2007-04-03  1:24     ` Kenichi Handa
@ 2007-04-03 21:40     ` Richard Stallman
  2 siblings, 0 replies; 45+ messages in thread
From: Richard Stallman @ 2007-04-03 21:40 UTC (permalink / raw)
  To: Chong Yidong; +Cc: yazicivo, emacs-devel, handa

    I wonder if there's a better way to do this.  Maybe we can define an
    ascii case table that doesn't get overwritten by the locale; then code
    like the above can bind to this case table temporarily (or we can
    define a downcase-ascii function that does such a thing).

That seems like a good approach.  It should be just as simple as the hack
that is proposed, and much faster.  (I do not know whether the speed of that
code matters.)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-03 16:30             ` Chong Yidong
  2007-04-03 17:57               ` with-case-table / ascii-case-table (was: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]) Reiner Steib
@ 2007-04-04 14:02               ` Richard Stallman
  2007-04-04 14:27                 ` Andreas Schwab
  2007-04-04 18:01                 ` Eli Zaretskii
  1 sibling, 2 replies; 45+ messages in thread
From: Richard Stallman @ 2007-04-04 14:02 UTC (permalink / raw)
  To: Chong Yidong; +Cc: simon, eliz, yazicivo, emacs-devel, handa

    + (defun downcase-ascii (string)
    +   "Convert ASCII argument to lower case and return that.

This seems to be a good solution.  The general macro would also be ok.

Using "ASCII" for the name and doc string are somewhat misleading,
since this is not limited to ASCII.  For instance, it works fine for
Latin 1 also.

What this function does is downcase in the most standard way.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-04 14:02               ` [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail] Richard Stallman
@ 2007-04-04 14:27                 ` Andreas Schwab
  2007-04-05 23:11                   ` Richard Stallman
  2007-04-04 18:01                 ` Eli Zaretskii
  1 sibling, 1 reply; 45+ messages in thread
From: Andreas Schwab @ 2007-04-04 14:27 UTC (permalink / raw)
  To: rms; +Cc: simon, handa, Chong Yidong, emacs-devel, yazicivo, eliz

Richard Stallman <rms@gnu.org> writes:

> Using "ASCII" for the name and doc string are somewhat misleading,
> since this is not limited to ASCII.  For instance, it works fine for
> Latin 1 also.

Once you start using non-ASCII letters you have to think about locales.  I
don't think we should encourage sloppiness here.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: with-case-table / ascii-case-table
  2007-04-03 17:57               ` with-case-table / ascii-case-table (was: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]) Reiner Steib
@ 2007-04-04 15:40                 ` Chong Yidong
  0 siblings, 0 replies; 45+ messages in thread
From: Chong Yidong @ 2007-04-04 15:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: simon, Volkan YAZICI, handa, emacs-devel, ding

> Note that the problem is not only the `downcase' function.  So maybe
> adding `with-case-table' would be a good idea.

I have added a new `ascii-case-table' variable in mule.el, and a
`with-case-table' macro to subr.el.

I modified smtpmail to use these, but have not changed Gnus.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-04 14:02               ` [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail] Richard Stallman
  2007-04-04 14:27                 ` Andreas Schwab
@ 2007-04-04 18:01                 ` Eli Zaretskii
  2007-04-05 23:11                   ` Richard Stallman
  1 sibling, 1 reply; 45+ messages in thread
From: Eli Zaretskii @ 2007-04-04 18:01 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel, handa

> From: Richard Stallman <rms@gnu.org>
> CC: eliz@gnu.org, simon@josefsson.org, yazicivo@ttnet.net.tr,
> 	handa@m17n.org, emacs-devel@gnu.org
> Date: Wed, 04 Apr 2007 10:02:31 -0400
> 
> Using "ASCII" for the name and doc string are somewhat misleading,
> since this is not limited to ASCII.  For instance, it works fine for
> Latin 1 also.

downcase-A-to-Z ?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-04 14:27                 ` Andreas Schwab
@ 2007-04-05 23:11                   ` Richard Stallman
  2007-04-06  6:15                     ` Kenichi Handa
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Stallman @ 2007-04-05 23:11 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: simon, handa, cyd, emacs-devel, yazicivo, eliz

    > Using "ASCII" for the name and doc string are somewhat misleading,
    > since this is not limited to ASCII.  For instance, it works fine for
    > Latin 1 also.

    Once you start using non-ASCII letters you have to think about locales.

Only for Turkish.  I think that is the only language 
which has a reason to alter the case tables.
Does anyone know of any other?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-04 18:01                 ` Eli Zaretskii
@ 2007-04-05 23:11                   ` Richard Stallman
  2007-04-06  8:31                     ` Eli Zaretskii
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Stallman @ 2007-04-05 23:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: handa, emacs-devel

    > Using "ASCII" for the name and doc string are somewhat misleading,
    > since this is not limited to ASCII.  For instance, it works fine for
    > Latin 1 also.

    downcase-A-to-Z ?

My point is it is not limited to the ASCII letters.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-05 23:11                   ` Richard Stallman
@ 2007-04-06  6:15                     ` Kenichi Handa
  2007-04-06  6:49                       ` Werner LEMBERG
  2007-04-06 19:47                       ` Richard Stallman
  0 siblings, 2 replies; 45+ messages in thread
From: Kenichi Handa @ 2007-04-06  6:15 UTC (permalink / raw)
  To: rms; +Cc: simon, schwab, cyd, emacs-devel, yazicivo, eliz

In article <E1HZb6z-0002iE-4Y@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

> Using "ASCII" for the name and doc string are somewhat misleading,
> since this is not limited to ASCII.  For instance, it works fine for
> Latin 1 also.

>     Once you start using non-ASCII letters you have to think about locales.

> Only for Turkish.  I think that is the only language 
> which has a reason to alter the case tables.
> Does anyone know of any other?

Azeri is the same as Turkish as for this.  For the current
specific case, only "I" is the problem.  But, in general
case changing, there are many many weird problems, and some
of them require locale-dependent processing.  For instance,
U+00CC (I WITH GRAVE) must be downcased into "U+0069 U+0307
"U+0300" sequence (i with dot-above and grave) in
Lithuanian.

I'll attache the file SpecialCasing.txt of Unicode Character
Database.

By the way...

Werner LEMBERG <wl@gnu.org> writes:

> > Do we need such an advanced downcasing as "MASSE" -> "maße"
> > for German?

> This is not possible without knowing the lowercase version first:

>   Maße  -> MASSE
>   Masse -> MASSE

Ummm.  So, in casefolding search, which is good; "maße"
matches with both "MASSE" and "masse", "maße" doesn't match
with them, or "maße" matches only with "MASSE".

---
Kenichi Handa
handa@m17n.org

# SpecialCasing-5.0.0.txt
# Date: 2006-03-03, 08:23:36 GMT [MD]
#
# Unicode Character Database
# Copyright (c) 1991-2006 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see UCD.html
#
# Special Casing Properties
#
# This file is a supplement to the UnicodeData file.
# It contains additional information about the casing of Unicode characters.
# (For compatibility, the UnicodeData.txt file only contains case mappings for
# characters where they are 1-1, and does not have locale-specific mappings.)
# For more information, see the discussion of Case Mappings in the Unicode Standard.
#
# All code points not listed in this file that do not have a simple case mappings
# in UnicodeData.txt map to themselves.
# ================================================================================
# Format
# ================================================================================
# The entries in this file are in the following machine-readable format:
#
# <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment>
#
# <code>, <lower>, <title>, and <upper> provide character values in hex. If there is more
# than one character, they are separated by spaces. Other than as used to separate 
# elements, spaces are to be ignored.
#
# The <condition_list> is optional. Where present, it consists of one or more locale IDs
# or contexts, separated by spaces. In these conditions:
# - A condition list overrides the normal behavior if all of the listed conditions are true.
# - The context is always the context of the characters in the original string,
#   NOT in the resulting string.
# - Case distinctions in the condition list are not significant.
# - Conditions preceded by "Not_" represent the negation of the condition.
#
# A locale ID is defined by taking any language tag as defined by
# RFC 3066 (or its successor), and replacing '-' by '_'.
#
# A context for a character C is defined by Section 3.13 Default Case 
# Operations, of The Unicode Standard, Version 5.0.
# (This is identical to the context defined by Unicode 4.1.0,
#  as specified in http://www.unicode.org/versions/Unicode4.1.0/)
#
# Parsers of this file must be prepared to deal with future additions to this format:
#  * Additional contexts
#  * Additional fields
# ================================================================================

# ================================================================================
# Unconditional mappings
# ================================================================================

# The German es-zed is special--the normal mapping is to SS.
# Note: the titlecase should never occur in practice. It is equal to titlecase(uppercase(<es-zed>))

00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S

# Preserve canonical equivalence for I with dot. Turkic is handled below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

# Ligatures

FB00; FB00; 0046 0066; 0046 0046; # LATIN SMALL LIGATURE FF
FB01; FB01; 0046 0069; 0046 0049; # LATIN SMALL LIGATURE FI
FB02; FB02; 0046 006C; 0046 004C; # LATIN SMALL LIGATURE FL
FB03; FB03; 0046 0066 0069; 0046 0046 0049; # LATIN SMALL LIGATURE FFI
FB04; FB04; 0046 0066 006C; 0046 0046 004C; # LATIN SMALL LIGATURE FFL
FB05; FB05; 0053 0074; 0053 0054; # LATIN SMALL LIGATURE LONG S T
FB06; FB06; 0053 0074; 0053 0054; # LATIN SMALL LIGATURE ST

0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN
FB13; FB13; 0544 0576; 0544 0546; # ARMENIAN SMALL LIGATURE MEN NOW
FB14; FB14; 0544 0565; 0544 0535; # ARMENIAN SMALL LIGATURE MEN ECH
FB15; FB15; 0544 056B; 0544 053B; # ARMENIAN SMALL LIGATURE MEN INI
FB16; FB16; 054E 0576; 054E 0546; # ARMENIAN SMALL LIGATURE VEW NOW
FB17; FB17; 0544 056D; 0544 053D; # ARMENIAN SMALL LIGATURE MEN XEH

# No corresponding uppercase precomposed character

0149; 0149; 02BC 004E; 02BC 004E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
0390; 0390; 0399 0308 0301; 0399 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
03B0; 03B0; 03A5 0308 0301; 03A5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
01F0; 01F0; 004A 030C; 004A 030C; # LATIN SMALL LETTER J WITH CARON
1E96; 1E96; 0048 0331; 0048 0331; # LATIN SMALL LETTER H WITH LINE BELOW
1E97; 1E97; 0054 0308; 0054 0308; # LATIN SMALL LETTER T WITH DIAERESIS
1E98; 1E98; 0057 030A; 0057 030A; # LATIN SMALL LETTER W WITH RING ABOVE
1E99; 1E99; 0059 030A; 0059 030A; # LATIN SMALL LETTER Y WITH RING ABOVE
1E9A; 1E9A; 0041 02BE; 0041 02BE; # LATIN SMALL LETTER A WITH RIGHT HALF RING
1F50; 1F50; 03A5 0313; 03A5 0313; # GREEK SMALL LETTER UPSILON WITH PSILI
1F52; 1F52; 03A5 0313 0300; 03A5 0313 0300; # GREEK SMALL LETTER UPSILON WITH PSILI AND VARIA
1F54; 1F54; 03A5 0313 0301; 03A5 0313 0301; # GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA
1F56; 1F56; 03A5 0313 0342; 03A5 0313 0342; # GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI
1FB6; 1FB6; 0391 0342; 0391 0342; # GREEK SMALL LETTER ALPHA WITH PERISPOMENI
1FC6; 1FC6; 0397 0342; 0397 0342; # GREEK SMALL LETTER ETA WITH PERISPOMENI
1FD2; 1FD2; 0399 0308 0300; 0399 0308 0300; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND VARIA
1FD3; 1FD3; 0399 0308 0301; 0399 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
1FD6; 1FD6; 0399 0342; 0399 0342; # GREEK SMALL LETTER IOTA WITH PERISPOMENI
1FD7; 1FD7; 0399 0308 0342; 0399 0308 0342; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND PERISPOMENI
1FE2; 1FE2; 03A5 0308 0300; 03A5 0308 0300; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND VARIA
1FE3; 1FE3; 03A5 0308 0301; 03A5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
1FE4; 1FE4; 03A1 0313; 03A1 0313; # GREEK SMALL LETTER RHO WITH PSILI
1FE6; 1FE6; 03A5 0342; 03A5 0342; # GREEK SMALL LETTER UPSILON WITH PERISPOMENI
1FE7; 1FE7; 03A5 0308 0342; 03A5 0308 0342; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND PERISPOMENI
1FF6; 1FF6; 03A9 0342; 03A9 0342; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI

# IMPORTANT-when capitalizing iota-subscript (0345)
#  It MUST be in normalized form--moved to the end of any sequence of combining marks.
#  This is because logically it represents a following base character!
#  E.g. <iota_subscript> (<Mn> | <Mc> | <Me>)+ => (<Mn> | <Mc> | <Me>)+ <iota_subscript>
# It should never be the first character in a word, so in titlecasing it can be left as is.

# The following cases are already in the UnicodeData file, so are only commented here.

# 0345; 0345; 0345; 0399; # COMBINING GREEK YPOGEGRAMMENI

# All letters with YPOGEGRAMMENI (iota-subscript) or PROSGEGRAMMENI (iota adscript)
# have special uppercases.
# Note: characters with PROSGEGRAMMENI are actually titlecase, not uppercase!

1F80; 1F80; 1F88; 1F08 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
1F81; 1F81; 1F89; 1F09 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAMMENI
1F82; 1F82; 1F8A; 1F0A 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND YPOGEGRAMMENI
1F83; 1F83; 1F8B; 1F0B 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA AND YPOGEGRAMMENI
1F84; 1F84; 1F8C; 1F0C 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA AND YPOGEGRAMMENI
1F85; 1F85; 1F8D; 1F0D 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA AND YPOGEGRAMMENI
1F86; 1F86; 1F8E; 1F0E 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI
1F87; 1F87; 1F8F; 1F0F 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI
1F88; 1F80; 1F88; 1F08 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI
1F89; 1F81; 1F89; 1F09 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI
1F8A; 1F82; 1F8A; 1F0A 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1F8B; 1F83; 1F8B; 1F0B 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1F8C; 1F84; 1F8C; 1F0C 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1F8D; 1F85; 1F8D; 1F0D 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1F8E; 1F86; 1F8E; 1F0E 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI
1F8F; 1F87; 1F8F; 1F0F 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
1F90; 1F90; 1F98; 1F28 0399; # GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI
1F91; 1F91; 1F99; 1F29 0399; # GREEK SMALL LETTER ETA WITH DASIA AND YPOGEGRAMMENI
1F92; 1F92; 1F9A; 1F2A 0399; # GREEK SMALL LETTER ETA WITH PSILI AND VARIA AND YPOGEGRAMMENI
1F93; 1F93; 1F9B; 1F2B 0399; # GREEK SMALL LETTER ETA WITH DASIA AND VARIA AND YPOGEGRAMMENI
1F94; 1F94; 1F9C; 1F2C 0399; # GREEK SMALL LETTER ETA WITH PSILI AND OXIA AND YPOGEGRAMMENI
1F95; 1F95; 1F9D; 1F2D 0399; # GREEK SMALL LETTER ETA WITH DASIA AND OXIA AND YPOGEGRAMMENI
1F96; 1F96; 1F9E; 1F2E 0399; # GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI
1F97; 1F97; 1F9F; 1F2F 0399; # GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI
1F98; 1F90; 1F98; 1F28 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI
1F99; 1F91; 1F99; 1F29 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND PROSGEGRAMMENI
1F9A; 1F92; 1F9A; 1F2A 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1F9B; 1F93; 1F9B; 1F2B 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1F9C; 1F94; 1F9C; 1F2C 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1F9D; 1F95; 1F9D; 1F2D 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1F9E; 1F96; 1F9E; 1F2E 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI
1F9F; 1F97; 1F9F; 1F2F 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
1FA0; 1FA0; 1FA8; 1F68 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI
1FA1; 1FA1; 1FA9; 1F69 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND YPOGEGRAMMENI
1FA2; 1FA2; 1FAA; 1F6A 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA AND YPOGEGRAMMENI
1FA3; 1FA3; 1FAB; 1F6B 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND VARIA AND YPOGEGRAMMENI
1FA4; 1FA4; 1FAC; 1F6C 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA AND YPOGEGRAMMENI
1FA5; 1FA5; 1FAD; 1F6D 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND OXIA AND YPOGEGRAMMENI
1FA6; 1FA6; 1FAE; 1F6E 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI
1FA7; 1FA7; 1FAF; 1F6F 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI
1FA8; 1FA0; 1FA8; 1F68 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI
1FA9; 1FA1; 1FA9; 1F69 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PROSGEGRAMMENI
1FAA; 1FA2; 1FAA; 1F6A 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1FAB; 1FA3; 1FAB; 1F6B 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1FAC; 1FA4; 1FAC; 1F6C 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1FAD; 1FA5; 1FAD; 1F6D 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1FAE; 1FA6; 1FAE; 1F6E 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI
1FAF; 1FA7; 1FAF; 1F6F 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
1FB3; 1FB3; 1FBC; 0391 0399; # GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI
1FBC; 1FB3; 1FBC; 0391 0399; # GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI
1FC3; 1FC3; 1FCC; 0397 0399; # GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI
1FCC; 1FC3; 1FCC; 0397 0399; # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI
1FF3; 1FF3; 1FFC; 03A9 0399; # GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI
1FFC; 1FF3; 1FFC; 03A9 0399; # GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI

# Some characters with YPOGEGRAMMENI also have no corresponding titlecases

1FB2; 1FB2; 1FBA 0345; 1FBA 0399; # GREEK SMALL LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI
1FB4; 1FB4; 0386 0345; 0386 0399; # GREEK SMALL LETTER ALPHA WITH OXIA AND YPOGEGRAMMENI
1FC2; 1FC2; 1FCA 0345; 1FCA 0399; # GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI
1FC4; 1FC4; 0389 0345; 0389 0399; # GREEK SMALL LETTER ETA WITH OXIA AND YPOGEGRAMMENI
1FF2; 1FF2; 1FFA 0345; 1FFA 0399; # GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI
1FF4; 1FF4; 038F 0345; 038F 0399; # GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI

1FB7; 1FB7; 0391 0342 0345; 0391 0342 0399; # GREEK SMALL LETTER ALPHA WITH PERISPOMENI AND YPOGEGRAMMENI
1FC7; 1FC7; 0397 0342 0345; 0397 0342 0399; # GREEK SMALL LETTER ETA WITH PERISPOMENI AND YPOGEGRAMMENI
1FF7; 1FF7; 03A9 0342 0345; 03A9 0342 0399; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI AND YPOGEGRAMMENI

# ================================================================================
# Conditional mappings
# ================================================================================

# Special case for final form of sigma

03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA

# Note: the following cases for non-final are already in the UnicodeData file.

# 03A3; 03C3; 03A3; 03A3; # GREEK CAPITAL LETTER SIGMA
# 03C3; 03C3; 03A3; 03A3; # GREEK SMALL LETTER SIGMA
# 03C2; 03C2; 03A3; 03A3; # GREEK SMALL LETTER FINAL SIGMA

# Note: the following cases are not included, since they would case-fold in lowercasing

# 03C3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK SMALL LETTER SIGMA
# 03C2; 03C3; 03A3; 03A3; Not_Final_Sigma; # GREEK SMALL LETTER FINAL SIGMA

# ================================================================================
# Locale-sensitive mappings
# ================================================================================

# Lithuanian

# Lithuanian retains the dot in a lowercase i when followed by accents.

# Remove DOT ABOVE after "i" with upper or titlecase

0307; 0307; ; ; lt After_Soft_Dotted; # COMBINING DOT ABOVE

# Introduce an explicit dot above when lowercasing capital I's and J's
# whenever there are more accents above.
# (of the accents used in Lithuanian: grave, acute, tilde above, and ogonek)

0049; 0069 0307; 0049; 0049; lt More_Above; # LATIN CAPITAL LETTER I
004A; 006A 0307; 004A; 004A; lt More_Above; # LATIN CAPITAL LETTER J
012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WITH OGONEK
00CC; 0069 0307 0300; 00CC; 00CC; lt; # LATIN CAPITAL LETTER I WITH GRAVE
00CD; 0069 0307 0301; 00CD; 00CD; lt; # LATIN CAPITAL LETTER I WITH ACUTE
0128; 0069 0307 0303; 0128; 0128; lt; # LATIN CAPITAL LETTER I WITH TILDE

# ================================================================================

# Turkish and Azeri

# I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
# The following rules handle those cases.

0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE

# When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i.
# This matches the behavior of the canonically equivalent I-dot_above

0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE
0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE

# When lowercasing, unless an I is before a dot_above, it turns into a dotless i.

0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I
0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I

# When uppercasing, i turns into a dotted capital I

0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I
0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I

# Note: the following case is already in the UnicodeData file.

# 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I

# EOF

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06  6:15                     ` Kenichi Handa
@ 2007-04-06  6:49                       ` Werner LEMBERG
  2007-04-06  7:15                         ` Kenichi Handa
  2007-04-06 19:47                       ` Richard Stallman
  1 sibling, 1 reply; 45+ messages in thread
From: Werner LEMBERG @ 2007-04-06  6:49 UTC (permalink / raw)
  To: handa; +Cc: simon, rms, schwab, cyd, emacs-devel, yazicivo, eliz


> > > Do we need such an advanced downcasing as "MASSE" -> "maße"
> > > for German?
> 
> > This is not possible without knowing the lowercase version first:
> 
> >   Maße  -> MASSE
> >   Masse -> MASSE
> 
> Ummm.  So, in casefolding search, which is good; "maße"
> matches with both "MASSE" and "masse", "maße" doesn't match
> with them, or "maße" matches only with "MASSE".

Sorry, I don't understand this sentence.  Please reformulate.

Ideally, searching `maße' should not match `masse', but it should
match `MASSE'.  Note that, to make a distinction in the uppercased
version between Maße (measures, metrics) and Masse (mass, matter),
some people also write `MASZE' for the uppercased version of Maße.
This use of SZ is oldfashioned and not `official' any more.


    Werner

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06  6:49                       ` Werner LEMBERG
@ 2007-04-06  7:15                         ` Kenichi Handa
  2007-04-06  7:30                           ` Werner LEMBERG
  2007-04-06  8:56                           ` Eli Zaretskii
  0 siblings, 2 replies; 45+ messages in thread
From: Kenichi Handa @ 2007-04-06  7:15 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: simon, rms, schwab, cyd, emacs-devel, yazicivo, eliz

In article <20070406.084909.48807006.wl@gnu.org>, Werner LEMBERG <wl@gnu.org> writes:

> > Ummm.  So, in casefolding search, which is good; "maße"
> > matches with both "MASSE" and "masse", "maße" doesn't match
> > with them, or "maße" matches only with "MASSE".

> Sorry, I don't understand this sentence.  Please reformulate.

> Ideally, searching `maße' should not match `masse', but it should
> match `MASSE'.

That's what I wanted to know, but it seems very difficult to
implement.  Provided that we give up that ideal behaviour,
which is better; searching `maße' matches both "masse" and
"MASSE", or searching `maße' doesn't match any of "masse"
and "MASSE".

> Note that, to make a distinction in the uppercased
> version between Maße (measures, metrics) and Masse (mass, matter),
> some people also write `MASZE' for the uppercased version of Maße.
> This use of SZ is oldfashioned and not `official' any more.

I hope people don't claim even if we don't support it.  :-p

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06  7:15                         ` Kenichi Handa
@ 2007-04-06  7:30                           ` Werner LEMBERG
  2007-04-06  8:56                           ` Eli Zaretskii
  1 sibling, 0 replies; 45+ messages in thread
From: Werner LEMBERG @ 2007-04-06  7:30 UTC (permalink / raw)
  To: handa; +Cc: simon, rms, schwab, cyd, emacs-devel, yazicivo, eliz

> > Ideally, searching `maße' should not match `masse', but it should
> > match `MASSE'.
> 
> That's what I wanted to know, but it seems very difficult to
> implement.  Provided that we give up that ideal behaviour,
> which is better; searching `maße' matches both "masse" and
> "MASSE", or searching `maße' doesn't match any of "masse"
> and "MASSE".

The former.  I think it's not worth the trouble to do anything more
complicated here, given that it is quite rare to find both `Masse' and
`Maße' at the same time -- and I'm quite sure that many native German
speakers mix up those two words anyway :-)


    Werner

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-05 23:11                   ` Richard Stallman
@ 2007-04-06  8:31                     ` Eli Zaretskii
  2007-04-06 19:47                       ` Richard Stallman
  0 siblings, 1 reply; 45+ messages in thread
From: Eli Zaretskii @ 2007-04-06  8:31 UTC (permalink / raw)
  To: rms; +Cc: handa, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: emacs-devel@gnu.org, handa@m17n.org
> Date: Thu, 05 Apr 2007 19:11:46 -0400
> 
>     > Using "ASCII" for the name and doc string are somewhat misleading,
>     > since this is not limited to ASCII.  For instance, it works fine for
>     > Latin 1 also.
> 
>     downcase-A-to-Z ?
> 
> My point is it is not limited to the ASCII letters.

??? I'm probably missing something: what other characters would be
changed by using standard-case-table?  AFAIK, it is set up to change
case only for letters from A to Z; see casetab.c:init_casetab_once.
The letters A to Z appear in many Latin-x character sets, so A to Z
does not imply US ASCII.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06  7:15                         ` Kenichi Handa
  2007-04-06  7:30                           ` Werner LEMBERG
@ 2007-04-06  8:56                           ` Eli Zaretskii
  2007-04-06  9:24                             ` Werner LEMBERG
  1 sibling, 1 reply; 45+ messages in thread
From: Eli Zaretskii @ 2007-04-06  8:56 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: simon, yazicivo, rms, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> CC: rms@gnu.org, simon@josefsson.org, schwab@suse.de, cyd@stupidchicken.com,
>         emacs-devel@gnu.org, yazicivo@ttnet.net.tr, eliz@gnu.org
> Date: Fri, 06 Apr 2007 16:15:50 +0900
> 
> > Ideally, searching `maße' should not match `masse', but it should
> > match `MASSE'.
> 
> That's what I wanted to know, but it seems very difficult to
> implement.

Are we talking about Emacs 22 or Emacs 23?  If the former, I don't
think we should do anything with such complicated case equivalences,
at least not now.

If you are talking about Emacs 23, I think we should first try to
design its search routines to cater to all the complications described
by the Unicode standard, no matter how difficult that is.  Only if
full compliance turns out to be unbearably hard and slow, should we
consider less strict adherence.  That's because these case equivalence
complications are just a tip of the iceberg, as far as Unicode goes,
and if we give up so early, we will never have Emacs that is compliant
with Unicode.  AFAIR, the Unicode standard has some practical advice
and even sample code that shows how to implement case-insensitive
search, so it's not like we are talking about rocket science.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06  8:56                           ` Eli Zaretskii
@ 2007-04-06  9:24                             ` Werner LEMBERG
  2007-04-06 13:54                               ` Eli Zaretskii
  0 siblings, 1 reply; 45+ messages in thread
From: Werner LEMBERG @ 2007-04-06  9:24 UTC (permalink / raw)
  To: eliz; +Cc: simon, yazicivo, emacs-devel, rms, handa


> > > Ideally, searching `maße' should not match `masse', but it should
> > > match `MASSE'.
> 
> Are we talking about Emacs 22 or Emacs 23?  If the former, I don't
> think we should do anything with such complicated case equivalences,
> at least not now.

Yep.

> If you are talking about Emacs 23, I think we should first try to
> design its search routines to cater to all the complications described
> by the Unicode standard, no matter how difficult that is.  Only if
> full compliance turns out to be unbearably hard and slow, should we
> consider less strict adherence.

Hmm.  The `Maße' vs. `Masse' issue is very special.  I doubt that this
is covered by any Unicode Technical Report.  Additionally, it's a
matter of taste.


    Werner

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06  9:24                             ` Werner LEMBERG
@ 2007-04-06 13:54                               ` Eli Zaretskii
  2007-04-07  8:01                                 ` Werner LEMBERG
  0 siblings, 1 reply; 45+ messages in thread
From: Eli Zaretskii @ 2007-04-06 13:54 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: simon, yazicivo, emacs-devel, rms, handa

> Date: Fri, 06 Apr 2007 11:24:05 +0200 (CEST)
> Cc: handa@m17n.org, rms@gnu.org, simon@josefsson.org, emacs-devel@gnu.org,
>  yazicivo@ttnet.net.tr
> From: Werner LEMBERG <wl@gnu.org>
> 
> The `Maße' vs. `Masse' issue is very special.

Is it?  Isn't it true that ß should match SS, but not ss, as a general
rule in German?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06  8:31                     ` Eli Zaretskii
@ 2007-04-06 19:47                       ` Richard Stallman
  2007-04-07  9:18                         ` Eli Zaretskii
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Stallman @ 2007-04-06 19:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, handa

    ??? I'm probably missing something: what other characters would be
    changed by using standard-case-table?  AFAIK, it is set up to change
    case only for letters from A to Z; see casetab.c:init_casetab_once.
    The letters A to Z appear in many Latin-x character sets, so A to Z
    does not imply US ASCII.

During normal execution, standard-case-table handles all the alphabets
that have a case distinction.  If ascii-case-table is a copy made
after standard-case-table is initialized, it will handle them all too.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06  6:15                     ` Kenichi Handa
  2007-04-06  6:49                       ` Werner LEMBERG
@ 2007-04-06 19:47                       ` Richard Stallman
  2007-04-07  7:30                         ` martin rudalics
  1 sibling, 1 reply; 45+ messages in thread
From: Richard Stallman @ 2007-04-06 19:47 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: simon, schwab, cyd, emacs-devel, yazicivo, eliz

    Azeri is the same as Turkish as for this.  For the current
    specific case, only "I" is the problem.  But, in general
    case changing, there are many many weird problems, and some
    of them require locale-dependent processing.  For instance,
    U+00CC (I WITH GRAVE) must be downcased into "U+0069 U+0307
    "U+0300" sequence (i with dot-above and grave) in
    Lithuanian.

I didn't know about that one.  This means that there may be various
languages that the default case tables don't handle, and that need to
change it just as Turkish changes it.

At present, the case table feature of English is incapable of handling
Lithuanian.  It can't convert one character into multiple characters.
It can't handle German quite right either.

After the release, it would be good to design a new case conversion
system which can handle the cases where one letter converts to more
than one.  It would be nice if it could even handle German.
This could be done thru the spell checker.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06 19:47                       ` Richard Stallman
@ 2007-04-07  7:30                         ` martin rudalics
  2007-04-07 17:31                           ` Richard Stallman
  0 siblings, 1 reply; 45+ messages in thread
From: martin rudalics @ 2007-04-07  7:30 UTC (permalink / raw)
  To: rms; +Cc: simon, Kenichi Handa, schwab, cyd, emacs-devel, yazicivo, eliz

> After the release, it would be good to design a new case conversion
> system which can handle the cases where one letter converts to more
> than one.  It would be nice if it could even handle German.
> This could be done thru the spell checker.

Compare the following excerpt from the Aspell manual (appendix C.4):

The German Sharp S or Eszett does not have an uppercase equivalent.
Instead when `ß' is converted to `SS'.  The conversion of `ß' to `SS'
requires a special rule, and increases the length of a word, thus
disallowing inplace case conversion.  Furthermore, my general rule of
converting all words to lowercase before looking them up in the
dictionary won't work because the conversion of `SS' to lowercase is
ambiguous; it can be `ss' or `ß'.  I do plan on dealing with this
eventually.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06 13:54                               ` Eli Zaretskii
@ 2007-04-07  8:01                                 ` Werner LEMBERG
  0 siblings, 0 replies; 45+ messages in thread
From: Werner LEMBERG @ 2007-04-07  8:01 UTC (permalink / raw)
  To: eliz; +Cc: simon, yazicivo, emacs-devel, rms, handa


> > The `Maße' vs. `Masse' issue is very special.
> 
> Is it?  Isn't it true that ß should match SS, but not ss, as a
> general rule in German?

In theory, yes, but it is complicated by a number of facts.

  . In Switzerland, people no longer use `ß'.  Everything is written
    with `ss' (both uppercase and lowercase).

  . In official documents, `ß' is used even in uppercased situations.
    Assume a passport, and the name of the person is `Dreßen'.  Then
    the uppercased version written in the passport is `DREßEN'.  This
    is the only allowed usage of `ß' with uppercase letters, AFAIK.
    However...

  . Many people think that, say, `STRAßE' is the right way to write to
    write `Straße' uppercased.  Perhaps the problem is also related to
    simplistic computer programs which aren't able to uppercase `ß'
    correctly and leave it as-is.

  . To avoid ambiguities, it was common usage to uppercase `ß' as
    `SZ': STRASZE, PREUSZEN.  However, this is oldfashioned today.

There is a quite long `article of excellency' in the German Wikipedia
which covers all aspects: http://de.wikipedia.org/wiki/%C3%9F.


    Werner

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-06 19:47                       ` Richard Stallman
@ 2007-04-07  9:18                         ` Eli Zaretskii
  2007-04-07 15:03                           ` Chong Yidong
  2007-04-07 17:31                           ` Richard Stallman
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Zaretskii @ 2007-04-07  9:18 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel, handa

> From: Richard Stallman <rms@gnu.org>
> CC: handa@m17n.org, emacs-devel@gnu.org
> Date: Fri, 06 Apr 2007 15:47:35 -0400
> 
>     ??? I'm probably missing something: what other characters would be
>     changed by using standard-case-table?  AFAIK, it is set up to change
>     case only for letters from A to Z; see casetab.c:init_casetab_once.
>     The letters A to Z appear in many Latin-x character sets, so A to Z
>     does not imply US ASCII.
> 
> During normal execution, standard-case-table handles all the alphabets
> that have a case distinction.  If ascii-case-table is a copy made
> after standard-case-table is initialized, it will handle them all too.

Sorry, I still don't understand.  casetab.c explicitly sets up
standard-case-table to convert only A-Z.  Could you please point out a
character outside this range whose case would be changed by using
standard-case-table, e.g. in the Latin-1 alphabet?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-07  9:18                         ` Eli Zaretskii
@ 2007-04-07 15:03                           ` Chong Yidong
  2007-04-07 17:36                             ` Eli Zaretskii
  2007-04-07 17:31                           ` Richard Stallman
  1 sibling, 1 reply; 45+ messages in thread
From: Chong Yidong @ 2007-04-07 15:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: handa, rms, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Richard Stallman <rms@gnu.org>
>> CC: handa@m17n.org, emacs-devel@gnu.org
>> Date: Fri, 06 Apr 2007 15:47:35 -0400
>> 
>>     ??? I'm probably missing something: what other characters would be
>>     changed by using standard-case-table?  AFAIK, it is set up to change
>>     case only for letters from A to Z; see casetab.c:init_casetab_once.
>>     The letters A to Z appear in many Latin-x character sets, so A to Z
>>     does not imply US ASCII.
>> 
>> During normal execution, standard-case-table handles all the alphabets
>> that have a case distinction.  If ascii-case-table is a copy made
>> after standard-case-table is initialized, it will handle them all too.
>
> Sorry, I still don't understand.  casetab.c explicitly sets up
> standard-case-table to convert only A-Z.  Could you please point out a
> character outside this range whose case would be changed by using
> standard-case-table, e.g. in the Latin-1 alphabet?

To be precise, characters.el later adds Latin and other characters to
this standard case table.  The Lisp variable ascii-case-table makes a
copy of the standard case table before all this work is done.

(BTW, Emacs' default case table is internally named Vascii_case_table
in casetab.c.  This variable name is misleading because this case
table gets updated with non-ascii information with impunity.  So we
might want to rename this C variable to something more appropriate
(after the release).)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-07  9:18                         ` Eli Zaretskii
  2007-04-07 15:03                           ` Chong Yidong
@ 2007-04-07 17:31                           ` Richard Stallman
  2007-04-07 17:47                             ` Eli Zaretskii
  1 sibling, 1 reply; 45+ messages in thread
From: Richard Stallman @ 2007-04-07 17:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, handa

    Sorry, I still don't understand.  casetab.c explicitly sets up
    standard-case-table to convert only A-Z.

Yes, but later on when Mule is loaded it changes the standard-case-table
to handle other alphabets.  standard-case-table is the one that is used
in new buffers.

I could not tell from the diffs when the code copies
standard-case-table to make ascii-case-table.  If that is done before
the Mule code changes standard-case-table, then ascii-case-table only affects
ASCII characters.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-07  7:30                         ` martin rudalics
@ 2007-04-07 17:31                           ` Richard Stallman
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Stallman @ 2007-04-07 17:31 UTC (permalink / raw)
  To: martin rudalics
  Cc: simon, handa, schwab, cyd, emacs-devel, yazicivo, eliz, kevin

    > After the release, it would be good to design a new case conversion
    > system which can handle the cases where one letter converts to more
    > than one.  It would be nice if it could even handle German.
    > This could be done thru the spell checker.

    Compare the following excerpt from the Aspell manual (appendix C.4):

    The German Sharp S or Eszett does not have an uppercase equivalent.
    Instead when `ß' is converted to `SS'.  The conversion of `ß' to `SS'
    requires a special rule, and increases the length of a word, thus
    disallowing inplace case conversion.  Furthermore, my general rule of
    converting all words to lowercase before looking them up in the
    dictionary won't work because the conversion of `SS' to lowercase is
    ambiguous; it can be `ss' or `ß'.  I do plan on dealing with this
    eventually.

That is not a problem for the method I have in mind.  Emacs can
generate all the possible downcasings of a word containing SS,
then send each one to Aspell to see if it is the right one.
Aspell can handle lower-case words, so this will work.

Meanwhile, this suggests a way that Aspell could handle the upper case
German words: generate the various possible downcasings of it.  (If
there are N occurrences of SS, there will be 2**N possible
downcasings.)  Then see if any of them is in the dictionary.  If so,
the upper case word is valid.  Otherwise, construct the union of the
suggestion-lists from the various possible downcasings.

I cc'd the Aspell maintainer so that he will see this idea.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-07 15:03                           ` Chong Yidong
@ 2007-04-07 17:36                             ` Eli Zaretskii
  0 siblings, 0 replies; 45+ messages in thread
From: Eli Zaretskii @ 2007-04-07 17:36 UTC (permalink / raw)
  To: Chong Yidong; +Cc: handa, rms, emacs-devel

> Cc: rms@gnu.org, emacs-devel@gnu.org, handa@m17n.org
> From: Chong Yidong <cyd@stupidchicken.com>
> Date: Sat, 07 Apr 2007 11:03:25 -0400
> 
> > Sorry, I still don't understand.  casetab.c explicitly sets up
> > standard-case-table to convert only A-Z.  Could you please point out a
> > character outside this range whose case would be changed by using
> > standard-case-table, e.g. in the Latin-1 alphabet?
> 
> To be precise, characters.el later adds Latin and other characters to
> this standard case table.  The Lisp variable ascii-case-table makes a
> copy of the standard case table before all this work is done.

Yes, I know.  And that's why I don't understand what Richard is
saying.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]
  2007-04-07 17:31                           ` Richard Stallman
@ 2007-04-07 17:47                             ` Eli Zaretskii
  0 siblings, 0 replies; 45+ messages in thread
From: Eli Zaretskii @ 2007-04-07 17:47 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel, handa

> From: Richard Stallman <rms@gnu.org>
> CC: handa@m17n.org, emacs-devel@gnu.org
> Date: Sat, 07 Apr 2007 13:31:22 -0400
> 
> I could not tell from the diffs when the code copies
> standard-case-table to make ascii-case-table.  If that is done before
> the Mule code changes standard-case-table, then ascii-case-table only affects
> ASCII characters.

Yes, it was copied before being populated with case conversions for
non-ASCII characters.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2007-04-07 17:47 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-31 20:43 [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail] Richard Stallman
2007-04-01 17:39 ` Chong Yidong
2007-04-02  6:51 ` Kenichi Handa
2007-04-02 22:52   ` Chong Yidong
2007-04-02 23:20     ` Volkan YAZICI
2007-04-03  1:24     ` Kenichi Handa
2007-04-03 21:40     ` Richard Stallman
2007-04-02 17:31 ` Volkan YAZICI
2007-04-03  8:06   ` Kenichi Handa
2007-04-03  8:28     ` Werner LEMBERG
2007-04-03  9:24     ` Eli Zaretskii
2007-04-03  9:33       ` Simon Josefsson
2007-04-03 13:44         ` Volkan YAZICI
2007-04-03 15:29           ` Eli Zaretskii
2007-04-03 15:50             ` David Kastrup
2007-04-03 16:03               ` Andreas Schwab
     [not found]               ` <87k5wt5tnx.fsf@ttnet.net.tr>
2007-04-03 16:21                 ` David Kastrup
     [not found]                 ` <87slbhquuw.fsf@ttnet.net.tr>
2007-04-03 18:44                   ` David Kastrup
2007-04-03 16:30             ` Chong Yidong
2007-04-03 17:57               ` with-case-table / ascii-case-table (was: [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail]) Reiner Steib
2007-04-04 15:40                 ` with-case-table / ascii-case-table Chong Yidong
2007-04-04 14:02               ` [yazicivo@ttnet.net.tr: Locale Dependent Downcasing in smtpmail] Richard Stallman
2007-04-04 14:27                 ` Andreas Schwab
2007-04-05 23:11                   ` Richard Stallman
2007-04-06  6:15                     ` Kenichi Handa
2007-04-06  6:49                       ` Werner LEMBERG
2007-04-06  7:15                         ` Kenichi Handa
2007-04-06  7:30                           ` Werner LEMBERG
2007-04-06  8:56                           ` Eli Zaretskii
2007-04-06  9:24                             ` Werner LEMBERG
2007-04-06 13:54                               ` Eli Zaretskii
2007-04-07  8:01                                 ` Werner LEMBERG
2007-04-06 19:47                       ` Richard Stallman
2007-04-07  7:30                         ` martin rudalics
2007-04-07 17:31                           ` Richard Stallman
2007-04-04 18:01                 ` Eli Zaretskii
2007-04-05 23:11                   ` Richard Stallman
2007-04-06  8:31                     ` Eli Zaretskii
2007-04-06 19:47                       ` Richard Stallman
2007-04-07  9:18                         ` Eli Zaretskii
2007-04-07 15:03                           ` Chong Yidong
2007-04-07 17:36                             ` Eli Zaretskii
2007-04-07 17:31                           ` Richard Stallman
2007-04-07 17:47                             ` Eli Zaretskii
2007-04-03 21:16           ` Davis Herring

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).