On 02/04/2016 02:35 PM, Óscar Fuentes wrote:
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> I see your point, but you are talking about accents all the time. In
>>> Spanish `n' and `ñ' are different letters. `n' matching `ñ' is no
>>> different than `p' matching `q'.
>>
>> Unicode disagrees:
>>
>>   M-: (get-char-code-property ?ñ 'decomposition) RET
>>
>>    => (110 771)
>>
>> 110 is 'n' and 771 is U+0303 NON-SPACING TILDE, a combining accent.
> 
> AFAIK Unicode doesn't mandate what the Spanish alphabet is.
> 
> I thought that the point of the feature was to provide searching with
> support for character equivalence classes, which is very useful for the
> case of Spanish (and other languages, I'm sure). But you are saying that
> the feature is about how the characters are encoded by the computer and
> not about how they are used by people. If that is true, it should be
> disabled by default.

Why? This feature is simply folding as specified by the Unicode standard. Hopefully the way it is implemented will indeed lend itself to future extensions; using it for user-defined classes of substitutions would be nice. But I don't understand why the possibility of fancier (though less clearly defined) folding should disqualify this feature from becoming the default.

Also, it's not easy (I'd guess not possible) to give any sort of precise meaning to ‘how characters are used by people’. I still find this simple character folding quite useful; I just accept that it's visual folding, not semantic folding (and this list is well aware of the difficulties that arise when one tries to assign semantic meaning to characters; cf. the ‘’ vs `' debate). The semantics of this simple folding are as uncontroversial as can be; we're following an established standard. Maybe there's a better behaved notion of folding out there, but I'm not sure why its existence is relevant to the choice of a default, since we don't have an implementation (nor a spec) for that alternative.

Clément.