AIDA Shinra <shinra@j10n.org> writes:

>> Anyway, as far as a system allows users to switch locale, I
>> think, pw_gecos must adopt locale-independent encoding, thus
>> the possible encoding is one of UTF-*.  And, considering
>> backward compatibility, it should be UTF-8.  Then, how about
>> we always decode it by utf-8 (only if it contains a byte
>> with MSB set) while falling back to locale-coding-system
>> (invalid utf-8 sequence is found), and see if that works on
>> any systems?   How does GNU/Linux encode it?
>
> A site administrator might choose an encoding other than UTF-8 even if
> it is locale-dependent...

Encoding is a *big* problem is the world of GNU/Linux, mainly comes
From the lack of standard. 

For example, if two users select different locale in the same
GNU/Linux system, userA select zh_CN.GB2312, userB select zh_CN.UTF-8,
because all of the filename related system call write/read filename
*as is*, that means userA's filename is encoded by GB2312, userB's
filename is encoded by UTF-8, that result in they can't read eath
other's filename, one's filename is a completely mess to the other.

Internet makes this problem even worse considering no standard
encoding is used for information exchange, ssh, ftp, vnc ..., almost
every of them will encounter encoding problem everyday.

Perhaps the ultimate best solution to this problem is, at the system
call level, filename is converted to utf-8 no matter what ever the
locale a user choose. If it is hard to do in the kernel, at least we
should do it at the libc level.

-- 
GnuPG KeyID: 0x61C92BB9