AIDA Shinra writes: >> Anyway, as far as a system allows users to switch locale, I >> think, pw_gecos must adopt locale-independent encoding, thus >> the possible encoding is one of UTF-*. And, considering >> backward compatibility, it should be UTF-8. Then, how about >> we always decode it by utf-8 (only if it contains a byte >> with MSB set) while falling back to locale-coding-system >> (invalid utf-8 sequence is found), and see if that works on >> any systems? How does GNU/Linux encode it? > > A site administrator might choose an encoding other than UTF-8 even if > it is locale-dependent... Encoding is a *big* problem is the world of GNU/Linux, mainly comes From the lack of standard. For example, if two users select different locale in the same GNU/Linux system, userA select zh_CN.GB2312, userB select zh_CN.UTF-8, because all of the filename related system call write/read filename *as is*, that means userA's filename is encoded by GB2312, userB's filename is encoded by UTF-8, that result in they can't read eath other's filename, one's filename is a completely mess to the other. Internet makes this problem even worse considering no standard encoding is used for information exchange, ssh, ftp, vnc ..., almost every of them will encounter encoding problem everyday. Perhaps the ultimate best solution to this problem is, at the system call level, filename is converted to utf-8 no matter what ever the locale a user choose. If it is hard to do in the kernel, at least we should do it at the libc level. -- GnuPG KeyID: 0x61C92BB9