On 7/18/11 9:41 AM, Eli Zaretskii wrote: > I'm afraid it isn't straightforward. I suspect there's a lot of > supporting code that still assumes unibyte characters. But I'll > welcome patches in that area (if we agree to drop W9X support). I meant that we can convert to UTF-8 (or our internal multibyte encoding) until the very last moment before calling a system function; we know UTF-8 support works already, and the conversation preserves everything. >> Cygwin supports UTF-8 filenames natively. > > I know that, but it isn't relevant to the native w32 build, because > that needs to use UTF-16, not UTF-8. > >> But even if we don't --- why does it matter? You can create files using >> the NT native API that can't be opened using Win32 calls; it doesn't >> cause a problem in practice. Likewise, users who have strange >> filesnames might not be able to use them with all Emacs features right >> away, but they'll be able to work with more reasonable filenames just as >> they did before. > > But switching to Unicode doesn't make sense _unless_ you want to > support "strange file names": all the non-strange file names are > already supported under the current ``ANSI'' APIs. It's when I want > to see file names with characters not from my system locale that I > need Unicode. Unicode APIs are also more modern in a sense --- new ANSI APIs aren't being created anymore. > [snip] > > See above: for those, the Unicode interfaces give no advantage. > >> Still, if we can't do that, then as a temporary measure, we can still >> use Unicode APIs (the 9X discussion notwithstanding), but as a temporary >> measure, filter their results so that we reject filenames that can't be >> used with the system codepage. > > But then this is just complication with no benefits, isn't it? Not necessarily --- this approach would allow us to gradually migrate to Unicode without introducing the possibility of delivering incomprehensible error messages to users. When we've updated all the relevant bits, we can just remove the filtering and have a pure Unicode Emacs.