On Thu, 15 Sep 2011 13:52:12 -0400, Austin Clements wrote: > On Tue, Sep 13, 2011 at 11:55 PM, Martin Owens wrote: > > Hello Again, > > > > I notice in the lib code notmuch_database_open(), > > notmuch_database_create() these functions use const char *path for the > > directory path input. Is this unicode safe? > > > > The python bindings (and ctype docs) seem to suggest using something > > called 'wchar_t *' for accepting unicode but that's for C not C++. > > > > Is this something that should be patched? > > char* is the correct type for paths on POSIX systems. The *meaning* > of those bytes is a more complicated matter and depends on your locale > settings. On old systems it was generally ASCII, on modern systems > it's generally UTF-8, and it can be many other things. However, as a > consequence of UNIX's C heritage, it is *always* terminated with a > NULL byte and cannot contain embedded NULL's. Right, that's what we are doing, passing in utf-8 encoded unicode strings to char*, which should be just fine if that is what the underlying OS uses. > wchar_t is another matter entirely. wchar_t is the type used by C to > represent wide strings internally, which generally (but not > necessarily!) means it stores a Unicode code point. However, this > isn't an encoding, and different compilers can give wchar_t different > meanings, so wchar_t strings aren't generally appropriate for storing > or sharing between processes or with the kernel. Mmh, I remember I attempted to user wchar_t to pass in unicode objects directly and it had failed miserably. Sebastian