On Thu, Oct 09, 2014 at 12:56:42PM +0900, Stephen J. Turnbull wrote: > Richard Stallman writes: > > > If you demonstrate that this claim is valid, I will be concerned. > > *sigh* Be unconcerned. The world is a *lot* more hostile today than > it was in the days when you posted your passwords on the 'net. Agreed. Character encoding attacks are also something that has been exploited "in the wild". Some examples include: - UTF-7 character encoding to bypass filters[0] (e.g. for XSS); - IIS WebDAV validation exploit (CVE-2009-1535);[1] and - CAPEC-80: Using UTF-8 Encoding to Bypass Validation Logic;[2] and - Google's XSS vulnerability, related to the first item in this list.[3] Note that not all of the above may be applicable to the specifics of this discussion---the point is to convey, generally, that character encoding poses serious threats when improperly handled. Though this discussion seems to be about what is "improper". See "Secure Programming for Linux [sic] and Unix HOWTO".[4] The Unicode Consortium also has a security report[5] that mentions, among other import concepts, deletion of code points and handling of "illegal" input byte sequences. With regards to passing raw input to other systems: this isn't necessarily Unicode related (unless an invalid sequence contains a null byte), but serves to illustrate the point that Mark is trying to make: there is a well-known issue in PHP whereby passing a null byte as a parameter to a script (e.g. via HTTP GET/POST) opens up a number of attacks. Specifically, PHP handles null bytes in strings (by storing the string length as part of the struct that holds the string). However, it makes calls directly to libc. So, if an unvalidated input $foo contains "../../../../etc/group\000", and PHP makes a call to `fopen' with the path "/webroot/modules/$foo/index.php", the result would be opening "/webroot/modules/../../../../etc/group". I have the most experience developing web applications, where character encoding exploits are common.[6] > So there you are. That's the best I can do. I can dig up more examples, but hopefully some of these help to illustrate the severity of ignoring character encoding concerns. * * * Aside: For those who don't know what XSS is: the issue is that, if input from the user is not properly validated/filtered, and is at some point output back to a user, that output could be interpreted as HTML, JavaScript, CSS, etc. So if XSS filters are bypassed using the aforementioned methods, perhaps the user will output `', which might change a login form, say, to post user credentials to a remote website. [0]: http://en.wikipedia.org/wiki/UTF-7#Security [1]: http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-1535 [2]: https://capec.mitre.org/data/definitions/80.html [3]: http://shiflett.org/blog/2005/dec/googles-xss-vulnerability [4]: http://www.tldp.org/HOWTO/Secure-Programs-HOWTO/character-encoding.html [5]: http://www.unicode.org/reports/tr36/ [6]: https://www.owasp.org/index.php/OWASP_Top_Ten_Cheat_Sheet -- Mike Gerwitz Free Software Hacker | GNU Maintainer http://mikegerwitz.com FSF Member #5804 | GPG Key ID: 0x8EE30EAB