On Wed, Nov 3, 2021 at 4:43 PM Stefan Monnier wrote: > > No, this summary is awful. > > The issue is that libc, the C standard committee, linux and most others > are > > ignoring the unicode identifier security guidelines. > > Identifiers must be identifiable, but strings should not be touched. > > What do those rules say about code like: > > int hi = 5; > int שָׁלוֹם = hi; > int hello = 10; > int السّلامعليك = hello; > myfun(שָׁלוֹם ,السّلامعليكم) > > IMO this code is fundamentally valid: we should allow > programmers to write identifiers in their native tongue. > Sure, nobody wants to forbid unicode identifiers. The rules only ensure that identifiers keep identifiable. I converted itto perl (because I dislike java or rust), and ran it through cperl. The problem is that from an innocent look or code review you won't see any problem, hence the security risk. You need to adjust your tools. But the very first RTL identifier שָׁלוֹם contains already non-identifier characters. So I cannot tell you if this code doesn't violate any of the 4 unicode mixed script profiles ( http://www.unicode.org/reports/tr39/#Mixed_Script_Detection 2-5) Or if any of the unreadable characters are of the recommended scripts: https://www.unicode.org/reports/tr31/#Table_Recommended_Scripts, (so no exotic or antique scripts) http://perl11.github.io/cperl/perldata.html#Identifier-parsing $hi = 5; $שָׁלוֹם = $hi; $hello = 10; $السّلامعليك = $hello; myfun($שָׁלוֹם, $السّلامعليك); => od -c 0000000 $ h i = 5 ; \n $ 327 251 326 270 327 201 0000020 327 234 327 225 326 271 327 235 = $ h i ; \n 0000040 $ h e l l o = 1 0 ; \n $ 330 247 0000060 331 204 330 263 331 221 331 204 330 247 331 205 330 271 331 204 0000100 331 212 331 203 = $ h e l l o ; \n m 0000120 y f u n ( $ 327 251 326 270 327 201 327 234 327 225 0000140 326 271 327 235 , $ 330 247 331 204 330 263 331 221 331 0000160 204 330 247 331 205 330 271 331 204 331 212 331 203 ) ; \n > Does the security guidelines require override chars to force the > `, ` to be in LTR, so as to fix the ordering problem (and would the > result be more or less clear to someone familiar with those RTL > scripts ;-0 )? > > > Stefan > > -- Reini Urban