From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Tomas Volf <~@wolfsden.cz> Newsgroups: gmane.lisp.guile.devel Subject: RFC: Changing the initial value of %default-port-conversion-strategy Date: Wed, 29 May 2024 22:26:51 +0200 Message-ID: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="JOkv1p4ZYmrzevuX" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="922"; mail-complaints-to="usenet@ciao.gmane.io" To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Wed May 29 22:27:13 2024 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sCPtB-00005x-5t for guile-devel@m.gmane-mx.org; Wed, 29 May 2024 22:27:13 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sCPsy-0007AY-LM; Wed, 29 May 2024 16:27:00 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <~@wolfsden.cz>) id 1sCPsw-00079i-5e for guile-devel@gnu.org; Wed, 29 May 2024 16:26:58 -0400 Original-Received: from wolfsden.cz ([37.205.8.62]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <~@wolfsden.cz>) id 1sCPst-0005hf-N8 for guile-devel@gnu.org; Wed, 29 May 2024 16:26:57 -0400 Original-Received: by wolfsden.cz (Postfix, from userid 104) id D8AAB24B9D3; Wed, 29 May 2024 20:26:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1717014412; bh=+wQX6ShfAz+GcgTA6uT2IxO9tYhqi773Fq6KKrvRf0U=; h=Date:From:To:Subject; b=Fy0aAIcm3vgGqnFeiPV5Z6nPNtTlbvuzHmsanYHkqxdmYfsyxtzlfvPkPorhqgODc 7M3UewHfpQDQe8LHkkPtjhyHhrQzTbxyzuuJ/U7MvQ2hI7izZLpf/tpET4CPWrDf/Y GuTyAAh+SI02JiTLq+2XfWYVr36QpK8eROzaByPinnCJqUSgZraPCXruRduOKFfBCb pCOjHiAy3np9uMqQ/d8cZ9j9EZgDCRswgkCpGt22rin1h1uvbawx/MiVt+oSbEJVeh ZZc3JwH0hdUT7MxJL79CNMohwtnTkkp22cfNMu+aLUVgab3Hwv9pFY+OUk8YmvbCuy iMROz/VSYe+Hfa9PjU2J8kVHGU2s+45/pnUdRFkepl5CkvSoa4TRpxEZZgbcfEMF0E DcY4zuc2SLGvBQY748m2btFFlEiR1ZqGriLtsTIpctfHhaiVzI/X9KZ65KCknRndf7 rheThyehGoZhdoeqv23NoTrP9otyvB/EBKzKjkxm7e8or9sWOveJ0j95UnTuPf+00m 5fwXLgyhkurj5IVvmkXggD0Lv9KprLif7mneOKCLVkJiM23kmm7tnd0Tkv9U6Ajfst 4eyLJC3fjV0KMNZMGqFHg7AfxZ8SJxxDbh/ZAVlfzy2gApZCo84keyXYZFr2H9lrpV FZxSdb6SnXSWIkX92z/49FFs= Original-Received: from localhost (unknown [193.32.127.145]) by wolfsden.cz (Postfix) with ESMTPSA id EF12824C74D for ; Wed, 29 May 2024 20:26:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1717014412; bh=+wQX6ShfAz+GcgTA6uT2IxO9tYhqi773Fq6KKrvRf0U=; h=Date:From:To:Subject; b=Fy0aAIcm3vgGqnFeiPV5Z6nPNtTlbvuzHmsanYHkqxdmYfsyxtzlfvPkPorhqgODc 7M3UewHfpQDQe8LHkkPtjhyHhrQzTbxyzuuJ/U7MvQ2hI7izZLpf/tpET4CPWrDf/Y GuTyAAh+SI02JiTLq+2XfWYVr36QpK8eROzaByPinnCJqUSgZraPCXruRduOKFfBCb pCOjHiAy3np9uMqQ/d8cZ9j9EZgDCRswgkCpGt22rin1h1uvbawx/MiVt+oSbEJVeh ZZc3JwH0hdUT7MxJL79CNMohwtnTkkp22cfNMu+aLUVgab3Hwv9pFY+OUk8YmvbCuy iMROz/VSYe+Hfa9PjU2J8kVHGU2s+45/pnUdRFkepl5CkvSoa4TRpxEZZgbcfEMF0E DcY4zuc2SLGvBQY748m2btFFlEiR1ZqGriLtsTIpctfHhaiVzI/X9KZ65KCknRndf7 rheThyehGoZhdoeqv23NoTrP9otyvB/EBKzKjkxm7e8or9sWOveJ0j95UnTuPf+00m 5fwXLgyhkurj5IVvmkXggD0Lv9KprLif7mneOKCLVkJiM23kmm7tnd0Tkv9U6Ajfst 4eyLJC3fjV0KMNZMGqFHg7AfxZ8SJxxDbh/ZAVlfzy2gApZCo84keyXYZFr2H9lrpV FZxSdb6SnXSWIkX92z/49FFs= Content-Disposition: inline Received-SPF: pass client-ip=37.205.8.62; envelope-from=~@wolfsden.cz; helo=wolfsden.cz X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22422 Archived-At: --JOkv1p4ZYmrzevuX Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Greetings, during my current quest to get more G-expressions working with UTF-8 input, I have read the Guile's documentation, in particular '(guile)Encoding', and I think change in default behavior is warranted. Currently the initial value of %default-port-conversion-strategy is 'substitute. I would like to propose changing it to 'error on the ground of preventing subtle bugs and data corruption. Just a reminder, when 'substitute is used, any non-representable character is replaced with #\?. No error is signaled and user has no way to detect it even happened. I just do not believe that to be a reasonable default. Let us take a look for example at test-suite/standalone/test-mb-regexp. It contains this code: (regexp-exec (make-regexp "(.)(.)(.)") (string (integer->char 200) #\x (integer->char 202))) That might look sensible until you realize that the following regexp *also* matches: (make-regexp "(\\?)(.)(\\?)") This is just asking for potential bugs (possibly security related) and data corruption. The 'substitute strategy should of course stay (if someone actually needs it), but the default should really be changed to 'error. Work-wise it is very feasible, the change is minimal (single line both in ports.c and in documentation) and just few tests break: * test-mb-regexp: But this just demonstrates code that should have not worked in the first place. IMO. * test-bad-identifiers: Requires setlocale to UTF-8 locale and converting one source file (guardians.c) from latin1 to UTF-8. * ports.test: This explicitly tests the default value, so it needs to be adjusted. Real world impact should be limited, since most people are likely to run with LANG set to *some* UTF-8 locale. And if you do not have that, I (and I expect majority of engineers) would prefer correctness over convenience. I strongly believe the current default is wrong and dangerous, but I am obviously interested what other people think, hence this message. Please let me know what you think. Should I put this into actual patch? Does it have chance to be accepted and merged into the master? Thank you for reading and have a nice day, Tomas Volf -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. --JOkv1p4ZYmrzevuX Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmZXj4sACgkQL7/ufbZ/ walS8Q//Yu/rRZbwA9OFVIWk8PObek/FcTLGFr7zfGwNO0OOygPL8uBxbLJzy04J 0VaM+bltJypjJO9OxNJ+lQzQXTem8tDbYHA+R+WwMJzfHfbNjoo3dNcim9IxFI6Q AeZirISGj/qghcw7w26kFXo8SawbGlsYslm/2rIFkSV99XgpNCUNwIhkPUsrTMii FCWLUUdqaTyLMcMw37nnaDa144dy5sKFgADWHsWvkD6Kx7oTIUuAsSrk7Af0sINN HCRUtW9qrps2BOJVNVkwHPUpkO2T07DJZUKsE6mu3ZMx8gHwsOXKAdy9nG43vqVa CSZp2jsBTb1BXnrNFiu4ugaP2LZ8tkaK1FBU+v1VETTjowCUM27Z6vw8WAXRbEzh YK0zD5WjF4zy4OQ1WzJVdN5IlShYHfWuHZ8qARhBOr0Kh2jMRuQVqmWeW1JNtUOs +Rdjy7JStJBOZlgUidx6dKPrjbIdpylN0PAW0UgS0WksIsJ2ViIZaFoVo6BZAvAr 0uXFiZG5enAP/tuupIHKRz4ZIBdzcROCDOEpqYBlvZpL/FB5QmUB4Ltv1I+d4Nxp +A+uvshjRt5+cocx9YetCOXSpoNAdrezfjZhuJNBozfu8vh/L8teKt4P7Aq3ECme elM1iPEmhW2vhQ4iX/iqRyOsSYyxDnyQHfcwZLeyIuQRNfDjUz0= =ITjG -----END PGP SIGNATURE----- --JOkv1p4ZYmrzevuX--