unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Tomas Volf <~@wolfsden.cz>
To: guile-devel@gnu.org
Subject: RFC: Changing the initial value of %default-port-conversion-strategy
Date: Wed, 29 May 2024 22:26:51 +0200	[thread overview]
Message-ID: <ZlePi6B3u0dGlztN@ws> (raw)

[-- Attachment #1: Type: text/plain, Size: 2306 bytes --]

Greetings,

during my current quest to get more G-expressions working with UTF-8 input, I
have read the Guile's documentation, in particular '(guile)Encoding', and I
think change in default behavior is warranted.

Currently the initial value of %default-port-conversion-strategy is 'substitute.
I would like to propose changing it to 'error on the ground of preventing subtle
bugs and data corruption.

Just a reminder, when 'substitute is used, any non-representable character is
replaced with #\?.  No error is signaled and user has no way to detect it even
happened.  I just do not believe that to be a reasonable default.

Let us take a look for example at test-suite/standalone/test-mb-regexp.  It
contains this code:

    (regexp-exec
     (make-regexp "(.)(.)(.)")
     (string (integer->char 200) #\x (integer->char 202)))

That might look sensible until you realize that the following regexp *also*
matches:

    (make-regexp "(\\?)(.)(\\?)")

This is just asking for potential bugs (possibly security related) and data
corruption.  The 'substitute strategy should of course stay (if someone actually
needs it), but the default should really be changed to 'error.

Work-wise it is very feasible, the change is minimal (single line both in
ports.c and in documentation) and just few tests break:
* test-mb-regexp:
    But this just demonstrates code that should have not worked in the first
    place.  IMO.
* test-bad-identifiers:
    Requires setlocale to UTF-8 locale and converting one source file
    (guardians.c) from latin1 to UTF-8.
* ports.test:
    This explicitly tests the default value, so it needs to be adjusted.

Real world impact should be limited, since most people are likely to run with
LANG set to *some* UTF-8 locale.  And if you do not have that, I (and I expect
majority of engineers) would prefer correctness over convenience.

I strongly believe the current default is wrong and dangerous, but I am
obviously interested what other people think, hence this message.  Please let me
know what you think.  Should I put this into actual patch?  Does it have chance
to be accepted and merged into the master?

Thank you for reading and have a nice day,
Tomas Volf

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

             reply	other threads:[~2024-05-29 20:26 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29 20:26 Tomas Volf [this message]
2024-05-30  7:54 ` RFC: Changing the initial value of %default-port-conversion-strategy Attila Lendvai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZlePi6B3u0dGlztN@ws \
    --to=~@wolfsden.cz \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).