From: Maxime Devos <maximedevos@telenet.be>
To: 54272@debbugs.gnu.org
Subject: bug#54272: Avoid producing ? in locales with too few characters.
Date: Sun, 06 Mar 2022 16:14:16 +0100 [thread overview]
Message-ID: <aeed150481e18446fb4e11622201e866e2c6d6d1.camel@telenet.be> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 43 bytes --]
This seems difficult to test. Any ideas?
[-- Attachment #1.2: 0002-Deprecate-non-functional-bind-textdomain-codeset.patch --]
[-- Type: text/x-patch, Size: 3649 bytes --]
From 8f9a8e56cc3e8496fdfed3f889cff9fca19b3663 Mon Sep 17 00:00:00 2001
From: Maxime Devos <maximedevos@telenet.be>
Date: Sun, 6 Mar 2022 12:51:33 +0000
Subject: [PATCH 2/2] Deprecate non-functional bind-textdomain-codeset.
TODO: this only deprecated it in the documentation, it needs
to be deprecated elsewhere as well.
* doc/ref/api-i18n.texi (bind-textdomain-codeset): Update documentation.
* doc/ref/guile.texi: Update copyright information.
---
doc/ref/api-i18n.texi | 36 ++++++++++++------------------------
doc/ref/guile.texi | 1 +
2 files changed, 13 insertions(+), 24 deletions(-)
diff --git a/doc/ref/api-i18n.texi b/doc/ref/api-i18n.texi
index 7c49b0a23..c06b75996 100644
--- a/doc/ref/api-i18n.texi
+++ b/doc/ref/api-i18n.texi
@@ -2,6 +2,7 @@
@c This is part of the GNU Guile Reference Manual.
@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2006, 2007,
@c 2009, 2010, 2017 Free Software Foundation, Inc.
+@c Copyright (C) 2022 Maxime Devos
@c See the file guile.texi for copying conditions.
@node Internationalization
@@ -599,33 +600,20 @@ non-standard location.
@deffn {Scheme Procedure} bind-textdomain-codeset domain [encoding]
@deffnx {C Function} scm_bind_textdomain_codeset (domain, encoding)
-Get or set the text encoding to be used by @code{gettext} for messages
-from @var{domain}. @var{encoding} is a string, the name of a coding
-system, for instance @nicode{"8859_1"}. (On a Unix/POSIX system the
-@command{iconv} program can list all available encodings.)
-When called without an @var{encoding} the current setting is returned,
-or @code{#f} if none yet set. When called with an @var{encoding}, it
-is set for @var{domain} and that new setting returned. For example,
+This is a historical procedure, used for getting and setting the text
+encoding used by @code{gettext} for messages from @var{domain},
+preserved for compatibility.
-@example
-(bind-textdomain-codeset "myprog")
-@result{} #f
-(bind-textdomain-codeset "myprog" "latin-9")
-@result{} "latin-9"
-@end example
+This procedure became useless since Guile's string began consisting of
+characters instead of individual bytes, especially since the
+@code{gettext} procedure always used the locale encoding instead
+of the encoding of the text domain.
-The encoding requested can be different from the translated data file,
-messages will be recoded as necessary. But note that when there is no
-translation, @code{gettext} returns its @var{msg} unchanged, ie.@:
-without any recoding. For that reason source message strings are best
-as plain ASCII.
-
-Currently Guile has no understanding of multi-byte characters, and
-string functions won't recognise character boundaries in multi-byte
-strings. An application will at least be able to pass such strings
-through to some output though. Perhaps this will change in the
-future.
+If you use @code{gettext} both in C and Guile code, be aware that Guile
+always assumes the UTF-8 encoding and sets this encoding when Guile's
+@code{bindtextdomain} is called. If the C code expects a different
+encoding, then it needs to operate on a separate domain.
@end deffn
@c Local Variables:
diff --git a/doc/ref/guile.texi b/doc/ref/guile.texi
index 660b1ae90..5b56145ca 100644
--- a/doc/ref/guile.texi
+++ b/doc/ref/guile.texi
@@ -15,6 +15,7 @@ This manual documents Guile version @value{VERSION}.
Copyright (C) 1996-1997, 2000-2005, 2009-2021 Free Software Foundation,
Inc.
+Copyright (C) 2022 Maxime Devos
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
--
2.30.2
[-- Attachment #1.3: 0001-Avoid-producing-in-locales-with-too-few-characters.patch --]
[-- Type: text/x-patch, Size: 2497 bytes --]
From 1477c30cdf251863ed8eb3e1f1136262a9814130 Mon Sep 17 00:00:00 2001
From: Maxime Devos <maximedevos@telenet.be>
Date: Sun, 6 Mar 2022 12:30:17 +0000
Subject: [PATCH 1/2] Avoid producing ? in locales with too few characters.
Previously, if the locale used a character encoding without all
characters, then 'gettext' could produce '?' characters. Avoid
character encoding concerns by always using UTF-8.
* libguile/gettext.c (scm_gettext): Use scm_to_utf8_string and
scm_from_utf8_string for msgids.
(scm_ngettext): Likewise.
(scm_bindtextdomain): Set the character encoding to UTF-8.
---
libguile/gettext.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/libguile/gettext.c b/libguile/gettext.c
index b9af4d313..bf54def7f 100644
--- a/libguile/gettext.c
+++ b/libguile/gettext.c
@@ -1,5 +1,7 @@
/* Copyright 2004,2006,2018
Free Software Foundation, Inc.
+ Copyright 2022
+ Maxime Devos
This file is part of Guile.
@@ -100,7 +102,7 @@ SCM_DEFINE (scm_gettext, "gettext", 1, 2, 0,
scm_dynwind_begin (0);
- c_msgid = scm_to_locale_string (msgid);
+ c_msgid = scm_to_utf8_string (msgid);
scm_dynwind_free (c_msgid);
if (SCM_UNBNDP (domain))
@@ -133,7 +135,7 @@ SCM_DEFINE (scm_gettext, "gettext", 1, 2, 0,
if (c_result == c_msgid)
result = msgid;
else
- result = scm_from_locale_string (c_result);
+ result = scm_from_utf8_string (c_result);
scm_dynwind_end ();
return result;
@@ -158,10 +160,10 @@ SCM_DEFINE (scm_ngettext, "ngettext", 3, 2, 0,
scm_dynwind_begin (0);
- c_msgid = scm_to_locale_string (msgid);
+ c_msgid = scm_to_utf8_string (msgid);
scm_dynwind_free (c_msgid);
- c_msgid_plural = scm_to_locale_string (msgid_plural);
+ c_msgid_plural = scm_to_utf8_string (msgid_plural);
scm_dynwind_free (c_msgid_plural);
c_n = scm_to_ulong (n);
@@ -199,7 +201,7 @@ SCM_DEFINE (scm_ngettext, "ngettext", 3, 2, 0,
else if (c_result == c_msgid_plural)
result = msgid_plural;
else
- result = scm_from_locale_string (c_result);
+ result = scm_from_utf8_string (c_result);
scm_dynwind_end ();
return result;
@@ -272,6 +274,10 @@ SCM_DEFINE (scm_bindtextdomain, "bindtextdomain", 1, 1, 0,
else
result = SCM_BOOL_F;
+ c_result = bind_textdomain_codeset (c_domain, "UTF-8");
+ if (c_result == NULL)
+ SCM_SYSERROR;
+
scm_dynwind_end ();
return result;
}
base-commit: 24b30130ca75653bdbacea84ce0443608379d630
--
2.30.2
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]
reply other threads:[~2022-03-06 15:14 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeed150481e18446fb4e11622201e866e2c6d6d1.camel@telenet.be \
--to=maximedevos@telenet.be \
--cc=54272@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).