unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#54272: Avoid producing ? in locales with too few characters.
@ 2022-03-06 15:14 Maxime Devos
  0 siblings, 0 replies; only message in thread
From: Maxime Devos @ 2022-03-06 15:14 UTC (permalink / raw)
  To: 54272


[-- Attachment #1.1: Type: text/plain, Size: 43 bytes --]

This seems difficult to test.  Any ideas?

[-- Attachment #1.2: 0002-Deprecate-non-functional-bind-textdomain-codeset.patch --]
[-- Type: text/x-patch, Size: 3649 bytes --]

From 8f9a8e56cc3e8496fdfed3f889cff9fca19b3663 Mon Sep 17 00:00:00 2001
From: Maxime Devos <maximedevos@telenet.be>
Date: Sun, 6 Mar 2022 12:51:33 +0000
Subject: [PATCH 2/2] Deprecate non-functional bind-textdomain-codeset.

TODO: this only deprecated it in the documentation, it needs
to be deprecated elsewhere as well.

* doc/ref/api-i18n.texi (bind-textdomain-codeset): Update documentation.
* doc/ref/guile.texi: Update copyright information.
---
 doc/ref/api-i18n.texi | 36 ++++++++++++------------------------
 doc/ref/guile.texi    |  1 +
 2 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/doc/ref/api-i18n.texi b/doc/ref/api-i18n.texi
index 7c49b0a23..c06b75996 100644
--- a/doc/ref/api-i18n.texi
+++ b/doc/ref/api-i18n.texi
@@ -2,6 +2,7 @@
 @c This is part of the GNU Guile Reference Manual.
 @c Copyright (C)  1996, 1997, 2000, 2001, 2002, 2003, 2004, 2006, 2007,
 @c   2009, 2010, 2017 Free Software Foundation, Inc.
+@c Copyright (C)  2022 Maxime Devos
 @c See the file guile.texi for copying conditions.
 
 @node Internationalization
@@ -599,33 +600,20 @@ non-standard location.
 
 @deffn {Scheme Procedure} bind-textdomain-codeset domain [encoding]
 @deffnx {C Function} scm_bind_textdomain_codeset (domain, encoding)
-Get or set the text encoding to be used by @code{gettext} for messages
-from @var{domain}.  @var{encoding} is a string, the name of a coding
-system, for instance @nicode{"8859_1"}.  (On a Unix/POSIX system the
-@command{iconv} program can list all available encodings.)
 
-When called without an @var{encoding} the current setting is returned,
-or @code{#f} if none yet set.  When called with an @var{encoding}, it
-is set for @var{domain} and that new setting returned.  For example,
+This is a historical procedure, used for getting and setting the text
+encoding used by @code{gettext} for messages from @var{domain},
+preserved for compatibility.
 
-@example
-(bind-textdomain-codeset "myprog")
-@result{} #f
-(bind-textdomain-codeset "myprog" "latin-9")
-@result{} "latin-9"
-@end example
+This procedure became useless since Guile's string began consisting of
+characters instead of individual bytes, especially since the
+@code{gettext} procedure always used the locale encoding instead
+of the encoding of the text domain.
 
-The encoding requested can be different from the translated data file,
-messages will be recoded as necessary.  But note that when there is no
-translation, @code{gettext} returns its @var{msg} unchanged, ie.@:
-without any recoding.  For that reason source message strings are best
-as plain ASCII.
-
-Currently Guile has no understanding of multi-byte characters, and
-string functions won't recognise character boundaries in multi-byte
-strings.  An application will at least be able to pass such strings
-through to some output though.  Perhaps this will change in the
-future.
+If you use @code{gettext} both in C and Guile code, be aware that Guile
+always assumes the UTF-8 encoding and sets this encoding when Guile's
+@code{bindtextdomain} is called.  If the C code expects a different
+encoding, then it needs to operate on a separate domain.
 @end deffn
 
 @c Local Variables:
diff --git a/doc/ref/guile.texi b/doc/ref/guile.texi
index 660b1ae90..5b56145ca 100644
--- a/doc/ref/guile.texi
+++ b/doc/ref/guile.texi
@@ -15,6 +15,7 @@ This manual documents Guile version @value{VERSION}.
 
 Copyright (C) 1996-1997, 2000-2005, 2009-2021 Free Software Foundation,
 Inc.
+Copyright (C) 2022 Maxime Devos
 
 Permission is granted to copy, distribute and/or modify this document
 under the terms of the GNU Free Documentation License, Version 1.3 or
-- 
2.30.2


[-- Attachment #1.3: 0001-Avoid-producing-in-locales-with-too-few-characters.patch --]
[-- Type: text/x-patch, Size: 2497 bytes --]

From 1477c30cdf251863ed8eb3e1f1136262a9814130 Mon Sep 17 00:00:00 2001
From: Maxime Devos <maximedevos@telenet.be>
Date: Sun, 6 Mar 2022 12:30:17 +0000
Subject: [PATCH 1/2] Avoid producing ? in locales with too few characters.

Previously, if the locale used a character encoding without all
characters, then 'gettext' could produce '?' characters.  Avoid
character encoding concerns by always using UTF-8.

* libguile/gettext.c (scm_gettext): Use scm_to_utf8_string and
  scm_from_utf8_string for msgids.
  (scm_ngettext): Likewise.
  (scm_bindtextdomain): Set the character encoding to UTF-8.
---
 libguile/gettext.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/libguile/gettext.c b/libguile/gettext.c
index b9af4d313..bf54def7f 100644
--- a/libguile/gettext.c
+++ b/libguile/gettext.c
@@ -1,5 +1,7 @@
 /* Copyright 2004,2006,2018
      Free Software Foundation, Inc.
+   Copyright 2022
+     Maxime Devos
 
    This file is part of Guile.
 
@@ -100,7 +102,7 @@ SCM_DEFINE (scm_gettext, "gettext", 1, 2, 0,
 
   scm_dynwind_begin (0);
 
-  c_msgid = scm_to_locale_string (msgid);
+  c_msgid = scm_to_utf8_string (msgid);
   scm_dynwind_free (c_msgid);
 
   if (SCM_UNBNDP (domain))
@@ -133,7 +135,7 @@ SCM_DEFINE (scm_gettext, "gettext", 1, 2, 0,
   if (c_result == c_msgid)
     result = msgid;
   else
-    result = scm_from_locale_string (c_result);
+    result = scm_from_utf8_string (c_result);
 
   scm_dynwind_end ();
   return result;
@@ -158,10 +160,10 @@ SCM_DEFINE (scm_ngettext, "ngettext", 3, 2, 0,
 
   scm_dynwind_begin (0);
 
-  c_msgid = scm_to_locale_string (msgid);
+  c_msgid = scm_to_utf8_string (msgid);
   scm_dynwind_free (c_msgid);
 
-  c_msgid_plural = scm_to_locale_string (msgid_plural);
+  c_msgid_plural = scm_to_utf8_string (msgid_plural);
   scm_dynwind_free (c_msgid_plural);
 
   c_n = scm_to_ulong (n);
@@ -199,7 +201,7 @@ SCM_DEFINE (scm_ngettext, "ngettext", 3, 2, 0,
   else if (c_result == c_msgid_plural)
     result = msgid_plural;
   else
-    result = scm_from_locale_string (c_result);
+    result = scm_from_utf8_string (c_result);
   
   scm_dynwind_end ();
   return result;
@@ -272,6 +274,10 @@ SCM_DEFINE (scm_bindtextdomain, "bindtextdomain", 1, 1, 0,
   else
     result = SCM_BOOL_F;
 
+  c_result = bind_textdomain_codeset (c_domain, "UTF-8");
+  if (c_result == NULL)
+    SCM_SYSERROR;
+
   scm_dynwind_end ();
   return result;
 }

base-commit: 24b30130ca75653bdbacea84ce0443608379d630
-- 
2.30.2


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2022-03-06 15:14 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-06 15:14 bug#54272: Avoid producing ? in locales with too few characters Maxime Devos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).