unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Frequent locales problems for new users
@ 2020-03-17 20:28 Leo Famulari
  2020-03-18  7:47 ` Efraim Flashner
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Leo Famulari @ 2020-03-17 20:28 UTC (permalink / raw)
  To: guix-devel

Warning! Locales! New users seem to have trouble with Guix locales every
day.

I think we can improve the situation.

First, we can deprecate the glibc-utf8-locales package and not mention
it in the manual section Application Setup. I've seen users think they
had to install it in order to get UTF-8 support. Everyone should be
using glibc-locales. Eventually we can rename it to
'glibc-locales-for-tests', and hide the package too.

Second, we need to make sure that guix-install.sh is setting up
GUIX_LOCPATH correctly. I see that the binary tarball's store includes
glibc-utf8-locales, so it should be possible for things to "just work",
ignoring that it's the wrong locales package. Does anyone know any
particular issues with the installer that would cause trouble?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-17 20:28 Frequent locales problems for new users Leo Famulari
@ 2020-03-18  7:47 ` Efraim Flashner
  2020-03-18  8:12 ` Thorsten Wilms
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Efraim Flashner @ 2020-03-18  7:47 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1418 bytes --]

On Tue, Mar 17, 2020 at 04:28:43PM -0400, Leo Famulari wrote:
> Warning! Locales! New users seem to have trouble with Guix locales every
> day.
> 
> I think we can improve the situation.
> 
> First, we can deprecate the glibc-utf8-locales package and not mention
> it in the manual section Application Setup. I've seen users think they
> had to install it in order to get UTF-8 support. Everyone should be
> using glibc-locales. Eventually we can rename it to
> 'glibc-locales-for-tests', and hide the package too.
> 
> Second, we need to make sure that guix-install.sh is setting up
> GUIX_LOCPATH correctly. I see that the binary tarball's store includes
> glibc-utf8-locales, so it should be possible for things to "just work",
> ignoring that it's the wrong locales package. Does anyone know any
> particular issues with the installer that would cause trouble?

I haven't setup a new install or helped people with one in a while so
bear with me. IIRC there are two times it's needed, once for the daemon,
and we already added the environment variable to the systemd unit, and
once for each user. I think making it Just Work™ with the daemon would
be really good at a minimum.

-- 
Efraim Flashner   <efraim@flashner.co.il>   אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-17 20:28 Frequent locales problems for new users Leo Famulari
  2020-03-18  7:47 ` Efraim Flashner
@ 2020-03-18  8:12 ` Thorsten Wilms
  2020-03-18 16:22   ` Tobias Geerinckx-Rice
  2020-03-18 15:07 ` Ludovic Courtès
  2020-07-01 18:02 ` Vagrant Cascadian
  3 siblings, 1 reply; 12+ messages in thread
From: Thorsten Wilms @ 2020-03-18  8:12 UTC (permalink / raw)
  To: guix-devel

On Tue, 17 Mar 2020 16:28:43 -0400
Leo Famulari <leo@famulari.name> wrote:

> First, we can deprecate the glibc-utf8-locales package and not mention
> it in the manual section Application Setup. I've seen users think they
> had to install it in order to get UTF-8 support. Everyone should be
> using glibc-locales.

I mean to recall that I read in the docs and/or in an example, that
glibc-utf8-locales is smaller than glibc-locales, but still sufficient
for many cases. Is that wrong?


-- 
Thorsten Wilms <t_w_@freenet.de>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-17 20:28 Frequent locales problems for new users Leo Famulari
  2020-03-18  7:47 ` Efraim Flashner
  2020-03-18  8:12 ` Thorsten Wilms
@ 2020-03-18 15:07 ` Ludovic Courtès
  2020-03-18 18:36   ` Leo Famulari
  2020-07-01 18:02 ` Vagrant Cascadian
  3 siblings, 1 reply; 12+ messages in thread
From: Ludovic Courtès @ 2020-03-18 15:07 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel

Hello!

Leo Famulari <leo@famulari.name> skribis:

> Warning! Locales! New users seem to have trouble with Guix locales every
> day.
>
> I think we can improve the situation.
>
> First, we can deprecate the glibc-utf8-locales package and not mention
> it in the manual section Application Setup. I've seen users think they
> had to install it in order to get UTF-8 support. Everyone should be
> using glibc-locales. Eventually we can rename it to
> 'glibc-locales-for-tests', and hide the package too.

Well, we still need to be able to install locales somehow, right?  :-)

> Second, we need to make sure that guix-install.sh is setting up
> GUIX_LOCPATH correctly. I see that the binary tarball's store includes
> glibc-utf8-locales, so it should be possible for things to "just work",
> ignoring that it's the wrong locales package. Does anyone know any
> particular issues with the installer that would cause trouble?

‘guix-command’ in (guix self) creates a ‘guix’ binary where GUIX_LOCPATH
points to ‘glibc-utf8-locales’, always.  That means that ‘guix pull’
returns a ‘guix’ program that works fine, provided you use one of the
locales in ‘glibc-utf8-locales’ *or* you have installed ‘glibc-locales’
and set ‘GUIX_LOCPATH’.

The ‘guix’ binary of the ‘guix’ package does something similar.

These two should already eliminate most problems.  Now, we should
investigate actual problems to see why they show up precisely (for that
we need to see the output of commands, the contents of the .service
file, and so on).  That will allow us to determine the best course of
action.

As for ‘glibc-utf8-locales’ vs. ‘glibc-locales’: the reason for choosing
the former by default over the latter is size (14 MiB vs. 917 MiB).
Perhaps an improvement would be for ‘glibc-utf8-locales’ to be more true
to its name: to include all the UTF-8 locales glibc supports rather than
an arbitrary sample thereof.

Thoughts?

Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-18  8:12 ` Thorsten Wilms
@ 2020-03-18 16:22   ` Tobias Geerinckx-Rice
  0 siblings, 0 replies; 12+ messages in thread
From: Tobias Geerinckx-Rice @ 2020-03-18 16:22 UTC (permalink / raw)
  To: guix-devel; +Cc: Leo Famulari, Thorsten Wilms

[-- Attachment #1: Type: text/plain, Size: 1336 bytes --]

Ludo', Thorsten, Leo,

Ludovic Courtès 写道:
> Well, we still need to be able to install locales somehow, 
> right?  :-)

This isn't about removing all locale packages, just the 
poorly-named -utf8- variant.

Thorsten Wilms 写道:
> smaller than glibc-locales, but still sufficient for many cases.
> Is that wrong?

Yes and no.  It's sufficient for many (but not most) humans, but 
then that's true for ar_AE as well ;-)

Here's what it contains:

  de_DE.utf8 el_GR.utf8 en_US.utf8 fr_FR.utf8 tr_TR.utf8

Offering this as our only choice of ‘sufficient’ user locales has 
some unpleasant cultural overtones to say the least.

Where it is useful, and apparently does cover the majority of use 
cases, is in test suites &c.  It's a good package for machines. 
Hiding it would make that clear, as we already do with 
tzdata-for-tests.

Ludovic Courtès 写道:
> As for ‘glibc-utf8-locales’ vs. ‘glibc-locales’: the reason for 
> choosing
> the former by default over the latter is size (14 MiB vs. 917 
> MiB).
> Perhaps an improvement would be for ‘glibc-utf8-locales’ to be 
> more true
> to its name: to include all the UTF-8 locales glibc supports 
> rather than
> an arbitrary sample thereof.

That would make it well-named, so good by me!

Kind regards,

T G-R

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-18 15:07 ` Ludovic Courtès
@ 2020-03-18 18:36   ` Leo Famulari
  2020-03-21 15:37     ` Ludovic Courtès
  0 siblings, 1 reply; 12+ messages in thread
From: Leo Famulari @ 2020-03-18 18:36 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Wed, Mar 18, 2020 at 04:07:22PM +0100, Ludovic Courtès wrote:
> As for ‘glibc-utf8-locales’ vs. ‘glibc-locales’: the reason for choosing
> the former by default over the latter is size (14 MiB vs. 917 MiB).

Oof! I was going by the manual, which says 110 MiB. That does change
things...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-18 18:36   ` Leo Famulari
@ 2020-03-21 15:37     ` Ludovic Courtès
  2020-03-21 18:02       ` Gábor Boskovits
  2020-03-21 19:43       ` Leo Famulari
  0 siblings, 2 replies; 12+ messages in thread
From: Ludovic Courtès @ 2020-03-21 15:37 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

Hi Leo,

Leo Famulari <leo@famulari.name> skribis:

> On Wed, Mar 18, 2020 at 04:07:22PM +0100, Ludovic Courtès wrote:
>> As for ‘glibc-utf8-locales’ vs. ‘glibc-locales’: the reason for choosing
>> the former by default over the latter is size (14 MiB vs. 917 MiB).
>
> Oof! I was going by the manual, which says 110 MiB. That does change
> things...

Yes, I was also surprised.

The patch below produces a package that includes all the UTF-8 locales
(actually I had written that patch long ago, it feels like we’re running
in circles :-)).

It takes ages to build, and when it’s finally done:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build -e '((@@ (gnu packages base) make-glibc-utf8-locales/full))' 
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
substituting /gnu/store/jdfs3xvlnj272475yja6bjrprfsgnkdd-glibc-2.29...
downloading from https://ci.guix.gnu.org/nar/lzip/jdfs3xvlnj272475yja6bjrprfsgnkdd-glibc-2.29...
 glibc-2.29  8.2MiB                                                       1.8MiB/s 00:05 [##################] 100.0%

building /gnu/store/w08zi9vnkd7bxpfvm5lgjyb30i7k7sw4-glibc-supported-utf8-locales.scm.drv...
successfully built /gnu/store/w08zi9vnkd7bxpfvm5lgjyb30i7k7sw4-glibc-supported-utf8-locales.scm.drv
building /gnu/store/ps6wh05pwjp5b0l9rh2yglv3sggpgcw4-glibc-utf8-locales-2.29.drv...
successfully built /gnu/store/ps6wh05pwjp5b0l9rh2yglv3sggpgcw4-glibc-utf8-locales-2.29.drv
/gnu/store/p0knl9ggxk91x87ww702g2x78jxy1vgf-glibc-utf8-locales-2.29
ludo@ribbon ~/src/guix$ guix size /gnu/store/p0knl9ggxk91x87ww702g2x78jxy1vgf-glibc-utf8-locales-2.29 | tail -1
total: 855.7 MiB
--8<---------------cut here---------------end--------------->8---

So I think that’s when we reached the conclusion that we needed
parameterized packages to allow users to choose the locale(s) they need
or special support in ‘guix package’.

:-/

Attached is the list of supported UTF-8 locales, 312 in total.

Thoughts?  How do other distros deal with this?  Are we missing some
trick to compress locale data?

Ludo’.


[-- Attachment #2: Type: text/x-patch, Size: 4435 bytes --]

diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm
index e8150708c0..98b413da13 100644
--- a/gnu/packages/base.scm
+++ b/gnu/packages/base.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2014, 2019 Andreas Enge <andreas@enge.fr>
 ;;; Copyright © 2012 Nikita Karetnikov <nikita@karetnikov.org>
 ;;; Copyright © 2014, 2015, 2016, 2018 Mark H Weaver <mhw@netris.org>
@@ -52,6 +52,8 @@
   #:use-module (gnu packages python)
   #:use-module (gnu packages gettext)
   #:use-module (guix utils)
+  #:use-module (guix gexp)
+  #:use-module (guix modules)
   #:use-module (guix packages)
   #:use-module (guix download)
   #:use-module (guix git-download)
@@ -61,6 +63,8 @@
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-26)
   #:export (glibc
+            %default-utf8-locales
+            make-glibc-utf8-locales
             libiconv-if-needed))
 
 ;;; Commentary:
@@ -1076,7 +1080,12 @@ to the @code{share/locale} sub-directory of this package.")
                                         ,(version-major+minor
                                           (package-version glibc)))))))))))
 
-(define-public (make-glibc-utf8-locales glibc)
+(define %default-utf8-locales
+  '("de_DE" "el_GR" "en_US" "fr_FR" "tr_TR"))
+
+(define* (make-glibc-utf8-locales glibc #:optional
+                                  (locales %default-utf8-locales)
+                                  (locale-file #f))
   (package
     (name "glibc-utf8-locales")
     (version (package-version glibc))
@@ -1115,10 +1124,17 @@ to the @code{share/locale} sub-directory of this package.")
 
                                ;; These are the locales commonly used for
                                ;; tests---e.g., in Guile's i18n tests.
-                               '("de_DE" "el_GR" "en_US" "fr_FR" "tr_TR"))
+                               ,(if locale-file
+                                    `(call-with-input-file
+                                         (assoc-ref %build-inputs "locale-file")
+                                       read)
+                                    `',locales))
                      #t))))
     (native-inputs `(("glibc" ,glibc)
-                     ("gzip" ,gzip)))
+                     ("gzip" ,gzip)
+                     ,@(if locale-file
+                           `(("locale-file" ,locale-file))
+                           '())))
     (synopsis "Small sample of UTF-8 locales")
     (description
      "This package provides a small sample of UTF-8 locales mostly useful in
@@ -1145,6 +1161,40 @@ test environments.")
 (define-public glibc-locales-2.27
   (deprecated-package "glibc-locales-2.27" glibc-locales-2.28))
 
+(define (glibc-supported-locales libc)
+  ((module-ref (resolve-interface '(gnu system locale)) ;FIXME: hack
+               'glibc-supported-locales)
+   libc))
+
+(define* (make-glibc-utf8-locales/full #:optional (glibc glibc))
+  (define utf8-locales
+    (computed-file "glibc-supported-utf8-locales.scm"
+                   #~(begin
+                       (use-modules (srfi srfi-1)
+                                    (ice-9 match)
+                                    (ice-9 pretty-print))
+
+                       (define locales
+                         (call-with-input-file
+                             #+(glibc-supported-locales glibc)
+                           read))
+
+                       (define utf8-locales
+                         (filter-map (match-lambda
+                                       ((name . "UTF-8")
+                                        (if (string-suffix? ".UTF-8" name)
+                                            (string-drop-right name 6)
+                                            name))
+                                       (_ #f))
+                                     locales))
+
+                       (call-with-output-file #$output
+                         (lambda (port)
+                           (pretty-print utf8-locales port))))))
+
+  (make-glibc-utf8-locales glibc #:locale-file utf8-locales))
+
+\f
 (define-public which
   (package
     (name "which")

[-- Attachment #3: Type: text/plain, Size: 2962 bytes --]

("aa_DJ"
 "aa_ER"
 "aa_ER@saaho"
 "aa_ET"
 "af_ZA"
 "agr_PE"
 "ak_GH"
 "am_ET"
 "an_ES"
 "anp_IN"
 "ar_AE"
 "ar_BH"
 "ar_DZ"
 "ar_EG"
 "ar_IN"
 "ar_IQ"
 "ar_JO"
 "ar_KW"
 "ar_LB"
 "ar_LY"
 "ar_MA"
 "ar_OM"
 "ar_QA"
 "ar_SA"
 "ar_SD"
 "ar_SS"
 "ar_SY"
 "ar_TN"
 "ar_YE"
 "ayc_PE"
 "az_AZ"
 "az_IR"
 "as_IN"
 "ast_ES"
 "be_BY"
 "be_BY@latin"
 "bem_ZM"
 "ber_DZ"
 "ber_MA"
 "bg_BG"
 "bhb_IN"
 "bho_IN"
 "bho_NP"
 "bi_VU"
 "bn_BD"
 "bn_IN"
 "bo_CN"
 "bo_IN"
 "br_FR"
 "brx_IN"
 "bs_BA"
 "byn_ER"
 "ca_AD"
 "ca_ES"
 "ca_ES@valencia"
 "ca_FR"
 "ca_IT"
 "ce_RU"
 "chr_US"
 "cmn_TW"
 "crh_UA"
 "cs_CZ"
 "csb_PL"
 "cv_RU"
 "cy_GB"
 "da_DK"
 "de_AT"
 "de_BE"
 "de_CH"
 "de_DE"
 "de_IT"
 "de_LI"
 "de_LU"
 "doi_IN"
 "dsb_DE"
 "dv_MV"
 "dz_BT"
 "el_GR"
 "el_CY"
 "en_AG"
 "en_AU"
 "en_BW"
 "en_CA"
 "en_DK"
 "en_GB"
 "en_HK"
 "en_IE"
 "en_IL"
 "en_IN"
 "en_NG"
 "en_NZ"
 "en_PH"
 "en_SC"
 "en_SG"
 "en_US"
 "en_ZA"
 "en_ZM"
 "en_ZW"
 "eo"
 "es_AR"
 "es_BO"
 "es_CL"
 "es_CO"
 "es_CR"
 "es_CU"
 "es_DO"
 "es_EC"
 "es_ES"
 "es_GT"
 "es_HN"
 "es_MX"
 "es_NI"
 "es_PA"
 "es_PE"
 "es_PR"
 "es_PY"
 "es_SV"
 "es_US"
 "es_UY"
 "es_VE"
 "et_EE"
 "eu_ES"
 "fa_IR"
 "ff_SN"
 "fi_FI"
 "fil_PH"
 "fo_FO"
 "fr_BE"
 "fr_CA"
 "fr_CH"
 "fr_FR"
 "fr_LU"
 "fur_IT"
 "fy_NL"
 "fy_DE"
 "ga_IE"
 "gd_GB"
 "gez_ER"
 "gez_ER@abegede"
 "gez_ET"
 "gez_ET@abegede"
 "gl_ES"
 "gu_IN"
 "gv_GB"
 "ha_NG"
 "hak_TW"
 "he_IL"
 "hi_IN"
 "hif_FJ"
 "hne_IN"
 "hr_HR"
 "hsb_DE"
 "ht_HT"
 "hu_HU"
 "hy_AM"
 "ia_FR"
 "id_ID"
 "ig_NG"
 "ik_CA"
 "is_IS"
 "it_CH"
 "it_IT"
 "iu_CA"
 "ja_JP"
 "ka_GE"
 "kab_DZ"
 "kk_KZ"
 "kl_GL"
 "km_KH"
 "kn_IN"
 "ko_KR"
 "kok_IN"
 "ks_IN"
 "ks_IN@devanagari"
 "ku_TR"
 "kw_GB"
 "ky_KG"
 "lb_LU"
 "lg_UG"
 "li_BE"
 "li_NL"
 "lij_IT"
 "ln_CD"
 "lo_LA"
 "lt_LT"
 "lv_LV"
 "lzh_TW"
 "mag_IN"
 "mai_IN"
 "mai_NP"
 "mfe_MU"
 "mg_MG"
 "mhr_RU"
 "mi_NZ"
 "miq_NI"
 "mjw_IN"
 "mk_MK"
 "ml_IN"
 "mn_MN"
 "mni_IN"
 "mr_IN"
 "ms_MY"
 "mt_MT"
 "my_MM"
 "nan_TW"
 "nan_TW@latin"
 "nb_NO"
 "nds_DE"
 "nds_NL"
 "ne_NP"
 "nhn_MX"
 "niu_NU"
 "niu_NZ"
 "nl_AW"
 "nl_BE"
 "nl_NL"
 "nn_NO"
 "nr_ZA"
 "nso_ZA"
 "oc_FR"
 "om_ET"
 "om_KE"
 "or_IN"
 "os_RU"
 "pa_IN"
 "pa_PK"
 "pap_AW"
 "pap_CW"
 "pl_PL"
 "ps_AF"
 "pt_BR"
 "pt_PT"
 "quz_PE"
 "raj_IN"
 "ro_RO"
 "ru_RU"
 "ru_UA"
 "rw_RW"
 "sa_IN"
 "sah_RU"
 "sat_IN"
 "sc_IT"
 "sd_IN"
 "sd_IN@devanagari"
 "se_NO"
 "sgs_LT"
 "shn_MM"
 "shs_CA"
 "si_LK"
 "sid_ET"
 "sk_SK"
 "sl_SI"
 "sm_WS"
 "so_DJ"
 "so_ET"
 "so_KE"
 "so_SO"
 "sq_AL"
 "sq_MK"
 "sr_ME"
 "sr_RS"
 "sr_RS@latin"
 "ss_ZA"
 "st_ZA"
 "sv_FI"
 "sv_SE"
 "sw_KE"
 "sw_TZ"
 "szl_PL"
 "ta_IN"
 "ta_LK"
 "tcy_IN"
 "te_IN"
 "tg_TJ"
 "th_TH"
 "the_NP"
 "ti_ER"
 "ti_ET"
 "tig_ER"
 "tk_TM"
 "tl_PH"
 "tn_ZA"
 "to_TO"
 "tpi_PG"
 "tr_CY"
 "tr_TR"
 "ts_ZA"
 "tt_RU"
 "tt_RU@iqtelif"
 "ug_CN"
 "uk_UA"
 "unm_US"
 "ur_IN"
 "ur_PK"
 "uz_UZ"
 "uz_UZ@cyrillic"
 "ve_ZA"
 "vi_VN"
 "wa_BE"
 "wae_CH"
 "wal_ET"
 "wo_SN"
 "xh_ZA"
 "yi_US"
 "yo_NG"
 "yue_HK"
 "yuw_PG"
 "zh_CN"
 "zh_HK"
 "zh_SG"
 "zh_TW"
 "zu_ZA")

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-21 15:37     ` Ludovic Courtès
@ 2020-03-21 18:02       ` Gábor Boskovits
  2020-03-21 19:43       ` Leo Famulari
  1 sibling, 0 replies; 12+ messages in thread
From: Gábor Boskovits @ 2020-03-21 18:02 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 6604 bytes --]

Hello,

Ludovic Courtès <ludo@gnu.org> ezt írta (időpont: 2020. márc. 21., Szo
16:37):

> Hi Leo,
>
> Leo Famulari <leo@famulari.name> skribis:
>
> > On Wed, Mar 18, 2020 at 04:07:22PM +0100, Ludovic Courtès wrote:
> >> As for ‘glibc-utf8-locales’ vs. ‘glibc-locales’: the reason for choosing
> >> the former by default over the latter is size (14 MiB vs. 917 MiB).
> >
> > Oof! I was going by the manual, which says 110 MiB. That does change
> > things...
>
> Yes, I was also surprised.
>
> The patch below produces a package that includes all the UTF-8 locales
> (actually I had written that patch long ago, it feels like we’re running
> in circles :-)).
>
> It takes ages to build, and when it’s finally done:
>
> --8<---------------cut here---------------start------------->8---
> $ ./pre-inst-env guix build -e '((@@ (gnu packages base)
> make-glibc-utf8-locales/full))'
> substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
> substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
> substituting /gnu/store/jdfs3xvlnj272475yja6bjrprfsgnkdd-glibc-2.29...
> downloading from
> https://ci.guix.gnu.org/nar/lzip/jdfs3xvlnj272475yja6bjrprfsgnkdd-glibc-2.29.
> ..
>  glibc-2.29  8.2MiB
>  1.8MiB/s 00:05 [##################] 100.0%
>
> building
> /gnu/store/w08zi9vnkd7bxpfvm5lgjyb30i7k7sw4-glibc-supported-utf8-locales.scm.drv...
> successfully built
> /gnu/store/w08zi9vnkd7bxpfvm5lgjyb30i7k7sw4-glibc-supported-utf8-locales.scm.drv
> building
> /gnu/store/ps6wh05pwjp5b0l9rh2yglv3sggpgcw4-glibc-utf8-locales-2.29.drv...
> successfully built
> /gnu/store/ps6wh05pwjp5b0l9rh2yglv3sggpgcw4-glibc-utf8-locales-2.29.drv
> /gnu/store/p0knl9ggxk91x87ww702g2x78jxy1vgf-glibc-utf8-locales-2.29
> ludo@ribbon ~/src/guix$ guix size
> /gnu/store/p0knl9ggxk91x87ww702g2x78jxy1vgf-glibc-utf8-locales-2.29 | tail
> -1
> total: 855.7 MiB
> --8<---------------cut here---------------end--------------->8---
>
> So I think that’s when we reached the conclusion that we needed
> parameterized packages to allow users to choose the locale(s) they need
> or special support in ‘guix package’.
>

I believe we could also add individual locales as outputs. Then we just
have to make sure that they are included to the LOCPATH. I believe we could
do this to the frequently used locales, and direct users to only install
out when they don't find an output with their locale. Wdyt?

>
> :-/
>
> Attached is the list of supported UTF-8 locales, 312 in total.
>
> Thoughts?  How do other distros deal with this?  Are we missing some
> trick to compress locale data?
>
> Ludo’.
>
g_bor

>
> ("aa_DJ"
>  "aa_ER"
>  "aa_ER@saaho"
>  "aa_ET"
>  "af_ZA"
>  "agr_PE"
>  "ak_GH"
>  "am_ET"
>  "an_ES"
>  "anp_IN"
>  "ar_AE"
>  "ar_BH"
>  "ar_DZ"
>  "ar_EG"
>  "ar_IN"
>  "ar_IQ"
>  "ar_JO"
>  "ar_KW"
>  "ar_LB"
>  "ar_LY"
>  "ar_MA"
>  "ar_OM"
>  "ar_QA"
>  "ar_SA"
>  "ar_SD"
>  "ar_SS"
>  "ar_SY"
>  "ar_TN"
>  "ar_YE"
>  "ayc_PE"
>  "az_AZ"
>  "az_IR"
>  "as_IN"
>  "ast_ES"
>  "be_BY"
>  "be_BY@latin"
>  "bem_ZM"
>  "ber_DZ"
>  "ber_MA"
>  "bg_BG"
>  "bhb_IN"
>  "bho_IN"
>  "bho_NP"
>  "bi_VU"
>  "bn_BD"
>  "bn_IN"
>  "bo_CN"
>  "bo_IN"
>  "br_FR"
>  "brx_IN"
>  "bs_BA"
>  "byn_ER"
>  "ca_AD"
>  "ca_ES"
>  "ca_ES@valencia"
>  "ca_FR"
>  "ca_IT"
>  "ce_RU"
>  "chr_US"
>  "cmn_TW"
>  "crh_UA"
>  "cs_CZ"
>  "csb_PL"
>  "cv_RU"
>  "cy_GB"
>  "da_DK"
>  "de_AT"
>  "de_BE"
>  "de_CH"
>  "de_DE"
>  "de_IT"
>  "de_LI"
>  "de_LU"
>  "doi_IN"
>  "dsb_DE"
>  "dv_MV"
>  "dz_BT"
>  "el_GR"
>  "el_CY"
>  "en_AG"
>  "en_AU"
>  "en_BW"
>  "en_CA"
>  "en_DK"
>  "en_GB"
>  "en_HK"
>  "en_IE"
>  "en_IL"
>  "en_IN"
>  "en_NG"
>  "en_NZ"
>  "en_PH"
>  "en_SC"
>  "en_SG"
>  "en_US"
>  "en_ZA"
>  "en_ZM"
>  "en_ZW"
>  "eo"
>  "es_AR"
>  "es_BO"
>  "es_CL"
>  "es_CO"
>  "es_CR"
>  "es_CU"
>  "es_DO"
>  "es_EC"
>  "es_ES"
>  "es_GT"
>  "es_HN"
>  "es_MX"
>  "es_NI"
>  "es_PA"
>  "es_PE"
>  "es_PR"
>  "es_PY"
>  "es_SV"
>  "es_US"
>  "es_UY"
>  "es_VE"
>  "et_EE"
>  "eu_ES"
>  "fa_IR"
>  "ff_SN"
>  "fi_FI"
>  "fil_PH"
>  "fo_FO"
>  "fr_BE"
>  "fr_CA"
>  "fr_CH"
>  "fr_FR"
>  "fr_LU"
>  "fur_IT"
>  "fy_NL"
>  "fy_DE"
>  "ga_IE"
>  "gd_GB"
>  "gez_ER"
>  "gez_ER@abegede"
>  "gez_ET"
>  "gez_ET@abegede"
>  "gl_ES"
>  "gu_IN"
>  "gv_GB"
>  "ha_NG"
>  "hak_TW"
>  "he_IL"
>  "hi_IN"
>  "hif_FJ"
>  "hne_IN"
>  "hr_HR"
>  "hsb_DE"
>  "ht_HT"
>  "hu_HU"
>  "hy_AM"
>  "ia_FR"
>  "id_ID"
>  "ig_NG"
>  "ik_CA"
>  "is_IS"
>  "it_CH"
>  "it_IT"
>  "iu_CA"
>  "ja_JP"
>  "ka_GE"
>  "kab_DZ"
>  "kk_KZ"
>  "kl_GL"
>  "km_KH"
>  "kn_IN"
>  "ko_KR"
>  "kok_IN"
>  "ks_IN"
>  "ks_IN@devanagari"
>  "ku_TR"
>  "kw_GB"
>  "ky_KG"
>  "lb_LU"
>  "lg_UG"
>  "li_BE"
>  "li_NL"
>  "lij_IT"
>  "ln_CD"
>  "lo_LA"
>  "lt_LT"
>  "lv_LV"
>  "lzh_TW"
>  "mag_IN"
>  "mai_IN"
>  "mai_NP"
>  "mfe_MU"
>  "mg_MG"
>  "mhr_RU"
>  "mi_NZ"
>  "miq_NI"
>  "mjw_IN"
>  "mk_MK"
>  "ml_IN"
>  "mn_MN"
>  "mni_IN"
>  "mr_IN"
>  "ms_MY"
>  "mt_MT"
>  "my_MM"
>  "nan_TW"
>  "nan_TW@latin"
>  "nb_NO"
>  "nds_DE"
>  "nds_NL"
>  "ne_NP"
>  "nhn_MX"
>  "niu_NU"
>  "niu_NZ"
>  "nl_AW"
>  "nl_BE"
>  "nl_NL"
>  "nn_NO"
>  "nr_ZA"
>  "nso_ZA"
>  "oc_FR"
>  "om_ET"
>  "om_KE"
>  "or_IN"
>  "os_RU"
>  "pa_IN"
>  "pa_PK"
>  "pap_AW"
>  "pap_CW"
>  "pl_PL"
>  "ps_AF"
>  "pt_BR"
>  "pt_PT"
>  "quz_PE"
>  "raj_IN"
>  "ro_RO"
>  "ru_RU"
>  "ru_UA"
>  "rw_RW"
>  "sa_IN"
>  "sah_RU"
>  "sat_IN"
>  "sc_IT"
>  "sd_IN"
>  "sd_IN@devanagari"
>  "se_NO"
>  "sgs_LT"
>  "shn_MM"
>  "shs_CA"
>  "si_LK"
>  "sid_ET"
>  "sk_SK"
>  "sl_SI"
>  "sm_WS"
>  "so_DJ"
>  "so_ET"
>  "so_KE"
>  "so_SO"
>  "sq_AL"
>  "sq_MK"
>  "sr_ME"
>  "sr_RS"
>  "sr_RS@latin"
>  "ss_ZA"
>  "st_ZA"
>  "sv_FI"
>  "sv_SE"
>  "sw_KE"
>  "sw_TZ"
>  "szl_PL"
>  "ta_IN"
>  "ta_LK"
>  "tcy_IN"
>  "te_IN"
>  "tg_TJ"
>  "th_TH"
>  "the_NP"
>  "ti_ER"
>  "ti_ET"
>  "tig_ER"
>  "tk_TM"
>  "tl_PH"
>  "tn_ZA"
>  "to_TO"
>  "tpi_PG"
>  "tr_CY"
>  "tr_TR"
>  "ts_ZA"
>  "tt_RU"
>  "tt_RU@iqtelif"
>  "ug_CN"
>  "uk_UA"
>  "unm_US"
>  "ur_IN"
>  "ur_PK"
>  "uz_UZ"
>  "uz_UZ@cyrillic"
>  "ve_ZA"
>  "vi_VN"
>  "wa_BE"
>  "wae_CH"
>  "wal_ET"
>  "wo_SN"
>  "xh_ZA"
>  "yi_US"
>  "yo_NG"
>  "yue_HK"
>  "yuw_PG"
>  "zh_CN"
>  "zh_HK"
>  "zh_SG"
>  "zh_TW"
>  "zu_ZA")
>

[-- Attachment #2: Type: text/html, Size: 11917 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-21 15:37     ` Ludovic Courtès
  2020-03-21 18:02       ` Gábor Boskovits
@ 2020-03-21 19:43       ` Leo Famulari
  2020-03-21 20:14         ` Leo Famulari
  2020-03-26 12:06         ` Ludovic Courtès
  1 sibling, 2 replies; 12+ messages in thread
From: Leo Famulari @ 2020-03-21 19:43 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Sat, Mar 21, 2020 at 04:37:05PM +0100, Ludovic Courtès wrote:
> Thoughts?  How do other distros deal with this?  Are we missing some
> trick to compress locale data?

I noticed that downloading glibc-locales, it's 10.8 MiB. On disk, the
store item is ~220 MiB. I'm not sure how guix size calculates 917 MiB.

Debian Buster's (stable) locales directory contains 357 entries, ours
contains 645. Debian's locales directory is ~13 MiB, ours ~220 MiB.

I poked around a bit. Debian achieves the smaller size by referring to
data rather than copying it around, and only including delta changes.
Hopefully we can copy their technique.

Our locales are collections of binary files in a directory, like this...

------
en_US
├── LC_ADDRESS
├── LC_COLLATE
├── LC_CTYPE
├── LC_IDENTIFICATION
├── LC_MEASUREMENT
├── LC_MESSAGES
│   └── SYS_LC_MESSAGES
├── LC_MONETARY
├── LC_NAME
├── LC_NUMERIC
├── LC_PAPER
├── LC_TELEPHONE
└── LC_TIME
------

... while Debian concatenates the files together as text.

I compared the en_US directory from both places, and it seems that
LC_CTYPE is re-used from en_GB with delta patching:

------
$ /gnu/store/03nvilh2x4z07dxv7h13gh986vvgpnsf-glibc-locales-2.29/lib/locale/2.29/en_US% du -sh *
4.0K	LC_ADDRESS
24K	LC_COLLATE
284K	LC_CTYPE                  <--- the big one
4.0K	LC_IDENTIFICATION
4.0K	LC_MEASUREMENT
8.0K	LC_MESSAGES
4.0K	LC_MONETARY
4.0K	LC_NAME
4.0K	LC_NUMERIC
4.0K	LC_PAPER
4.0K	LC_TELEPHONE
4.0K	LC_TIME
$ cat /usr/share/i18n/locales/en_US
[...]
LC_CTYPE
copy "en_GB"
END LC_CTYPE
[...]
$ cat /usr/share/i18n/locales/en_GB
[...]
LC_CTYPE
copy "i18n"

translit_start
include "translit_combining";""
translit_end
END LC_CTYPE
[...]
$ du -sh /usr/share/i18n/locales/i18n_ctype
160K	/usr/share/i18n/locales/i18n_ctype
------

Another example, more obscure:

------
$ /gnu/store/03nvilh2x4z07dxv7h13gh986vvgpnsf-glibc-locales-2.29/lib/locale/2.29/te_IN% du -sh *
4.0K	LC_ADDRESS
2.5M	LC_COLLATE              <--- Yikes
332K	LC_CTYPE                <--- Still big
4.0K	LC_IDENTIFICATION
4.0K	LC_MEASUREMENT
8.0K	LC_MESSAGES
4.0K	LC_MONETARY
4.0K	LC_NAME
4.0K	LC_NUMERIC
4.0K	LC_PAPER
4.0K	LC_TELEPHONE
8.0K	LC_TIME
$ cat /usr/share/i18n/locales/te_IN
[...]
LC_CTYPE
copy "i18n"

% Telugu uses the alternate digits U+0C66..U+0C6F
outdigit <U0C66>..<U0C6F>

% This is used in the scanf family of functions to read Telugu numbers
% using "%Id" and such.
map to_inpunct; /
  (<U0030>,<U0C66>); /
  (<U0031>,<U0C67>); /
  (<U0032>,<U0C68>); /
  (<U0033>,<U0C69>); /
  (<U0034>,<U0C6A>); /
  (<U0035>,<U0C6B>); /
  (<U0036>,<U0C6C>); /
  (<U0037>,<U0C6D>); /
  (<U0038>,<U0C6E>); /
  (<U0039>,<U0C6F>);

translit_start
include  "translit_combining";""
translit_end
END LC_CTYPE

LC_COLLATE

% Copy the template from ISO/IEC 14651
copy "iso14651_t1"

END LC_COLLATE
------

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-21 19:43       ` Leo Famulari
@ 2020-03-21 20:14         ` Leo Famulari
  2020-03-26 12:06         ` Ludovic Courtès
  1 sibling, 0 replies; 12+ messages in thread
From: Leo Famulari @ 2020-03-21 20:14 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Sat, Mar 21, 2020 at 03:43:32PM -0400, Leo Famulari wrote:
> I poked around a bit. Debian achieves the smaller size by referring to
> data rather than copying it around, and only including delta changes.
> Hopefully we can copy their technique.

We are discussing it on #guix. I think that Debian just packages the
sources and then builds only what is requested by users, so my
comparison may be moot.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-21 19:43       ` Leo Famulari
  2020-03-21 20:14         ` Leo Famulari
@ 2020-03-26 12:06         ` Ludovic Courtès
  1 sibling, 0 replies; 12+ messages in thread
From: Ludovic Courtès @ 2020-03-26 12:06 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel

Hi,

Leo Famulari <leo@famulari.name> skribis:

> On Sat, Mar 21, 2020 at 04:37:05PM +0100, Ludovic Courtès wrote:
>> Thoughts?  How do other distros deal with this?  Are we missing some
>> trick to compress locale data?
>
> I noticed that downloading glibc-locales, it's 10.8 MiB. On disk, the
> store item is ~220 MiB. I'm not sure how guix size calculates 917 MiB.

Oh, this is due to hard links: nars don’t support hard links, so the
same thing is repeated several times.

--8<---------------cut here---------------start------------->8---
$ guix archive --export glibc-locales |wc -c
961328272
$ du -hsl $(guix build glibc-locales)
939M    /gnu/store/03nvilh2x4z07dxv7h13gh986vvgpnsf-glibc-locales-2.29
$ du -hs $(guix build glibc-locales)
220M    /gnu/store/03nvilh2x4z07dxv7h13gh986vvgpnsf-glibc-locales-2.29
--8<---------------cut here---------------end--------------->8---

(It does mean that we should replace hard links with symlinks, like we
do for ‘git’.)

Doing that with the full set of UTF-8 locales I mentioned in my previous
message, I see:

--8<---------------cut here---------------start------------->8---
$ du -hsl /gnu/store/p0knl9ggxk91x87ww702g2x78jxy1vgf-glibc-utf8-locales-2.29
870M	/gnu/store/p0knl9ggxk91x87ww702g2x78jxy1vgf-glibc-utf8-locales-2.29
$ du -hs /gnu/store/p0knl9ggxk91x87ww702g2x78jxy1vgf-glibc-utf8-locales-2.29
193M	/gnu/store/p0knl9ggxk91x87ww702g2x78jxy1vgf-glibc-utf8-locales-2.29
--8<---------------cut here---------------end--------------->8---

To compare to:

--8<---------------cut here---------------start------------->8---
$ du -hs $(guix build glibc-utf8-locales)
6.1M	/gnu/store/n79cf8bvy3k96gjk1rf18d36w40lkwlr-glibc-utf8-locales-2.29
$ du -hsl $(guix build glibc-utf8-locales)
15M	/gnu/store/n79cf8bvy3k96gjk1rf18d36w40lkwlr-glibc-utf8-locales-2.29
--8<---------------cut here---------------end--------------->8---

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Frequent locales problems for new users
  2020-03-17 20:28 Frequent locales problems for new users Leo Famulari
                   ` (2 preceding siblings ...)
  2020-03-18 15:07 ` Ludovic Courtès
@ 2020-07-01 18:02 ` Vagrant Cascadian
  3 siblings, 0 replies; 12+ messages in thread
From: Vagrant Cascadian @ 2020-07-01 18:02 UTC (permalink / raw)
  To: Leo Famulari, guix-devel

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

On 2020-03-17, Leo Famulari wrote:
> Warning! Locales! New users seem to have trouble with Guix locales every
> day.
>
> I think we can improve the situation.
>
> First, we can deprecate the glibc-utf8-locales package and not mention
> it in the manual section Application Setup. I've seen users think they
> had to install it in order to get UTF-8 support. Everyone should be
> using glibc-locales. Eventually we can rename it to
> 'glibc-locales-for-tests', and hide the package too.
>
> Second, we need to make sure that guix-install.sh is setting up
> GUIX_LOCPATH correctly. I see that the binary tarball's store includes
> glibc-utf8-locales, so it should be possible for things to "just work",
> ignoring that it's the wrong locales package. Does anyone know any
> particular issues with the installer that would cause trouble?

I neglecteed to chime in way back when, but in irc the other day issues
around locales came up and I wondered ...

Any compelling reason not to put each locale into it's own package
and/or output?

You could have meta-packages which pull in specific sets
"glibc-locales-es" which pull in all spanish locales, or
"glibc-locales-all" or "glibc-locales-all-utf8" which pulls in
everything. Or some other semi-logical splitting.

That way users could install exactly the locales they want. It could be
selected from the installer, and install only the specific locales they
want, or sets of locales they want, etc.


live well,
  vagrant

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-07-01 18:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-17 20:28 Frequent locales problems for new users Leo Famulari
2020-03-18  7:47 ` Efraim Flashner
2020-03-18  8:12 ` Thorsten Wilms
2020-03-18 16:22   ` Tobias Geerinckx-Rice
2020-03-18 15:07 ` Ludovic Courtès
2020-03-18 18:36   ` Leo Famulari
2020-03-21 15:37     ` Ludovic Courtès
2020-03-21 18:02       ` Gábor Boskovits
2020-03-21 19:43       ` Leo Famulari
2020-03-21 20:14         ` Leo Famulari
2020-03-26 12:06         ` Ludovic Courtès
2020-07-01 18:02 ` Vagrant Cascadian

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).