From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= Subject: Re: Frequent locales problems for new users Date: Sat, 21 Mar 2020 16:37:05 +0100 Message-ID: <87pnd51zz2.fsf@gnu.org> References: <20200317202843.GA18844@jasmine.lan> <87eetp8zx1.fsf@gnu.org> <20200318183622.GA25087@jasmine.lan> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:40997) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jFgBe-0004N6-Ef for guix-devel@gnu.org; Sat, 21 Mar 2020 11:37:24 -0400 In-Reply-To: <20200318183622.GA25087@jasmine.lan> (Leo Famulari's message of "Wed, 18 Mar 2020 14:36:22 -0400") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane-mx.org@gnu.org Sender: "Guix-devel" To: Leo Famulari Cc: guix-devel@gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Leo, Leo Famulari skribis: > On Wed, Mar 18, 2020 at 04:07:22PM +0100, Ludovic Court=C3=A8s wrote: >> As for =E2=80=98glibc-utf8-locales=E2=80=99 vs. =E2=80=98glibc-locales= =E2=80=99: the reason for choosing >> the former by default over the latter is size (14=C2=A0MiB vs. 917=C2=A0= MiB). > > Oof! I was going by the manual, which says 110 MiB. That does change > things... Yes, I was also surprised. The patch below produces a package that includes all the UTF-8 locales (actually I had written that patch long ago, it feels like we=E2=80=99re ru= nning in circles :-)). It takes ages to build, and when it=E2=80=99s finally done: --8<---------------cut here---------------start------------->8--- $ ./pre-inst-env guix build -e '((@@ (gnu packages base) make-glibc-utf8-lo= cales/full))'=20 substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0% substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0% substituting /gnu/store/jdfs3xvlnj272475yja6bjrprfsgnkdd-glibc-2.29... downloading from https://ci.guix.gnu.org/nar/lzip/jdfs3xvlnj272475yja6bjrpr= fsgnkdd-glibc-2.29... glibc-2.29 8.2MiB 1= .8MiB/s 00:05 [##################] 100.0% building /gnu/store/w08zi9vnkd7bxpfvm5lgjyb30i7k7sw4-glibc-supported-utf8-l= ocales.scm.drv... successfully built /gnu/store/w08zi9vnkd7bxpfvm5lgjyb30i7k7sw4-glibc-suppor= ted-utf8-locales.scm.drv building /gnu/store/ps6wh05pwjp5b0l9rh2yglv3sggpgcw4-glibc-utf8-locales-2.2= 9.drv... successfully built /gnu/store/ps6wh05pwjp5b0l9rh2yglv3sggpgcw4-glibc-utf8-l= ocales-2.29.drv /gnu/store/p0knl9ggxk91x87ww702g2x78jxy1vgf-glibc-utf8-locales-2.29 ludo@ribbon ~/src/guix$ guix size /gnu/store/p0knl9ggxk91x87ww702g2x78jxy1v= gf-glibc-utf8-locales-2.29 | tail -1 total: 855.7 MiB --8<---------------cut here---------------end--------------->8--- So I think that=E2=80=99s when we reached the conclusion that we needed parameterized packages to allow users to choose the locale(s) they need or special support in =E2=80=98guix package=E2=80=99. :-/ Attached is the list of supported UTF-8 locales, 312 in total. Thoughts? How do other distros deal with this? Are we missing some trick to compress locale data? Ludo=E2=80=99. --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm index e8150708c0..98b413da13 100644 --- a/gnu/packages/base.scm +++ b/gnu/packages/base.scm @@ -1,5 +1,5 @@ ;;; GNU Guix --- Functional package management for GNU -;;; Copyright =C2=A9 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019 Ludovi= c Court=C3=A8s +;;; Copyright =C2=A9 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 = Ludovic Court=C3=A8s ;;; Copyright =C2=A9 2014, 2019 Andreas Enge ;;; Copyright =C2=A9 2012 Nikita Karetnikov ;;; Copyright =C2=A9 2014, 2015, 2016, 2018 Mark H Weaver @@ -52,6 +52,8 @@ #:use-module (gnu packages python) #:use-module (gnu packages gettext) #:use-module (guix utils) + #:use-module (guix gexp) + #:use-module (guix modules) #:use-module (guix packages) #:use-module (guix download) #:use-module (guix git-download) @@ -61,6 +63,8 @@ #:use-module (srfi srfi-1) #:use-module (srfi srfi-26) #:export (glibc + %default-utf8-locales + make-glibc-utf8-locales libiconv-if-needed)) =20 ;;; Commentary: @@ -1076,7 +1080,12 @@ to the @code{share/locale} sub-directory of this pac= kage.") ,(version-major+minor (package-version glibc))))))))))) =20 -(define-public (make-glibc-utf8-locales glibc) +(define %default-utf8-locales + '("de_DE" "el_GR" "en_US" "fr_FR" "tr_TR")) + +(define* (make-glibc-utf8-locales glibc #:optional + (locales %default-utf8-locales) + (locale-file #f)) (package (name "glibc-utf8-locales") (version (package-version glibc)) @@ -1115,10 +1124,17 @@ to the @code{share/locale} sub-directory of this pa= ckage.") =20 ;; These are the locales commonly used for ;; tests---e.g., in Guile's i18n tests. - '("de_DE" "el_GR" "en_US" "fr_FR" "tr_TR")) + ,(if locale-file + `(call-with-input-file + (assoc-ref %build-inputs "locale-= file") + read) + `',locales)) #t)))) (native-inputs `(("glibc" ,glibc) - ("gzip" ,gzip))) + ("gzip" ,gzip) + ,@(if locale-file + `(("locale-file" ,locale-file)) + '()))) (synopsis "Small sample of UTF-8 locales") (description "This package provides a small sample of UTF-8 locales mostly useful = in @@ -1145,6 +1161,40 @@ test environments.") (define-public glibc-locales-2.27 (deprecated-package "glibc-locales-2.27" glibc-locales-2.28)) =20 +(define (glibc-supported-locales libc) + ((module-ref (resolve-interface '(gnu system locale)) ;FIXME: hack + 'glibc-supported-locales) + libc)) + +(define* (make-glibc-utf8-locales/full #:optional (glibc glibc)) + (define utf8-locales + (computed-file "glibc-supported-utf8-locales.scm" + #~(begin + (use-modules (srfi srfi-1) + (ice-9 match) + (ice-9 pretty-print)) + + (define locales + (call-with-input-file + #+(glibc-supported-locales glibc) + read)) + + (define utf8-locales + (filter-map (match-lambda + ((name . "UTF-8") + (if (string-suffix? ".UTF-8" name) + (string-drop-right name 6) + name)) + (_ #f)) + locales)) + + (call-with-output-file #$output + (lambda (port) + (pretty-print utf8-locales port)))))) + + (make-glibc-utf8-locales glibc #:locale-file utf8-locales)) + + (define-public which (package (name "which") --=-=-= Content-Type: text/plain Content-Disposition: attachment ("aa_DJ" "aa_ER" "aa_ER@saaho" "aa_ET" "af_ZA" "agr_PE" "ak_GH" "am_ET" "an_ES" "anp_IN" "ar_AE" "ar_BH" "ar_DZ" "ar_EG" "ar_IN" "ar_IQ" "ar_JO" "ar_KW" "ar_LB" "ar_LY" "ar_MA" "ar_OM" "ar_QA" "ar_SA" "ar_SD" "ar_SS" "ar_SY" "ar_TN" "ar_YE" "ayc_PE" "az_AZ" "az_IR" "as_IN" "ast_ES" "be_BY" "be_BY@latin" "bem_ZM" "ber_DZ" "ber_MA" "bg_BG" "bhb_IN" "bho_IN" "bho_NP" "bi_VU" "bn_BD" "bn_IN" "bo_CN" "bo_IN" "br_FR" "brx_IN" "bs_BA" "byn_ER" "ca_AD" "ca_ES" "ca_ES@valencia" "ca_FR" "ca_IT" "ce_RU" "chr_US" "cmn_TW" "crh_UA" "cs_CZ" "csb_PL" "cv_RU" "cy_GB" "da_DK" "de_AT" "de_BE" "de_CH" "de_DE" "de_IT" "de_LI" "de_LU" "doi_IN" "dsb_DE" "dv_MV" "dz_BT" "el_GR" "el_CY" "en_AG" "en_AU" "en_BW" "en_CA" "en_DK" "en_GB" "en_HK" "en_IE" "en_IL" "en_IN" "en_NG" "en_NZ" "en_PH" "en_SC" "en_SG" "en_US" "en_ZA" "en_ZM" "en_ZW" "eo" "es_AR" "es_BO" "es_CL" "es_CO" "es_CR" "es_CU" "es_DO" "es_EC" "es_ES" "es_GT" "es_HN" "es_MX" "es_NI" "es_PA" "es_PE" "es_PR" "es_PY" "es_SV" "es_US" "es_UY" "es_VE" "et_EE" "eu_ES" "fa_IR" "ff_SN" "fi_FI" "fil_PH" "fo_FO" "fr_BE" "fr_CA" "fr_CH" "fr_FR" "fr_LU" "fur_IT" "fy_NL" "fy_DE" "ga_IE" "gd_GB" "gez_ER" "gez_ER@abegede" "gez_ET" "gez_ET@abegede" "gl_ES" "gu_IN" "gv_GB" "ha_NG" "hak_TW" "he_IL" "hi_IN" "hif_FJ" "hne_IN" "hr_HR" "hsb_DE" "ht_HT" "hu_HU" "hy_AM" "ia_FR" "id_ID" "ig_NG" "ik_CA" "is_IS" "it_CH" "it_IT" "iu_CA" "ja_JP" "ka_GE" "kab_DZ" "kk_KZ" "kl_GL" "km_KH" "kn_IN" "ko_KR" "kok_IN" "ks_IN" "ks_IN@devanagari" "ku_TR" "kw_GB" "ky_KG" "lb_LU" "lg_UG" "li_BE" "li_NL" "lij_IT" "ln_CD" "lo_LA" "lt_LT" "lv_LV" "lzh_TW" "mag_IN" "mai_IN" "mai_NP" "mfe_MU" "mg_MG" "mhr_RU" "mi_NZ" "miq_NI" "mjw_IN" "mk_MK" "ml_IN" "mn_MN" "mni_IN" "mr_IN" "ms_MY" "mt_MT" "my_MM" "nan_TW" "nan_TW@latin" "nb_NO" "nds_DE" "nds_NL" "ne_NP" "nhn_MX" "niu_NU" "niu_NZ" "nl_AW" "nl_BE" "nl_NL" "nn_NO" "nr_ZA" "nso_ZA" "oc_FR" "om_ET" "om_KE" "or_IN" "os_RU" "pa_IN" "pa_PK" "pap_AW" "pap_CW" "pl_PL" "ps_AF" "pt_BR" "pt_PT" "quz_PE" "raj_IN" "ro_RO" "ru_RU" "ru_UA" "rw_RW" "sa_IN" "sah_RU" "sat_IN" "sc_IT" "sd_IN" "sd_IN@devanagari" "se_NO" "sgs_LT" "shn_MM" "shs_CA" "si_LK" "sid_ET" "sk_SK" "sl_SI" "sm_WS" "so_DJ" "so_ET" "so_KE" "so_SO" "sq_AL" "sq_MK" "sr_ME" "sr_RS" "sr_RS@latin" "ss_ZA" "st_ZA" "sv_FI" "sv_SE" "sw_KE" "sw_TZ" "szl_PL" "ta_IN" "ta_LK" "tcy_IN" "te_IN" "tg_TJ" "th_TH" "the_NP" "ti_ER" "ti_ET" "tig_ER" "tk_TM" "tl_PH" "tn_ZA" "to_TO" "tpi_PG" "tr_CY" "tr_TR" "ts_ZA" "tt_RU" "tt_RU@iqtelif" "ug_CN" "uk_UA" "unm_US" "ur_IN" "ur_PK" "uz_UZ" "uz_UZ@cyrillic" "ve_ZA" "vi_VN" "wa_BE" "wae_CH" "wal_ET" "wo_SN" "xh_ZA" "yi_US" "yo_NG" "yue_HK" "yuw_PG" "zh_CN" "zh_HK" "zh_SG" "zh_TW" "zu_ZA") --=-=-=--