From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: "pelzflorian (Florian Pelz)" Newsgroups: gmane.lisp.guile.devel,gmane.lisp.guile.user Subject: Re: Website translations with Haunt Date: Sat, 16 Dec 2017 20:30:41 +0100 Message-ID: <20171216193041.s6hxyf7u2lb5ihso@floriannotebook> References: <20171209180619.GA10254@floriannotebook.localdomain> <7e3a3109-25c5-ed60-9a7b-ee1a656de87e@zoho.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="qajtc45la2ji4jdf" X-Trace: blaine.gmane.org 1513452648 11965 195.159.176.226 (16 Dec 2017 19:30:48 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 16 Dec 2017 19:30:48 +0000 (UTC) User-Agent: NeoMutt/20171208 Cc: Guile User , guile-devel To: sirgazil Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Dec 16 20:30:42 2017 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eQIAU-00029m-6k for guile-devel@m.gmane.org; Sat, 16 Dec 2017 20:30:42 +0100 Original-Received: from localhost ([::1]:51801 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQIAW-0005gL-BB for guile-devel@m.gmane.org; Sat, 16 Dec 2017 14:30:44 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43319) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQIAQ-0005gA-9D for guile-devel@gnu.org; Sat, 16 Dec 2017 14:30:40 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQIAL-00084g-Ag for guile-devel@gnu.org; Sat, 16 Dec 2017 14:30:38 -0500 Original-Received: from pelzflorian.de ([5.45.111.108]:42904 helo=mail.pelzflorian.de) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eQIAK-0007yN-Km; Sat, 16 Dec 2017 14:30:33 -0500 Original-Received: from floriannotebook (ip5b431f77.dynamic.kabel-deutschland.de [91.67.31.119]) by mail.pelzflorian.de (Postfix) with ESMTPSA id B3863360007; Sat, 16 Dec 2017 20:30:30 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=pelzflorian.de; s=mail; t=1513452630; bh=4sC9jLNMk1XScOLjavQM9txD3kN8p+cHCk23bfmGB3Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=uuDLJ5Bec3YtsMf/ZqrSxH7lxpO6idCLpsNgX/yPiPAxDb19ZFdwF7vKR4tBZAsR/ IopKs9xHJV/3IfMJayP6G0OZX/A0u4LcbkhHpf6zvFlzVXbtv6XYAJo8XjxngdyApf J3ewx+EyWJC4JbqXtY6wm4hiFyUADbqiV6MeOGAk= Content-Disposition: inline In-Reply-To: <7e3a3109-25c5-ed60-9a7b-ee1a656de87e@zoho.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 5.45.111.108 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.org gmane.lisp.guile.devel:19441 gmane.lisp.guile.user:14375 Archived-At: --qajtc45la2ji4jdf Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Dec 16, 2017 at 10:26:12AM -0500, sirgazil wrote: > I'm very interested on this subject because I help with Guile and Guix > websites, and I usually work with multilingual websites. I have no idea of > what would be the right way to do i18n of websites written in Scheme, > though. So I will just join this conversation as a potential user of your > solutions :) >=20 :) >=20 > > I did not want to use the ordinary gettext functions in order to not > > call setlocale very often to switch languages. It seems the Gettext > > system is not designed for rapidly changing locales, but maybe I am > > wrong about this and very many setlocale calls would not be that bad. >=20 >=20 > For what is worth, I use ordinary gettext and `setlocale` in my website, > which is not Haunt-based, but it is Guile Scheme and statically generated > too. So far, it works ok. >=20 Performance is what motivated me to avoid repeated setlocale calls. I now measured the impact of my approach and for my website, repeatedly calling setlocale and gettext is actually slightly faster than transforming a po file into an associative list and assoc-ref=E2=80=99ing t= he list. Only when using the same msgid very many times, transforming the po file gets faster. So it is probably best *not* to add ffi-helper to Haunt after all and just use Gettext because while repeated setlocale is bad in theory, it is faster in practice for normal websites and it does not really matter much anyway. Then again, for long running applications, not using setlocale is better. If you want detailed timings, read on, otherwise feel free to skip to the end of this e-mail. For my German and English website with the code at https://pelzflorian.de/git/pelzfloriande-website/ this is the result of timing my current approach, which avoids repeated setlocale and standard gettext calls but instead uses libgettextpo to create an association list of msgids and msgstrs from the respective po files (i.e. not from compiled mo files). I put #!/bin/sh GUILE_LOAD_PATH=3D$GUILE_LOAD_PATH:$HOME/keep/projects/pelzfloriande-websit= e:$HOME/build/nyacc/src/nyacc/examples LD_LIBRARY_PATH=3D/gnu/store/0jjgg2b= k6qmx87sdksm7bd2b3z10yd6j-gettext-0.19.8.1/lib haunt build inside a file called launch.sh. I then ran $ time ./launch.sh [=E2=80=A6] =2E/launch.sh 2.43s user 0.33s system 83% cpu 3.317 total =2E/launch.sh 2.47s user 0.33s system 103% cpu 2.703 total =2E/launch.sh 2.43s user 0.36s system 103% cpu 2.700 total =2E/launch.sh 2.56s user 0.33s system 103% cpu 2.783 total When instead not loading gettext-po, ffi-help-rt and Guile=E2=80=99s system foreign modules, but running msgfmt to transform the po files to mo files, moving them to ./de/LC_MESSAGES/pelzfloriande.mo and just using standard Gettext and setlocale with (bindtextdomain "pelzfloriande" "/home/florian/keep/projects/pelzfloriande-= website") (bind-textdomain-codeset "pelzfloriande" "UTF-8") (textdomain "pelzfloriande") (define (locale-for-lingua lingua) (assoc-ref '(("de" . "de_DE.UTF-8") ("en" . "en_US.UTF-8")) lingua)) (define (translated-msg msgid lingua) (begin (setlocale LC_ALL (locale-for-lingua lingua)) (gettext msgid))) I got the following measurements and verified that the translation is still working: $ time ./launch.sh =20 building pages in 'site'... copying asset 'css/common.css' =E2=86=92 '/css/common.css' [=E2=80=A6] =2E/launch.sh 2.01s user 0.29s system 102% cpu 2.241 total For multiple runs: =2E/launch.sh 2.01s user 0.29s system 102% cpu 2.241 total =2E/launch.sh 2.06s user 0.31s system 102% cpu 2.302 total =2E/launch.sh 2.15s user 0.33s system 104% cpu 2.387 total =2E/launch.sh 1.99s user 0.32s system 102% cpu 2.246 total When using setlocale but only when the lingua has changed from the last call to _: (define old-lingua "") (define (translated-msg msgid lingua) (begin (if (not (equal? old-lingua lingua)) (begin (setlocale LC_ALL (locale-for-lingua lingua)) (set! old-lingua lingua))) (gettext msgid))) =2E/launch.sh 2.10s user 0.32s system 103% cpu 2.332 total =2E/launch.sh 2.03s user 0.31s system 102% cpu 2.283 total =2E/launch.sh 2.11s user 0.36s system 102% cpu 2.408 total =2E/launch.sh 2.05s user 0.30s system 102% cpu 2.296 total When adding the following in a div: ,@(let loop ((i 0)) (if (< i 10000) (cons (_ "Home page") (loop (1+ i))) '())) and verifying it is correctly translated, this is the result for my implementation: =2E/launch.sh 4.48s user 0.33s system 89% cpu 5.356 total =2E/launch.sh 4.52s user 0.36s system 102% cpu 4.737 total =2E/launch.sh 4.44s user 0.38s system 104% cpu 4.619 total =2E/launch.sh 4.49s user 0.40s system 103% cpu 4.735 total With a setlocale call for each _: =2E/launch.sh 4.46s user 0.36s system 101% cpu 4.736 total =2E/launch.sh 4.64s user 0.39s system 103% cpu 4.875 total =2E/launch.sh 4.65s user 0.33s system 103% cpu 4.838 total =2E/launch.sh 4.66s user 0.33s system 103% cpu 4.833 total This is the result for a cached setlocale call: =2E/launch.sh 4.39s user 0.37s system 102% cpu 4.624 total =2E/launch.sh 4.17s user 0.32s system 88% cpu 5.086 total =2E/launch.sh 4.09s user 0.32s system 102% cpu 4.276 total =2E/launch.sh 4.16s user 0.35s system 103% cpu 4.345 total When adding the following in the div instead ,@(let loop ((i 0)) (if (< i 10000) (cons (let ((current-lingua "de")) (_ "Home page")) (cons (let ((current-lingua "en")) (_ "Home page")) (loop (1+ i)))) '())) this is my current implementation =2E/launch.sh 6.36s user 0.36s system 99% cpu 6.733 total =2E/launch.sh 6.34s user 0.34s system 103% cpu 6.470 total =2E/launch.sh 6.00s user 0.39s system 103% cpu 6.195 total this is without caching setlocale =2E/launch.sh 8.74s user 0.38s system 101% cpu 8.986 total =2E/launch.sh 8.70s user 0.36s system 102% cpu 8.872 total =2E/launch.sh 8.86s user 0.40s system 99% cpu 9.300 total this is with caching setlocale =2E/launch.sh 8.95s user 0.37s system 93% cpu 9.979 total =2E/launch.sh 8.60s user 0.39s system 103% cpu 8.712 total =2E/launch.sh 8.81s user 0.34s system 95% cpu 9.581 total In this contrived example, my implementation is faster. Note that my implementation may or may not be slower when not using the same translation very often but instead using a longer PO file. > For internationalization, I know the convention is to use _, but I don't > like that, so I use the alias l10n instead. > We should definitely let the user define the syntax like in the Guile manual. If you want l10n, then use l10n, which is less confusing when using _ for pattern matching. But I will stick to _ for my website. > For internationalizing complex blocks that should not be translated in > fragments, like: >=20 >=20 > `(p "Hi! I play " > (a (@ (href ,sport-url)) ,(l10n "futsal")) > " in " > (a (@ (href ,place-url)) ,(l10n "Tokyo"))) >=20 >=20 > I had to write a procedure I call `interleave` that I use like this: >=20 >=20 > `(p > ,@(interleave (l10n "Hi! I play ~SPORT~ in ~PLACE~.") > `(a (@ (href ,sport-url)) ,(l10n "futsal")) > `(a (@ (href ,place-url)) ,(l10n "Tokyo")))) >=20 >=20 > So, in the translation catalogs, translators will see the strings: >=20 >=20 > "Hi! I play ~SPORT~ in ~PLACE~." > "futsal" > "Tokyo" >=20 >=20 This interleaving is like a format string and is common in applications, but it separates the value of ~SPORT~ from the context in which it should be translated. I prefer my approach with multi-part translations with ,@(__ "This is a ||em_|multi-part translation||." `(("em_" . ,(lambda (text) `(em ,text))))) > Currently, I use xgettext manually and Poedit for working with translation > catalogs, but I'd like to manage translations in the future like this > (replace `site` with `haunt`): >=20 >=20 > # Create new translation catalogs for Finnish and Japanese. > $ site catalog-new fi ja >=20 > # Update translation catalogs with new translation strings. > $ site catalog-update >=20 > # Compile translation catalogs (generate .mo files) > $ site catalog-compile >=20 > Yes. This is a good user interface. Maybe this should be part of the haunt command and not require a build system after all=E2=80=A6 > To be fully localized, I also have to pass IETF Language Tags around in t= he > website code, so that I get the right content when rendering the templates > in a given language. >=20 > My 2=C2=A2 >=20 Yes, me too. I wonder if this should be wrapped into custom syntax maybe like the Guix store in G-expressions, but I=E2=80=99m not sure. Regards, Florian --qajtc45la2ji4jdf Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEwRjGsqAMqXB4uw3y3T6EbElHBVsFAlo1dFwACgkQ3T6EbElH BVtybxAApgXhjHJMaldGECCCuacgw2tYJYtsnBiSo1MV9Ev7zLWO7u/kB4VDv4Ov Cmb5w/g5NScW6gdZXMLosVqKgsLdoePib1gYmfAZiIo5PQgJGd/NS0vCxHkVvSIr uHRQAMB5VpsiY4e899i6rAojRe8mE5P4DFJWoqWFjRDpghCRV6scXNYSz9oVL1FO rDoGXWIUcI65K2AFg3jnOFKAB6odo3Id9ddrZwiJ4ygHDxcabJbTQgi11LNpecpy RQ7BPNRtvR9b1MvgDFjOnZAKkm93LqtZtx3WCi6zkOwFIWkJIQPKbiiuLQTUNuVw pgg7uhpB/DMicjj27tiRzXqOHEefnHWMPIlqbJUUh89RpN8qhBQuYTRe17cX3JjG 6b9buJi/FButGJoUhs7mLiqQ1tNtNLeWh11WETt6gJw867qr6Y8Wj+NdTnuJ/sqT 0G+KMf90V280DtyhVQjEnFxuzpHvLCV7LGczRAZfYX9QHK5o3XdNEXxbG5L06OEN WsxMK/ojeI3YcgfOalWEs4kgEg/KqBMIwOm0uKudG9phuDpzE+3PxKpiY++X3bCw eteaQY/3wRCbd4RDu6I2wOoomKgI9FczN0HxnCo85kyZeQbiLWc4WsGTTRhNP3N+ eQKu1wHjW7yCOdEaqhf0WN75qt/PqnxfrNCD7WW3IEfTDk2zC50= =6A8V -----END PGP SIGNATURE----- --qajtc45la2ji4jdf--