unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
@ 2021-02-27  2:18 ylc991
  2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: ylc991 @ 2021-02-27  2:18 UTC (permalink / raw)
  To: 46807

[-- Attachment #1: Type: text/html, Size: 522 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
  2021-02-27  2:18 bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh' ylc991
@ 2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
  2021-03-04 11:03   ` pelzflorian (Florian Pelz)
  2021-02-27 12:34 ` Julien Lepiller
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Tobias Geerinckx-Rice via Bug reports for GNU Guix @ 2021-02-27 12:31 UTC (permalink / raw)
  To: ylc991; +Cc: 46807

[-- Attachment #1: Type: text/plain, Size: 3495 bytes --]

Ylc991,

Thanks for the report!

My verbose notes so far; I need to (finally!) set up a local build 
of the Web site first.

ylc991 写道:
> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by 
> default, and https://guix.gnu.org returns 404.

Indeed, handling of zh-CN specifically is broken.  :-(

--8<---------------cut here---------------start------------->8---
~ λ curl -LI -H 'Accept-Language: zh-cn' https://guix.gnu.org
HTTP/1.1 404 Not Found
[...]
--8<---------------cut here---------------end--------------->8---

This is because our nginx configuration 
(maintenance/hydra/nginx/berlin.scm) does:

--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-CN;
[...]
try_files $uri /$lang/$uri /$lang/$uri/index.html =404;
--8<---------------cut here---------------end--------------->8---

i.e., it looks in /srv/guix.gnu.org/zh-CN, but our website uses...

--8<---------------cut here---------------start------------->8---
nckx@berlin ~$ ls -d /srv/guix.gnu.org/zh*
/srv/guix.gnu.org/zh-cn/
--8<---------------cut here---------------end--------------->8---

...lowercase.  This questionable choice comes from 
artwork/po/ietf-tags.scm:

--8<---------------cut here---------------start------------->8---
;;; This file contains an association list for each translation 
    from
;;; the locale to an IETF language tag to be used in the URL path 
    of
;;; translated pages.  The language tag results from the 
    translation
;;; team<E2><80><99>s language code from
;;; <https://translationproject.org/team/index.html>.  The 
    underscore
;;; in the team<E2><80><99>s code is replaced by a hyphen.  For 
    example, az would
;;; be used for the Azerbaijani language (not az-Latn) and zh-CN 
    would
;;; be used for mainland Chinese (not zh-Hans-CN)
([...]
 ("zh_CN" . "zh-cn"))
--8<---------------cut here---------------end--------------->8---

Questionable only because, while a lowercase region is technically 
valid, it's so rare that it's likely to cause problems -- as we 
found out.

> I have tested with curl, 'zh-CN,zh', 'zh-CN', [is 404]

These are valid, so the nginx accept-language module accepts them, 
but then looks for a subdirectory that doesn't exist and returns 
404.

> 'zh-cn' is 404

This is valid, but since we configure the accept-language module 
to use ‘zh-CN’ it normalises $lang to the latter.  Which is good, 
but it causes the same 404 as above.

> 'zh_CN' is 200.

This is bogus (‘_’ is not valid), hence ignored, and so the site 
falls back to English 200.

> 'zh' [is 200]

Valid but the accept-language module is not clever; we need to add 
an explicit 'zh' entry for that to work:

--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-CN zh en;
--8<---------------cut here---------------end--------------->8---

I expect that adding it and changing ietf-tags.scm to use "zh-CN" 
will fix both 404s, but need to check that it doesn't break 
anything else.

The other untested solution is using lowercase

--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-cn zh en;
--8<---------------cut here---------------end--------------->8---

but I--assuming that even works--'m not fond of making the 
unconventional the norm.

Kind regards,

T G-R

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 247 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
  2021-02-27  2:18 bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh' ylc991
  2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
@ 2021-02-27 12:34 ` Julien Lepiller
  2021-03-01 10:06 ` Ludovic Courtès
  2021-03-05 10:03 ` YLC
  3 siblings, 0 replies; 10+ messages in thread
From: Julien Lepiller @ 2021-02-27 12:34 UTC (permalink / raw)
  To: 46807, ylc991

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

It might be related to translations. When you use zh-cn, we have a translation for that language, so you're redirected to it. Not sure why you get a 404 though.

Le 26 février 2021 21:18:12 GMT-05:00, ylc991 <ylc991@163.com> a écrit :
>Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
>default, and https://guix.gnu.org returns 404. I have tested with curl,
>'zh-CN,zh', 'zh-CN', 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.
>
>
>The first time I found it is on 2021-02-23. And it didn't happened
>about one or two months ago. I think there may be something wrong with
>the web server.

[-- Attachment #2: Type: text/html, Size: 934 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
  2021-02-27  2:18 bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh' ylc991
  2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
  2021-02-27 12:34 ` Julien Lepiller
@ 2021-03-01 10:06 ` Ludovic Courtès
  2021-03-01 10:49   ` pelzflorian (Florian Pelz)
  2021-03-05 11:54   ` pelzflorian (Florian Pelz)
  2021-03-05 10:03 ` YLC
  3 siblings, 2 replies; 10+ messages in thread
From: Ludovic Courtès @ 2021-03-01 10:06 UTC (permalink / raw)
  To: ylc991; +Cc: 46807

Hello,

ylc991 <ylc991@163.com> skribis:

> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by default, and https://guix.gnu.org returns 404. I have tested with curl, 'zh-CN,zh', 'zh-CN',
> 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.

Florian, could it be that we’re not normalizing language tags
appropriately?  Does that ring a bell?

Thanks for your report!

Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
  2021-03-01 10:06 ` Ludovic Courtès
@ 2021-03-01 10:49   ` pelzflorian (Florian Pelz)
  2021-03-05 11:54   ` pelzflorian (Florian Pelz)
  1 sibling, 0 replies; 10+ messages in thread
From: pelzflorian (Florian Pelz) @ 2021-03-01 10:49 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: ylc991, 46807

Hello,

On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
> Florian, could it be that we’re not normalizing language tags
> appropriately?  Does that ring a bell?

Tobias’ analysis likely is correct.  I haven’t yet build a current
berlin virtual machine to test though.

We’re not normalizing language tags at all currently.  Doing URL
redirects in nginx confuses me greatly; I have no idea how to
concisely specify redirects *and* have them execute in the right
order.  The many lines

(redirect "/blog/2006/purely-functional-software-deployment-model" "/$lang/blog/2006/purely-functional-software-deployment-model/")

and similar in maintenance.git’s hydra/nginx/berlin.scm file are a bad
solution and are testament to my confusion.  I would not like one line
for each package.

Regards,
Florian




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
  2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
@ 2021-03-04 11:03   ` pelzflorian (Florian Pelz)
  0 siblings, 0 replies; 10+ messages in thread
From: pelzflorian (Florian Pelz) @ 2021-03-04 11:03 UTC (permalink / raw)
  To: Tobias Geerinckx-Rice; +Cc: ylc991, 46807

On Sat, Feb 27, 2021 at 01:31:40PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
> I expect that adding it and changing ietf-tags.scm to use "zh-CN" will fix
> both 404s, but need to check that it doesn't break anything else.

I made the tiny change to guix-artwork’s ietf-tags.scm as
04c96a370b8cae48ed162e4414b8950cc65c513b now (sorry for taking so
long):

diff --git a/website/po/ietf-tags.scm b/website/po/ietf-tags.scm
index 32b81ef..5bd22f4 100644
--- a/website/po/ietf-tags.scm
+++ b/website/po/ietf-tags.scm
@@ -10,4 +10,4 @@
  ("de_DE" . "de")
  ("es_ES" . "es")
  ("fr_FR" . "fr")
- ("zh_CN" . "zh-cn"))
+ ("zh_CN" . "zh-CN"))

Note that the prior zh-cn URLs will be broken.

I will play around with nginx’ map directive to make zh-cn and zh
Accept-Language settings direct to the proper URL later, afterwards I
will close this bug.  zh-cn URLs remain invalid.  Links to the manual
continue to use zh-cn.

For testing I dug out the VM code
<https://lists.gnu.org/archive/html/bug-guix/2020-04/msg00195.html>
where I had removed parts of berlin that are not relevant to the
website.  The change breaks neither website nor manual.

Thanks ylc991 for the report!

Regards,
Florian




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
  2021-02-27  2:18 bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh' ylc991
                   ` (2 preceding siblings ...)
  2021-03-01 10:06 ` Ludovic Courtès
@ 2021-03-05 10:03 ` YLC
  3 siblings, 0 replies; 10+ messages in thread
From: YLC @ 2021-03-05 10:03 UTC (permalink / raw)
  To: 46807

Thank you for your help! Everything goes fine now.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
  2021-03-01 10:06 ` Ludovic Courtès
  2021-03-01 10:49   ` pelzflorian (Florian Pelz)
@ 2021-03-05 11:54   ` pelzflorian (Florian Pelz)
  2021-03-08 13:27     ` Ludovic Courtès
  1 sibling, 1 reply; 10+ messages in thread
From: pelzflorian (Florian Pelz) @ 2021-03-05 11:54 UTC (permalink / raw)
  To: 46807

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

Hello all,

On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
> Florian, could it be that we’re not normalizing language tags
> appropriately?  Does that ring a bell?

The attached patch to maintenance.git fixes the remaining minor issue:
Now Accept-Language language codes get normalized, zh to zh-CN, so web
browsers requesting any kind of Chinese get the website in mainland
Chinese.  (This is a minor issue.  The only valid URL is /zh-CN/ since
my last patch to guix-artwork because I don’t know how to
rewrite/redirect URLs in nginx.)

The patch was tested on a berlin VM.

There is no copyright header in maintenance.git’s
hydra/nginx/berlin.scm so I did not add a copyright.  I hereby license
the patch CC0
<https://creativecommons.org/publicdomain/zero/1.0/legalcode>.

Shall I just push?  A reconfigure of berlin will be necessary but is
not urgent.

Regards,
Florian

[-- Attachment #2: 0001-nginx-berlin-Normalize-Accept-Language-language-code.patch --]
[-- Type: text/plain, Size: 2333 bytes --]

From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Thu, 4 Mar 2021 20:29:27 +0100
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: [PATCH] nginx: berlin: Normalize Accept-Language language code zh to
 zh-CN.

Now web browsers requesting any kind of Chinese get the website in
mainland Chinese.

zh, zh-Hans, zh-Hans-CN all are synonymous with zh-CN now.

* hydra/nginx/berlin.scm (accept-languages): New procedure.
(%extra-content): Normalize $lang variable with it.
---
 hydra/nginx/berlin.scm | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/hydra/nginx/berlin.scm b/hydra/nginx/berlin.scm
index 85aaf38..4b9d297 100644
--- a/hydra/nginx/berlin.scm
+++ b/hydra/nginx/berlin.scm
@@ -995,12 +995,37 @@ PUBLISH-URL."
        (uri "~ /(.*)")
        (body (list "return 301 $scheme://guixwl.org/$1;"))))))))
 
+(define (accept-languages language-lists)
+  "Returns nginx configuration code to set up the $lang variable
+according to the Accept-Language header in the HTTP request.  The
+requesting user agent will be served the files at /$lang/some/url.
+Each list in LANGUAGE-LISTS starts with the $lang and is followed by
+synonymous IETF language tags that should be mapped to the same $lang."
+  (define (language-mappings language-list)
+    (define (language-mapping language)
+      (string-join (list "    "  language (car language-list) ";")))
+    (string-join (map language-mapping language-list) "\n"))
+
+  (let ((directives
+         `(,(string-join
+             `("set_from_accept_language $lang_unmapped"
+               ,@(map string-join language-lists)
+               ";"))
+           "map $lang_unmapped $lang {"
+           ,@(map language-mappings language-lists)
+           "}")))
+    (string-join directives "\n")))
+
 (define %extra-content
   (list
    "default_type  application/octet-stream;"
    "sendfile        on;"
 
-   "set_from_accept_language $lang en de es fr zh-CN;"
+   (accept-languages '(("en")
+                       ("de")
+                       ("es")
+                       ("fr")
+                       ("zh-CN" "zh" "zh-Hans" "zh-Hans-CN")))
 
    ;; Maximum chunk size to send.  Partly this is a workaround for
    ;; <http://bugs.gnu.org/19939>, but also the nginx docs mention that
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
  2021-03-05 11:54   ` pelzflorian (Florian Pelz)
@ 2021-03-08 13:27     ` Ludovic Courtès
  2021-03-11  0:01       ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2021-03-08 13:27 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: 46807

Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> The attached patch to maintenance.git fixes the remaining minor issue:
> Now Accept-Language language codes get normalized, zh to zh-CN, so web
> browsers requesting any kind of Chinese get the website in mainland
> Chinese.  (This is a minor issue.  The only valid URL is /zh-CN/ since
> my last patch to guix-artwork because I don’t know how to
> rewrite/redirect URLs in nginx.)
>
> The patch was tested on a berlin VM.

Yay!

> There is no copyright header in maintenance.git’s
> hydra/nginx/berlin.scm so I did not add a copyright.  I hereby license
> the patch CC0
> <https://creativecommons.org/publicdomain/zero/1.0/legalcode>.

Good point; I guess it was meant to be GPLv3+ like the rest, but thanks
for clarifying.

> Shall I just push?  A reconfigure of berlin will be necessary but is
> not urgent.

Yes, sounds good!

We’ll reconfigure sooner or later, just ping if you don’t see it happen
within two weeks or so.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
  2021-03-08 13:27     ` Ludovic Courtès
@ 2021-03-11  0:01       ` pelzflorian (Florian Pelz)
  0 siblings, 0 replies; 10+ messages in thread
From: pelzflorian (Florian Pelz) @ 2021-03-11  0:01 UTC (permalink / raw)
  To: 46807-done

Pushed to maintenance.git as 82b075685b6089c7f98acb0993c003936d833776.

Closing.  Thank you all!




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-03-11  0:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-27  2:18 bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh' ylc991
2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
2021-03-04 11:03   ` pelzflorian (Florian Pelz)
2021-02-27 12:34 ` Julien Lepiller
2021-03-01 10:06 ` Ludovic Courtès
2021-03-01 10:49   ` pelzflorian (Florian Pelz)
2021-03-05 11:54   ` pelzflorian (Florian Pelz)
2021-03-08 13:27     ` Ludovic Courtès
2021-03-11  0:01       ` pelzflorian (Florian Pelz)
2021-03-05 10:03 ` YLC

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).