* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
@ 2021-02-27 2:18 ylc991
2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: ylc991 @ 2021-02-27 2:18 UTC (permalink / raw)
To: 46807
[-- Attachment #1: Type: text/html, Size: 522 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
2021-02-27 2:18 bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh' ylc991
@ 2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
2021-03-04 11:03 ` pelzflorian (Florian Pelz)
2021-02-27 12:34 ` Julien Lepiller
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Tobias Geerinckx-Rice via Bug reports for GNU Guix @ 2021-02-27 12:31 UTC (permalink / raw)
To: ylc991; +Cc: 46807
[-- Attachment #1: Type: text/plain, Size: 3495 bytes --]
Ylc991,
Thanks for the report!
My verbose notes so far; I need to (finally!) set up a local build
of the Web site first.
ylc991 写道:
> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
> default, and https://guix.gnu.org returns 404.
Indeed, handling of zh-CN specifically is broken. :-(
--8<---------------cut here---------------start------------->8---
~ λ curl -LI -H 'Accept-Language: zh-cn' https://guix.gnu.org
HTTP/1.1 404 Not Found
[...]
--8<---------------cut here---------------end--------------->8---
This is because our nginx configuration
(maintenance/hydra/nginx/berlin.scm) does:
--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-CN;
[...]
try_files $uri /$lang/$uri /$lang/$uri/index.html =404;
--8<---------------cut here---------------end--------------->8---
i.e., it looks in /srv/guix.gnu.org/zh-CN, but our website uses...
--8<---------------cut here---------------start------------->8---
nckx@berlin ~$ ls -d /srv/guix.gnu.org/zh*
/srv/guix.gnu.org/zh-cn/
--8<---------------cut here---------------end--------------->8---
...lowercase. This questionable choice comes from
artwork/po/ietf-tags.scm:
--8<---------------cut here---------------start------------->8---
;;; This file contains an association list for each translation
from
;;; the locale to an IETF language tag to be used in the URL path
of
;;; translated pages. The language tag results from the
translation
;;; team<E2><80><99>s language code from
;;; <https://translationproject.org/team/index.html>. The
underscore
;;; in the team<E2><80><99>s code is replaced by a hyphen. For
example, az would
;;; be used for the Azerbaijani language (not az-Latn) and zh-CN
would
;;; be used for mainland Chinese (not zh-Hans-CN)
([...]
("zh_CN" . "zh-cn"))
--8<---------------cut here---------------end--------------->8---
Questionable only because, while a lowercase region is technically
valid, it's so rare that it's likely to cause problems -- as we
found out.
> I have tested with curl, 'zh-CN,zh', 'zh-CN', [is 404]
These are valid, so the nginx accept-language module accepts them,
but then looks for a subdirectory that doesn't exist and returns
404.
> 'zh-cn' is 404
This is valid, but since we configure the accept-language module
to use ‘zh-CN’ it normalises $lang to the latter. Which is good,
but it causes the same 404 as above.
> 'zh_CN' is 200.
This is bogus (‘_’ is not valid), hence ignored, and so the site
falls back to English 200.
> 'zh' [is 200]
Valid but the accept-language module is not clever; we need to add
an explicit 'zh' entry for that to work:
--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-CN zh en;
--8<---------------cut here---------------end--------------->8---
I expect that adding it and changing ietf-tags.scm to use "zh-CN"
will fix both 404s, but need to check that it doesn't break
anything else.
The other untested solution is using lowercase
--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-cn zh en;
--8<---------------cut here---------------end--------------->8---
but I--assuming that even works--'m not fond of making the
unconventional the norm.
Kind regards,
T G-R
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 247 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
2021-02-27 2:18 bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh' ylc991
2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
@ 2021-02-27 12:34 ` Julien Lepiller
2021-03-01 10:06 ` Ludovic Courtès
2021-03-05 10:03 ` YLC
3 siblings, 0 replies; 10+ messages in thread
From: Julien Lepiller @ 2021-02-27 12:34 UTC (permalink / raw)
To: 46807, ylc991
[-- Attachment #1: Type: text/plain, Size: 616 bytes --]
It might be related to translations. When you use zh-cn, we have a translation for that language, so you're redirected to it. Not sure why you get a 404 though.
Le 26 février 2021 21:18:12 GMT-05:00, ylc991 <ylc991@163.com> a écrit :
>Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
>default, and https://guix.gnu.org returns 404. I have tested with curl,
>'zh-CN,zh', 'zh-CN', 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.
>
>
>The first time I found it is on 2021-02-23. And it didn't happened
>about one or two months ago. I think there may be something wrong with
>the web server.
[-- Attachment #2: Type: text/html, Size: 934 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
2021-02-27 2:18 bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh' ylc991
2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
2021-02-27 12:34 ` Julien Lepiller
@ 2021-03-01 10:06 ` Ludovic Courtès
2021-03-01 10:49 ` pelzflorian (Florian Pelz)
2021-03-05 11:54 ` pelzflorian (Florian Pelz)
2021-03-05 10:03 ` YLC
3 siblings, 2 replies; 10+ messages in thread
From: Ludovic Courtès @ 2021-03-01 10:06 UTC (permalink / raw)
To: ylc991; +Cc: 46807
Hello,
ylc991 <ylc991@163.com> skribis:
> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by default, and https://guix.gnu.org returns 404. I have tested with curl, 'zh-CN,zh', 'zh-CN',
> 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.
Florian, could it be that we’re not normalizing language tags
appropriately? Does that ring a bell?
Thanks for your report!
Ludo’.
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
2021-03-01 10:06 ` Ludovic Courtès
@ 2021-03-01 10:49 ` pelzflorian (Florian Pelz)
2021-03-05 11:54 ` pelzflorian (Florian Pelz)
1 sibling, 0 replies; 10+ messages in thread
From: pelzflorian (Florian Pelz) @ 2021-03-01 10:49 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: ylc991, 46807
Hello,
On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
> Florian, could it be that we’re not normalizing language tags
> appropriately? Does that ring a bell?
Tobias’ analysis likely is correct. I haven’t yet build a current
berlin virtual machine to test though.
We’re not normalizing language tags at all currently. Doing URL
redirects in nginx confuses me greatly; I have no idea how to
concisely specify redirects *and* have them execute in the right
order. The many lines
(redirect "/blog/2006/purely-functional-software-deployment-model" "/$lang/blog/2006/purely-functional-software-deployment-model/")
and similar in maintenance.git’s hydra/nginx/berlin.scm file are a bad
solution and are testament to my confusion. I would not like one line
for each package.
Regards,
Florian
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
@ 2021-03-04 11:03 ` pelzflorian (Florian Pelz)
0 siblings, 0 replies; 10+ messages in thread
From: pelzflorian (Florian Pelz) @ 2021-03-04 11:03 UTC (permalink / raw)
To: Tobias Geerinckx-Rice; +Cc: ylc991, 46807
On Sat, Feb 27, 2021 at 01:31:40PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
> I expect that adding it and changing ietf-tags.scm to use "zh-CN" will fix
> both 404s, but need to check that it doesn't break anything else.
I made the tiny change to guix-artwork’s ietf-tags.scm as
04c96a370b8cae48ed162e4414b8950cc65c513b now (sorry for taking so
long):
diff --git a/website/po/ietf-tags.scm b/website/po/ietf-tags.scm
index 32b81ef..5bd22f4 100644
--- a/website/po/ietf-tags.scm
+++ b/website/po/ietf-tags.scm
@@ -10,4 +10,4 @@
("de_DE" . "de")
("es_ES" . "es")
("fr_FR" . "fr")
- ("zh_CN" . "zh-cn"))
+ ("zh_CN" . "zh-CN"))
Note that the prior zh-cn URLs will be broken.
I will play around with nginx’ map directive to make zh-cn and zh
Accept-Language settings direct to the proper URL later, afterwards I
will close this bug. zh-cn URLs remain invalid. Links to the manual
continue to use zh-cn.
For testing I dug out the VM code
<https://lists.gnu.org/archive/html/bug-guix/2020-04/msg00195.html>
where I had removed parts of berlin that are not relevant to the
website. The change breaks neither website nor manual.
Thanks ylc991 for the report!
Regards,
Florian
^ permalink raw reply related [flat|nested] 10+ messages in thread
* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
2021-02-27 2:18 bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh' ylc991
` (2 preceding siblings ...)
2021-03-01 10:06 ` Ludovic Courtès
@ 2021-03-05 10:03 ` YLC
3 siblings, 0 replies; 10+ messages in thread
From: YLC @ 2021-03-05 10:03 UTC (permalink / raw)
To: 46807
Thank you for your help! Everything goes fine now.
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
2021-03-01 10:06 ` Ludovic Courtès
2021-03-01 10:49 ` pelzflorian (Florian Pelz)
@ 2021-03-05 11:54 ` pelzflorian (Florian Pelz)
2021-03-08 13:27 ` Ludovic Courtès
1 sibling, 1 reply; 10+ messages in thread
From: pelzflorian (Florian Pelz) @ 2021-03-05 11:54 UTC (permalink / raw)
To: 46807
[-- Attachment #1: Type: text/plain, Size: 898 bytes --]
Hello all,
On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
> Florian, could it be that we’re not normalizing language tags
> appropriately? Does that ring a bell?
The attached patch to maintenance.git fixes the remaining minor issue:
Now Accept-Language language codes get normalized, zh to zh-CN, so web
browsers requesting any kind of Chinese get the website in mainland
Chinese. (This is a minor issue. The only valid URL is /zh-CN/ since
my last patch to guix-artwork because I don’t know how to
rewrite/redirect URLs in nginx.)
The patch was tested on a berlin VM.
There is no copyright header in maintenance.git’s
hydra/nginx/berlin.scm so I did not add a copyright. I hereby license
the patch CC0
<https://creativecommons.org/publicdomain/zero/1.0/legalcode>.
Shall I just push? A reconfigure of berlin will be necessary but is
not urgent.
Regards,
Florian
[-- Attachment #2: 0001-nginx-berlin-Normalize-Accept-Language-language-code.patch --]
[-- Type: text/plain, Size: 2333 bytes --]
From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Thu, 4 Mar 2021 20:29:27 +0100
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: [PATCH] nginx: berlin: Normalize Accept-Language language code zh to
zh-CN.
Now web browsers requesting any kind of Chinese get the website in
mainland Chinese.
zh, zh-Hans, zh-Hans-CN all are synonymous with zh-CN now.
* hydra/nginx/berlin.scm (accept-languages): New procedure.
(%extra-content): Normalize $lang variable with it.
---
hydra/nginx/berlin.scm | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/hydra/nginx/berlin.scm b/hydra/nginx/berlin.scm
index 85aaf38..4b9d297 100644
--- a/hydra/nginx/berlin.scm
+++ b/hydra/nginx/berlin.scm
@@ -995,12 +995,37 @@ PUBLISH-URL."
(uri "~ /(.*)")
(body (list "return 301 $scheme://guixwl.org/$1;"))))))))
+(define (accept-languages language-lists)
+ "Returns nginx configuration code to set up the $lang variable
+according to the Accept-Language header in the HTTP request. The
+requesting user agent will be served the files at /$lang/some/url.
+Each list in LANGUAGE-LISTS starts with the $lang and is followed by
+synonymous IETF language tags that should be mapped to the same $lang."
+ (define (language-mappings language-list)
+ (define (language-mapping language)
+ (string-join (list " " language (car language-list) ";")))
+ (string-join (map language-mapping language-list) "\n"))
+
+ (let ((directives
+ `(,(string-join
+ `("set_from_accept_language $lang_unmapped"
+ ,@(map string-join language-lists)
+ ";"))
+ "map $lang_unmapped $lang {"
+ ,@(map language-mappings language-lists)
+ "}")))
+ (string-join directives "\n")))
+
(define %extra-content
(list
"default_type application/octet-stream;"
"sendfile on;"
- "set_from_accept_language $lang en de es fr zh-CN;"
+ (accept-languages '(("en")
+ ("de")
+ ("es")
+ ("fr")
+ ("zh-CN" "zh" "zh-Hans" "zh-Hans-CN")))
;; Maximum chunk size to send. Partly this is a workaround for
;; <http://bugs.gnu.org/19939>, but also the nginx docs mention that
--
2.30.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
2021-03-05 11:54 ` pelzflorian (Florian Pelz)
@ 2021-03-08 13:27 ` Ludovic Courtès
2021-03-11 0:01 ` pelzflorian (Florian Pelz)
0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2021-03-08 13:27 UTC (permalink / raw)
To: pelzflorian (Florian Pelz); +Cc: 46807
Hi,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> The attached patch to maintenance.git fixes the remaining minor issue:
> Now Accept-Language language codes get normalized, zh to zh-CN, so web
> browsers requesting any kind of Chinese get the website in mainland
> Chinese. (This is a minor issue. The only valid URL is /zh-CN/ since
> my last patch to guix-artwork because I don’t know how to
> rewrite/redirect URLs in nginx.)
>
> The patch was tested on a berlin VM.
Yay!
> There is no copyright header in maintenance.git’s
> hydra/nginx/berlin.scm so I did not add a copyright. I hereby license
> the patch CC0
> <https://creativecommons.org/publicdomain/zero/1.0/legalcode>.
Good point; I guess it was meant to be GPLv3+ like the rest, but thanks
for clarifying.
> Shall I just push? A reconfigure of berlin will be necessary but is
> not urgent.
Yes, sounds good!
We’ll reconfigure sooner or later, just ping if you don’t see it happen
within two weeks or so.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
2021-03-08 13:27 ` Ludovic Courtès
@ 2021-03-11 0:01 ` pelzflorian (Florian Pelz)
0 siblings, 0 replies; 10+ messages in thread
From: pelzflorian (Florian Pelz) @ 2021-03-11 0:01 UTC (permalink / raw)
To: 46807-done
Pushed to maintenance.git as 82b075685b6089c7f98acb0993c003936d833776.
Closing. Thank you all!
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-03-11 0:03 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-27 2:18 bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh' ylc991
2021-02-27 12:31 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
2021-03-04 11:03 ` pelzflorian (Florian Pelz)
2021-02-27 12:34 ` Julien Lepiller
2021-03-01 10:06 ` Ludovic Courtès
2021-03-01 10:49 ` pelzflorian (Florian Pelz)
2021-03-05 11:54 ` pelzflorian (Florian Pelz)
2021-03-08 13:27 ` Ludovic Courtès
2021-03-11 0:01 ` pelzflorian (Florian Pelz)
2021-03-05 10:03 ` YLC
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.