From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 85BBD20281; Tue, 23 May 2017 18:39:40 +0000 (UTC) Date: Tue, 23 May 2017 18:39:40 +0000 From: Eric Wong To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Cc: meta@public-inbox.org Subject: Re: Feature R/BUG: Auto uri_unescape() & utf8 handling Message-ID: <20170523183940.GA9543@dcvr> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: Ævar Arnfjörð Bjarmason wrote: > BUG: > > The code is missing a utf8::decode() or equivalent somewhere, try > searching for: https://public-inbox.org/git/?q=%C3%86var > > The search works, but the text in the search box is garbled, the > second search is https://public-inbox.org/git/?q=%C3%83%E2%80%A0var > third https://public-inbox.org/git/?q=%C3%83%C6%92%C3%A2%E2%82%AC%C2%A0var > etc. Thanks for the report, I'm testing the patch below on public-inbox.org and it seems fine. I'll need to write a test for this... diff --git a/lib/PublicInbox/MID.pm b/lib/PublicInbox/MID.pm index 1c2d75c..2613c8e 100644 --- a/lib/PublicInbox/MID.pm +++ b/lib/PublicInbox/MID.pm @@ -6,7 +6,7 @@ package PublicInbox::MID; use strict; use warnings; use base qw/Exporter/; -our @EXPORT_OK = qw/mid_clean id_compress mid2path mid_mime mid_escape/; +our @EXPORT_OK = qw/mid_clean id_compress mid2path mid_mime mid_escape MID_ESC/; use URI::Escape qw(uri_escape_utf8); use Digest::SHA qw/sha1_hex/; use constant MID_MAX => 40; # SHA-1 hex length diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm index cec87c6..42bc648 100644 --- a/lib/PublicInbox/SearchView.pm +++ b/lib/PublicInbox/SearchView.pm @@ -222,7 +222,9 @@ sub mset_thread { sub ctx_prepare { my ($q, $ctx) = @_; - my $qh = ascii_html($q->{'q'}); + my $qh = $q->{'q'}; + utf8::decode($qh); + $qh = ascii_html($qh); $ctx->{-q_value_html} = $qh; $ctx->{-atom} = '?'.$q->qs_html(x => 'A', r => undef); $ctx->{-title_html} = "$qh - search results"; @@ -254,8 +256,9 @@ sub adump { package PublicInbox::SearchQuery; use strict; use warnings; +use URI::Escape qw(uri_escape); use PublicInbox::Hval; -use PublicInbox::MID qw(mid_escape); +use PublicInbox::MID qw(MID_ESC); sub new { my ($class, $qp) = @_; @@ -280,7 +283,7 @@ sub qs_html { $self = $tmp; } - my $q = mid_escape($self->{'q'}); + my $q = uri_escape($self->{'q'}, MID_ESC); $q =~ s/%20/+/g; # improve URL readability my $qs = "q=$q"; diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm index 13b3921..f3c702e 100644 --- a/lib/PublicInbox/WWW.pm +++ b/lib/PublicInbox/WWW.pm @@ -42,6 +42,7 @@ sub call { # we don't care about multi-value my %qp = map { + utf8::decode($_); my ($k, $v) = split('=', uri_unescape($_), 2); $v = '' unless defined $v; $v =~ tr/+/ /;