From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Yuri Khan Newsgroups: gmane.emacs.devel Subject: Re: eww doesn't decode %AA%BB%CC URL names Date: Fri, 25 Dec 2015 00:07:40 +0600 Message-ID: References: <83r3n0llkt.fsf@gnu.org> <87vb7nsq1g.fsf@gnus.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1450980492 17021 80.91.229.3 (24 Dec 2015 18:08:12 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 24 Dec 2015 18:08:12 +0000 (UTC) Cc: Eli Zaretskii , Emacs developers To: Lars Ingebrigtsen Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Dec 24 19:08:11 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aCAJ8-0002JO-Lc for ged-emacs-devel@m.gmane.org; Thu, 24 Dec 2015 19:08:10 +0100 Original-Received: from localhost ([::1]:32954 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCAJ8-0003MS-0q for ged-emacs-devel@m.gmane.org; Thu, 24 Dec 2015 13:08:10 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51049) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCAJ5-0003ML-EW for emacs-devel@gnu.org; Thu, 24 Dec 2015 13:08:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aCAJ4-0006p0-Nh for emacs-devel@gnu.org; Thu, 24 Dec 2015 13:08:07 -0500 Original-Received: from mail-lf0-x236.google.com ([2a00:1450:4010:c07::236]:33296) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCAIy-0006nI-Tz; Thu, 24 Dec 2015 13:08:01 -0500 Original-Received: by mail-lf0-x236.google.com with SMTP id p203so167730274lfa.0; Thu, 24 Dec 2015 10:08:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=LIsMu5DteNItJTIxL2JKMCsFdeiVH1ax7xyg8we1ku8=; b=dusof8k6I6dxLolH35KkYW4a7tullkVSyb91ErhxqL7ojJKea2tSgWGhWpy3kID0Oi yrbY1pehuisLdnrhrY6Lf19jPwzYp+qpcd5SI43/tZ5gVpCI7eOfF5wlksdrm2UW1VS0 hVubNDrC6yDqm0/FvLIm5dDxFvTBMLIFpEXkTNtG4pc4acHP7Wp+x3S2mzeQ19UVhac8 vI7awfN62VWvOhTUtav5lSI5MrhLbVOWixpjAkh/VYj4e8XfKZ9HE0XstAoc7fEo+TpY 1c9mySOVb0OmcKF0VHjkgxELzeXq8cv6R706tW7grVze67KTb0pfbW2SnkJjozi2jV2I ar3A== X-Received: by 10.25.22.14 with SMTP id m14mr2841038lfi.124.1450980480185; Thu, 24 Dec 2015 10:08:00 -0800 (PST) Original-Received: by 10.112.129.163 with HTTP; Thu, 24 Dec 2015 10:07:40 -0800 (PST) In-Reply-To: <87vb7nsq1g.fsf@gnus.org> X-Google-Sender-Auth: bg7J4HKTkJJHXCwScb_-9inBWsE X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a00:1450:4010:c07::236 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:196783 Archived-At: On Thu, Dec 24, 2015 at 11:40 PM, Lars Ingebrigtsen wrote: > (decode-coding-string (url-unhex-string > "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5") > 'utf-8) > =3D> "=D0=A1=D0=B5=D1=80=D0=B4=D1=86=D0=B5" > > Right. What charset do we choose? I guess using the charset of the > document we're in doesn't make much sense (because it's linking to > something off-site which may be in a different charset)... By RFC 3986, percent-encoded URLs SHOULD use UTF-8 encoding. If the URL does not decode into a valid UTF-8 string, it is ok to fall back to a heuristic, though.