From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lars Ingebrigtsen Newsgroups: gmane.emacs.devel Subject: Re: eww doesn't decode %AA%BB%CC URL names Date: Thu, 24 Dec 2015 20:18:47 +0100 Message-ID: <8760znslig.fsf@gnus.org> References: <83r3n0llkt.fsf@gnu.org> <87vb7nsq1g.fsf@gnus.org> <83ziwzmzyp.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1450984778 12443 80.91.229.3 (24 Dec 2015 19:19:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 24 Dec 2015 19:19:38 +0000 (UTC) Cc: emacs-devel@gnu.org, Yuri Khan To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Dec 24 20:19:28 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aCBQ6-0005wr-T8 for ged-emacs-devel@m.gmane.org; Thu, 24 Dec 2015 20:19:27 +0100 Original-Received: from localhost ([::1]:33145 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCBQ6-00068f-1I for ged-emacs-devel@m.gmane.org; Thu, 24 Dec 2015 14:19:26 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33000) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCBPy-00068E-OI for emacs-devel@gnu.org; Thu, 24 Dec 2015 14:19:22 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aCBPv-0004P8-2O for emacs-devel@gnu.org; Thu, 24 Dec 2015 14:19:18 -0500 Original-Received: from hermes.netfonds.no ([80.91.224.195]:52942) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCBPu-0004OX-SK; Thu, 24 Dec 2015 14:19:15 -0500 Original-Received: from 2.150.58.24.tmi.telenormobil.no ([2.150.58.24] helo=mouse) by hermes.netfonds.no with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1aCBPU-0002tp-Pe; Thu, 24 Dec 2015 20:18:48 +0100 In-Reply-To: <83ziwzmzyp.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 24 Dec 2015 21:03:10 +0200") User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.1.50 (gnu/linux) X-MailScanner-ID: 1aCBPU-0002tp-Pe MailScanner-NULL-Check: 1451589529.75563@ZSw08uC1ZzME5yS2HEbbSA X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.224.195 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:196789 Archived-At: Eli Zaretskii writes: >> From: Yuri Khan >> Date: Fri, 25 Dec 2015 00:07:40 +0600 >> Cc: Eli Zaretskii , Emacs developers >>=20 >> On Thu, Dec 24, 2015 at 11:40 PM, Lars Ingebrigtsen wro= te: >> > (decode-coding-string (url-unhex-string >> > "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5") >> > 'utf-8) >> > =3D> "=A7=B3=A7=D6=A7=E2=A7=D5=A7=E8=A7=D6" >> > >> > Right. What charset do we choose? I guess using the charset of the >> > document we're in doesn't make much sense (because it's linking to >> > something off-site which may be in a different charset)... >>=20 >> By RFC 3986, percent-encoded URLs SHOULD use UTF-8 encoding. If the >> URL does not decode into a valid UTF-8 string, it is ok to fall back >> to a heuristic, though. That's basically just (car (decode-coding-string ...)), though, since it'll return utf-8 first if that's a possible charset, won't it? > Yes, I think this is a good policy, thanks. Bonus points for > implementing the command in a way that it will be able to accept user > choice of the encoding via "C-x RET c", like file operations do. Let's see... that function basically just binds `coding-system-for-{read,write}' and then calls the command interactively? Do the commands just look at those variables, and if they're bound, then they use that coding system instead? --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no