From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lars Ingebrigtsen Newsgroups: gmane.emacs.devel Subject: Re: eww doesn't decode %AA%BB%CC URL names Date: Thu, 24 Dec 2015 18:40:59 +0100 Message-ID: <87vb7nsq1g.fsf@gnus.org> References: <83r3n0llkt.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1450978908 27142 80.91.229.3 (24 Dec 2015 17:41:48 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 24 Dec 2015 17:41:48 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Dec 24 18:41:32 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aC9tL-0003QI-Il for ged-emacs-devel@m.gmane.org; Thu, 24 Dec 2015 18:41:31 +0100 Original-Received: from localhost ([::1]:32879 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aC9tK-0004Q5-Px for ged-emacs-devel@m.gmane.org; Thu, 24 Dec 2015 12:41:30 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35498) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aC9tG-0004PC-PP for emacs-devel@gnu.org; Thu, 24 Dec 2015 12:41:27 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aC9tD-00006O-HU for emacs-devel@gnu.org; Thu, 24 Dec 2015 12:41:26 -0500 Original-Received: from hermes.netfonds.no ([80.91.224.195]:51532) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aC9tD-00005s-Al; Thu, 24 Dec 2015 12:41:23 -0500 Original-Received: from 2.150.58.24.tmi.telenormobil.no ([2.150.58.24] helo=mouse) by hermes.netfonds.no with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1aC9sq-0001o9-9g; Thu, 24 Dec 2015 18:41:00 +0100 In-Reply-To: <83r3n0llkt.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 18 Aug 2015 17:26:10 +0300") User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.1.50 (gnu/linux) X-MailScanner-ID: 1aC9sq-0001o9-9g MailScanner-NULL-Check: 1451583660.82856@uTLdKcF1G6cV0rnAuQjqvQ X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.224.195 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:196776 Archived-At: Eli Zaretskii writes: > When I visit a URL in eww and press 'd' on a link like this: > > https://ru.wikipedia.org/wiki/%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5 > > the file Emacs creates a file whose name is made of those hex-encoded > characters as you see them in this mail. Shouldn't we decode them? > Firefox does. We should. Let's see... (url-unhex-string "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5") =3D> "\320\241\320\265\321\200\320\264\321\206\320\265" Uhm... (decode-coding-string (url-unhex-string "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5") 'utf-8) =3D> "=A7=B3=A7=D6=A7=E2=A7=D5=A7=E8=A7=D6" Right. What charset do we choose? I guess using the charset of the document we're in doesn't make much sense (because it's linking to something off-site which may be in a different charset)... Perhaps just run a `detect-coding-string' on it? Or! We've just downloaded the file, after all, and the charset of the file itself may tell us what the charset of the name is... On the other hand, probably not. (For instance, a PDF with a Cyrillic name would probably still just be reported by the web server as being binary.) `detect-coding-string' it is, I guess, unless anybody has a better idea? --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no