From: Eli Zaretskii <eliz@gnu.org>
To: Lars Ingebrigtsen <larsi@gnus.org>
Cc: emacs-devel@gnu.org, yuri.v.khan@gmail.com
Subject: Re: eww doesn't decode %AA%BB%CC URL names
Date: Thu, 24 Dec 2015 21:34:24 +0200 [thread overview]
Message-ID: <83twn7myin.fsf@gnu.org> (raw)
In-Reply-To: <8760znslig.fsf@gnus.org> (message from Lars Ingebrigtsen on Thu, 24 Dec 2015 20:18:47 +0100)
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Yuri Khan <yuri.v.khan@gmail.com>, emacs-devel@gnu.org
> Date: Thu, 24 Dec 2015 20:18:47 +0100
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> >> From: Yuri Khan <yuri.v.khan@gmail.com>
> >> Date: Fri, 25 Dec 2015 00:07:40 +0600
> >> Cc: Eli Zaretskii <eliz@gnu.org>, Emacs developers <emacs-devel@gnu.org>
> >>
> >> On Thu, Dec 24, 2015 at 11:40 PM, Lars Ingebrigtsen <larsi@gnus.org> wrote:
> >> > (decode-coding-string (url-unhex-string
> >> > "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5")
> >> > 'utf-8)
> >> > => "Сердце"
> >> >
> >> > Right. What charset do we choose? I guess using the charset of the
> >> > document we're in doesn't make much sense (because it's linking to
> >> > something off-site which may be in a different charset)...
> >>
> >> By RFC 3986, percent-encoded URLs SHOULD use UTF-8 encoding. If the
> >> URL does not decode into a valid UTF-8 string, it is ok to fall back
> >> to a heuristic, though.
>
> That's basically just (car (decode-coding-string ...))
I believe you meant detect-coding-string.
> though, since it'll return utf-8 first if that's a possible charset,
> won't it?
You cannot rely on it returning UTF-8, that depends on coding
priorities (that are subject to customizations) and other things.
I think you should use UTF-8 literally as the first choice.
> > Yes, I think this is a good policy, thanks. Bonus points for
> > implementing the command in a way that it will be able to accept user
> > choice of the encoding via "C-x RET c", like file operations do.
>
> Let's see... that function basically just binds
> `coding-system-for-{read,write}' and then calls the command
> interactively?
Yes.
> Do the commands just look at those variables, and if they're bound,
> then they use that coding system instead?
Yes, they use these in preference to everything else, something like
this:
(let ((coding (or coding-system-for-read
document-encoding
locale-coding-system
...)))
(decode-coding-string ... coding))
next prev parent reply other threads:[~2015-12-24 19:34 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-18 14:26 eww doesn't decode %AA%BB%CC URL names Eli Zaretskii
2015-12-24 17:40 ` Lars Ingebrigtsen
2015-12-24 18:07 ` Yuri Khan
2015-12-24 19:03 ` Eli Zaretskii
2015-12-24 19:18 ` Lars Ingebrigtsen
2015-12-24 19:34 ` Eli Zaretskii [this message]
2015-12-24 19:55 ` Lars Ingebrigtsen
2015-12-24 20:40 ` Eli Zaretskii
2015-12-24 20:49 ` Lars Ingebrigtsen
2015-12-24 20:43 ` Lars Ingebrigtsen
2015-12-24 21:00 ` Eli Zaretskii
2015-12-24 21:04 ` Lars Ingebrigtsen
2015-12-24 21:11 ` Eli Zaretskii
2015-12-24 21:16 ` Eli Zaretskii
2015-12-24 21:17 ` Lars Ingebrigtsen
2015-12-24 21:28 ` Lars Ingebrigtsen
2015-12-25 7:24 ` Eli Zaretskii
2015-12-25 7:32 ` Lars Ingebrigtsen
2015-12-25 7:17 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83twn7myin.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=larsi@gnus.org \
--cc=yuri.v.khan@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).