eww doesn't decode %AA%BB%CC URL names

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* eww doesn't decode %AA%BB%CC URL names
@ 2015-08-18 14:26 Eli Zaretskii
  2015-12-24 17:40 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2015-08-18 14:26 UTC (permalink / raw)
  To: emacs-devel

When I visit a URL in eww and press 'd' on a link like this:

  https://ru.wikipedia.org/wiki/%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5

the file Emacs creates a file whose name is made of those hex-encoded
characters as you see them in this mail.  Shouldn't we decode them?
Firefox does.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-08-18 14:26 eww doesn't decode %AA%BB%CC URL names Eli Zaretskii
@ 2015-12-24 17:40 ` Lars Ingebrigtsen
  2015-12-24 18:07   ` Yuri Khan
  0 siblings, 1 reply; 19+ messages in thread
From: Lars Ingebrigtsen @ 2015-12-24 17:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> When I visit a URL in eww and press 'd' on a link like this:
>
>   https://ru.wikipedia.org/wiki/%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5
>
> the file Emacs creates a file whose name is made of those hex-encoded
> characters as you see them in this mail.  Shouldn't we decode them?
> Firefox does.

We should.  Let's see...

(url-unhex-string "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5")
=> "\320\241\320\265\321\200\320\264\321\206\320\265"

Uhm...

(decode-coding-string (url-unhex-string
"%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5")
'utf-8)
=> "Сердце"

Right.  What charset do we choose?  I guess using the charset of the
document we're in doesn't make much sense (because it's linking to
something off-site which may be in a different charset)...

Perhaps just run a `detect-coding-string' on it?

Or!  We've just downloaded the file, after all, and the charset of the
file itself may tell us what the charset of the name is...  On the other
hand, probably not.  (For instance, a PDF with a Cyrillic name would
probably still just be reported by the web server as being binary.)

`detect-coding-string' it is, I guess, unless anybody has a better idea?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 17:40 ` Lars Ingebrigtsen
@ 2015-12-24 18:07   ` Yuri Khan
  2015-12-24 19:03     ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Yuri Khan @ 2015-12-24 18:07 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, Emacs developers

On Thu, Dec 24, 2015 at 11:40 PM, Lars Ingebrigtsen <larsi@gnus.org> wrote:
> (decode-coding-string (url-unhex-string
> "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5")
> 'utf-8)
> => "Сердце"
>
> Right.  What charset do we choose?  I guess using the charset of the
> document we're in doesn't make much sense (because it's linking to
> something off-site which may be in a different charset)...

By RFC 3986, percent-encoded URLs SHOULD use UTF-8 encoding. If the
URL does not decode into a valid UTF-8 string, it is ok to fall back
to a heuristic, though.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 18:07   ` Yuri Khan
@ 2015-12-24 19:03     ` Eli Zaretskii
  2015-12-24 19:18       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2015-12-24 19:03 UTC (permalink / raw)
  To: Yuri Khan; +Cc: larsi, emacs-devel

> From: Yuri Khan <yuri.v.khan@gmail.com>
> Date: Fri, 25 Dec 2015 00:07:40 +0600
> Cc: Eli Zaretskii <eliz@gnu.org>, Emacs developers <emacs-devel@gnu.org>
> 
> On Thu, Dec 24, 2015 at 11:40 PM, Lars Ingebrigtsen <larsi@gnus.org> wrote:
> > (decode-coding-string (url-unhex-string
> > "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5")
> > 'utf-8)
> > => "Сердце"
> >
> > Right.  What charset do we choose?  I guess using the charset of the
> > document we're in doesn't make much sense (because it's linking to
> > something off-site which may be in a different charset)...
> 
> By RFC 3986, percent-encoded URLs SHOULD use UTF-8 encoding. If the
> URL does not decode into a valid UTF-8 string, it is ok to fall back
> to a heuristic, though.

Yes, I think this is a good policy, thanks.  Bonus points for
implementing the command in a way that it will be able to accept user
choice of the encoding via "C-x RET c", like file operations do.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 19:03     ` Eli Zaretskii
@ 2015-12-24 19:18       ` Lars Ingebrigtsen
  2015-12-24 19:34         ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Lars Ingebrigtsen @ 2015-12-24 19:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Yuri Khan

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Yuri Khan <yuri.v.khan@gmail.com>
>> Date: Fri, 25 Dec 2015 00:07:40 +0600
>> Cc: Eli Zaretskii <eliz@gnu.org>, Emacs developers <emacs-devel@gnu.org>
>> 
>> On Thu, Dec 24, 2015 at 11:40 PM, Lars Ingebrigtsen <larsi@gnus.org> wrote:
>> > (decode-coding-string (url-unhex-string
>> > "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5")
>> > 'utf-8)
>> > => "Сердце"
>> >
>> > Right.  What charset do we choose?  I guess using the charset of the
>> > document we're in doesn't make much sense (because it's linking to
>> > something off-site which may be in a different charset)...
>> 
>> By RFC 3986, percent-encoded URLs SHOULD use UTF-8 encoding. If the
>> URL does not decode into a valid UTF-8 string, it is ok to fall back
>> to a heuristic, though.

That's basically just (car (decode-coding-string ...)), though, since
it'll return utf-8 first if that's a possible charset, won't it?

> Yes, I think this is a good policy, thanks.  Bonus points for
> implementing the command in a way that it will be able to accept user
> choice of the encoding via "C-x RET c", like file operations do.

Let's see...  that function basically just binds
`coding-system-for-{read,write}' and then calls the command
interactively?  Do the commands just look at those variables, and if
they're bound, then they use that coding system instead?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 19:18       ` Lars Ingebrigtsen
@ 2015-12-24 19:34         ` Eli Zaretskii
  2015-12-24 19:55           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2015-12-24 19:34 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel, yuri.v.khan

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Yuri Khan <yuri.v.khan@gmail.com>,  emacs-devel@gnu.org
> Date: Thu, 24 Dec 2015 20:18:47 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Yuri Khan <yuri.v.khan@gmail.com>
> >> Date: Fri, 25 Dec 2015 00:07:40 +0600
> >> Cc: Eli Zaretskii <eliz@gnu.org>, Emacs developers <emacs-devel@gnu.org>
> >> 
> >> On Thu, Dec 24, 2015 at 11:40 PM, Lars Ingebrigtsen <larsi@gnus.org> wrote:
> >> > (decode-coding-string (url-unhex-string
> >> > "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5")
> >> > 'utf-8)
> >> > => "Сердце"
> >> >
> >> > Right.  What charset do we choose?  I guess using the charset of the
> >> > document we're in doesn't make much sense (because it's linking to
> >> > something off-site which may be in a different charset)...
> >> 
> >> By RFC 3986, percent-encoded URLs SHOULD use UTF-8 encoding. If the
> >> URL does not decode into a valid UTF-8 string, it is ok to fall back
> >> to a heuristic, though.
> 
> That's basically just (car (decode-coding-string ...))

I believe you meant detect-coding-string.

> though, since it'll return utf-8 first if that's a possible charset,
> won't it?

You cannot rely on it returning UTF-8, that depends on coding
priorities (that are subject to customizations) and other things.

I think you should use UTF-8 literally as the first choice.

> > Yes, I think this is a good policy, thanks.  Bonus points for
> > implementing the command in a way that it will be able to accept user
> > choice of the encoding via "C-x RET c", like file operations do.
> 
> Let's see...  that function basically just binds
> `coding-system-for-{read,write}' and then calls the command
> interactively?

Yes.

> Do the commands just look at those variables, and if they're bound,
> then they use that coding system instead?

Yes, they use these in preference to everything else, something like
this:

  (let ((coding (or coding-system-for-read
                    document-encoding
		    locale-coding-system
                    ...)))
      (decode-coding-string ... coding))




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 19:34         ` Eli Zaretskii
@ 2015-12-24 19:55           ` Lars Ingebrigtsen
  2015-12-24 20:40             ` Eli Zaretskii
  2015-12-24 20:43             ` Lars Ingebrigtsen
  0 siblings, 2 replies; 19+ messages in thread
From: Lars Ingebrigtsen @ 2015-12-24 19:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, yuri.v.khan

Eli Zaretskii <eliz@gnu.org> writes:

>> That's basically just (car (decode-coding-string ...))
>
> I believe you meant detect-coding-string.

Yup.  :-)

>> though, since it'll return utf-8 first if that's a possible charset,
>> won't it?
>
> You cannot rely on it returning UTF-8, that depends on coding
> priorities (that are subject to customizations) and other things.
>
> I think you should use UTF-8 literally as the first choice.

Right.  How do I check whether the bytes are a valid utf-8 sequence,
though?  I thought I remembered something called
`valid-something-something-p', but I can't find it now...

> Yes, they use these in preference to everything else, something like
> this:
>
>   (let ((coding (or coding-system-for-read
>                     document-encoding
> 		    locale-coding-system
>                     ...)))
>       (decode-coding-string ... coding))

Okidoke.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 19:55           ` Lars Ingebrigtsen
@ 2015-12-24 20:40             ` Eli Zaretskii
  2015-12-24 20:49               ` Lars Ingebrigtsen
  2015-12-24 20:43             ` Lars Ingebrigtsen
  1 sibling, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2015-12-24 20:40 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel, yuri.v.khan

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: yuri.v.khan@gmail.com,  emacs-devel@gnu.org
> Date: Thu, 24 Dec 2015 20:55:13 +0100
> 
> > I think you should use UTF-8 literally as the first choice.
> 
> Right.  How do I check whether the bytes are a valid utf-8 sequence,
> though?  I thought I remembered something called
> `valid-something-something-p', but I can't find it now...

I think you can run find-charset-string on the decoded string, and if
the result is just (unicode), you can be sure.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 19:55           ` Lars Ingebrigtsen
  2015-12-24 20:40             ` Eli Zaretskii
@ 2015-12-24 20:43             ` Lars Ingebrigtsen
  2015-12-24 21:00               ` Eli Zaretskii
  2015-12-24 21:04               ` Lars Ingebrigtsen
  1 sibling, 2 replies; 19+ messages in thread
From: Lars Ingebrigtsen @ 2015-12-24 20:43 UTC (permalink / raw)
  To: emacs-devel

Hm!  I have an unexpected compliation here.

If I eval the following:

(write-region (point) (point-max) "/home/larsi/Downloads/Сердце")

Then I get a file name that consists of five spaces.  That seems awfully
weird.  I may have configured something somewhere that says that Emacs
should create file names in latin-1...  Hm...

(set-language-environment "Latin-1")

Which I would guess isn't uncommon.  Making an all-blank file name here
is somewhat unacceptable, I think.  So how should this be handled?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 20:40             ` Eli Zaretskii
@ 2015-12-24 20:49               ` Lars Ingebrigtsen
  0 siblings, 0 replies; 19+ messages in thread
From: Lars Ingebrigtsen @ 2015-12-24 20:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yuri.v.khan, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> I think you can run find-charset-string on the decoded string, and if
> the result is just (unicode), you can be sure.

Yeah, that should do the trick.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 20:43             ` Lars Ingebrigtsen
@ 2015-12-24 21:00               ` Eli Zaretskii
  2015-12-24 21:04               ` Lars Ingebrigtsen
  1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2015-12-24 21:00 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Thu, 24 Dec 2015 21:43:17 +0100
> 
> If I eval the following:
> 
> (write-region (point) (point-max) "/home/larsi/Downloads/Сердце")
> 
> Then I get a file name that consists of five spaces.  That seems awfully
> weird.  I may have configured something somewhere that says that Emacs
> should create file names in latin-1...  Hm...
> 
> (set-language-environment "Latin-1")
> 
> Which I would guess isn't uncommon.

I hope not.  Those who do that completely screw up their file-name
encoding stuff.

> Making an all-blank file name here is somewhat unacceptable, I
> think.  So how should this be handled?

Not sure which problem are you trying to solve.  But my crystal ball
says you need to

  (let ((file-name-coding-system default-file))
    (write-region (point) (point-max) "/home/larsi/Downloads/Сердце"))

because most GNU/Linux systems use UTF-8 codeset by default.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 20:43             ` Lars Ingebrigtsen
  2015-12-24 21:00               ` Eli Zaretskii
@ 2015-12-24 21:04               ` Lars Ingebrigtsen
  2015-12-24 21:11                 ` Eli Zaretskii
  1 sibling, 1 reply; 19+ messages in thread
From: Lars Ingebrigtsen @ 2015-12-24 21:04 UTC (permalink / raw)
  To: emacs-devel

After spelunking down into `set-language-environment', it seems like
it's the setting of `default-file-name-coding-system' that's the problem
here:

(encode-coding-string
 (decode-coding-string
  (url-unhex-string "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5")
  'utf-8)
 default-file-name-coding-system)
=> "      "

So I guess the file name should remain those percentages if it can't be
encoded using that...  but how do I check that, then?  :-)

Charsets are hard!

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 21:04               ` Lars Ingebrigtsen
@ 2015-12-24 21:11                 ` Eli Zaretskii
  2015-12-24 21:16                   ` Eli Zaretskii
  2015-12-24 21:17                   ` Lars Ingebrigtsen
  0 siblings, 2 replies; 19+ messages in thread
From: Eli Zaretskii @ 2015-12-24 21:11 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Thu, 24 Dec 2015 22:04:08 +0100
> 
> After spelunking down into `set-language-environment', it seems like
> it's the setting of `default-file-name-coding-system' that's the problem
> here:
> 
> (encode-coding-string
>  (decode-coding-string
>   (url-unhex-string "%D0%A1%D0%B5%D1%80%D0%B4%D1%86%D0%B5")
>   'utf-8)
>  default-file-name-coding-system)
> => "      "
> 
> So I guess the file name should remain those percentages if it can't be
> encoded using that...  but how do I check that, then?  :-)

If you want to check that STRING can be encoded in CODING, do this:

  (member CODING (find-coding-systems-string STRING))

and see if the result is non-nil.

For file names, you should do this test with file-name-coding-system,
if that's non-nil, else with default-file-name-coding-system.

> Charsets are hard!

Subtle.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 21:11                 ` Eli Zaretskii
@ 2015-12-24 21:16                   ` Eli Zaretskii
  2015-12-24 21:17                   ` Lars Ingebrigtsen
  1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2015-12-24 21:16 UTC (permalink / raw)
  To: larsi; +Cc: emacs-devel

> Date: Thu, 24 Dec 2015 23:11:20 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
> 
>   (member CODING (find-coding-systems-string STRING))
     ^^^^^^
Sorry, memq, of course.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 21:11                 ` Eli Zaretskii
  2015-12-24 21:16                   ` Eli Zaretskii
@ 2015-12-24 21:17                   ` Lars Ingebrigtsen
  2015-12-24 21:28                     ` Lars Ingebrigtsen
  2015-12-25  7:17                     ` Eli Zaretskii
  1 sibling, 2 replies; 19+ messages in thread
From: Lars Ingebrigtsen @ 2015-12-24 21:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> If you want to check that STRING can be encoded in CODING, do this:
>
>   (member CODING (find-coding-systems-string STRING))
>
> and see if the result is non-nil.

Hm:

(find-coding-systems-string "a")
=> (undecided)

(find-coding-systems-string "Сердце")
=> (chinese-iso-8bit japanese-shift-jis iso-2022-jp utf-8 korean-iso-8bit euc-jis-2004 japanese-iso-8bit iso-2022-jp-2004 cp855 windows-1251 koi8-t koi8-u cp866 koi8-u cyrillic-koi8 cyrillic-iso-8bit chinese-gb18030 chinese-gbk chinese-big5-hkscs chinese-hz utf-7 iso-2022-kr iso-2022-jp-2 iso-2022-cn-ext iso-2022-cn utf-16 utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le compound-text-with-extensions compound-text iso-2022-7bit utf-8-auto utf-8-with-signature emacs-mule raw-text iso-2022-8bit-ss2 iso-2022-7bit-lock eucjp-ms korean-cp949 japanese-shift-jis-2004 japanese-iso-7bit-1978-irv japanese-cp932 pt154 mik cp1125 cyrillic-alternativnyj utf-7-imap utf-8-emacs prefer-utf-8 no-conversion ctext-no-compositions iso-2022-7bit-lock-ss2 iso-2022-7bit-ss2)

Wowza.

Ok, I think I should now be able to create the function in question.
Thanks for all the help.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 21:17                   ` Lars Ingebrigtsen
@ 2015-12-24 21:28                     ` Lars Ingebrigtsen
  2015-12-25  7:24                       ` Eli Zaretskii
  2015-12-25  7:17                     ` Eli Zaretskii
  1 sibling, 1 reply; 19+ messages in thread
From: Lars Ingebrigtsen @ 2015-12-24 21:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> (find-coding-systems-string "Сердце")
> => (chinese-iso-8bit japanese-shift-jis iso-2022-jp utf-8 korean-iso-8bit euc-jis-2004 japanese-iso-8bit iso-2022-jp-2004 cp855 windows-1251 koi8-t koi8-u cp866 koi8-u cyrillic-koi8 cyrillic-iso-8bit chinese-gb18030 chinese-gbk chinese-big5-hkscs chinese-hz utf-7 iso-2022-kr iso-2022-jp-2 iso-2022-cn-ext iso-2022-cn utf-16 utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le compound-text-with-extensions compound-text iso-2022-7bit utf-8-auto utf-8-with-signature emacs-mule raw-text iso-2022-8bit-ss2 iso-2022-7bit-lock eucjp-ms korean-cp949 japanese-shift-jis-2004 japanese-iso-7bit-1978-irv japanese-cp932 pt154 mik cp1125 cyrillic-alternativnyj utf-7-imap utf-8-emacs prefer-utf-8 no-conversion ctext-no-compositions iso-2022-7bit-lock-ss2 iso-2022-7bit-ss2)

Darn!  If I start emacs -Q, I get

default-file-name-coding-system
=> utf-8-unix

And that isn't on that monstrous list up there...  Is that a bug in
`find-coding-systems-string'?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 21:17                   ` Lars Ingebrigtsen
  2015-12-24 21:28                     ` Lars Ingebrigtsen
@ 2015-12-25  7:17                     ` Eli Zaretskii
  1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2015-12-25  7:17 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Thu, 24 Dec 2015 22:17:10 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > If you want to check that STRING can be encoded in CODING, do this:
> >
> >   (member CODING (find-coding-systems-string STRING))
> >
> > and see if the result is non-nil.
> 
> Hm:
> 
> (find-coding-systems-string "a")
> => (undecided)

This is normal for pure ASCII.  If the return value is just that, then
CODING, any CODING, can do the job.

> (find-coding-systems-string "Сердце")
> => (chinese-iso-8bit japanese-shift-jis iso-2022-jp utf-8 korean-iso-8bit euc-jis-2004 japanese-iso-8bit iso-2022-jp-2004 cp855 windows-1251 koi8-t koi8-u cp866 koi8-u cyrillic-koi8 cyrillic-iso-8bit chinese-gb18030 chinese-gbk chinese-big5-hkscs chinese-hz utf-7 iso-2022-kr iso-2022-jp-2 iso-2022-cn-ext iso-2022-cn utf-16 utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le compound-text-with-extensions compound-text iso-2022-7bit utf-8-auto utf-8-with-signature emacs-mule raw-text iso-2022-8bit-ss2 iso-2022-7bit-lock eucjp-ms korean-cp949 japanese-shift-jis-2004 japanese-iso-7bit-1978-irv japanese-cp932 pt154 mik cp1125 cyrillic-alternativnyj utf-7-imap utf-8-emacs prefer-utf-8 no-conversion ctext-no-compositions iso-2022-7bit-lock-ss2 iso-2022-7bit-ss2)
> 
> Wowza.

Yeah.

> Ok, I think I should now be able to create the function in question.
> Thanks for all the help.  :-)

You are welcome.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-24 21:28                     ` Lars Ingebrigtsen
@ 2015-12-25  7:24                       ` Eli Zaretskii
  2015-12-25  7:32                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2015-12-25  7:24 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Thu, 24 Dec 2015 22:28:48 +0100
> 
> Darn!  If I start emacs -Q, I get
> 
> default-file-name-coding-system
> => utf-8-unix
> 
> And that isn't on that monstrous list up there...  Is that a bug in
> `find-coding-systems-string'?

No, it's another "issue" when dealing with coding systems.  To avoid
this, use

  (coding-system-base default-file-name-coding-system)

instead of just default-file-name-coding-system, and the same with
file-name-coding-system.

(The "-unix" suffix controls conversion of end-of-line, which is not
relevant for encoding the characters in the file name.)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: eww doesn't decode %AA%BB%CC URL names
  2015-12-25  7:24                       ` Eli Zaretskii
@ 2015-12-25  7:32                         ` Lars Ingebrigtsen
  0 siblings, 0 replies; 19+ messages in thread
From: Lars Ingebrigtsen @ 2015-12-25  7:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> No, it's another "issue" when dealing with coding systems.  To avoid
> this, use
>
>   (coding-system-base default-file-name-coding-system)
>
> instead of just default-file-name-coding-system, and the same with
> file-name-coding-system.

That did the trick.  emacs -Q now saves that Russian-looking file name
using utf-8 into the Download directory.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-12-25  7:32 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-18 14:26 eww doesn't decode %AA%BB%CC URL names Eli Zaretskii
2015-12-24 17:40 ` Lars Ingebrigtsen
2015-12-24 18:07   ` Yuri Khan
2015-12-24 19:03     ` Eli Zaretskii
2015-12-24 19:18       ` Lars Ingebrigtsen
2015-12-24 19:34         ` Eli Zaretskii
2015-12-24 19:55           ` Lars Ingebrigtsen
2015-12-24 20:40             ` Eli Zaretskii
2015-12-24 20:49               ` Lars Ingebrigtsen
2015-12-24 20:43             ` Lars Ingebrigtsen
2015-12-24 21:00               ` Eli Zaretskii
2015-12-24 21:04               ` Lars Ingebrigtsen
2015-12-24 21:11                 ` Eli Zaretskii
2015-12-24 21:16                   ` Eli Zaretskii
2015-12-24 21:17                   ` Lars Ingebrigtsen
2015-12-24 21:28                     ` Lars Ingebrigtsen
2015-12-25  7:24                       ` Eli Zaretskii
2015-12-25  7:32                         ` Lars Ingebrigtsen
2015-12-25  7:17                     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).