unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Question about (gui-get-selection nil 'text/html)
@ 2018-04-13 19:00 Lars Ingebrigtsen
  2018-04-13 19:07 ` Lars Ingebrigtsen
  2018-04-13 21:42 ` Stefan Monnier
  0 siblings, 2 replies; 8+ messages in thread
From: Lars Ingebrigtsen @ 2018-04-13 19:00 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 194 bytes --]

So, we can yank HTML that we cut from Firefox like so:

(gui-get-selection nil 'text/html)

... sort of.

I've put the result into a binary file so it'll hopefully survive the
email transport.


[-- Attachment #2: selection.bin --]
[-- Type: application/octet-stream, Size: 125 bytes --]

[-- Attachment #3: Type: text/plain, Size: 448 bytes --]


So...  what is that?  I've tried to google, but I found nothing
promising, so my Google-fu is probably bad.

It rather looks like it's UTF-16 -- every other byte is a nul.  But
Emacs claims that it's iso-8859-1.  And...  I've tried decoding various
instances of these things like UTF-16, and it almost kinda works, but
not quite.

Any ideas?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about (gui-get-selection nil 'text/html)
  2018-04-13 19:00 Question about (gui-get-selection nil 'text/html) Lars Ingebrigtsen
@ 2018-04-13 19:07 ` Lars Ingebrigtsen
  2018-04-13 20:27   ` Eli Zaretskii
  2018-04-13 21:42 ` Stefan Monnier
  1 sibling, 1 reply; 8+ messages in thread
From: Lars Ingebrigtsen @ 2018-04-13 19:07 UTC (permalink / raw)
  To: emacs-devel

Oh, wow.  If I just do

(decode-coding-region (point-min) (point-max) 'utf-16-le)

instead of utf-16, I get the HTML I expect instead of a bunch of Chinese
characters.  :-)

There's a byte order mark at the start -- isn't utf-16 supposed to use
that to get the byte order? 

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about (gui-get-selection nil 'text/html)
  2018-04-13 19:07 ` Lars Ingebrigtsen
@ 2018-04-13 20:27   ` Eli Zaretskii
  2018-04-13 20:29     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2018-04-13 20:27 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Fri, 13 Apr 2018 21:07:27 +0200
> 
> Oh, wow.  If I just do
> 
> (decode-coding-region (point-min) (point-max) 'utf-16-le)
> 
> instead of utf-16, I get the HTML I expect instead of a bunch of Chinese
> characters.  :-)
> 
> There's a byte order mark at the start -- isn't utf-16 supposed to use
> that to get the byte order? 

The file you attached has no BOM.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about (gui-get-selection nil 'text/html)
  2018-04-13 20:27   ` Eli Zaretskii
@ 2018-04-13 20:29     ` Lars Ingebrigtsen
  2018-04-13 22:07       ` Andreas Schwab
  2018-04-14  6:42       ` Eli Zaretskii
  0 siblings, 2 replies; 8+ messages in thread
From: Lars Ingebrigtsen @ 2018-04-13 20:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Lars Ingebrigtsen <larsi@gnus.org>
>> Date: Fri, 13 Apr 2018 21:07:27 +0200
>> 
>> Oh, wow.  If I just do
>> 
>> (decode-coding-region (point-min) (point-max) 'utf-16-le)
>> 
>> instead of utf-16, I get the HTML I expect instead of a bunch of Chinese
>> characters.  :-)
>> 
>> There's a byte order mark at the start -- isn't utf-16 supposed to use
>> that to get the byte order? 
>
> The file you attached has no BOM.

The first four bytes were

\303\277\303\276

Isn't that the BOM?  Or do I misremember?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about (gui-get-selection nil 'text/html)
  2018-04-13 19:00 Question about (gui-get-selection nil 'text/html) Lars Ingebrigtsen
  2018-04-13 19:07 ` Lars Ingebrigtsen
@ 2018-04-13 21:42 ` Stefan Monnier
  1 sibling, 0 replies; 8+ messages in thread
From: Stefan Monnier @ 2018-04-13 21:42 UTC (permalink / raw)
  To: emacs-devel

> So, we can yank HTML that we cut from Firefox like so:
>
> (gui-get-selection nil 'text/html)

I've opened a bug report for that: bug#31149


        Stefan




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about (gui-get-selection nil 'text/html)
  2018-04-13 20:29     ` Lars Ingebrigtsen
@ 2018-04-13 22:07       ` Andreas Schwab
  2018-04-13 22:18         ` Lars Ingebrigtsen
  2018-04-14  6:42       ` Eli Zaretskii
  1 sibling, 1 reply; 8+ messages in thread
From: Andreas Schwab @ 2018-04-13 22:07 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel

On Apr 13 2018, Lars Ingebrigtsen <larsi@gnus.org> wrote:

> The first four bytes were
>
> \303\277\303\276
>
> Isn't that the BOM?  Or do I misremember?

It's a BOM encoded as UTF-16 encoded as UTF-8.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about (gui-get-selection nil 'text/html)
  2018-04-13 22:07       ` Andreas Schwab
@ 2018-04-13 22:18         ` Lars Ingebrigtsen
  0 siblings, 0 replies; 8+ messages in thread
From: Lars Ingebrigtsen @ 2018-04-13 22:18 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Eli Zaretskii, emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

> On Apr 13 2018, Lars Ingebrigtsen <larsi@gnus.org> wrote:
>
>> The first four bytes were
>>
>> \303\277\303\276
>>
>> Isn't that the BOM?  Or do I misremember?
>
> It's a BOM encoded as UTF-16 encoded as UTF-8.

Heh heh.  Beautiful.  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about (gui-get-selection nil 'text/html)
  2018-04-13 20:29     ` Lars Ingebrigtsen
  2018-04-13 22:07       ` Andreas Schwab
@ 2018-04-14  6:42       ` Eli Zaretskii
  1 sibling, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2018-04-14  6:42 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Fri, 13 Apr 2018 22:29:41 +0200
> Cc: emacs-devel@gnu.org
> 
> >> There's a byte order mark at the start -- isn't utf-16 supposed to use
> >> that to get the byte order? 
> >
> > The file you attached has no BOM.
> 
> The first four bytes were
> 
> \303\277\303\276
> 
> Isn't that the BOM?  Or do I misremember?

Look at it with hexl-find-file or with "od -x", and you will see it's
not a BOM (which should be either FFFE or FEFF).



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-04-14  6:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-13 19:00 Question about (gui-get-selection nil 'text/html) Lars Ingebrigtsen
2018-04-13 19:07 ` Lars Ingebrigtsen
2018-04-13 20:27   ` Eli Zaretskii
2018-04-13 20:29     ` Lars Ingebrigtsen
2018-04-13 22:07       ` Andreas Schwab
2018-04-13 22:18         ` Lars Ingebrigtsen
2018-04-14  6:42       ` Eli Zaretskii
2018-04-13 21:42 ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).