all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Eww can't display Chinese characters correctly.
@ 2020-12-14  3:30 Hongyi Zhao
  2020-12-14 14:01 ` Pankaj Jangid
  0 siblings, 1 reply; 14+ messages in thread
From: Hongyi Zhao @ 2020-12-14  3:30 UTC (permalink / raw)
  To: help-gnu-emacs

On Ubuntu 20.10, I compiled and installed the git master version of
emacs. When I try to browse the web with eww, I find that all Chinese
characters will show as garbled.

Any hints for fixing/solving this problem?

Regards
-- 
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Polytechnic Vocational and Technical University
NO. 552 North Gangtie Road, Xingtai, China



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-14  3:30 Eww can't display Chinese characters correctly Hongyi Zhao
@ 2020-12-14 14:01 ` Pankaj Jangid
  2020-12-14 14:38   ` Hongyi Zhao
  2020-12-14 14:41   ` Lars Ingebrigtsen
  0 siblings, 2 replies; 14+ messages in thread
From: Pankaj Jangid @ 2020-12-14 14:01 UTC (permalink / raw)
  To: Hongyi Zhao; +Cc: help-gnu-emacs

Hongyi Zhao <hongyi.zhao@gmail.com> writes:

> On Ubuntu 20.10, I compiled and installed the git master version of
> emacs. When I try to browse the web with eww, I find that all Chinese
> characters will show as garbled.
>
> Any hints for fixing/solving this problem?

You need ‘set-fontset-font’ to tell Emacs what font to use for the
(language) script. Example,

  (set-fontset-font t 'devanagari "Noto")

This sets Noto font for Devanagari script in the default fontset. ‘t’ is
for default fontset.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-14 14:01 ` Pankaj Jangid
@ 2020-12-14 14:38   ` Hongyi Zhao
  2020-12-14 14:41   ` Lars Ingebrigtsen
  1 sibling, 0 replies; 14+ messages in thread
From: Hongyi Zhao @ 2020-12-14 14:38 UTC (permalink / raw)
  To: Hongyi Zhao, help-gnu-emacs

On Mon, Dec 14, 2020 at 10:01 PM Pankaj Jangid <pankaj@codeisgreat.org> wrote:
>
> Hongyi Zhao <hongyi.zhao@gmail.com> writes:
>
> > On Ubuntu 20.10, I compiled and installed the git master version of
> > emacs. When I try to browse the web with eww, I find that all Chinese
> > characters will show as garbled.
> >
> > Any hints for fixing/solving this problem?
>
> You need ‘set-fontset-font’ to tell Emacs what font to use for the
> (language) script. Example,
>
>   (set-fontset-font t 'devanagari "Noto")
>
> This sets Noto font for Devanagari script in the default fontset. ‘t’ is
> for default fontset.

The eaf-open-browser command shipped in EAF
<https://github.com/manateelazycat/emacs-application-framework>
doesn't have this problem.

Based on my tries, EWW is pretty poor as a browser.

Regards
-- 
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Polytechnic Vocational and Technical University
NO. 552 North Gangtie Road, Xingtai, China



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-14 14:01 ` Pankaj Jangid
  2020-12-14 14:38   ` Hongyi Zhao
@ 2020-12-14 14:41   ` Lars Ingebrigtsen
  2020-12-14 14:48     ` Hongyi Zhao
  1 sibling, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2020-12-14 14:41 UTC (permalink / raw)
  To: Hongyi Zhao; +Cc: help-gnu-emacs

Pankaj Jangid <pankaj@codeisgreat.org> writes:

>> On Ubuntu 20.10, I compiled and installed the git master version of
>> emacs. When I try to browse the web with eww, I find that all Chinese
>> characters will show as garbled.
>>
>> Any hints for fixing/solving this problem?
>
> You need ‘set-fontset-font’ to tell Emacs what font to use for the
> (language) script. Example,
>
>   (set-fontset-font t 'devanagari "Noto")
>
> This sets Noto font for Devanagari script in the default fontset. ‘t’ is
> for default fontset.

Emacs should display Chinese characters just fine on its own without any
configuration needed.

But it's hard to tell what the problem is since "garbled" doesn't really
tell us much.  Garbled how?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-14 14:41   ` Lars Ingebrigtsen
@ 2020-12-14 14:48     ` Hongyi Zhao
  2020-12-14 14:56       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 14+ messages in thread
From: Hongyi Zhao @ 2020-12-14 14:48 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]

On Mon, Dec 14, 2020 at 10:41 PM Lars Ingebrigtsen <larsi@gnus.org> wrote:
>
> Pankaj Jangid <pankaj@codeisgreat.org> writes:
>
> >> On Ubuntu 20.10, I compiled and installed the git master version of
> >> emacs. When I try to browse the web with eww, I find that all Chinese
> >> characters will show as garbled.
> >>
> >> Any hints for fixing/solving this problem?
> >
> > You need ‘set-fontset-font’ to tell Emacs what font to use for the
> > (language) script. Example,
> >
> >   (set-fontset-font t 'devanagari "Noto")
> >
> > This sets Noto font for Devanagari script in the default fontset. ‘t’ is
> > for default fontset.
>
> Emacs should display Chinese characters just fine on its own without any
> configuration needed.
>
> But it's hard to tell what the problem is since "garbled" doesn't really
> tell us much.  Garbled how?

See the attached file for more information.

Regards
-- 
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Polytechnic Vocational and Technical University
NO. 552 North Gangtie Road, Xingtai, China

[-- Attachment #2: garbled-eww.png --]
[-- Type: image/png, Size: 265208 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-14 14:48     ` Hongyi Zhao
@ 2020-12-14 14:56       ` Lars Ingebrigtsen
  2020-12-14 15:19         ` Stefan Monnier
                           ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2020-12-14 14:56 UTC (permalink / raw)
  To: Hongyi Zhao; +Cc: help-gnu-emacs

Hongyi Zhao <hongyi.zhao@gmail.com> writes:

> See the attached file for more information.

Oh, fun.  The test case is:

M-x eww RET google.com RET

type in

中文网

in the search field, and Google will then redirect you to:

http://www.google.com/search?gbv=1&iflsig=AINFCbYAAAAAX9eJ576dBCkZ_8MT30T-VWnLwzH6yNx4&bih=&biw=&source=hp&hl=no&ie=ISO-8859-1&btnG=Google-s%C3%B8k&q=+%E4%B8%AD%E6%96%87%E7%BD%91

Which gives you:

Content-Type: text/html; charset=ISO-8859-1

in the headers and then

<!doctype html><html lang="no"><head><meta charset="UTF-8">

in the body.  So it's another example of Google Quality.

eww heeds the Content-type header over the meta charset, which is the
same as Firefox...  and Chrome, too (just try that URL).

So it's a Google bug, not an Emacs bug.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-14 14:56       ` Lars Ingebrigtsen
@ 2020-12-14 15:19         ` Stefan Monnier
  2020-12-14 15:27           ` Lars Ingebrigtsen
  2020-12-14 18:59         ` Tomas Nordin
  2020-12-14 23:22         ` Hongyi Zhao
  2 siblings, 1 reply; 14+ messages in thread
From: Stefan Monnier @ 2020-12-14 15:19 UTC (permalink / raw)
  To: help-gnu-emacs

>> See the attached file for more information.
> Oh, fun.  The test case is:
> M-x eww RET google.com RET
> type in
>
> 中文网
>
> in the search field, and Google will then redirect you to:
>
> http://www.google.com/search?gbv=1&iflsig=AINFCbYAAAAAX9eJ576dBCkZ_8MT30T-VWnLwzH6yNx4&bih=&biw=&source=hp&hl=no&ie=ISO-8859-1&btnG=Google-s%C3%B8k&q=+%E4%B8%AD%E6%96%87%E7%BD%91

[...]
> So it's a Google bug, not an Emacs bug.

Could be, but it could also be that the request made by Eww doesn't
correctly specify the encoding of the "中文网" string, so that Google is
then lead to believe we're using iso-8859-1?


        Stefan




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-14 15:19         ` Stefan Monnier
@ 2020-12-14 15:27           ` Lars Ingebrigtsen
  2020-12-19 11:35             ` William Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2020-12-14 15:27 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: help-gnu-emacs

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Could be, but it could also be that the request made by Eww doesn't
> correctly specify the encoding of the "中文网" string, so that Google is
> then lead to believe we're using iso-8859-1?

Nope.  Google puts this in the form:

<input name="ie" value="ISO-8859-1" type="hidden">

I'm assuming they then have some JS to magically make this work.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-14 14:56       ` Lars Ingebrigtsen
  2020-12-14 15:19         ` Stefan Monnier
@ 2020-12-14 18:59         ` Tomas Nordin
  2020-12-14 23:22         ` Hongyi Zhao
  2 siblings, 0 replies; 14+ messages in thread
From: Tomas Nordin @ 2020-12-14 18:59 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Hongyi Zhao; +Cc: help-gnu-emacs

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Hongyi Zhao <hongyi.zhao@gmail.com> writes:
>
>> See the attached file for more information.
>
> Oh, fun.  The test case is:
>
> M-x eww RET google.com RET
>
> type in
>
> 中文网
>
> in the search field, and Google will then redirect you to:
>
> http://www.google.com/search?gbv=1&iflsig=AINFCbYAAAAAX9eJ576dBCkZ_8MT30T-VWnLwzH6yNx4&bih=&biw=&source=hp&hl=no&ie=ISO-8859-1&btnG=Google-s%C3%B8k&q=+%E4%B8%AD%E6%96%87%E7%BD%91
>
> Which gives you:
>
> Content-Type: text/html; charset=ISO-8859-1
>
> in the headers and then
>
> <!doctype html><html lang="no"><head><meta charset="UTF-8">
>
> in the body.  So it's another example of Google Quality.
>
> eww heeds the Content-type header over the meta charset, which is the
> same as Firefox...  and Chrome, too (just try that URL).

FWIW, in emacs-w3m it looks garbled too...

>
> So it's a Google bug, not an Emacs bug.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-14 14:56       ` Lars Ingebrigtsen
  2020-12-14 15:19         ` Stefan Monnier
  2020-12-14 18:59         ` Tomas Nordin
@ 2020-12-14 23:22         ` Hongyi Zhao
  2 siblings, 0 replies; 14+ messages in thread
From: Hongyi Zhao @ 2020-12-14 23:22 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: help-gnu-emacs

On Mon, Dec 14, 2020 at 10:56 PM Lars Ingebrigtsen <larsi@gnus.org> wrote:
>
> Hongyi Zhao <hongyi.zhao@gmail.com> writes:
>
> > See the attached file for more information.
>
> Oh, fun.  The test case is:
>
> M-x eww RET google.com RET
>
> type in
>
> 中文网
>
> in the search field, and Google will then redirect you to:
>
> http://www.google.com/search?gbv=1&iflsig=AINFCbYAAAAAX9eJ576dBCkZ_8MT30T-VWnLwzH6yNx4&bih=&biw=&source=hp&hl=no&ie=ISO-8859-1&btnG=Google-s%C3%B8k&q=+%E4%B8%AD%E6%96%87%E7%BD%91
>
> Which gives you:
>
> Content-Type: text/html; charset=ISO-8859-1
>
> in the headers and then
>
> <!doctype html><html lang="no"><head><meta charset="UTF-8">
>
> in the body.  So it's another example of Google Quality.

Wonderful analysis, but I still don’t understand how you came to this
conclusion from the above URL.

>
> eww heeds the Content-type header over the meta charset, which is the
> same as Firefox...  and Chrome, too (just try that URL).
>
> So it's a Google bug, not an Emacs bug.



-- 
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Polytechnic Vocational and Technical University
NO. 552 North Gangtie Road, Xingtai, China



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-14 15:27           ` Lars Ingebrigtsen
@ 2020-12-19 11:35             ` William Xu
  2020-12-19 15:18               ` Lars Ingebrigtsen
  0 siblings, 1 reply; 14+ messages in thread
From: William Xu @ 2020-12-19 11:35 UTC (permalink / raw)
  To: help-gnu-emacs

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>> Could be, but it could also be that the request made by Eww doesn't
>> correctly specify the encoding of the "中文网" string, so that Google is
>> then lead to believe we're using iso-8859-1?
>
> Nope.  Google puts this in the form:
>
> <input name="ie" value="ISO-8859-1" type="hidden">
>
> I'm assuming they then have some JS to magically make this work.

I checked the page source in chrome or safari, they don't seem to have
iso-8859-1 in the page, instead something like this: 

<input class="gNO89b" value="Google Search" aria-label="Google Search" name="btnK" type="submit" data-ved="0ahUKEwjD_5XP9NntAhWIBGMBHbV0BZUQ4dUDCAw">

Also, if i copy and paste below url directly in safari or chrome, it
would display the page correctly. 

http://www.google.com/search?gbv=1&iflsig=AINFCbYAAAAAX9eJ576dBCkZ_8MT30T-VWnLwzH6yNx4&bih=&biw=&source=hp&hl=no&ie=ISO-8859-1&btnG=Google-s%C3%B8k&q=+%E4%B8%AD%E6%96%87%E7%BD%91

On the other hand, when I just mouse-click from emacs, which calls
browse-url, then safari will display the same garbage there, simlar to
what OP posts. 

I wonder what browse-url is doing in between. 

Is browse-url also used by eww? 

-- 
William




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-19 11:35             ` William Xu
@ 2020-12-19 15:18               ` Lars Ingebrigtsen
  2020-12-19 18:27                 ` William Xu
  2020-12-20  1:13                 ` Hongyi Zhao
  0 siblings, 2 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2020-12-19 15:18 UTC (permalink / raw)
  To: William Xu; +Cc: help-gnu-emacs

William Xu <william.xwl@gmail.com> writes:

> I checked the page source in chrome or safari, they don't seem to have
> iso-8859-1 in the page, instead something like this: 
>
> <input class="gNO89b" value="Google Search" aria-label="Google Search"
> name="btnK" type="submit"
> data-ved="0ahUKEwjD_5XP9NntAhWIBGMBHbV0BZUQ4dUDCAw">

Yes, Google serves out HTML that actually works if the User-Agent
indicates the major browsers.  If you 

(setq url-user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36")

that makes Google serve out HTML that's non-buggy.

> Also, if i copy and paste below url directly in safari or chrome, it
> would display the page correctly. 
>
> http://www.google.com/search?gbv=1&iflsig=AINFCbYAAAAAX9eJ576dBCkZ_8MT30T-VWnLwzH6yNx4&bih=&biw=&source=hp&hl=no&ie=ISO-8859-1&btnG=Google-s%C3%B8k&q=+%E4%B8%AD%E6%96%87%E7%BD%91
>
> On the other hand, when I just mouse-click from emacs, which calls
> browse-url, then safari will display the same garbage there, simlar to
> what OP posts. 

I see the same in Chrome whether I go via browse-url or paste the URL.

> I wonder what browse-url is doing in between. 
>
> Is browse-url also used by eww? 

No.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-19 15:18               ` Lars Ingebrigtsen
@ 2020-12-19 18:27                 ` William Xu
  2020-12-20  1:13                 ` Hongyi Zhao
  1 sibling, 0 replies; 14+ messages in thread
From: William Xu @ 2020-12-19 18:27 UTC (permalink / raw)
  To: help-gnu-emacs

Lars Ingebrigtsen <larsi@gnus.org> writes:

> William Xu <william.xwl@gmail.com> writes:
>
> Yes, Google serves out HTML that actually works if the User-Agent
> indicates the major browsers.  If you 
>
> (setq url-user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88
> Safari/537.36")
>
> that makes Google serve out HTML that's non-buggy.

Thanks, this works for me too. I guess I should add this to my .emacs. :)

>> Also, if i copy and paste below url directly in safari or chrome, it
>> would display the page correctly. 
>>
>> http://www.google.com/search?gbv=1&iflsig=AINFCbYAAAAAX9eJ576dBCkZ_8MT30T-VWnLwzH6yNx4&bih=&biw=&source=hp&hl=no&ie=ISO-8859-1&btnG=Google-s%C3%B8k&q=+%E4%B8%AD%E6%96%87%E7%BD%91
>>
>> On the other hand, when I just mouse-click from emacs, which calls
>> browse-url, then safari will display the same garbage there, simlar to
>> what OP posts. 
>
> I see the same in Chrome whether I go via browse-url or paste the URL.

I look into the browse-url implementation, if I call the same command
line ("open URL"), the same garbage will be shown in Safari.

So it is something happening in the Safari.  Somehow when it is pasted
in the address bar Safari, it has converted the pasted url into below:

https://www.google.com/search?client=safari&rls=en&q=%E4%B8%AD%E6%96%87%E7%BD%91&ie=UTF-8&oe=UTF-8

The same trick doesn't work for Chrome. 

-- 
William




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Eww can't display Chinese characters correctly.
  2020-12-19 15:18               ` Lars Ingebrigtsen
  2020-12-19 18:27                 ` William Xu
@ 2020-12-20  1:13                 ` Hongyi Zhao
  1 sibling, 0 replies; 14+ messages in thread
From: Hongyi Zhao @ 2020-12-20  1:13 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: William Xu, help-gnu-emacs

On Sat, Dec 19, 2020 at 11:19 PM Lars Ingebrigtsen <larsi@gnus.org> wrote:
>
> William Xu <william.xwl@gmail.com> writes:
>
> > I checked the page source in chrome or safari, they don't seem to have
> > iso-8859-1 in the page, instead something like this:
> >
> > <input class="gNO89b" value="Google Search" aria-label="Google Search"
> > name="btnK" type="submit"
> > data-ved="0ahUKEwjD_5XP9NntAhWIBGMBHbV0BZUQ4dUDCAw">
>
> Yes, Google serves out HTML that actually works if the User-Agent
> indicates the major browsers.  If you
>
> (setq url-user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36")
>
> that makes Google serve out HTML that's non-buggy.

I confirm it does the trick for me too.

>
> > Also, if i copy and paste below url directly in safari or chrome, it
> > would display the page correctly.
> >
> > http://www.google.com/search?gbv=1&iflsig=AINFCbYAAAAAX9eJ576dBCkZ_8MT30T-VWnLwzH6yNx4&bih=&biw=&source=hp&hl=no&ie=ISO-8859-1&btnG=Google-s%C3%B8k&q=+%E4%B8%AD%E6%96%87%E7%BD%91
> >
> > On the other hand, when I just mouse-click from emacs, which calls
> > browse-url, then safari will display the same garbage there, simlar to
> > what OP posts.
>
> I see the same in Chrome whether I go via browse-url or paste the URL.

Me too.

>
> > I wonder what browse-url is doing in between.
> >
> > Is browse-url also used by eww?
>
> No.
>
> --
> (domestic pets only, the antidote for overdose, milk.)
>    bloggy blog: http://lars.ingebrigtsen.no
>


-- 
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Polytechnic Vocational and Technical University
NO. 552 North Gangtie Road, Xingtai, China



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-12-20  1:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-14  3:30 Eww can't display Chinese characters correctly Hongyi Zhao
2020-12-14 14:01 ` Pankaj Jangid
2020-12-14 14:38   ` Hongyi Zhao
2020-12-14 14:41   ` Lars Ingebrigtsen
2020-12-14 14:48     ` Hongyi Zhao
2020-12-14 14:56       ` Lars Ingebrigtsen
2020-12-14 15:19         ` Stefan Monnier
2020-12-14 15:27           ` Lars Ingebrigtsen
2020-12-19 11:35             ` William Xu
2020-12-19 15:18               ` Lars Ingebrigtsen
2020-12-19 18:27                 ` William Xu
2020-12-20  1:13                 ` Hongyi Zhao
2020-12-14 18:59         ` Tomas Nordin
2020-12-14 23:22         ` Hongyi Zhao

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.