unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#68971: Innocent file renders crazy
@ 2024-02-07 14:17 Dan Jacobson
  2024-02-07 15:00 ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Dan Jacobson @ 2024-02-07 14:17 UTC (permalink / raw)
  To: 68971

[-- Attachment #1: Type: text/plain, Size: 223 bytes --]

There is something crazy about this attached file that causes emacs to
display tons of weird characters.
$ md5sum metadata.html
42c875bae87988bbbd4db481b873bc1a metadata.html
$ emacs -Q metadata.html #crazy!
GNU Emacs 29.1

[-- Attachment #2: metadata.html --]
[-- Type: text/html, Size: 20676 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#68971: Innocent file renders crazy
  2024-02-07 14:17 bug#68971: Innocent file renders crazy Dan Jacobson
@ 2024-02-07 15:00 ` Eli Zaretskii
  2024-02-07 21:46   ` Dan Jacobson
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2024-02-07 15:00 UTC (permalink / raw)
  To: Dan Jacobson; +Cc: 68971

tags 68971 notabug
thanks

> Date: Wed, 07 Feb 2024 22:17:58 +0800
> From: Dan Jacobson <jidanni@jidanni.org>
> 
> There is something crazy about this attached file that causes emacs to
> display tons of weird characters.
> $ md5sum metadata.html
> 42c875bae87988bbbd4db481b873bc1a metadata.html
> $ emacs -Q metadata.html #crazy!
> GNU Emacs 29.1

It's this part:

  <html lang="en"><head><META http-equiv="Content-Type"
  content="text/html; charset=utf-16"><font face="calibri"><title>
                      ^^^^^^^^^^^^^^

UTF-16 encodes each character below 0x10000 with 2 bytes, so you get
this gibberish if you try to display plain ASCII text as if it were
UTF-16.

This is not a bug.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#68971: Innocent file renders crazy
  2024-02-07 15:00 ` Eli Zaretskii
@ 2024-02-07 21:46   ` Dan Jacobson
  2024-02-08  6:03     ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Dan Jacobson @ 2024-02-07 21:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 68971

OK, you are entirely right. It is all the file's fault and not emacs's.

But on the other hand I wouldn't get far telling the Google Chrome team
they should stop overriding charset declarations just to make things
render good.

In the end it's the emacs users who end up not being able to read the
document.

Maybe have some warning "wrong charset detected, proceed? [y,n,(a)utofix...]"

Else well, all the other users in the room are proceeding with their
homework assignment, except Ralph, who uses emacs, which has gibberish
on its screen, with no warnings.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#68971: Innocent file renders crazy
  2024-02-07 21:46   ` Dan Jacobson
@ 2024-02-08  6:03     ` Eli Zaretskii
  0 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2024-02-08  6:03 UTC (permalink / raw)
  To: Dan Jacobson; +Cc: 68971-done

> From: Dan Jacobson <jidanni@jidanni.org>
> Cc: 68971@debbugs.gnu.org
> Date: Thu, 08 Feb 2024 05:46:35 +0800
> 
> OK, you are entirely right. It is all the file's fault and not emacs's.
> 
> But on the other hand I wouldn't get far telling the Google Chrome team
> they should stop overriding charset declarations just to make things
> render good.
> 
> In the end it's the emacs users who end up not being able to read the
> document.
> 
> Maybe have some warning "wrong charset detected, proceed? [y,n,(a)utofix...]"

How can Emacs know, up front, that the charset is wrong?  In general,
when a file claims some specific charset or encoding, Emacs believes
that and obeys.  The "gibberish" is in the eyes of the beholder; Emacs
doesn't really understand human-readable text, and so doesn't know
whether what it presents is legible text or garbage caused by wrong
decoding.

> Else well, all the other users in the room are proceeding with their
> homework assignment, except Ralph, who uses emacs, which has gibberish
> on its screen, with no warnings.

What I did when I saw gibberish was to visit the file literally (as in
"M-x find-file-literally"), then, when I saw it was plain ASCII,
looked at its preamble, where I saw UTF-16, which explained why "C-x C-f"
shows gibberish.  So when something like this happens, my suggestion
is:

  . M-x find-file-literally
  . look at the literal display: if its is readable, you can just
    proceed with your home assignment
  . alternatively, force Emacs to visit with the correct encoding, as
    in "C-x RET c utf-8 RET C-x C-f metadata.html RET"

The "utf-8" part above was a guess, based on looking at the file when
visited literally; you may need to guess again if the results are not
good enough.  See the node "Text Coding" in the Emacs user manual for
more about these facilities.

And with that, I'm closing this bug.





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-02-08  6:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-07 14:17 bug#68971: Innocent file renders crazy Dan Jacobson
2024-02-07 15:00 ` Eli Zaretskii
2024-02-07 21:46   ` Dan Jacobson
2024-02-08  6:03     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).