unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* How to find the character breaking the file encoding?
@ 2020-02-04 21:42 Marcin Borkowski
  2020-02-05  2:42 ` Óscar Fuentes
  2020-02-05  3:09 ` Stefan Monnier
  0 siblings, 2 replies; 4+ messages in thread
From: Marcin Borkowski @ 2020-02-04 21:42 UTC (permalink / raw)
  To: Help Gnu Emacs mailing list

Hello all,

I have a large UTF-8 file (about 1.5MB) to which I add stuff regularly.
Recently, Emacs started saving it as a binary file.  I suspect I somehow
inserted a non-UTF-8 sequence of bytes there.  Is there any way Emacs
can help me finding it (other than me manually bisecting the file until
I find the offending place)?

TIA,

-- 
Marcin Borkowski
http://mbork.pl



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How to find the character breaking the file encoding?
  2020-02-04 21:42 How to find the character breaking the file encoding? Marcin Borkowski
@ 2020-02-05  2:42 ` Óscar Fuentes
  2020-02-05  3:09 ` Stefan Monnier
  1 sibling, 0 replies; 4+ messages in thread
From: Óscar Fuentes @ 2020-02-05  2:42 UTC (permalink / raw)
  To: help-gnu-emacs

Marcin Borkowski <mbork@mbork.pl> writes:

> I have a large UTF-8 file (about 1.5MB) to which I add stuff regularly.
> Recently, Emacs started saving it as a binary file.  I suspect I somehow
> inserted a non-UTF-8 sequence of bytes there.  Is there any way Emacs
> can help me finding it (other than me manually bisecting the file until
> I find the offending place)?

Try using M-x encode-coding-region ENTERT utf-8 ENTER

With some luck this will mark the buffer as modified because it replaces
unencodable content with blanks (IIRC). Then you can use M-x
diff-buffer-with-file to see the line that contains the problematic
sequence.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How to find the character breaking the file encoding?
  2020-02-04 21:42 How to find the character breaking the file encoding? Marcin Borkowski
  2020-02-05  2:42 ` Óscar Fuentes
@ 2020-02-05  3:09 ` Stefan Monnier
  2020-02-08 11:36   ` Marcin Borkowski
  1 sibling, 1 reply; 4+ messages in thread
From: Stefan Monnier @ 2020-02-05  3:09 UTC (permalink / raw)
  To: help-gnu-emacs

> inserted a non-UTF-8 sequence of bytes there.  Is there any way Emacs
> can help me finding it (other than me manually bisecting the file until
> I find the offending place)?

I think you can do:

    C-x RET r utf-8 RET

to force reading it as utf-8.  And then

    C-x C-s

should complain about the problematic element with a buffer that shows
you the offender and lets you click on it to jump to its location.


        Stefan




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How to find the character breaking the file encoding?
  2020-02-05  3:09 ` Stefan Monnier
@ 2020-02-08 11:36   ` Marcin Borkowski
  0 siblings, 0 replies; 4+ messages in thread
From: Marcin Borkowski @ 2020-02-08 11:36 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: help-gnu-emacs


On 2020-02-05, at 04:09, Stefan Monnier <monnier@iro.umontreal.ca> wrote:

>> inserted a non-UTF-8 sequence of bytes there.  Is there any way Emacs
>> can help me finding it (other than me manually bisecting the file until
>> I find the offending place)?
>
> I think you can do:
>
>     C-x RET r utf-8 RET
>
> to force reading it as utf-8.  And then
>
>     C-x C-s
>
> should complain about the problematic element with a buffer that shows
> you the offender and lets you click on it to jump to its location.

Thanks Stefan and Óscar.  Before you answered me, I killed the buffer
and visited the file again, and the problem disappeared.  Go figure...

Thanks,

-- 
Marcin Borkowski
http://mbork.pl



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-08 11:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-04 21:42 How to find the character breaking the file encoding? Marcin Borkowski
2020-02-05  2:42 ` Óscar Fuentes
2020-02-05  3:09 ` Stefan Monnier
2020-02-08 11:36   ` Marcin Borkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).