* How to find the character breaking the file encoding?
@ 2020-02-04 21:42 Marcin Borkowski
2020-02-05 2:42 ` Óscar Fuentes
2020-02-05 3:09 ` Stefan Monnier
0 siblings, 2 replies; 4+ messages in thread
From: Marcin Borkowski @ 2020-02-04 21:42 UTC (permalink / raw)
To: Help Gnu Emacs mailing list
Hello all,
I have a large UTF-8 file (about 1.5MB) to which I add stuff regularly.
Recently, Emacs started saving it as a binary file. I suspect I somehow
inserted a non-UTF-8 sequence of bytes there. Is there any way Emacs
can help me finding it (other than me manually bisecting the file until
I find the offending place)?
TIA,
--
Marcin Borkowski
http://mbork.pl
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How to find the character breaking the file encoding?
2020-02-04 21:42 How to find the character breaking the file encoding? Marcin Borkowski
@ 2020-02-05 2:42 ` Óscar Fuentes
2020-02-05 3:09 ` Stefan Monnier
1 sibling, 0 replies; 4+ messages in thread
From: Óscar Fuentes @ 2020-02-05 2:42 UTC (permalink / raw)
To: help-gnu-emacs
Marcin Borkowski <mbork@mbork.pl> writes:
> I have a large UTF-8 file (about 1.5MB) to which I add stuff regularly.
> Recently, Emacs started saving it as a binary file. I suspect I somehow
> inserted a non-UTF-8 sequence of bytes there. Is there any way Emacs
> can help me finding it (other than me manually bisecting the file until
> I find the offending place)?
Try using M-x encode-coding-region ENTERT utf-8 ENTER
With some luck this will mark the buffer as modified because it replaces
unencodable content with blanks (IIRC). Then you can use M-x
diff-buffer-with-file to see the line that contains the problematic
sequence.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How to find the character breaking the file encoding?
2020-02-04 21:42 How to find the character breaking the file encoding? Marcin Borkowski
2020-02-05 2:42 ` Óscar Fuentes
@ 2020-02-05 3:09 ` Stefan Monnier
2020-02-08 11:36 ` Marcin Borkowski
1 sibling, 1 reply; 4+ messages in thread
From: Stefan Monnier @ 2020-02-05 3:09 UTC (permalink / raw)
To: help-gnu-emacs
> inserted a non-UTF-8 sequence of bytes there. Is there any way Emacs
> can help me finding it (other than me manually bisecting the file until
> I find the offending place)?
I think you can do:
C-x RET r utf-8 RET
to force reading it as utf-8. And then
C-x C-s
should complain about the problematic element with a buffer that shows
you the offender and lets you click on it to jump to its location.
Stefan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How to find the character breaking the file encoding?
2020-02-05 3:09 ` Stefan Monnier
@ 2020-02-08 11:36 ` Marcin Borkowski
0 siblings, 0 replies; 4+ messages in thread
From: Marcin Borkowski @ 2020-02-08 11:36 UTC (permalink / raw)
To: Stefan Monnier; +Cc: help-gnu-emacs
On 2020-02-05, at 04:09, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>> inserted a non-UTF-8 sequence of bytes there. Is there any way Emacs
>> can help me finding it (other than me manually bisecting the file until
>> I find the offending place)?
>
> I think you can do:
>
> C-x RET r utf-8 RET
>
> to force reading it as utf-8. And then
>
> C-x C-s
>
> should complain about the problematic element with a buffer that shows
> you the offender and lets you click on it to jump to its location.
Thanks Stefan and Óscar. Before you answered me, I killed the buffer
and visited the file again, and the problem disappeared. Go figure...
Thanks,
--
Marcin Borkowski
http://mbork.pl
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-02-08 11:36 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-04 21:42 How to find the character breaking the file encoding? Marcin Borkowski
2020-02-05 2:42 ` Óscar Fuentes
2020-02-05 3:09 ` Stefan Monnier
2020-02-08 11:36 ` Marcin Borkowski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).