From: Eli Zaretskii <eliz@gnu.org>
To: help-gnu-emacs@gnu.org
Subject: Re: Encoding help
Date: Mon, 01 Jun 2009 19:05:56 -0400 [thread overview]
Message-ID: <E1MBGZo-0002oe-PM@fencepost.gnu.org> (raw)
In-Reply-To: <Kqadnd4TsaV6m7nXnZ2dnUVZ_s-dnZ2d@sysmatrix.net> (nihil@nihilo.net)
> Date: Mon, 01 Jun 2009 11:51:13 -0500
> From: "B. T. Raven" <nihil@nihilo.net>
> Newsgroups: gnu.emacs.help
>
> I have a file created by saving a pdf as text and I want to convert the
> whole thing to utf-8 encoding. If I force the encoding for save in Emacs
> 23.0 to utf-8 I get the following in a *Warning* buffer:
>
> These default coding systems were tried to encode text
> in the buffer `span.txt':
> (utf-8-dos (122 . 4194285) (165 . 4194257) (204 . 4194285) (253
> . 4194257) (292 . 4194285) (372 . 4194289) (410 . 4194285) (418
> . 4194285) (653 . 4194217) (689 . 4194285) (731 . 4194285))
> (iso-latin-1-dos (122 . 4194285) (165 . 4194257) (204 . 4194285)
> (253 . 4194257) (292 . 4194285) (372 . 4194289) (410 . 4194285) (418
> . 4194285) (653 . 4194217) (689 . 4194285) (731 . 4194285))
> However, each of them encountered characters it couldn't encode:
>
> [Below are many dozens of \xxx octal escape sequences]
>
> utf-8-dos cannot encode these: ...
> iso-latin-1-dos cannot encode these: ...
>
> The original pdf shows many standard diacritics for Romance languages
> along with a few vowels with macrons.
It sounds like the original text file is already in UTF-8. Does it
help to visit it with "C-x RET c utf-8 RET C-x C-f" instead of just
"C-x C-f"?
If that doesn't help (i.e. if you don't see diacritics instead of
octal escapes), then can you find out how the files is encoded?
Going to one of the octal escapes and typing "C-u C-x =" might also
give important hints, so please post the result here.
> If my only option is to Search and Replace these escape sequences
> with Unicode characters, how can I get a list of all these bad
> characters (they all show in red in Emacs 23 anyway).
You can try using the functions unencodable-char-position and
find-coding-systems-region to find these characters.
> Has any of you written routines to replace things like these using a
> list of dotted pairs or something similar?
Given the wealth of encodings supported by Emacs, such replacements
should not be necessary. Instead, try to find out how the file is
encoded, and visit it by instructing Emacs to use that encoding, with
"C-x RET c".
next prev parent reply other threads:[~2009-06-01 23:05 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-01 16:51 Encoding help B. T. Raven
2009-06-01 23:05 ` Eli Zaretskii [this message]
[not found] ` <mailman.8314.1243897564.31690.help-gnu-emacs@gnu.org>
2009-06-02 16:25 ` B. T. Raven
2009-06-02 22:58 ` Eli Zaretskii
[not found] ` <mailman.8392.1243983524.31690.help-gnu-emacs@gnu.org>
2009-06-03 17:35 ` B. T. Raven
2009-06-03 17:58 ` Peter Dyballa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1MBGZo-0002oe-PM@fencepost.gnu.org \
--to=eliz@gnu.org \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.