From: Ivan Kanis <ivank@juliva.com>
Subject: Re: character encoding
Date: 26 Oct 2002 19:34:35 +0200 [thread overview]
Message-ID: <87eladhzo4.fsf@juliva.com> (raw)
In-Reply-To: mailman.1035185597.6743.help-gnu-emacs@gnu.org
Charles> Hugh wrote:
>>
>> Sometimes when I cut and past "it's" from a web page into an
>> emacs buffer it transfers as "it?s". Ditto for other similar
>> events.
Charles> Most likely these are "curly" apostrophes that are
Charles> inserted when people publish HTML by first writing it in
Charles> a word processor like Word or Word Perfect, which use
I agree it's a real pain. This program strips all of that nonsense. It
won't work for copy/paste problem but it will work for big chunk of
text. It turns windows-1252 encoding to iso-8859-1 encoding. Basically
one has to convert the crap Microsoft inserted between 0x80 to 0x9f
into something standard.
I know it's in C. If someone cares to turn this into lisp that'll be neat :)
Ivan
#include "stdio.h"
char *table [] = {
"euro", /* 0x80 0x20AC #EURO SIGN */
"", /* 0x81 #UNDEFINED */
"\"", /* 0x82 0x201A #SINGLE LOW-9 QUOTATION MARK */
"f", /* 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK */
"\"", /* 0x84 0x201E #DOUBLE LOW-9 QUOTATION MARK */
"...", /* 0x85 0x2026 #HORIZONTAL ELLIPSIS */
"*", /* 0x86 0x2020 #DAGGER */
"*", /* 0x87 0x2021 #DOUBLE DAGGER */
"^", /* 0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT */
" 0/00", /* 0x89 0x2030 #PER MILLE SIGN */
"S", /* 0x8A 0x0160 #LATIN CAPITAL LETTER S WITH CARON */
"<", /* 0x8B 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK */
"OE", /* 0x8C 0x0152 #LATIN CAPITAL LIGATURE OE */
"", /* 0x8D #UNDEFINED */
"Z", /* 0x8E 0x017D #LATIN CAPITAL LETTER Z WITH CARON */
"", /* 0x8F #UNDEFINED */
"", /* 0x90 #UNDEFINED */
"'", /* 0x91 0x2018 #LEFT SINGLE QUOTATION MARK */
"'", /* 0x92 0x2019 #RIGHT SINGLE QUOTATION MARK */
"\"", /* 0x93 0x201C #LEFT DOUBLE QUOTATION MARK */
"\"", /* 0x94 0x201D #RIGHT DOUBLE QUOTATION MARK */
"*", /* 0x95 0x2022 #BULLET */
"-", /* 0x96 0x2013 #EN DASH */
"-", /* 0x97 0x2014 #EM DASH */
"~", /* 0x98 0x02DC #SMALL TILDE */
"(TM)", /* 0x99 0x2122 #TRADE MARK SIGN */
"s", /* 0x9A 0x0161 #LATIN SMALL LETTER S WITH CARON */
"\"", /* 0x9B 0x203A #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK */
"oe", /* 0x9C 0x0153 #LATIN SMALL LIGATURE OE */
"", /* 0x9D #UNDEFINED */
"z", /* 0x9E 0x017E #LATIN SMALL LETTER Z WITH CARON */
"y" /* 0x9F 0x0178 #LATIN CAPITAL LETTER Y WITH DIAERESIS */
};
int main (int argc, char **argv) {
FILE *fd;
unsigned char in;
if (argc == 2) {
if ((fd = fopen(argv[1], "r"))) {
while (fread(&in, 1, sizeof(char), fd)) {
if (in >= 0x80 && in < 0xa0) {
printf ("%s", table[in-0x80]);
} else {
printf("%c", in);
}
}
fclose (fd);
}
}
return 0;
}
--
/-----------------------------------------------------------------------------*
| "I shall never make a new friend in my life, | Ivan Kanis |
| though perhaps a few after I die." | ivank@juliva.com |
| (Oscar Wilde) | www.juliva.com |
*-----------------------------------------------------------------------------/
next prev parent reply other threads:[~2002-10-26 17:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-10-20 19:01 character encoding Hugh Lawson
2002-10-21 7:36 ` Charles Muller
[not found] ` <mailman.1035185597.6743.help-gnu-emacs@gnu.org>
2002-10-26 17:34 ` Ivan Kanis [this message]
2002-10-26 19:13 ` Michael Slass
2002-10-26 20:05 ` Michael Slass
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87eladhzo4.fsf@juliva.com \
--to=ivank@juliva.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).