unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* \225 and so on
@ 2003-04-04 22:24 Richard Stallman
  2003-04-07 15:37 ` Andreas Schwab
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Richard Stallman @ 2003-04-04 22:24 UTC (permalink / raw)
  Cc: emacs-devel

I often get files that are basically plain ASCII but have a few of
Microsoft's special characters such as \222 and \225.  Emacs does not
seem to cope with these files very well.  One time recently it
suggested iso-8859-15 iso-8859-14 utf-8 mule-utf-16-be mule-utf-16-le.

Can you please work on more natural handling for these characters?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-04 22:24 \225 and so on Richard Stallman
@ 2003-04-07 15:37 ` Andreas Schwab
  2003-04-08  2:30   ` Richard Stallman
  2003-04-07 16:51 ` Benjamin Riefenstahl
  2003-04-08  4:13 ` Eli Zaretskii
  2 siblings, 1 reply; 18+ messages in thread
From: Andreas Schwab @ 2003-04-07 15:37 UTC (permalink / raw)
  Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:

|> I often get files that are basically plain ASCII but have a few of
|> Microsoft's special characters such as \222 and \225.  Emacs does not
|> seem to cope with these files very well.  One time recently it
|> suggested iso-8859-15 iso-8859-14 utf-8 mule-utf-16-be mule-utf-16-le.
|> 
|> Can you please work on more natural handling for these characters?

What would you consider natural?  When I tried to edit such a file it was
read in as raw-text and saved back without any questions.  It probably
depends on the language environment, though (I'm using "German").

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-04 22:24 \225 and so on Richard Stallman
  2003-04-07 15:37 ` Andreas Schwab
@ 2003-04-07 16:51 ` Benjamin Riefenstahl
  2003-04-08  2:31   ` Richard Stallman
  2003-04-08  4:16   ` Eli Zaretskii
  2003-04-08  4:13 ` Eli Zaretskii
  2 siblings, 2 replies; 18+ messages in thread
From: Benjamin Riefenstahl @ 2003-04-07 16:51 UTC (permalink / raw)
  Cc: emacs-devel

Hi Richard,


Richard Stallman <rms@gnu.org> writes:

> I often get files that are basically plain ASCII but have a few of
> Microsoft's special characters such as \222 and \225.

That's almost always cp1252, often mislabeled as iso-8859-1 or
us-ascii, or even unlabeled.  I use these settings in GNUS to cope:

 (setq gnus-newsgroup-ignored-charsets
       '(unknown-8bit x-unknown us-ascii iso-8859-1))
 (setq gnus-default-charset 'cp1252)

I.e. I disable using the MIME parameters for us-ascii and iso-8859-1
and set the default to cp1252 instead.  This has worked fine so far
for me.  Real us-ascii or iso-8859-1 messages don't have a problem
with this, as cp1252 is a proper superset of iso-8859-1.

cp1252 is in lisp/international/code-pages.el in CVS Emacs. 


so long, benny

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-07 15:37 ` Andreas Schwab
@ 2003-04-08  2:30   ` Richard Stallman
  2003-04-08  4:47     ` Kenichi Handa
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Stallman @ 2003-04-08  2:30 UTC (permalink / raw)
  Cc: emacs-devel

    |> I often get files that are basically plain ASCII but have a few of
    |> Microsoft's special characters such as \222 and \225.  Emacs does not
    |> seem to cope with these files very well.  One time recently it
    |> suggested iso-8859-15 iso-8859-14 utf-8 mule-utf-16-be mule-utf-16-le.
    |> 
    |> Can you please work on more natural handling for these characters?

    What would you consider natural?

It should be treated as Latin-1, or perhaps as Latin-whatever based on
one's usual coding system preferences.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-07 16:51 ` Benjamin Riefenstahl
@ 2003-04-08  2:31   ` Richard Stallman
  2003-04-08  4:16   ` Eli Zaretskii
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Stallman @ 2003-04-08  2:31 UTC (permalink / raw)
  Cc: emacs-devel

    That's almost always cp1252, often mislabeled as iso-8859-1 or
    us-ascii, or even unlabeled.  I use these settings in GNUS to cope:

     (setq gnus-newsgroup-ignored-charsets
	   '(unknown-8bit x-unknown us-ascii iso-8859-1))
     (setq gnus-default-charset 'cp1252)

This workaround may work, but it is cumbersome.  We should change
Emacs so that this is not necessary.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-04 22:24 \225 and so on Richard Stallman
  2003-04-07 15:37 ` Andreas Schwab
  2003-04-07 16:51 ` Benjamin Riefenstahl
@ 2003-04-08  4:13 ` Eli Zaretskii
  2 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2003-04-08  4:13 UTC (permalink / raw)
  Cc: emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Date: Fri, 04 Apr 2003 17:24:12 -0500
> 
> I often get files that are basically plain ASCII but have a few of
> Microsoft's special characters such as \222 and \225.  Emacs does not
> seem to cope with these files very well.  One time recently it
> suggested iso-8859-15 iso-8859-14 utf-8 mule-utf-16-be mule-utf-16-le.
> 
> Can you please work on more natural handling for these characters?

I think we already have that: these files should be decoded and
encoded as raw-text.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-07 16:51 ` Benjamin Riefenstahl
  2003-04-08  2:31   ` Richard Stallman
@ 2003-04-08  4:16   ` Eli Zaretskii
  2003-04-08 13:07     ` Benjamin Riefenstahl
  1 sibling, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2003-04-08  4:16 UTC (permalink / raw)
  Cc: emacs-devel

> From: Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de>
> Date: 07 Apr 2003 18:51:12 +0200
> 
> > I often get files that are basically plain ASCII but have a few of
> > Microsoft's special characters such as \222 and \225.
> 
> That's almost always cp1252, often mislabeled as iso-8859-1 or
> us-ascii, or even unlabeled.  I use these settings in GNUS to cope:
> 
>  (setq gnus-newsgroup-ignored-charsets
>        '(unknown-8bit x-unknown us-ascii iso-8859-1))
>  (setq gnus-default-charset 'cp1252)
> 
> I.e. I disable using the MIME parameters for us-ascii and iso-8859-1
> and set the default to cp1252 instead.  This has worked fine so far
> for me.

I don't think this will help Richard, as he doesn't use Gnus.

> cp1252 is in lisp/international/code-pages.el in CVS Emacs. 

Doesn't that map those characters into mule-unicode-* charsets?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-08  2:30   ` Richard Stallman
@ 2003-04-08  4:47     ` Kenichi Handa
  2003-04-08 11:39       ` Kenichi Handa
  2003-04-09  2:00       ` Richard Stallman
  0 siblings, 2 replies; 18+ messages in thread
From: Kenichi Handa @ 2003-04-08  4:47 UTC (permalink / raw)
  Cc: emacs-devel

In article <E192ist-0000pD-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>     |> I often get files that are basically plain ASCII but have a few of
>     |> Microsoft's special characters such as \222 and \225.  Emacs does not
>     |> seem to cope with these files very well.  One time recently it
>     |> suggested iso-8859-15 iso-8859-14 utf-8 mule-utf-16-be mule-utf-16-le.
>     |> 
>     |> Can you please work on more natural handling for these characters?

>     What would you consider natural?

> It should be treated as Latin-1, or perhaps as Latin-whatever based on
> one's usual coding system preferences.

Currently, \222, \223, \224 are registered in
latin-extra-code-table.  We can add \225 to it.  And, by
paying attention to latin-extra-code-table in
find-coding-systems-region-internal, we can make Emacs to
work as you wish.  I'll work on it soon.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-08  4:47     ` Kenichi Handa
@ 2003-04-08 11:39       ` Kenichi Handa
  2003-04-09  1:59         ` Richard Stallman
  2003-04-09  5:32         ` Eli Zaretskii
  2003-04-09  2:00       ` Richard Stallman
  1 sibling, 2 replies; 18+ messages in thread
From: Kenichi Handa @ 2003-04-08 11:39 UTC (permalink / raw)
  Cc: emacs-devel

In article <200304080447.NAA14014@etlken.m17n.org>, Kenichi Handa <handa@m17n.org> writes:
> Currently, \222, \223, \224 are registered in
> latin-extra-code-table.  We can add \225 to it.  And, by
> paying attention to latin-extra-code-table in
> find-coding-systems-region-internal, we can make Emacs to
> work as you wish.  I'll work on it soon.

I've just installed it.  Now iso-latin-1 can read/write a
file that contains \225.

> In article <E192ist-0000pD-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>>  It should be treated as Latin-1, or perhaps as Latin-whatever based on
>>  one's usual coding system preferences.

Currently, only iso-latin-1/8/9 has t for
accept-latin-extra-code flag.  So, only they can read/write
such a file.  I don't know why iso-latin-2/3/4/5 doesn't.
Perhaps, those who uses those coding systems won't encounter
such bytes (\222..\225) that much and they prefer reading
such a file in raw-text.  I'm not sure.  At least, I don't
rember any complaints about the current definition of those
coding systems.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-08  4:16   ` Eli Zaretskii
@ 2003-04-08 13:07     ` Benjamin Riefenstahl
  2003-04-09  5:33       ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Benjamin Riefenstahl @ 2003-04-08 13:07 UTC (permalink / raw)
  Cc: emacs-devel

Hi Eli,


"Eli Zaretskii" <eliz@elta.co.il> writes:
> I don't think this will help Richard, as he doesn't use Gnus.

I understand that.  I was just trying to add some experience about the
problem.

The generic algorithm (if you can call it that) seems usefull to me
and applicable to other email readers.  It also directly captures the
problem as I understand it, namely that some email messages are simply
mis-labeled in a fairly predictable way.

Of course for a generic solution other coding systems would have to be
considered, for those people that do not use latin-1 primarily.  I
guess in some cases the interactions may be more complicated and the
problems less predictable.

> > cp1252 is in lisp/international/code-pages.el in CVS Emacs. 
> 
> Doesn't that map those characters into mule-unicode-* charsets?

Is that a problem?  It is not a problem for me (quite the contrary),
but if it is a problem in general, an alternative implementation for
these coding systems should be created perhaps?


so long, benny

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-08 11:39       ` Kenichi Handa
@ 2003-04-09  1:59         ` Richard Stallman
  2003-04-09  5:32         ` Eli Zaretskii
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Stallman @ 2003-04-09  1:59 UTC (permalink / raw)
  Cc: emacs-devel

    Currently, only iso-latin-1/8/9 has t for
    accept-latin-extra-code flag.  So, only they can read/write
    such a file.  I don't know why iso-latin-2/3/4/5 doesn't.

Perhaps it is ok that way.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-08  4:47     ` Kenichi Handa
  2003-04-08 11:39       ` Kenichi Handa
@ 2003-04-09  2:00       ` Richard Stallman
  2003-04-09  2:40         ` Kenichi Handa
  1 sibling, 1 reply; 18+ messages in thread
From: Richard Stallman @ 2003-04-09  2:00 UTC (permalink / raw)
  Cc: emacs-devel

    Currently, \222, \223, \224 are registered in
    latin-extra-code-table.  We can add \225 to it.

I have seen codes \221 and \226 too.  I think you should add them also,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-09  2:00       ` Richard Stallman
@ 2003-04-09  2:40         ` Kenichi Handa
  2003-04-09 10:18           ` Alex Schroeder
  0 siblings, 1 reply; 18+ messages in thread
From: Kenichi Handa @ 2003-04-09  2:40 UTC (permalink / raw)
  Cc: emacs-devel

In article <E1934sn-0005Dn-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     Currently, \222, \223, \224 are registered in
>     latin-extra-code-table.  We can add \225 to it.

> I have seen codes \221 and \226 too.  I think you should add them also,

Done.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-08 11:39       ` Kenichi Handa
  2003-04-09  1:59         ` Richard Stallman
@ 2003-04-09  5:32         ` Eli Zaretskii
  1 sibling, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2003-04-09  5:32 UTC (permalink / raw)
  Cc: emacs-devel

> Date: Tue, 8 Apr 2003 20:39:31 +0900 (JST)
> From: Kenichi Handa <handa@m17n.org>
> 
> Currently, only iso-latin-1/8/9 has t for
> accept-latin-extra-code flag.  So, only they can read/write
> such a file.  I don't know why iso-latin-2/3/4/5 doesn't.

I think all 8-bit encodings that don't have \225 et al defined should
have accept-latin-extra-code set to t.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-08 13:07     ` Benjamin Riefenstahl
@ 2003-04-09  5:33       ` Eli Zaretskii
  0 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2003-04-09  5:33 UTC (permalink / raw)
  Cc: emacs-devel

> From: Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de>
> Date: 08 Apr 2003 15:07:10 +0200
> 
> > > cp1252 is in lisp/international/code-pages.el in CVS Emacs. 
> > 
> > Doesn't that map those characters into mule-unicode-* charsets?
> 
> Is that a problem?

It might be, since the CVS HEAD version does not yet unify Unicode
and non-Unicode character sets.

> if it is a problem in general, an alternative implementation for
> these coding systems should be created perhaps?

I don't think it's easy to do that, since those characters are not in
the target charsets of the cpNNN encodings.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-09  2:40         ` Kenichi Handa
@ 2003-04-09 10:18           ` Alex Schroeder
  2003-04-09 11:05             ` Kenichi Handa
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Schroeder @ 2003-04-09 10:18 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <E1934sn-0005Dn-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>>     Currently, \222, \223, \224 are registered in
>>     latin-extra-code-table.  We can add \225 to it.
>
>> I have seen codes \221 and \226 too.  I think you should add them also,
>
> Done.

In my .emacs, I use the following code.  Does that mean we should add
\227 also?

(standard-display-ascii ?\221 "`")
(standard-display-ascii ?\222 "'")
(standard-display-ascii ?\223 "\"")
(standard-display-ascii ?\224 "\"")
(standard-display-ascii ?\226 "--")
(standard-display-ascii ?\227 "--")

Alex.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-09 10:18           ` Alex Schroeder
@ 2003-04-09 11:05             ` Kenichi Handa
  2003-04-10  6:22               ` Richard Stallman
  0 siblings, 1 reply; 18+ messages in thread
From: Kenichi Handa @ 2003-04-09 11:05 UTC (permalink / raw)
  Cc: emacs-devel

In article <873ckskm8g.fsf@gnu.org>, Alex Schroeder <alex@gnu.org> writes:
> In my .emacs, I use the following code.  Does that mean we should add
> \227 also?

> (standard-display-ascii ?\221 "`")
> (standard-display-ascii ?\222 "'")
> (standard-display-ascii ?\223 "\"")
> (standard-display-ascii ?\224 "\"")
> (standard-display-ascii ?\226 "--")
> (standard-display-ascii ?\227 "--")

It's up to you.  If you register the more bytes in
latin-extra-code-table, the more cases Emacs detects files
as latin-1, not as binary data.  In general, only those
bytes that frequently appear in wrongly labeled latin-1
files should be registered.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: \225 and so on
  2003-04-09 11:05             ` Kenichi Handa
@ 2003-04-10  6:22               ` Richard Stallman
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Stallman @ 2003-04-10  6:22 UTC (permalink / raw)
  Cc: emacs-devel

    > In my .emacs, I use the following code.  Does that mean we should add
    > \227 also?

    It's up to you.  If you register the more bytes in
    latin-extra-code-table, the more cases Emacs detects files
    as latin-1, not as binary data.  In general, only those
    bytes that frequently appear in wrongly labeled latin-1
    files should be registered.

Real binary files will probably have codes 0200-0217 and 0220-0227, so
I don't think we have to worry.  I think we should add 227 to this
list.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-04-10  6:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-04 22:24 \225 and so on Richard Stallman
2003-04-07 15:37 ` Andreas Schwab
2003-04-08  2:30   ` Richard Stallman
2003-04-08  4:47     ` Kenichi Handa
2003-04-08 11:39       ` Kenichi Handa
2003-04-09  1:59         ` Richard Stallman
2003-04-09  5:32         ` Eli Zaretskii
2003-04-09  2:00       ` Richard Stallman
2003-04-09  2:40         ` Kenichi Handa
2003-04-09 10:18           ` Alex Schroeder
2003-04-09 11:05             ` Kenichi Handa
2003-04-10  6:22               ` Richard Stallman
2003-04-07 16:51 ` Benjamin Riefenstahl
2003-04-08  2:31   ` Richard Stallman
2003-04-08  4:16   ` Eli Zaretskii
2003-04-08 13:07     ` Benjamin Riefenstahl
2003-04-09  5:33       ` Eli Zaretskii
2003-04-08  4:13 ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).