From mboxrd@z Thu Jan 1 00:00:00 1970 Path: quimby.gnus.org!not-for-mail From: jsbien@mimuw.edu.pl (Janusz S. =?iso-8859-2?q?Bie=F1?=) Newsgroups: gmane.emacs.devel Subject: Re: charsets and character sets (was: Re: 21.1: list-charset-chars) Date: 19 Feb 2002 19:42:36 +0100 Message-ID: References: <3C7124BB.14633.1B9F2BD@localhost> <6503-Mon18Feb2002213318+0200-eliz@is.elta.co.il> <87k7t9aivq.fsf_-_@mimuw.edu.pl> NNTP-Posting-Host: quimby2.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: quoted-printable X-Trace: quimby2.netfonds.no 1014141449 29947 195.204.10.66 (19 Feb 2002 17:57:29 GMT) X-Complaints-To: usenet@quimby2.netfonds.no NNTP-Posting-Date: 19 Feb 2002 17:57:29 GMT Cc: Ulrich.Windl@rz.uni-regensburg.de, bug-gnu-emacs@gnu.org, emacs-devel@gnu.org Bcc: Reply-to: jsbien@mimuw.edu.pl Original-Received: from fencepost.gnu.org ([199.232.76.164]) by quimby2.netfonds.no with esmtp (Exim 3.12 #1 (Debian)) id 16dEW8-0007mv-00 for ; Tue, 19 Feb 2002 18:57:28 +0100 Original-Received: from localhost ([127.0.0.1] helo=fencepost.gnu.org) by fencepost.gnu.org with esmtp (Exim 3.33 #1 (Debian)) id 16dEVs-0001BY-00; Tue, 19 Feb 2002 12:57:12 -0500 Original-Received: from duch.mimuw.edu.pl ([193.0.96.2]) by fencepost.gnu.org with esmtp (Exim 3.33 #1 (Debian)) id 16dETP-00012q-00 for ; Tue, 19 Feb 2002 12:54:39 -0500 Original-Received: (qmail 22858 invoked by uid 83); 19 Feb 2002 17:54:37 -0000 Original-Received: from ws4040a.mimuw.edu.pl (HELO grafem.mimuw.edu.pl) (root@10.1.2.52) by duch.mimuw.edu.pl with SMTP; 19 Feb 2002 17:54:32 -0000 Original-Received: by ws4040a.mimuw.edu.pl via sendmail from stdin id (Debian Smail3.2.0.114) for Ulrich.Windl@rz.uni-regensburg.de; Tue, 19 Feb 2002 18:54:28 +0100 (CET) Original-To: Eli Zaretskii Original-Lines: 151 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 X-Virus-Scanned: by AMaViS perl-11 Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.5 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: quimby.gnus.org gmane.emacs.devel:1328 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:1328 I quote my letter in full as I intended to send it also to emacs-devel but forgot to add it to the adressee list. On 19 Feb 2002 jsbien@mimuw.edu.pl (Janusz S. Bie=F1) wrote: > On Mon, 18 Feb 2002 "Eli Zaretskii" wrote: >=20 > > > From: "Ulrich Windl" > > > Date: Mon, 18 Feb 2002 15:58:51 +0100 > > >=20 > > > I found out that the result of list-charset-chars (e.g. for latin15) = is=20 > > > contrary to the documentation: Only characters > 127 are displayed, b= ut=20 > > > the name and documentation creates the impression that all characters= =20 > > > are listed. > >=20 > > What led you to believe that ASCII characters with codes below 128 > > belong to the other charsets? Whatever gave you that impression is > > the place where the documentation should be improved, because ASCII > > characters are a separate charset in Emacs. >=20 > On Tue, 19 Feb 2002 "Ulrich Windl" w= rote: >=20 > [...] >=20 > > "list charset chars": What else than listing the characters in the=20 > > charset could be expected? > >=20 > > Regards, > > Ulrich >=20 > The Emacs documentation fails to make clear distinction between Emacs > charsets and character sets in the sense of ISO and related > standards.=20 >=20 > Charset named e.g. latin15 *is not* ISO/IEC Latin 15 character set, it > is just its right-hand part, registered as such in ISO International > Register (available online) as ISO-IR 203. However, iso-8859-15 > *coding system* is equivalent to ISO/IEC Latin 15, cf. the output of > `describe-coding-system': >=20 > -------------------------------------------------------------------------= ----- > 0 -- iso-8859-15 (alias of iso-latin-9) > ISO 2022 based 8-bit encoding for Latin-9 (MIME:ISO-8859-15) > Type: 2 (variant of ISO-2022) > Initial designations: > G0 -- ascii:ASCII (ISO646 IRV) > G1 -- latin-iso8859-15:Right-Hand Part of Latin Alphabet 9 (ISO/IEC 885= 9-15): ISO-IR-203 > -------------------------------------------------------------------------= ---- >=20 > Long, long ago I proposed to change the name of charsets > appropriately, but my suggestion was rejected and I didn't pressed the > point. I think there is now the right time to come back to the > problem, as the correct terminology is important for the development > work. >=20 > My current proposal is: >=20 > - make explicit in the manuals and documentation strings that > charsets are Emacs specific technical terms, >=20 > - add `describe-charset' analogical to `describe-coding-system' to > minimize the chance of user confusion, >=20 > - on the first convenient occasion rename `latin-15' and related > charsets to something more adequate, e.g. `latin-no9-rp' (15 is the > number of the ISO/IEC 8859 standard part which containes the > definiton of Latin alphabet number 9 while `latin-15' suggests Latin > alphabet number 15; `rp' is to stands for `right-hand part of', > which is ISO/IEC technical term). >=20 > Best regards >=20 > Janusz >=20 > --=20 > ,=20=20=20 > dr hab. Janusz S. Bien, prof. UW > Prof. Janusz S. Bien, Warsaw Uniwersity > http://www.orient.uw.edu.pl/~jsbien/ > --------------------------------------------------------------------- > Na tym koncie czytam i wysylam poczte i wiadomosci offline. > On this account I read/post mail/news offline. On Tue, 19 Feb 2002 "Eli Zaretskii" wrote: [...] > > I don't have a v21 Emacs at hand in the moment, but a ISO 8859 15=20 > > charset is a superset of US-ASCII >=20 > Not in Emacs, it isn't.=20=20 Because charset *is not* character set. > The full name of latin-iso8859-15 in Emacs > is this: >=20 > "Right-Hand Part of Latin Alphabet 9 (ISO/IEC 8859-15): ISO-IR-203." >=20 > See mule-conf.el for more information. The ``right-hand part'' thing > means that characters below 128 are not included. In other words, the charset name is not adequate. > What I'm asking is where would you suggest to explain this > fundamental fact so that it becomes clear. For example, after ------------------------------------------------------------------------- International Character Set Support *********************************** Emacs supports a wide variety of international character sets, including European variants of the Latin alphabet, as well as Chinese, Cyrillic, Devanagari (Hindi and Marathi), Ethiopic, Greek, Hebrew, IPA, Japanese, Korean, Lao, Thai, Tibetan, and Vietnamese scripts. These features have been merged from the modified version of Emacs known as MULE (for "MULti-lingual Enhancement to GNU Emacs") ------------------------------------------------------------------------ add To implement the character set support Emacs uses the notion of charset. For historical reasons most 8-bit character codes are considered to consist of two separate 7-bit charsets, namely ASCII and so called right-hand part of the appropriate character code, for example... Please note also that characters belonging to different charsets are always different, even if they look the same: the letter o with acute accent from Latin alphabet no 1 (charset `latin-no1-rp', intended to be used e.g. for French) is different from the letter o with acute accent from Latin alphabet no 2 (charset `latin-no2-rp', intended to be used e.g. for Polish). Best regards Janusz --=20 ,=20=20=20 dr hab. Janusz S. Bien, prof. UW Prof. Janusz S. Bien, Warsaw Uniwersity http://www.orient.uw.edu.pl/~jsbien/ --------------------------------------------------------------------- Na tym koncie czytam i wysylam poczte i wiadomosci offline. On this account I read/post mail/news offline. _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://mail.gnu.org/mailman/listinfo/emacs-devel