From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files Date: Sun, 27 Sep 2015 13:13:01 +0300 Message-ID: <83bncof9w2.fsf@gnu.org> References: <20150921165211.20434.28114@vcs.savannah.gnu.org> <83fv27mt7r.fsf@gnu.org> <83wpvfix7i.fsf@gnu.org> <83fv23hr0z.fsf@gnu.org> <5605CB6B.4000102@cs.ucla.edu> <83twqhhf0g.fsf@gnu.org> <5606AC48.7090801@cs.ucla.edu> <83zj09fbzp.fsf@gnu.org> <5606C140.6090309@cs.ucla.edu> <878u7trwlb.fsf@fencepost.gnu.org> <5606E995.2000102@cs.ucla.edu> <83si61ezxd.fsf@gnu.org> <560700E1.4010403@cs.ucla.edu> <83pp14fhj5.fsf@gnu.org> <87io6wqpf5.fsf@fencepost.gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1443348806 1497 80.91.229.3 (27 Sep 2015 10:13:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 27 Sep 2015 10:13:26 +0000 (UTC) Cc: emacs-devel@gnu.org To: Rustom Mody Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 27 12:13:17 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Zg8x3-00044b-N4 for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 12:13:01 +0200 Original-Received: from localhost ([::1]:56797 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg8x3-00022y-Ez for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 06:13:01 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49019) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg8wz-00022A-Gd for emacs-devel@gnu.org; Sun, 27 Sep 2015 06:12:58 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zg8ww-0001rG-A2 for emacs-devel@gnu.org; Sun, 27 Sep 2015 06:12:57 -0400 Original-Received: from mtaout26.012.net.il ([80.179.55.182]:41447) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg8wv-0001qI-Tf for emacs-devel@gnu.org; Sun, 27 Sep 2015 06:12:54 -0400 Original-Received: from conversion-daemon.mtaout26.012.net.il by mtaout26.012.net.il (HyperSendmail v2007.08) id <0NVB00I00YL6WJ00@mtaout26.012.net.il> for emacs-devel@gnu.org; Sun, 27 Sep 2015 13:15:36 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([84.94.185.246]) by mtaout26.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NVB00JJ9Z60GF10@mtaout26.012.net.il>; Sun, 27 Sep 2015 13:15:36 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.182 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:190404 Archived-At: > From: Rustom Mody > Date: Sun, 27 Sep 2015 14:50:48 +0530 >=20 > Ive been trying to understand this stuff and was looking at eg. > lisp/language/indian.el >=20 > In there I find that: > (defconst bengali-composable-pattern > (let ((table > '(("a" . "\u0981") ; SIGN CANDRABINDU > ("A" . "[\u0982-\u0983]") ; SIGN ANUSVARA .. VISARGA > ("V" . "[\u0985-\u0994\u09E0-\u09E1]") ; independent vowel > ("C" . "[\u0995-\u09B9\u09DC-\u09DF\u09F1]") ; consonant > ("B" . "[\u09AC\u09AF-\u09B0\u09F0]") ; BA, YA, RA > ("R" . "[\u09B0\u09F0]") ; RA > ("n" . "\u09BC") ; NUKTA > ("v" . "[\u09BE-\u09CC\u09D7\u09E2-\u09E3]") ; vowel sign > ("H" . "\u09CD") ; HALANT > ("T" . "\u09CE") ; KHANDA TA > ("N" . "\u200C") ; ZWNJ > ("J" . "\u200D") ; ZWJ > ("X" . "[\u0980-\u09FF]")))) ; all coverage > etc etc This is unrelated: it specifies which character sequences should be composed and displayed as a single grapheme cluster. > So then I checked why the file was showing as UTF-8 encoded. >=20 > Found this one non-ASCII line: >=20 > (set-language-info-alist > "Kannada" '((charset unicode) > (coding-system mule-utf-8) > (coding-priority mule-utf-8) > (input-method . "kannada-itrans") > (sample-text . "Kannada (=E0=B2=95=E0=B2=A8=E0=B3=8D=E0= =B2=A8=E0=B2=A1) =E0=B2=A8=E0=B2=AE=E0=B2=B8=E0=B3=8D=E0=B2=95= =E0=B2=BE=E0=B2=B0") > (documentation . "\ > Kannada language and script is supported in this language > environment.")) > '("Indian")) >=20 > It strikes me that this sample text should be there for the other > languages also but it does not seem to be there You cannot base encoding decisions on the language or script alone, unless that language exists in a single locale. Many languages and scripts serve several different locales with several different defaul= t encodings. > Just for context if I can understand whats going on, I would like t= o > help improve this/these docs: >=20 >=20 > (info "(elisp)input methods") >=20 > | How to define input methods is not yet documented in this manua= l, > but here we > | describe how to use them. Again unrelated. Input methods are about typing characters not directly supported by the user's keyboard.