From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Rustom Mody <rustompmody@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding
	UTF-8 encoded Lisp files
Date: Sun, 27 Sep 2015 14:50:48 +0530
Message-ID: <CAJ+TeoeCK7x2EyPVUdMvVZROcpsGx5Wv-2d-rz9EjnDckk6jDw@mail.gmail.com>
References: <20150921165211.20434.28114@vcs.savannah.gnu.org>
	<E1Ze4K3-0005KC-5U@vcs.savannah.gnu.org>
	<jwv6133mtuz.fsf-monnier+emacsdiffs@gnu.org>
	<83fv27mt7r.fsf@gnu.org> <83wpvfix7i.fsf@gnu.org>
	<jwva8sbbj7w.fsf-monnier+emacsdiffs@gnu.org>
	<83fv23hr0z.fsf@gnu.org> <jwv37y2hf6x.fsf-monnier+emacsdiffs@gnu.org>
	<5605CB6B.4000102@cs.ucla.edu> <83twqhhf0g.fsf@gnu.org>
	<5606AC48.7090801@cs.ucla.edu>
	<83zj09fbzp.fsf@gnu.org> <5606C140.6090309@cs.ucla.edu>
	<878u7trwlb.fsf@fencepost.gnu.org>
	<5606E995.2000102@cs.ucla.edu> <83si61ezxd.fsf@gnu.org>
	<560700E1.4010403@cs.ucla.edu>
	<83pp14fhj5.fsf@gnu.org> <87io6wqpf5.fsf@fencepost.gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1443345711 22989 80.91.229.3 (27 Sep 2015 09:21:51 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 27 Sep 2015 09:21:51 +0000 (UTC)
To: emacs-devel <emacs-devel@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 27 11:21:35 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1Zg893-00072x-Gi
	for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 11:21:21 +0200
Original-Received: from localhost ([::1]:56683 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1Zg893-0008Q2-1A
	for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 05:21:21 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39494)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rustompmody@gmail.com>) id 1Zg88s-0008Pk-09
	for emacs-devel@gnu.org; Sun, 27 Sep 2015 05:21:11 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rustompmody@gmail.com>) id 1Zg88r-0002zM-4N
	for emacs-devel@gnu.org; Sun, 27 Sep 2015 05:21:10 -0400
Original-Received: from mail-wi0-x229.google.com ([2a00:1450:400c:c05::229]:35076)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rustompmody@gmail.com>) id 1Zg88q-0002zI-TT
	for emacs-devel@gnu.org; Sun, 27 Sep 2015 05:21:09 -0400
Original-Received: by wicge5 with SMTP id ge5so68885456wic.0
	for <emacs-devel@gnu.org>; Sun, 27 Sep 2015 02:21:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:content-type:content-transfer-encoding;
	bh=ds9ENgzY5ZVk5lbmpfyqtUDVIctbkfArpLFUPmWgNSs=;
	b=Y8B7qG3vIeIdix23TyU3AAJpmmJFkNT4gy+bZ9qAvfMgVC0YTK8dXAqYFS336V8nQX
	Khl8Evxxs74tNwyphr8/7xgXBN2q/aMYnQT1CmdKBa8Aw8b7GcaQPXg4VrWNO1lFTqXr
	Kf1S8Ic/7GVy1mm8uNt6VmELAWOP7d6uljWgc3Q9+ej4LebsoUz4rvHN/RsFdI80lDd+
	3C6dC0zRoc1zS4LeMfj2oJ3LizW0mONkfTrxHO7TbND8ooniPksTpLAO6dbw64hNulCQ
	FAK9H5IPxYgF3evs1waZoow154U6wmFIUNeLKOThDVqq+1mFalZYmTcnLq+LmpooOPcZ
	ynQw==
X-Received: by 10.180.10.170 with SMTP id j10mr13044326wib.77.1443345667861;
	Sun, 27 Sep 2015 02:21:07 -0700 (PDT)
Original-Received: by 10.27.37.209 with HTTP; Sun, 27 Sep 2015 02:20:48 -0700 (PDT)
In-Reply-To: <87io6wqpf5.fsf@fencepost.gnu.org>
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2a00:1450:400c:c05::229
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:190397
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/190397>

On Sun, Sep 27, 2015 at 1:12 PM, David Kastrup <dak@gnu.org> wrote:
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > I've also looked at the *.po files in the latest releases of GNU Make,
> > Gawk, Texinfo, and Binutils, and I find that between 20% and 25% of
> > such files still use non-UTF-8 encodings.
>
> Which, btw, I consider crazy.
>


Ive been trying to understand this stuff and was looking at eg.
lisp/language/indian.el

In there I find that:
(defconst bengali-composable-pattern
  (let ((table
     '(("a" . "\u0981")        ; SIGN CANDRABINDU
       ("A" . "[\u0982-\u0983]")    ; SIGN ANUSVARA .. VISARGA
       ("V" . "[\u0985-\u0994\u09E0-\u09E1]") ; independent vowel
       ("C" . "[\u0995-\u09B9\u09DC-\u09DF\u09F1]") ; consonant
       ("B" . "[\u09AC\u09AF-\u09B0\u09F0]")        ; BA, YA, RA
       ("R" . "[\u09B0\u09F0]")        ; RA
       ("n" . "\u09BC")        ; NUKTA
       ("v" . "[\u09BE-\u09CC\u09D7\u09E2-\u09E3]") ; vowel sign
       ("H" . "\u09CD")        ; HALANT
       ("T" . "\u09CE")        ; KHANDA TA
       ("N" . "\u200C")        ; ZWNJ
       ("J" . "\u200D")        ; ZWJ
       ("X" . "[\u0980-\u09FF]"))))    ; all coverage
etc etc

And repeated with small variations for devanagari, tamil, telugu etc
It would sure help a native speaker if the comment and the ucs-hex
were interchanged with the actual chars used instead.

So then I checked why the file was showing as UTF-8 encoded.

Found this one non-ASCII line:

(set-language-info-alist
 "Kannada" '((charset unicode)
         (coding-system mule-utf-8)
         (coding-priority mule-utf-8)
         (input-method . "kannada-itrans")
         (sample-text . "Kannada (=E0=B2=95=E0=B2=A8=E0=B3=8D=E0=B2=A8=E0=
=B2=A1)    =E0=B2=A8=E0=B2=AE=E0=B2=B8=E0=B3=8D=E0=B2=95=E0=B2=BE=E0=B2=B0"=
)
         (documentation . "\
Kannada language and script is supported in this language
environment."))
 '("Indian"))

It strikes me that this sample text should be there for the other
languages also but it does not seem to be there

Just for context if I can understand whats going on, I would like to
help improve this/these docs:


(info "(elisp)input methods")

  | How to define input methods is not yet documented in this manual,
but here we
  | describe how to use them.