From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Rustom Mody Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files Date: Sun, 27 Sep 2015 14:50:48 +0530 Message-ID: References: <20150921165211.20434.28114@vcs.savannah.gnu.org> <83fv27mt7r.fsf@gnu.org> <83wpvfix7i.fsf@gnu.org> <83fv23hr0z.fsf@gnu.org> <5605CB6B.4000102@cs.ucla.edu> <83twqhhf0g.fsf@gnu.org> <5606AC48.7090801@cs.ucla.edu> <83zj09fbzp.fsf@gnu.org> <5606C140.6090309@cs.ucla.edu> <878u7trwlb.fsf@fencepost.gnu.org> <5606E995.2000102@cs.ucla.edu> <83si61ezxd.fsf@gnu.org> <560700E1.4010403@cs.ucla.edu> <83pp14fhj5.fsf@gnu.org> <87io6wqpf5.fsf@fencepost.gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1443345711 22989 80.91.229.3 (27 Sep 2015 09:21:51 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 27 Sep 2015 09:21:51 +0000 (UTC) To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 27 11:21:35 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Zg893-00072x-Gi for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 11:21:21 +0200 Original-Received: from localhost ([::1]:56683 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg893-0008Q2-1A for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 05:21:21 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39494) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg88s-0008Pk-09 for emacs-devel@gnu.org; Sun, 27 Sep 2015 05:21:11 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zg88r-0002zM-4N for emacs-devel@gnu.org; Sun, 27 Sep 2015 05:21:10 -0400 Original-Received: from mail-wi0-x229.google.com ([2a00:1450:400c:c05::229]:35076) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg88q-0002zI-TT for emacs-devel@gnu.org; Sun, 27 Sep 2015 05:21:09 -0400 Original-Received: by wicge5 with SMTP id ge5so68885456wic.0 for ; Sun, 27 Sep 2015 02:21:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=ds9ENgzY5ZVk5lbmpfyqtUDVIctbkfArpLFUPmWgNSs=; b=Y8B7qG3vIeIdix23TyU3AAJpmmJFkNT4gy+bZ9qAvfMgVC0YTK8dXAqYFS336V8nQX Khl8Evxxs74tNwyphr8/7xgXBN2q/aMYnQT1CmdKBa8Aw8b7GcaQPXg4VrWNO1lFTqXr Kf1S8Ic/7GVy1mm8uNt6VmELAWOP7d6uljWgc3Q9+ej4LebsoUz4rvHN/RsFdI80lDd+ 3C6dC0zRoc1zS4LeMfj2oJ3LizW0mONkfTrxHO7TbND8ooniPksTpLAO6dbw64hNulCQ FAK9H5IPxYgF3evs1waZoow154U6wmFIUNeLKOThDVqq+1mFalZYmTcnLq+LmpooOPcZ ynQw== X-Received: by 10.180.10.170 with SMTP id j10mr13044326wib.77.1443345667861; Sun, 27 Sep 2015 02:21:07 -0700 (PDT) Original-Received: by 10.27.37.209 with HTTP; Sun, 27 Sep 2015 02:20:48 -0700 (PDT) In-Reply-To: <87io6wqpf5.fsf@fencepost.gnu.org> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:400c:c05::229 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:190397 Archived-At: On Sun, Sep 27, 2015 at 1:12 PM, David Kastrup wrote: > > Eli Zaretskii writes: > > > I've also looked at the *.po files in the latest releases of GNU Make, > > Gawk, Texinfo, and Binutils, and I find that between 20% and 25% of > > such files still use non-UTF-8 encodings. > > Which, btw, I consider crazy. > Ive been trying to understand this stuff and was looking at eg. lisp/language/indian.el In there I find that: (defconst bengali-composable-pattern (let ((table '(("a" . "\u0981") ; SIGN CANDRABINDU ("A" . "[\u0982-\u0983]") ; SIGN ANUSVARA .. VISARGA ("V" . "[\u0985-\u0994\u09E0-\u09E1]") ; independent vowel ("C" . "[\u0995-\u09B9\u09DC-\u09DF\u09F1]") ; consonant ("B" . "[\u09AC\u09AF-\u09B0\u09F0]") ; BA, YA, RA ("R" . "[\u09B0\u09F0]") ; RA ("n" . "\u09BC") ; NUKTA ("v" . "[\u09BE-\u09CC\u09D7\u09E2-\u09E3]") ; vowel sign ("H" . "\u09CD") ; HALANT ("T" . "\u09CE") ; KHANDA TA ("N" . "\u200C") ; ZWNJ ("J" . "\u200D") ; ZWJ ("X" . "[\u0980-\u09FF]")))) ; all coverage etc etc And repeated with small variations for devanagari, tamil, telugu etc It would sure help a native speaker if the comment and the ucs-hex were interchanged with the actual chars used instead. So then I checked why the file was showing as UTF-8 encoded. Found this one non-ASCII line: (set-language-info-alist "Kannada" '((charset unicode) (coding-system mule-utf-8) (coding-priority mule-utf-8) (input-method . "kannada-itrans") (sample-text . "Kannada (=E0=B2=95=E0=B2=A8=E0=B3=8D=E0=B2=A8=E0= =B2=A1) =E0=B2=A8=E0=B2=AE=E0=B2=B8=E0=B3=8D=E0=B2=95=E0=B2=BE=E0=B2=B0"= ) (documentation . "\ Kannada language and script is supported in this language environment.")) '("Indian")) It strikes me that this sample text should be there for the other languages also but it does not seem to be there Just for context if I can understand whats going on, I would like to help improve this/these docs: (info "(elisp)input methods") | How to define input methods is not yet documented in this manual, but here we | describe how to use them.