From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alex Ott Newsgroups: gmane.emacs.devel Subject: Re: Language identification Date: Fri, 28 Aug 2009 08:45:05 +0200 Organization: Alex Ott's Consulting Message-ID: <87r5uwl9n2.fsf@alexott.dev.webwasher.com> References: <87skfczqc8.fsf@mail.jurta.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1251441903 22709 80.91.229.12 (28 Aug 2009 06:45:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 28 Aug 2009 06:45:03 +0000 (UTC) Cc: joakim@verona.se, Emacs Development To: Juri Linkov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Aug 28 08:44:55 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MgvCg-0002UA-SL for ged-emacs-devel@m.gmane.org; Fri, 28 Aug 2009 08:44:55 +0200 Original-Received: from localhost ([127.0.0.1]:54193 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MgvCg-0004MV-Ab for ged-emacs-devel@m.gmane.org; Fri, 28 Aug 2009 02:44:54 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MgvCa-0004MQ-Sx for emacs-devel@gnu.org; Fri, 28 Aug 2009 02:44:48 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MgvCW-0004MD-H7 for emacs-devel@gnu.org; Fri, 28 Aug 2009 02:44:48 -0400 Original-Received: from [199.232.76.173] (port=40837 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MgvCW-0004MA-B2 for emacs-devel@gnu.org; Fri, 28 Aug 2009 02:44:44 -0400 Original-Received: from mx20.gnu.org ([199.232.41.8]:31230) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MgvCV-0007eQ-PB for emacs-devel@gnu.org; Fri, 28 Aug 2009 02:44:43 -0400 Original-Received: from mail-bw0-f222.google.com ([209.85.218.222]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MgvCU-0003Z3-Df for emacs-devel@gnu.org; Fri, 28 Aug 2009 02:44:42 -0400 Original-Received: by bwz22 with SMTP id 22so1670155bwz.42 for ; Thu, 27 Aug 2009 23:44:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject :organization:references:date:in-reply-to:message-id:user-agent :mime-version:content-type; bh=YTsmadEnPSKN9ZSCXKsoUz9HlGmlB28rzMlK2b41EeQ=; b=EmKdAPVaZvkFahE3Lp6ElpOdLSikENuNvjK5juEk7KKI9l51PKnI07bBGi6THdIUkH //aVaUufn8R2qSQcYdKuqlNmK3DaNZ37/MzMiCwg5SMKnfGcC924JKFGElOvkJ1PlkSr rwuhIDrMkGTYNUw21RP6yTwYWhSRRLk/n6skQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:organization:references:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=uQ3KXolueSrEO7GkXPOIMO5fefqJmUyJgSpmMH+8ItQLuE/Kz8Msyrrw0L5NCzb/9n fdtae3Q5f/PpsC3/nAysgwsFHn+S9u109D3+YP1CUtildIIdF1abYtf9pVOFiJPlWg/8 cfOVZOSoh3bvXd+5xwPQ7uffHznBQY2LDLhfw= Original-Received: by 10.204.156.28 with SMTP id u28mr604899bkw.74.1251441880291; Thu, 27 Aug 2009 23:44:40 -0700 (PDT) Original-Received: from alexott.dev.webwasher.com (pdbfw01.securecomputing.com [80.66.20.180]) by mx.google.com with ESMTPS id z10sm1042790fka.5.2009.08.27.23.44.38 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 27 Aug 2009 23:44:38 -0700 (PDT) In-Reply-To: <87skfczqc8.fsf@mail.jurta.org> (Juri Linkov's message of "Fri, 28 Aug 2009 03:27:35 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) X-Detected-Operating-System: by mx20.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:114722 Archived-At: Hello N-Gram algorithms is could be used to identify languages - it simpler than bayes, and requires smaller database Juri Linkov at "Fri, 28 Aug 2009 03:27:35 +0300" wrote: >> I often wish that files would open in Emacs with correct mode >> more often when there is no file extension. JL> In `auto-mode-alist' you can see that with the exception of JL> `archive-mode', `doc-view-mode' and `image-mode', all remaining JL> modes are programming text modes. It would be more useful JL> to identify file types for these modes that libmagic can't do. JL> Do you know a library that identifies programming languages? JL> Such a library might be implemented using a Bayesian classifier JL> trained on a sufficiently large corpus of different programming JL> languages. -- With best wishes, Alex Ott, MBA http://alexott.blogspot.com/ http://xtalk.msk.su/~ott/ http://alexott-ru.blogspot.com/