From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lars Magne Ingebrigtsen Newsgroups: gmane.emacs.devel Subject: "Readability" feature in eww Date: Mon, 03 Nov 2014 01:41:14 +0100 Organization: Programmerer Ingebrigtsen Message-ID: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1414975319 9910 80.91.229.3 (3 Nov 2014 00:41:59 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 3 Nov 2014 00:41:59 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Nov 03 01:41:53 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Xl5iT-0001GD-KV for ged-emacs-devel@m.gmane.org; Mon, 03 Nov 2014 01:41:53 +0100 Original-Received: from localhost ([::1]:59787 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xl5iT-0007HL-2P for ged-emacs-devel@m.gmane.org; Sun, 02 Nov 2014 19:41:53 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33992) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xl5i9-0007GM-3V for emacs-devel@gnu.org; Sun, 02 Nov 2014 19:41:39 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xl5i2-0005Ix-SX for emacs-devel@gnu.org; Sun, 02 Nov 2014 19:41:33 -0500 Original-Received: from plane.gmane.org ([80.91.229.3]:41733) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xl5i2-0005Ip-LQ for emacs-devel@gnu.org; Sun, 02 Nov 2014 19:41:26 -0500 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Xl5hz-00019g-Pn for emacs-devel@gnu.org; Mon, 03 Nov 2014 01:41:23 +0100 Original-Received: from cm-84.215.51.58.getinternet.no ([84.215.51.58]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 03 Nov 2014 01:41:23 +0100 Original-Received: from larsi by cm-84.215.51.58.getinternet.no with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 03 Nov 2014 01:41:23 +0100 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Original-Lines: 21 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: cm-84.215.51.58.getinternet.no Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAgMAAAAqbBEUAAAADFBMVEUyHyChoIZaUksVBwj1 WBVTAAACQElEQVQokQ3RQUhTcRwH8J8vBDeiB5HhoYNoz9i/CEk87PRCMNDTUt9r//8glzHhPcFD p+iQUy86Rjvkf7gKmjNle0VSUESEEywPJXXq0LWYyDpsDIwcr337336fw4/f98uPNrM3Hst0KDFj vF6k4u9f6HTbHHsveJ5EH/AnRYUZP/OZxEWgMsrcperzQxIXgO9Xg0+bG6Vxchggkv62uZGYpMQq 0Bz3uflmeJKGuxRiMObCI09omNWwN3El2B92f5JnxP65m27q65HTQ0k2fdk+k5xH/UeUpGe5xdUB Ux2IkG0kuLUSGQCOJyleyXrimeDAF4OSg1mZi4ZvlVGrELU7lnAXAdQ+0U5YWqJDzdg9IBqUVqRb x7e/DxUSlh07rAPV21VypuVaLK+jmRuTtPDOOxUpmvBZr6SxoanrNz0VYF/fJ3FP8KliLxDa8qmw UOJ3MypNl94i6ljj780acBagDs1S2AWkAoV5tGqqnRUFtjQSZR+2gI9oUre2LFgmg5aXPqYT5PUw XUcrknpFbx2LszTQige3KWdZ3CubOB4rPSBDPuJMB44mtCT1DemcldUnrDxRX/qkE1TotF2ijUKu 1A/4lohqlCtkQ/dVHZmwdyh7uiswm4f/kk+skvEiNlgoopXmtk2B+XzIW0fjjshZRD1aQaHe8M5p lNeWk2wdzYCVlGQY1zKsiHq7rXEVJ97vLcJ34kFB3TLe8DImUm0LNh3Icl1Lm2gERgVVPNOnSyZa xrJHgTmgriqgmcr/B4qtM+CMDBsYAAAAAElFTkSuQmCC Mail-Copies-To: never X-Now-Playing: David Bowie's _Tonight_: "Tonight" User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/25.0.50 (gnu/linux) Cancel-Lock: sha1:XamiOdsrBSb6QFXKprvmpAwnkpg= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:176245 Archived-At: It occurred to me that it would probably make sense for eww to have a "readability" feature a couple of hours ago, so I implemented a take on it and committed it. The `R' command in eww will try to find the parts of the current page where most of the text is, and only display that part. This makes all the menus and stuff disappear, and you don't have to page forever to find the actual article on newspaper sites. This is a heuristic, of course, so it can be tweaked endlessly. The current algorithm just gives most words a positive score, HTML markup a negative score, and words inside tags a negative score. For such a simple algorithm, it seems to give pretty good results. But tweaking is necessary for it to be ... better. If anybody has ideas for tweaks or better algorithms, please be my guest and have at it. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no