From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Engster Newsgroups: gmane.emacs.devel Subject: Re: "Readability" feature in eww Date: Tue, 04 Nov 2014 08:44:10 +0100 Message-ID: <87ioivbeb9.fsf@engster.org> References: <87mw88artr.fsf@engster.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1415087080 14329 80.91.229.3 (4 Nov 2014 07:44:40 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 4 Nov 2014 07:44:40 +0000 (UTC) Cc: emacs-devel@gnu.org To: Lars Magne Ingebrigtsen Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Nov 04 08:44:35 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XlYn3-00018M-S2 for ged-emacs-devel@m.gmane.org; Tue, 04 Nov 2014 08:44:33 +0100 Original-Received: from localhost ([::1]:38996 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XlYn3-0003xX-HZ for ged-emacs-devel@m.gmane.org; Tue, 04 Nov 2014 02:44:33 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46440) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XlYmu-0003xC-Sa for emacs-devel@gnu.org; Tue, 04 Nov 2014 02:44:30 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XlYmo-0006Q1-Pg for emacs-devel@gnu.org; Tue, 04 Nov 2014 02:44:24 -0500 Original-Received: from randomsample.de ([5.45.97.173]:50700) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XlYmo-0006Pv-GN for emacs-devel@gnu.org; Tue, 04 Nov 2014 02:44:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=randomsample.de; s=a; h=Content-Type:MIME-Version:Message-ID:Date:References:In-Reply-To:Subject:Cc:To:From; bh=nb/E31xJ5dIA6jsyJtNxJATkOhzkYNGaQCsJKhGxe1w=; b=f0NOJqbWGSRiR9ZlwMo3NNAMYZs+yCQj1OOG7oJuQfLAnIl8aivwhigtoL9qE/sWhAfR072108ihutLyb7Fwkp0SIv3QC6lCvz+S9BcosDC37zRNUJBMnSJ7yVAWIOzo; Original-Received: from dslc-082-083-057-107.pools.arcor-ip.net ([82.83.57.107] helo=spaten) by randomsample.de with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from ) id 1XlYmm-0003VW-BU; Tue, 04 Nov 2014 08:44:16 +0100 In-Reply-To: (Lars Magne Ingebrigtsen's message of "Mon, 03 Nov 2014 23:51:26 +0100") User-Agent: Gnus/5.13001 (Ma Gnus v0.10) Emacs/24.3.91 (gnu/linux) Mail-Followup-To: Lars Magne Ingebrigtsen , emacs-devel@gnu.org X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 5.45.97.173 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:176335 Archived-At: Lars Magne Ingebrigtsen writes: > David Engster writes: > >> It'd be great if you could make this extraction method flexible, similar >> to the 'washing' feature from Gnus, so that users could hook their own >> methods for extracting the main content into eww. The user would provide >> an extraction function and the corresponding regexp that matches against >> the URL, or optionally also against the source to match things like the >> 'generator' meta-tag. > > Well, the best is if we find a solution that works out of the box, > because then all users can just, like, use it. >"? > > The current heuristics are probably too simple, but I've been going > through bunches of pages, and it already seems to basically do what > you'd expect it to do, which surprises me, sort of. Well, users might want to even automatically load another version of the page. This is actually quite common in emacs-w3m's shimbun library. For instance, many sites break their articles among several pages to generate more clicks, but also provide a printable version where everything is on one, which the shimbun then automatically loads. -David