unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Rüdiger Sonderfeld" <ruediger@c-plusplus.de>
To: emacs-devel@gnu.org
Cc: Lars Magne Ingebrigtsen <larsi@gnus.org>
Subject: Re: "Readability" feature in eww
Date: Mon, 03 Nov 2014 10:37:47 +0100	[thread overview]
Message-ID: <7820496.BS1QHyORAs@descartes> (raw)
In-Reply-To: <m3r3xlp13p.fsf@stories.gnus.org>

On Monday 03 November 2014 01:41:14 Lars Magne Ingebrigtsen wrote:
> This is a heuristic, of course, so it can be tweaked endlessly.  The
> current algorithm just gives most words a positive score, HTML markup a
> negative score, and words inside <a> tags a negative score.  For such a
> simple algorithm, it seems to give pretty good results.
> 
> But tweaking is necessary for it to be ... better.  If anybody has ideas
> for tweaks or better algorithms, please be my guest and have at it.

HTML5 has introduced tags such as <main> and <article>, which can be used to 
identify the important parts.  I'm not sure how widespread their use thus far 
is (I think org-mode supports it already if one sets the HTML5 export option).  
But at least adding them to the heuristic might help.

E.g., https://developer.mozilla.org/en-US/docs/Web/HTML/Element/main

Regards,
Rüdiger




  parent reply	other threads:[~2014-11-03  9:37 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-03  0:41 "Readability" feature in eww Lars Magne Ingebrigtsen
2014-11-03  2:30 ` Yoni Rabkin
2014-11-03 10:32   ` Colour selection in shr (was: "Readability" feature in eww) Lars Magne Ingebrigtsen
2014-11-03 13:30     ` Colour selection in shr Yoni Rabkin
2014-11-03 17:26       ` Lars Magne Ingebrigtsen
2014-11-03 17:36         ` Eli Zaretskii
2014-11-03 18:20           ` Achim Gratz
2014-11-03 19:00             ` Eli Zaretskii
2014-11-03 19:23               ` Lars Magne Ingebrigtsen
2014-11-03 19:36                 ` Eli Zaretskii
2014-11-03 19:42                   ` Lars Magne Ingebrigtsen
2014-11-03 19:54                     ` Eli Zaretskii
2014-11-03 20:34                       ` Lars Magne Ingebrigtsen
2014-11-03 20:43                         ` Eli Zaretskii
2014-11-03 19:55         ` Yoni Rabkin
2014-11-03 21:10           ` Lars Magne Ingebrigtsen
2014-11-03 21:21             ` Yoni Rabkin
2014-11-03 21:29               ` Lars Magne Ingebrigtsen
2014-11-04  2:10                 ` Yoni Rabkin
2014-11-04 15:40                   ` Stefan Monnier
2014-11-04 15:53                     ` Lars Magne Ingebrigtsen
2014-11-04 18:18                       ` Stefan Monnier
2014-11-05  0:15                         ` Lars Magne Ingebrigtsen
2014-11-05  0:53                         ` Yoni Rabkin
2014-11-05  2:03                           ` Stefan Monnier
2014-11-05 11:40                         ` Wolfgang Jenkner
2014-11-03 20:11         ` Stefan Monnier
2014-11-03 20:36           ` Lars Magne Ingebrigtsen
2014-11-13 20:00             ` Yoni Rabkin
2014-11-13 20:06               ` Lars Magne Ingebrigtsen
2014-11-03  9:37 ` Rüdiger Sonderfeld [this message]
2014-11-03 11:15   ` "Readability" feature in eww Rasmus
2014-11-04 15:51   ` Lars Magne Ingebrigtsen
2014-11-03 11:10 ` Rasmus
2014-11-03 11:22   ` Lars Magne Ingebrigtsen
2014-11-03 12:28     ` Rasmus
2014-11-03 12:11   ` Óscar Fuentes
2014-11-03 16:25 ` raman
2014-11-03 21:37 ` David Engster
2014-11-03 22:51   ` Lars Magne Ingebrigtsen
2014-11-04  7:44     ` David Engster
2014-11-04 15:49       ` Lars Magne Ingebrigtsen
2014-11-04 18:00         ` David Engster
2014-11-05  2:04           ` raman
2014-11-05  2:57         ` Yuri Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7820496.BS1QHyORAs@descartes \
    --to=ruediger@c-plusplus.de \
    --cc=emacs-devel@gnu.org \
    --cc=larsi@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).