emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* org-board -- bookmarking and archival
@ 2016-09-03 11:10 Charles A. Roelli
  2016-09-14  7:10 ` Alan Schmitt
  2016-09-15 17:07 ` Adam Porter
  0 siblings, 2 replies; 5+ messages in thread
From: Charles A. Roelli @ 2016-09-03 11:10 UTC (permalink / raw)
  To: emacs-orgmode

Hi,

I've written a tool called "org-board" to keep track of web bookmarks
in an org mode file.  The tool can archive bookmarks instantly using
`wget', saving them in an entry's dedicated directory (made via
`org-attach').  Pretty much any option for `wget' can be used by
org-board, and the whole system is quite flexible.  Org mode already
seems like a natural place for bookmarks (with tagging, TODO entries,
linking, etc.) and org-board is a convenient extension to help you
keep all your data on your own machine.

There is a lot of room for improvement (check out the TODO.org file
for some of them).  I'm relatively new to elisp so I'd also appreciate
any style or general programming tips.

The repository is here: https://github.com/scallywag/org-board

Cheers,
Charles

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: org-board -- bookmarking and archival
  2016-09-03 11:10 org-board -- bookmarking and archival Charles A. Roelli
@ 2016-09-14  7:10 ` Alan Schmitt
  2016-09-14 18:33   ` Charles A. Roelli
  2016-09-15 17:07 ` Adam Porter
  1 sibling, 1 reply; 5+ messages in thread
From: Alan Schmitt @ 2016-09-14  7:10 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 936 bytes --]

Hello Charles,

On 2016-09-03 13:10, charles@aurox.ch (Charles A. Roelli) writes:

> I've written a tool called "org-board" to keep track of web bookmarks
> in an org mode file.  The tool can archive bookmarks instantly using
> `wget', saving them in an entry's dedicated directory (made via
> `org-attach').  Pretty much any option for `wget' can be used by
> org-board, and the whole system is quite flexible.  Org mode already
> seems like a natural place for bookmarks (with tagging, TODO entries,
> linking, etc.) and org-board is a convenient extension to help you
> keep all your data on your own machine.

This looks great, and I can imagine combining this with a capture
template and with http://chadok.info/firefox-org-capture/

Do you plan to release this as a package?

Thanks,

Alan

-- 
OpenPGP Key ID : 040D0A3B4ED2E5C7
Monthly Athmospheric CO₂, Mauna Loa Obs. 2016-08: 402.25, 2015-08: 398.93

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 454 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: org-board -- bookmarking and archival
  2016-09-14  7:10 ` Alan Schmitt
@ 2016-09-14 18:33   ` Charles A. Roelli
  0 siblings, 0 replies; 5+ messages in thread
From: Charles A. Roelli @ 2016-09-14 18:33 UTC (permalink / raw)
  To: Alan Schmitt; +Cc: emacs-orgmode

Hi Alan,

It's a neat idea to combine this with capturing and a browser extension.
I will definitely look into that.

There are still a few features I'd like to add in the next weeks and
then I'll think about releasing.  (see the file TODO.org for more on
this)

Cheers,
Charles

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: org-board -- bookmarking and archival
  2016-09-03 11:10 org-board -- bookmarking and archival Charles A. Roelli
  2016-09-14  7:10 ` Alan Schmitt
@ 2016-09-15 17:07 ` Adam Porter
  2016-09-16 18:40   ` Charles A. Roelli
  1 sibling, 1 reply; 5+ messages in thread
From: Adam Porter @ 2016-09-15 17:07 UTC (permalink / raw)
  To: emacs-orgmode

Hi Charles,

Thanks for sharing that, I will check it out.  As was mentioned, it
seems ripe for integrating with browser capture.  On that note, have you
seen org-protocol-capture-html?  For articles that are primarily text,
I've been capturing articles directly in Org format, but your package
sounds good for capturing pages as-is.

By the way, you might want to consider integrating something like
Readability or the Python package python-readability (aka
readability-lxml) for reducing web pages to the primary content.  It's
worked out well in org-protocol-capture-html.

By the way, here's some code I've been using to read and/or capture web
pages from URLs on the clipboard:

#+BEGIN_SRC elisp
(defun url-to-org-with-readability (url)
  "Get page content of URL with python-readability, convert to
Org with Pandoc, and display in buffer."

  (let (title content new-buffer)

    (with-temp-buffer
      (unless (= 0 (call-process "python" nil '(t t) nil "-m" "readability.readability" "-u" url))
        (error "Python readability-lxml script failed: %s" (buffer-string)))

      ;; Get title
      (goto-char (point-min))
      (setq title (buffer-substring-no-properties (search-forward "Title:") (line-end-position)))

      (unless (= 0 (call-process-region (point-min) (point-max) "pandoc" t t nil "--no-wrap" "-f" "html" "-t" "org"))
        (error "Pandoc failed."))
      (setq content (buffer-substring (point-min) (buffer-end 1))))

    ;; Make new buffer
    (setq new-buffer (generate-new-buffer title))
    (with-current-buffer new-buffer
      (insert (concat "* [[" url "][" title "]]\n\n"))
      (insert content)
      (org-mode)
      (goto-char (point-min))
      (org-cycle)
      (switch-to-buffer new-buffer))))
(defun read-url-with-org ()
  "Call `url-to-org-with-readability' on URL in kill ring."
  (interactive)
  (url-to-org-with-readability (first kill-ring)))

(defun org-capture-web-page-with-readability (&optional url)
  "Return string containing entire capture to be inserted in org-capture template."
  (let ((url (or url (first kill-ring)))
        ;; From org-insert-time-stamp
        (timestamp (format-time-string (concat "[" (substring (cdr org-time-stamp-formats) 1 -1) "]")))
        title title-linked content)

    (with-temp-buffer
      (unless (= 0 (call-process "python" nil '(t t) nil "-m" "readability.readability" "-u" url))
        (error "Python readability-lxml script failed: %s" (buffer-string)))

      ;; Get title
      (goto-char (point-min))
      (setq title (buffer-substring-no-properties (search-forward "Title:") (line-end-position)))
      (setq title-linked (concat "[[" url "][" title "]]"))

      (unless (= 0 (call-process-region (point-min) (point-max) "pandoc" t t nil "--no-wrap" "-f" "html" "-t" "org"))
        (error "Pandoc failed."))

      ;; Demote page headings in capture buffer to below the
      ;; top-level Org heading and "Article" 2nd-level heading
      (save-excursion
        (goto-char (point-min))
        (while (re-search-forward (rx bol (1+ "*") (1+ space)) nil t)
          (beginning-of-line)
          (insert "**")
          (end-of-line)))

      (goto-char (point-min))
      (goto-line 2)
      (setq content (s-trim (buffer-substring (point) (buffer-end 1))))

      ;; Return capture for insertion
      (concat title-linked " :website:\n\n" timestamp "\n\n** Article\n\n" content))))

;; org-capture template
("wr" "Capture Web site with python-readability" entry
 (file "~/org/articles.org")
 "* %(org-capture-web-page-with-readability)")
#+END_SRC

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: org-board -- bookmarking and archival
  2016-09-15 17:07 ` Adam Porter
@ 2016-09-16 18:40   ` Charles A. Roelli
  0 siblings, 0 replies; 5+ messages in thread
From: Charles A. Roelli @ 2016-09-16 18:40 UTC (permalink / raw)
  To: Adam Porter; +Cc: emacs-orgmode

Adam Porter <adam@alphapapa.net> writes:

> Hi Charles,
>
> Thanks for sharing that, I will check it out.  As was mentioned, it
> seems ripe for integrating with browser capture.  On that note, have you
> seen org-protocol-capture-html?  For articles that are primarily text,
> I've been capturing articles directly in Org format, but your package
> sounds good for capturing pages as-is.

Thanks for letting me know about org-protocol-capture-html, I had not
seen it.  Capturing text directly to an Org file sounds more
manageable.

> By the way, you might want to consider integrating something like
> Readability or the Python package python-readability (aka
> readability-lxml) for reducing web pages to the primary content.  It's
> worked out well in org-protocol-capture-html.

Great idea, maybe as part of a post-processing hook?  Then we could save
the HTML as a backup (for later web browsing) and then include the
primary text in the Org file for easy viewing straight from Emacs.
Seems your package is already well-suited to that part. :)

I also wanted to keep the design relatively abstract so that things like
this could be added later.  One other feature idea that could be
implemented as a post-processing hook is responding to "downloadable"
links (like links to YouTube videos) by running a backend program (in
this case, "youtube-dl") to go take care of fetching the apprapriate
content.

> By the way, here's some code I've been using to read and/or capture
>web
> pages from URLs on the clipboard:
> [...]

It's helpful to see an example of org-capture in use, I still have more
to learn about it.  I'll put a little example in the README for
org-board.

Cheers,
Charles

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-09-16 18:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-03 11:10 org-board -- bookmarking and archival Charles A. Roelli
2016-09-14  7:10 ` Alan Schmitt
2016-09-14 18:33   ` Charles A. Roelli
2016-09-15 17:07 ` Adam Porter
2016-09-16 18:40   ` Charles A. Roelli

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).