* elisps dom library doesn't work as I expect
@ 2023-05-09 3:54 Stefan Huchler
2023-05-09 5:32 ` Tim Landscheidt
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Huchler @ 2023-05-09 3:54 UTC (permalink / raw)
To: help-gnu-emacs
I try to webscrape ebay auctions, but I can't scape the name as example:
(defun ebay-tracker--query-test (id)
"Webscrape ebay auction by ID."
(setq ebay-tracker-running 't)
(let* ((auction-url (format "https://www.ebay.com/itm/%s" id)))
(request auction-url
:parser (lambda ()
(goto-char (point-min))
(re-search-forward "^$")
;; (next-line)
(libxml-parse-html-region (point) (point-max)))
:headers '(("Content-Type" . "application/json"))
:success (cl-function (lambda (&key data &allow-other-keys)
(when data
(setq ebay-tracker-auction
data)
(message "success")
(setq ebay-tracker-running nil))))))
(while ebay-tracker-running
(sleep-for 0.03))
(print (length (dom-elements
(dom-by-tag ebay-tracker-auction 'meta)
'itemprop "name"))))
;; data that I try to scrape:
;; <meta itemprop="name" content="Sony Bravia XR-55A90J OLED TV - TOP">
(ebay-tracker--query-test "185887279856")
Also the dom-elements funktion is not described on the gnu website, yet
it's the only way to find elements with this library that don't use
class or id as attributes. shouldn't that code find this node?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: elisps dom library doesn't work as I expect
2023-05-09 3:54 elisps dom library doesn't work as I expect Stefan Huchler
@ 2023-05-09 5:32 ` Tim Landscheidt
2023-05-10 3:56 ` Stefan Huchler
0 siblings, 1 reply; 5+ messages in thread
From: Tim Landscheidt @ 2023-05-09 5:32 UTC (permalink / raw)
To: Stefan Huchler; +Cc: help-gnu-emacs
Stefan Huchler <stefan.huchler@mail.de> wrote:
> I try to webscrape ebay auctions, but I can't scape the name as example:
> […]
> (print (length (dom-elements
> (dom-by-tag ebay-tracker-auction 'meta)
> 'itemprop "name"))))
> ;; data that I try to scrape:
> ;; <meta itemprop="name" content="Sony Bravia XR-55A90J OLED TV - TOP">
> (ebay-tracker--query-test "185887279856")
> Also the dom-elements funktion is not described on the gnu website, yet
> it's the only way to find elements with this library that don't use
> class or id as attributes. shouldn't that code find this node?
dom-by-tag returns a list of DOM elements; however,
dom-elements expects a single DOM element as its second ar-
gument. So you need to iterate over the list of DOM ele-
ments returned by dom-by-tag and call dom-elements on each,
or use dom-search, etc.
Tim
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: elisps dom library doesn't work as I expect
2023-05-09 5:32 ` Tim Landscheidt
@ 2023-05-10 3:56 ` Stefan Huchler
2023-05-10 20:18 ` Tim Landscheidt
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Huchler @ 2023-05-10 3:56 UTC (permalink / raw)
To: help-gnu-emacs
Tim Landscheidt <tim@tim-landscheidt.de> writes:
> dom-by-tag returns a list of DOM elements; however,
> dom-elements expects a single DOM element as its second ar-
> gument. So you need to iterate over the list of DOM ele-
> ments returned by dom-by-tag and call dom-elements on each,
> or use dom-search, etc.
Interesting, I find the documention of dom-elements confusing:
> Find elements matching MATCH
what are the "elements" then? attributes not tags/ dom entries? But yes
that would be a bugreport about the documentation also that dom-elements
is not even listed in the gnu doku seem strange to me.
But maybe I am just not good enough in xml lingo.
Could you explain a bit or show a example of the dom-search function or
explain it's parameters. I have no idea what in the docstring
"predicate" means and in what format it's expected, is Predicate a known
term for something specific?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: elisps dom library doesn't work as I expect
2023-05-10 3:56 ` Stefan Huchler
@ 2023-05-10 20:18 ` Tim Landscheidt
2023-05-20 19:40 ` Stefan Huchler
0 siblings, 1 reply; 5+ messages in thread
From: Tim Landscheidt @ 2023-05-10 20:18 UTC (permalink / raw)
To: Stefan Huchler; +Cc: help-gnu-emacs
Stefan Huchler <stefan.huchler@mail.de> wrote:
>> dom-by-tag returns a list of DOM elements; however,
>> dom-elements expects a single DOM element as its second ar-
>> gument. So you need to iterate over the list of DOM ele-
>> ments returned by dom-by-tag and call dom-elements on each,
>> or use dom-search, etc.
> Interesting, I find the documention of dom-elements confusing:
>> Find elements matching MATCH
> what are the "elements" then? attributes not tags/ dom entries? But yes
> that would be a bugreport about the documentation also that dom-elements
> is not even listed in the gnu doku seem strange to me.
"Elements" in the context of dom-elements means the children
of the node passed as DOM (AFAICT).
> But maybe I am just not good enough in xml lingo.
That's not really a problem here as it is not /the/ prob-
lem :-); the dom-* functions only lightly relate to XML or
DOM concepts in JavaScript & Co. so one has to refer to the
Emacs "model". As almost everything is a list in Emacs, one
does not get a meaningful error when using the DOM functions
incorrectly but instead the code just does not work.
> Could you explain a bit or show a example of the dom-search function or
> explain it's parameters. I have no idea what in the docstring
> "predicate" means and in what format it's expected, is Predicate a known
> term for something specific?
A predicate in Emacs Lisp is typically a (possibly
anonymous) function that looks at something and then returns
t for some values and nil for others. So for your use case,
you could write something à la:
| (let
| ((dom (with-temp-buffer (url-insert-file-contents
| "https://www.ebay.com/itm/185887279856")
| (libxml-parse-html-region (point-min) (point-max)))))
| (dom-attr (car
| (dom-search
| dom
| (lambda (d)
| (and (equal (dom-tag d) 'meta)
| (equal (dom-attr d 'itemprop) "name")))))
| 'content))
This will iterate over all DOM elements in the document,
return those that have a tag "meta" and an attribute
"itemprop" with the value "name", take the first (and
probably only) one, and return the value of this element's
"content" attribute.
Tim
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: elisps dom library doesn't work as I expect
2023-05-10 20:18 ` Tim Landscheidt
@ 2023-05-20 19:40 ` Stefan Huchler
0 siblings, 0 replies; 5+ messages in thread
From: Stefan Huchler @ 2023-05-20 19:40 UTC (permalink / raw)
To: help-gnu-emacs
Thanks didn't think of creating dom elements to compare them and all the
other information, will have to implement it when I find time.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-05-20 19:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-09 3:54 elisps dom library doesn't work as I expect Stefan Huchler
2023-05-09 5:32 ` Tim Landscheidt
2023-05-10 3:56 ` Stefan Huchler
2023-05-10 20:18 ` Tim Landscheidt
2023-05-20 19:40 ` Stefan Huchler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).