How can I process RUNE-DOM::ELEMENTs?

Discussion of Common Lisp
Lispeth
Posts: 25
Joined: Wed May 13, 2015 8:33 am

How can I process RUNE-DOM::ELEMENTs?

Post by Lispeth » Sat Mar 19, 2016 8:26 am

Considering I want to retrieve a list of all <div class="box"> on a website in order to process their contents: It looks like I should use the css-selectors package for that which itself expects a cxml-dom object, so I need to use the make-dom-builder instead of the xmls-builder first.

Now my application returns a list like this:
#<RUNE-DOM::ELEMENT div {1004196643}>
#<RUNE-DOM::ELEMENT div {10041BD993}>
#<RUNE-DOM::ELEMENT div {10041CFC23}>
#<RUNE-DOM::ELEMENT div {10041DD7D3}>
#<RUNE-DOM::ELEMENT div {10041E5A33}>
#<RUNE-DOM::ELEMENT div {10041EA6F3}>
Either I'm too stupid to read the documentation or the documentation isn't very exhaustive indeed; so: how can I "work" with these elements, e.g. examine their contents? Or should I use the xmls-builder and replace the css-selectors by something different - in this case: by what?

Thank you!

David Mullen
Posts: 78
Joined: Mon Dec 01, 2014 12:29 pm
Contact:

Re: How can I process RUNE-DOM::ELEMENTs?

Post by David Mullen » Sat Mar 19, 2016 12:52 pm

It's not exhaustive, I guess, in that it doesn't document the DOM interface itself. You could look at how the DOM gets used in domtest.lisp.

Lispeth
Posts: 25
Joined: Wed May 13, 2015 8:33 am

Re: How can I process RUNE-DOM::ELEMENTs?

Post by Lispeth » Sat Mar 19, 2016 1:31 pm

Which lacks documentation too. At least I found that child-elements returns a list of child DOM elements. No information on how to parse it as something readable though. :(

David Mullen
Posts: 78
Joined: Mon Dec 01, 2014 12:29 pm
Contact:

Re: How can I process RUNE-DOM::ELEMENTs?

Post by David Mullen » Sat Mar 19, 2016 1:58 pm

The DOM is what it is. It isn't Lispy and it isn't exactly lightweight from a conceptual standpoint, but, hey, it's a "standard." I imagine you can use it to pick out the relevant pieces from the elements and build a more useful structure.

Lispeth
Posts: 25
Joined: Wed May 13, 2015 8:33 am

Re: How can I process RUNE-DOM::ELEMENTs?

Post by Lispeth » Sat Mar 19, 2016 2:35 pm

That's actually what I want to do: Grab all the .boxes from a website and put it into a Lispy structure (probably, a nested list). Is there an easier way to do that?

David Mullen
Posts: 78
Joined: Mon Dec 01, 2014 12:29 pm
Contact:

Re: How can I process RUNE-DOM::ELEMENTs?

Post by David Mullen » Sat Mar 19, 2016 3:00 pm

Maybe something like (I'm going strictly from the documentation here) this:

Code: Select all

(dom:map-document (cxml-xmls:make-xmls-builder) document)
Does that work? I don't know. I don't have CXML installed. Just taking a stab at it.
Last edited by David Mullen on Sat Mar 19, 2016 3:41 pm, edited 1 time in total.

Lispeth
Posts: 25
Joined: Wed May 13, 2015 8:33 am

Re: How can I process RUNE-DOM::ELEMENTs?

Post by Lispeth » Sat Mar 19, 2016 3:16 pm

That would require passing the document over as an argument, which leaves me without a filter over the CSS classes again.

What I currently have is, basically, this:

Code: Select all

;; ... define the-url ...

(loop for entry in
  (css-selectors:query ".box"
    (chtml:parse
      (dex:get the-url)
      (cxml-dom:make-dom-builder)))
do
  (progn
;; TODO: do something useful with <entry>. 
    (print entry)))
Without the css-selectors things could be so easy to handle, because I could replace make-dom-builder by make-xmls-builder and have a nice Lisp list. But then I couldn't easily filter my DIVs.

David Mullen
Posts: 78
Joined: Mon Dec 01, 2014 12:29 pm
Contact:

Re: How can I process RUNE-DOM::ELEMENTs?

Post by David Mullen » Sat Mar 19, 2016 3:35 pm

OK. How about this. If we have the matching box-elements, then stick them in a document and try the dom:map-document thing as I suggested previously. Something like this:

Code: Select all

(loop with mini-document = (dom:create-document 'rune-dom:implementation nil nil nil)
      with root = (dom:append-child mini-document (dom:create-element mini-document "query"))
      for box-element in box-elements do (dom:append-child root box-element)
      finally (return (dom:map-document (cxml-xmls:make-xmls-builder) mini-document)))

Lispeth
Posts: 25
Joined: Wed May 13, 2015 8:33 am

Re: How can I process RUNE-DOM::ELEMENTs?

Post by Lispeth » Sat Mar 19, 2016 3:58 pm

Obviously I can't "take over" DOM elements:
cannot adopt #<RUNE-DOM::ELEMENT div {100418AD63}>, since it was created by a different document.
Thanks though. :(

David Mullen
Posts: 78
Joined: Mon Dec 01, 2014 12:29 pm
Contact:

Re: How can I process RUNE-DOM::ELEMENTs?

Post by David Mullen » Sat Mar 19, 2016 8:40 pm

I guess what I'd do, then, is take the most direct route. Walk the elements DOM-wise and build a list structure recursively. Something like this ought to work:

Code: Select all

(defun attribute-list (attribute)
  (list (dom:name attribute)
        (dom:value attribute)))

(defun node-expression (node)
  (if (dom:text-node-p node)
      (dom:node-value node)
      (list* (dom:tag-name node)
             (let ((attribute-map (dom:attributes node)))
               (mapcar #'attribute-list (dom:items attribute-map)))
             (map 'list #'node-expression (dom:child-nodes node)))))

Post Reply