A place where things actually work,
with a sense of humor.

admin
0 comments


keywords:
Code
PHP
DOMDocument

PHP DOMDocument()
I've been having to do some weird things parsing HTML lately, and I usually just use a lot of regular expressions (amazing magical things). Sometimes, the target HTML is just so grungy, this is not easy, or even possible. Again, PHP to the rescue. It has some incredible built in tools for dealing with HTML like XML data, and this is some concept testing code from where I needed to start doing this. It's not the specific code I ended up with, but it was where I started.

Goal: Find the link text and the sub text in the a href tag as a span (but no other id/class) and seperate them out for parsing and reporting for other things.


<?php
$content = <<<EOF
This is a bogus HTML document example.
<ul>
<li>Item 1</li>
<li>Item 2</li>
</ul>
<span id="info">id info with no a href</span>                   
<span id="info"><a href="#">link text<span>inner span</span></a></span><p>                   
<span>this</span> More body text. and more.      
EOF
;  
$doc = new DOMDocument();
$doc->preserveWhiteSpace = FALSE;
$doc->loadHTML("$content");
$doc->normalizeDocument() ;
$params  = $doc->getElementsByTagName('span') ;
$i = 0 ;
foreach ($params as $param) {
if($param->getAttribute('id') == 'info') {
  print "v: " . $doc->getElementsByTagName('span')->item($i)->firstChild->nodeValue . "<br> " ;
  print "a: " . $param->getElementsByTagName('a')->item(0)->firstChild->nodeValue . "<br> " ; 
  print "s: " . $param->getElementsByTagName('span')->item(0)->firstChild->nodeValue . "<br> " ;
} ;
$i++ ;
}
?>

The output looked like:

v: id info with no a href
a:
s:
v: link textinner span
a: link text
s: inner span
 
535 Chestnut Street - Suite 241 - Chattanooga TN 37402
423-605-6943