I’m trying to parse an Html string that may contain any valid html tags. I used this code to parse the string:
$doc = new DOMDocument();
$doc->loadHTML($product['description']); // comes from db
$els = $doc->getElementsByTagName('*');
foreach ($els as $node) {
o($node->nodeName.' '.$node->nodeValue);
}
This does print my tags but the first two tags are html and body. I want to ignore those. The string from the db does not contain html or body tags. Here’s an example:
This is a paragraph
- This is a list
I was wondering if there’s a way to iterate over tags inside the body only. I tried these
$els = $doc->getElementsByTagName('body *');
$body = $doc->getElementsByTagName('body');
$els = $body->getElementsByTagName('*');
Both don’t work. I have seen others use xpath but that gives me headaches. Can it be done with DomDocument?