[ad_1]
I have something like the following HTML that represents multiple choice questions. The pattern is normally <p>
tag (the question), followed by and <ol>
and four <li>
s (the answer choices). However, only occasionaly, a question is more than one <p>
tag long.
<ol />
<p class="Question-Stem">Which choice is best?</p>
<ol>
<li class="Answer-Choice">text</li>
<li class="Answer-Choice">text</li>
<li class="Answer-Choice">text</li>
<li class="Answer-Choice">text</li>
</ol>
<ol />
<p class="Question-Stem">Which choice is best?</p>
<p class="Indented-Sentence">more text</p>
<p class="Question-Stem">more text</p>
<ol>
<li class="Answer-Choice">text</li>
<li class="Answer-Choice">text</li>
<li class="Answer-Choice">text</li>
<li class="Answer-Choice">text</li>
</ol>
<ol />
<p class="Question-Stem">null</p>
<ol>
<li class="Answer-Choice">text</li>
<li class="Answer-Choice">text</li>
<li class="Answer-Choice">text</li>
<li class="Answer-Choice">text</li>
</ol>
I am parsing the questions into an array of hashes, one hash per question
questions_array = []
questions_doc.css('p.Question-Stem').each do |p|
# skip this iteration if the previous tag is not an ol tag
# (i.e. it's not the beginning of the stem)
next if p.previous_element.name != "ol"
question = { :stem => "", :answer_choices => [] }
p.inner_html == "null" ? question[:stem] = "" : question[:stem] = p.inner_html
p.next_element.element_children.each do |child|
question[:answer_choices] << child.inner_html
end
questions_array << question
end
This is parsing exactly as I would like except for those few cases where the question stem is three p
tags in a row. In those cases, I want the html of all three tags together to get pushed into question[:stem]
. Any ideas how to achieve that?
I have already read How to parse consecutive tags with Nokogiri? but didn’t find the solutions applicable to this case.
[ad_2]