[ad_1]
Is it possible to get character positions of each highlighted fragment? I need to match the highlighted text back to the source document and having character positions would make it possible.
For example:
curl "localhost:9200/twitter/tweet/_search?pretty=true" -d '{
"query": {
"query_string": {
"query": "foo"
}
},
"highlight": {
"fields": {
"message": {"number_of_fragments": 20}
}
}
}'
returns this highglight:
"highlight" : {
"message" : [ "some <em>foo</em> text" ]
}
If the field message in the matched document were:
"Here is some foo text"
is there a way to know that the snippet begins at char 8 and ends at char 21 of the matched field?
Knowing the start/end offset of the matched token would be good for me as well – perhaps there is a way to access that information using script_fields? (This question shows how to obtain the tokens, but not the offsets).
The field “message” has:
"term_vector" : "with_positions_offsets",
"index_options" : "positions"
[ad_2]