[ad_1]
I am trying to open, read and extract the content (fragment) that is between 2 words (which are opening and closing profile, also included) of an .xml locating the fragment by means of a keyword that I introduce and write only that fragment (between 2 tags) in another new .xml that I generate.
Currently the python script that I have allows me to open, read the source .xml file, search for the keyword that I introduce in the text and return those complete lines where the keyword is found by writing them in a new .xml file that I generate as follows:
keyword = 'Georgia'
occurrences = []
with open('test_input.xml') as lines:
for line in lines:
if keyword in line:
occurrences.append(line)
archi1=open("test_output.xml","w")
archi1.write(''.join(occurrences))
archi1.close()
The result I get is a “test_output.xml” file that contains the following:
<id>Georgia-1</id>
<profile>Georgia-p1</profile>
<id>Georgia-2</id>
<profile>Georgia-p2</profile>
And the problem is that I not only need it to return the complete lines that contain the keyword (in this case ‘Georgia’) but also the entire fragment that contains those two words and that is delimited between the opening and the closing of the word or tag ‘profile’, that is, I need it to return the following result:
<profile>
<id>Georgia-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p1</profile>
<showtitle>Georgia_s1</showtitle>
<ip>000.000.0.3</ip>
<port>00003</port>
<persistencePort>00033</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_3</webstart.server.name>
<codebaseProtocolServer>T3</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Georgia-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p2</profile>
<showtitle>Georgia_s2</showtitle>
<ip>000.000.0.4</ip>
<port>00004</port>
<persistencePort>00044</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_4</webstart.server.name>
<codebaseProtocolServer>T4</codebaseProtocolServer>
</properties>
</profile>
The full source .xml I am using is as follows:
<project>
<profile>
<id>Azerbaiyan-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Azerbaiyan-p1</profile>
<showtitle>Azerbaiyan_s1</showtitle>
<ip>000.000.0.1</ip>
<port>00001</port>
<persistencePort>00011</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_1</webstart.server.name>
<codebaseProtocolServer>T1</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Azerbaiyan-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Azerbaiyan-p2</profile>
<showtitle>Azerbaiyan_s2</showtitle>
<ip>000.000.0.2</ip>
<port>00002</port>
<persistencePort>00022</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_2</webstart.server.name>
<codebaseProtocolServer>T2</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Georgia-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p1</profile>
<showtitle>Georgia_s1</showtitle>
<ip>000.000.0.3</ip>
<port>00003</port>
<persistencePort>00033</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_3</webstart.server.name>
<codebaseProtocolServer>T3</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>Georgia-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>Georgia-p2</profile>
<showtitle>Georgia_s2</showtitle>
<ip>000.000.0.4</ip>
<port>00004</port>
<persistencePort>00044</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_4</webstart.server.name>
<codebaseProtocolServer>T4</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>USA-1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>USA-p1</profile>
<showtitle>USA1_s1</showtitle>
<ip>000.000.0.5</ip>
<port>00005</port>
<persistencePort>00055</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_5</webstart.server.name>
<codebaseProtocolServer>T5</codebaseProtocolServer>
</properties>
</profile>
<profile>
<id>USA-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<profile>USA-p2</profile>
<showtitle>USA1_s2</showtitle>
<ip>000.000.0.6</ip>
<port>00006</port>
<persistencePort>00066</persistencePort>
<defaultLocale>en_GB</defaultLocale>
<webstart.server.name>host_6</webstart.server.name>
<codebaseProtocolServer>T6</codebaseProtocolServer>
</properties>
</profile>
[ad_2]