Navigating and extracting XML in Python

WiseOldDabbler · February 5, 2023, 6:30pm

Hi,
I’m trying to find a way to extract certain elements from the note data in this xml document:

F
1
4

3

1
half

I now need the data from the tied element, and whilst extracting the data from pitch, I had no problem, as I simply itterated through pitch as follows:
for pitch in myroot.iter(“pitch”):
But now I need other elements from note, specifically the tied element. The above statement could be altered to:
for not in myroot.iter(“note”):
but having tried statements like:
pitch=note.find(“step”).text
Caused a python error, similarly:
pitch=note/pitch/step.text doesn’t work.
I’d appreciate hints on this, having been googling for most of the day.
Thanks.

kinome79 · February 7, 2023, 2:47am

Just FYI, not sure if you were trying to display an outline or link to a file, but either way it didn’t come through, and we can’t see the data you were attempting to read.

WiseOldDabbler · February 7, 2023, 10:39am

Apologies.
Hi,
I’m trying to find a way to extract certain elements from the note data in this xml document:

<note>
 <pitch>
  <step>F</step>
  <alter>1</alter>
  <octave>4</octave>
 </pitch>
 <duration>3</duration>
 <tie type="start"/>
 <voice>1</voice>
 <type>half</type>
 <dot/>
 <notations>
  <tied type="start"/>
 </notations>
</note>

I now need the data from the tied element, and whilst extracting the data from pitch, I had no problem, as I simply itterated through pitch as follows:
for pitch in myroot.iter(“pitch”):
But now I need other elements from note, specifically the tied element. The above statement could be altered to:
for note in myroot.iter(“note”):
but having tried statements like:
pitch=note.find(“step”).text
Caused a python error, similarly:
pitch=note/pitch/step.text doesn’t work.
I’d appreciate hints on this, having been googling for most of the day.
Thanks.

kinome79 · February 7, 2023, 5:27pm

I take interest because I just started playing with ElementTree a few days ago:)

Unless I’m oversymplifying, using ElementTree, as long as your variable is an Element, you should be able to just iterate though it…

So if Note is your root, lets say stored in variable note, then:

for value in note:
    print(value.tag)

That should print the tag of every direct child element in note. Find only looks one deep I believe, so note.find("step") wont work, because step is inside pitch… if you did note.find("pitch").find("step").text that would work. iter() searches multiple levels, so note.iter("step") would get all steps found throughout the entire tree.

You can also treat it like a list, so if say tie is always the 3rd element(index 2) then note[2] would return the tie element.

All in the docs for ElementTree:

Hope that helps.

WiseOldDabbler · February 7, 2023, 8:38pm

Thanks so much! It is really that simple. I sometimes forget how much instruction can be crammed into a line of python code.

WiseOldDabbler · February 7, 2023, 9:15pm

Sadly:
note.find("pitch").find("step").text doesn’t work. it gives: “NoneType” has no attribute find
Seemed a pythonesque logical solution too. I’ll read the docs again.
Thanks.

kinome79 · February 7, 2023, 9:29pm

Remember, I used note as an example variable name. If you didn’t call your variable note, or your root isn’t the <note> you may have to modify it slightly. That error is just saying note is equal to None, so you can’t run .find on it. It could be saying .find(‘pitch’) resulted in a None value, it worked in the little sample you gave me, but the full file you’re using might be different.

kinome79 · February 7, 2023, 9:35pm

Guess I should also make sure we’re using the same software… I’ve been working with:

import xml.etree.ElementTree as et

WiseOldDabbler · February 7, 2023, 10:17pm

Yes, we’re using the same software. The xml file consistes of a number of note nodes, so I start by looping through them i.e.:
Yes, we’re using the same module. The file consists of a load of nodes at the note level, and I move through the notes, i.e.: for note in myroot.iter(“note”):
Thanks.

kinome79 · February 8, 2023, 4:26am

From your root you should just be able to do: for note in myroot then for each of those notes you should be able to do the note.find("pitch").find("step").text assuming every note has a pitch and a step. If there is a possiblity some notes might be missing a pitch or a kind, you would then want to use if statements to avoid getting errors, or could do try and except and skip any values that are missing.

WiseOldDabbler · February 8, 2023, 12:18pm

Not quite that simple. Note is a subnode of measure, and there are other nodes outside that structure. I’ll look at the element class to see whether that might make things easier.
Thanks.

kinome79 · February 8, 2023, 6:11pm

Well, its still that simple, you just need to adjust it to fit your document, dig down to where you wanna be. Good luck. Let us know if you have any additional questions.

WiseOldDabbler · February 8, 2023, 9:29pm

Hi, well, it all works now, and thankfully, has left me with a far better understanding of how this library works. I’m now able to extract what I want, and use it effectively. Very many thanks!

WiseOldDabbler · February 9, 2023, 11:27am

OK, I’ve one further question. I’m trying to access the text from tied. If I print tied.attrib, I get what looks like a dictionary, but printing tied.text gives me an error. I’ve also tried: print(tied[“type”] but that didn’t work. also tried accessing tied as a list, but no good either. You’ll note that in the example I sent, type is set to start, and it’s that bit I need to access.
Thanks sio much for your help and patience.

kinome79 · February 9, 2023, 2:57pm

Tied doesn’t have a text it appears, what you’re trying to read is an attribute, and you’re right, attrib does return something like a dictionary, and you can access its data as such.

tied.attrib['type']

WiseOldDabbler · February 9, 2023, 4:19pm

Hi,
I got there just as you sent the email. The fact that there’s no text confused me, but printing attrib looked like a dictionary.
Brilliant!

system · August 11, 2023, 4:20am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Accessing the text node of an element - unexpected behavior (for me)	6	1089	January 17, 2021
BeautifulSoup how to getText from all td tags	1	707	June 1, 2021
How to get attribute name as nodeName return?	8	2832	January 17, 2021
Accessing Wikipedia API elements	3	723	January 17, 2021
Instead of returning the body element Why my command document.firstElementChild.secondElementChild returns undefined? JavaScript	3	768	February 2, 2021

Navigating and extracting XML in Python

Related topics