Hey, I need a RegEx that captures a String between tags the following way:
<a>hello<b>world<c>whatup</c>ineed</b>help</a>
In this case I would only want whatup to be captured, because it is the only String that has its opening and closing tags adjacent to it.
<a><a>hello<b><b>world</b></b><c>whatup</a></c>yo</a>
In this case only the word world should be captured.
It should work for any possible tag and any amount of tags. The String should also be allowed to contain the characters <, > and /. Only if they are arranged like a closing tag </.*> should the capture group end.
I just started with RegEx yesterday and can’t really seem to figure this one out. I have tried several things as of now, and currently I am back at <(.*)>(.*)</\1> .
This does only kinda seperate the content the way I want it to:
<h1><h2>Sanjay has no watch</h2></h1><par>So wait for a while</par>
becomes (when only grabbing group2)
<h2>Sanjay has no watch</h2>
and
So wait for a while
I have at least two problems with this:
-
It does not account for multiple tags.
<h1><h2>Sanjay has no watch</h2></h1> becomes <h2>Sanjay has no watch</h2>
-
It does not account for closing tags if there is no appropriate opening tag.
<h1>had<h1>public</h1515></h1> becomes had<h1>public</h1515>
(it should not return anything in this case)
I tried solving this with negative lookaheads and lookbehinds, but struggled with that, especially because I dont know how to account for tags of any size when there is no multiplier * allowed in lookbehinds. Also they didn’t quiete behave the way I expected them to. I also don’t know if I am even going in the right direction.
I would really love to solve this on my own, so ideally I don’t want a solution, just someone to point me in the right direction