Regex issue with gcsplit

I have a directory with the following files:

page1.html
page2.html
page3.html

For example, say my page1.html looked like this:

<strong>Hello world</strong>

<p>ABC, Page (1 whatever).</p>
<p>Some text</p>

<p>DEF, Page (1 ummm what).</p>
<p>Some text</p>

<p>THE<em><strong><span class="underline">GHI</span></strong></em>JK <em><strong><span class="underline">the</span></strong></em>LMNOP<em><strong><span class="underline">Q</span></strong></em>RS.<p> ABC, Page (1).</p>

I want to split page1.html to:

page1_0.html

<strong>Hello world</strong>

page1_1.html

<p>ABC, Page (1 whatever).</p>
<p>Some text</p>

page1_2.html

<p>DEF,  Page (1 ummm what).</p>
<p>Some text</p>

<p>THE<em><strong><span class="underline">GHI</span></strong></em>JK <em><strong><span class="underline">the</span></strong></em>LMNOP<em><strong><span class="underline">Q</span></strong></em>RS.<p> ABC, Page (1).</p>

I want code that identifies the line with the following pattern:

[0 to 10 characters in the beginning] , Page (1 [0 to 10 characters here]). </p>

I currently have the following code:

for filename in *.html; gcsplit -z -f "${filename%.*}_" --suffix-format="%d.html" $filename /'Page (1'/ '{*}'

But this is creating a page1_3.html containing the following text:

<p>THE<em><strong><span class="underline">GHI</span></strong></em>JK <em><strong><span class="underline">the</span></strong></em>LMNOP<em><strong><span class="underline">Q</span></strong></em>RS.<p> ABC, Page (1).</p>

But when I run this:

for filename in *.html; gcsplit -z -f "${filename%.*}_" --suffix-format="%d.html" $filename /'^.{0,10}, Page \(1.{0,10}\).\<\/p\>'/ '{*}'

This just outputs the file page1_0.html.

What is the issue with my regex? Are there any alternative ways to achieve what I’m trying to do?

The utility splits on something that identifies the start of each section, you don’t seem to be looking for something that is common to the start of each section. Each of your sections (apart from the first) start with two newlines followed by a left angle bracket.

For example, the source I’m looking at is using it to split Markdown files, and the author is looking for level two headings (they start with ## ).

Also minor, but the utility is called csplit, gcsplit is command you’re using (Global Csplit I think), Googling gcsplit brings back a lot of results for both genomics and gas chromatography

I was able to split it with the following RegEx:

^.\{0,50\}Page (1