Need help with Regex match

hi all, actually i have a code that works fine except for 2 things…my code actually replace all words with hashtag (e.g. #example) and turn it into link like:

https://myurl/?q=#example

here my code:

export default function () {

  const regex = /#[^\s!@#$%^&*()=+.\/,\[{\]};:'"?><]+/g;

  const p = this.$('.Post-body');
  const baseurl = app.forum.attribute('baseUrl');

  p.html = p.html(p.html().replace(regex, match => `<a href="${baseurl}/?q=${match}" class="hashlink" title="Find more post with this hashtag">${match}</a>`))

}

now i can’t find a way to do this:

  • ignore html url
  • remove “#” from generated link

You post reads very confusing. Would you mind rephrasing/elaborating preferably with more examples?

hi gaac,
sorry but i can’t provide example.

this code actually find and replace #words with
<a href="http://mylink/?q=#words">#words</a>

but actually he math also anchor tag in html link (i want to skip all html links), and the result must be

https://mylink/?q=words and not https://mylink/?q=#words

can you give an example of a string returned by p.html() which you are chaining .replace(...) with? can you also show your desired string after replacements, as well as what your replace code actually gives you currently?

exactly this <a href="http://mylink/?q=#words">#words</a>

this is the final output, and i want to remove the # and avoid to parse html url

so returned string muste be

<a href="http://mylink/?q=words">#words</a>

Hey there justoverclockl,

The easiest solution that I can think up is using slice(1) in your replace callback.

const regex = /#[^\s!@#$%^&*()=+.\/,\[{\]};:'"?><]+/g;
"These are some sample #words".replace(regex, match => match.slice(1))
// output: "There are some sample words"

So using some combo of match.slice(1) and match should be what you need.

If I am understanding the question correctly that should get the job done for ya. I do agree with @gaac510 , in that the question as posted is a little confusing.

your regex has the “g” flag, which makes it remove all “#” symbols. You could just use something like:

let input='<a href="http://mylink/?q=#words">#words</a>'

console.log(input.replace('#', ''))
console.log(input.replace(/#/, ''))

// '<a href="http://mylink/?q=words">#words</a>'

That seems to achieve what you demanded in your last post

thanks, this actually fix the 1st problem, now remain only one! skip all html url in regex!

because now if a url contain an hashtag, this become:

so the only way is to have a regex that skip all html tag

instinctually I think you should just be able to add a
‘^’ into your regex. That will force it to match on words that start with ‘#’

const regex = /^#[^\s!@#$%^&*()=+.\/,\[{\]};:'"?><]+/g;

Should work. But I would test it a bunch to make sure there aren’t any side effects that you aren’t intending

this actually not work unfortunately, he does not recognize #words anymore

Humm… I see that. Now it is only searching the beginning of the string. My bad.

What about a negative look behind?
Something like

regex = /(?<!html)#[^\s!@#$%^&*()=+.\/,\[{\]};:'"?><]+/g
"some #words with a link https://docs.flarum.org/composer.html#regex-are-complicated".replace(regex, match => match.slice(1))
// output: "some words with a link https://docs.flarum.org/composer.html#regex-are-complicated"

does that work. for ya?

1 Like

this works pretty much :)…thank you :slight_smile:

after few try he broke link with anchor…

https://docs.flarum.org/#goals this become:
#goals" rel=" nofollow ugc">https://docs.flarum.org/#goals

I’m still not 100% sure what you are trying to do. Are you saying it’s something like:

// The complete string returned by `p.html()` before replacement:
"These are some sample #words"; 

// The complete string you want to get after replacement:
`These are some sample <a href="${baseurl}/?q=words" class="hashlink" title="Find more post with this hashtag">#words</a>`

As well as ignoring strings similar to the below:

// The complete string returned by `p.html()` before replacement:
"Here's a url: https://docs.flarum.org/#goals" // Part of a url, so ignore.

If the above sound about right here’s a modified version of your regex you can try in combination with @codyjamesbrooks’ suggested usage of slice():

const regexModified = /(?<!https?:\/\/\S*)#[^\s!@#$%^&*()=+.\/,\[{\]};:'"?><]+/g;

Do note about regexModified that:

  • It’s logic is similar to that of @codyjamesbrooks’s final suggestion; his would disregard a match if "#" is immediately preceded by "html"; mine would disregard a match if "#" is preceded by "http://" (or "https://") with any number of non-white spaces separating the two.

  • regexModified relies on JS’ implementation of regex where non-fixed length lookbehind assertions are allowed. regexModified may not work in other languages (e.g. Python, PHP, Java).

  • regexModified may not cover all situations, similar to what you have experienced with @codyjamesbrooks’ final suggestion. It’s definitely possible to make it more robust but you need to provide us with better explained context (what your project is like and what role the replacement routine in question plays in your project) and examples of edge cases. Until we are given these information we’d just be guessing your needs and not getting anywhere.

2 Likes

so, this code will be used in a forum, simply find and replace any word with hashtag (#example) and turn into a link that search through the forum.

http://mylink/?q=example

this is my project

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.