Search and Replace - Better Advanced Solution

@r1chard5mith @Marmiz Instead of trolling, why don’t you make an attempt to provide useful feedback?

If you enjoy writing incomprehensible code, go join a code golf competition.

I did provide you useful feedback - I showed you a shorter way to solve the problem without using RegEx (which I find easier to read than the examples you originally gave)

Then you told me it was using Regex anyway and I asked you to explain. I also asked “What does a “non-regex” solution look like?”. Do you have a better solution?

Be civil, please. No one is trolling here.

Nowhere did I mention in the initial proposal that my intent is to produce unreadable/incomprehensible code. I thought it was pretty clear that the intent was to:

  1. Provide a concise solution
  2. Use advanced (ie less well-known features of js) to do so

@r1chard5mith Your first suggestion missed an important edge case. If the before is lowercase, and the after is uppercase, the after should be modified to also be lowercase.

Demonstrating Regex usage is one of the main points of this proposal. Providing a non-regex solution doesn’t fit into the scope of the question.

I ‘get’ that Regex isn’t everybody’s cup of tea and that’s OK. Personally, I avoided using it as much as possible for years before I finally made an honest effort to understand it. The point is, for devs working on these problems who are interested in learning Regex, it’s useful to see working examples.

In that same thread, ternary statements are another less-well-known feature of JS that can be useful in situations where you require a simple either/or check. Once again, not everybody’s cup of tea but when used intentionally, can cut down code size.

For instance, it was a very common pattern to use ternary statements to define parameter defaults prior to the feature being included in the new ES6 standard.

For example:

function someFunction(param) {
  var someVar = (param) ? param : 'some default value';
}

… can now be done in ES6 with parameter defaults:

function someFunction(someParam='some default value') {

}

@Marmiz The point isn’t whether or not it uses regex. I was just pointing out that String.replace uses regex.

Also, did you read the initial proposal? The intent is not to make the solution impossible to read. Ternary statements aren’t ‘impossible to read’ when used conservatively. Providing a solution that is more difficult to read – while creative – is not constructive to the conversation.

@r1chard5mith This proposal isn’t specifically targeted to ‘everyone’. It’s marked ‘Advanced’ specifically because it uses some of the more complex and less-well-understood features of JS to produce concise and efficient code.

@PortableStick I posted this seeking constructive feedback/suggestions. Maybe, I have a more liberal interpretation of the term ‘trolling’ than most. I consider, intentionally providing non-constructive feedback, as a form of trolling.

Code golf competitions exist specifically as a outlet for devs who are bored and/or looking for a challenge that involves creativity solving problems with overly-complex and hard-to-read solutions.

I proposed an alternative solution that may be a good learning utility for others. If this is the wrong place for that type of feedback. Feel free to flag the question so a Moderator can delete it.

Learn to relax, man. All input is constructive if you know how to handle it.

String.replace itself isn’t a Regexp method - if a regex is used as the search expression then it will evaluate the expression using the builtin Regexp module (using that to build a search string), and it in turn allows the replace expression to use regex subsyrings. But if a string is passed it just uses that directly. replace just does string substitution, its not to do with regex.

Edit @evanplaice see line 135. String.replace is not the same as Regexp.replace at all, it uses a different algorithm (as defined in the ECMAScript standard) and different code so that it can actively avoid delegating to Regexp modules if at all possible [and ideally drop into fast C++ native string replace methods? That would be engine specific though]:

  1. Coerce string to object
  2. If the search string is not type string + a bunch of other conditions, then call out to Regexp search/replace methods.
  3. Else the search string is type string, so just replace it with whatever is specified as the replacement.
  4. Return the new string.

At stage 2, the function does its checks to see if it should delegate out to Regexp (the JS Regexp modules and the C++/Rust code that powers it), but the algorithm is set up to defensively avoid using the [slow] methods in the Regexp modules.

2 Likes

Stupidly, I hadn’t realised that the ‘hints’ page is actually a comment thread itself and lots of other people have posted solutions there already. There are a couple of solutions there that doesn’t use replace. It’s very interesting to look at the different ways it can be done. My favourite is this one, freeCodeCamp Algorithm Challenge Guide: Search and Replace which is splitting on the word to be matched and then reconstructing it as the input to join. Very clever.

@DanCouper TIL. Thanks, I didn’t know the JS source was accessible online.

So, you’re saying that the JS version of RegEx is used as a sort of polyfill when the native C implementation of str::replace is missing?

I’ve never heard of Rust being used for browser dev, did you add that for completeness or are there actually browsers written in Rust?

@r1chard5mith I would have posted this in Hints, but replies have been locked there.

If you like that example, it works almost almost identically to how built the output string in the first example.

Remove the split/join parts to simplify things a bit

(before[0] == before[0].toUpperCase()) 
    ? after.charAt(0).toUpperCase() + after.slice(1)
    : after;

Then inline the regex test so the ternary test is the same

  (/^[A-Z]/.test(before))
    ? after[0].toUpperCase() + after.slice(1)
    : after[0].toLowerCase() + after.slice(1);

They share a lot in common, my version uses a different result when the ternary is false because it also converts the after string to lowercase if the before string is lowercase.

The first character test works the same:

// this, is the same
before[0] == before[0].toUpperCase()

// as this
/^[A-Z]/

// the carat specifies that the search should start from the beginning of the string
^ 
// the next selector checks just the first character to see if it's within the range of uppercase letters
[A-Z] 

// forward slashes are used to delimit a RegEx string the way quotes are used to delimit a text string
/some regex expression/

// RegExp.prototype.test() takes a string as input, checks it against the regex string, and returns a boolean result
var booleanResult = /some regex/.test(input);

// so the following returns true only if the first character of 'before' is uppercase
/^[A-Z]/.test(before)

There’s actually an edge case this doesn’t cover that wasn’t defined in the question. If the first letter of before is not a letter (ie symbol, number) this will still convert after to lowercase. If I wanted this solution to be more rebust it should do 2 comparisons:

  1. if the first char of before is uppercase, after should be titlecase
  2. if the first char of before is lowercase, after should be lowercase
  3. if neither of those true, either throw an error or return after unchanged.

So this would be a better solution:

function myReplace(str, before, after) {
  after = (() => {
    if (/^[A-Z]/.test(before))
      return after[0].toUpperCase() + after.slice(1);
    if (/^[a-z]/.test(before))
      return after[0].toLowerCase() + after.slice(1);
    return after;
  })();
  return str.replace(before, after);
}

Note: the => (fat arrow) syntax is just a quick way to define a unnamed function and if statements work without {} (brackes) if the body only takes up one line. May look weird but the arrow function body is just an if/if else/else evaluation.

Re rust: yes, parts of firefox.

My mistake, I was still in the process of editing the last part.

The sad thing is that javascript has limited regex support (in this case for modifiers), so you can’t do the whole replace with a single regex call.

Here’s a one liner (because I like one liners)

function myReplace(str, before, after) {
  return str.replace(
    new RegExp(before, 'ig'),
    match => after.replace(
      /^./,
      first => /^[A-Z]/.test(match) ? first.toUpperCase() :
               /^[a-z]/.test(match) ? first.toLowerCase() : first
    )
  );
}

This solution is probably 1 or 2 years old but it still holds its weight; maybe you can only change the way it checks for casing with RegExp instead of charCode but hey, whatever.

function myReplace(str, before, after) {
  return str.replace(new RegExp(before,"ig"), (toReplace) =>
    toReplace.charCodeAt(0) < 97
      ? after[0].toUpperCase() + after.slice(1)
      : after
  )
}

@lynxlynxlynx What do you mean? JS supports RegEx modifiers, dotall is on the track to be added to the spec soon too. Not sure how that relates to doing inline replace.

Did you mean matching groups? Because JS RegEx supports those too.

Here’s a relatively one-liner regex that solves the problem using some of the more advanced, and less known features of JS regex.

@KittenHero Nice, when you said one-liner I was thinking the same thing. Here’s an alternative method to insert variables into the regex source. If you are building a regex string that conditionally includes multiple variable, this can be a lot cleaner than using new with a concatenation.

This version covers the missing edge case mentioned in my last comment. Also, you don’t need the g modifier because you’re only doing a single match.

function myReplace(str, before, after) {
 return str.replace(/X/i.source.replace(/X/, before), (m) => {
   if (/^[A-Z]/.test(m))
     return after[0].toUpperCase() + after.slice(1);
   if (/^[a-z]/.test(m))
     return after[0].toLowerCase() + after.slice(1);
   return after;
 });
}
  • /X/.source - references the regex source’s raw string
  • replace(/X/, before) - replaces X in the source with the value of before.

The next part leverages another feature of RegEx. The second parameter of Replace can be fed a function. For every matching group (ie surrounded by parentheses) a match variable will be made available in the function.

str.replace(/group1)|(group2)|(group3)/, (m, g1, g2, g3, offset, string) => {
  console.log(m);
  console.log(g1);
  console.log(g2);
  console.log(g3);
});

This is fully documented at MDN - String.prototype.replace()

If you need to do a complex replace without additional logic you can use $&, $1, $2, $3 in the second parameter to reference the match, group1, group2, group3, etc.

str.replace(/group1)|(group2)|(group3)/, "$&  $1 $2 $3);

The replacement string is whatever you return from the function. By inlining the after construction logic into this string we can capture before in the text, construct after and return the correct version of `after.

Even for an advanced example, this goes pretty far above and beyond the typical usage of RegEx. I only know about this because I use RegExp as the lexer in jquery-csv lib.

I suppose capture group could be a bit advanced, but it does make your life easier when you do search and replace in vim :slight_smile:

That’s just Firefox btw. Chromium’s implementation will differ. Chrome/IE/Edge/Safari are all closed souce so can’t tell. However, the core JS stuff (ie not the browser APIs) has to follow the ECMAScript spec, somyou can tell what it’s doing anyway without the source.

Ok, string.replace is notthing to do with regex, it’s basic string substitution. However, the method does a check to see if a regex is used - if so, it delegates to the regexp module to build the string to look for and the one to replace. The browser is written in C++ (some Rust if it’s Firefox), so if using the native methods is a better option than the JS ones, then it makes sense to use them, but it’ll be on a case by case basis. Like the array method sort: if it’s all integers, generally a fast C++ integer sort from the standard library is (or used to commonly be) used.

Firefox is having parts of itself rebuilt in Rust; it’s Mozilla’s language and it was funded for that reason afaik

It supports very few of them. If it had case-affecting modifiers (eg. \L), it could make things cleaner here. I do rescind the one-regex possibility though, as I don’t see a way to do it with less than two, except maybe if it also supported recursion.

@luishendrix92: yes, it would need to be fixed to work on non-ascii locales.

@DanCouper I could track where it fellback to using the non-optimized version if the native c version doesn’t exist. I couldn’t see where it delegated to using RegEx, so I assumed the RegEx approach was the fallback alternative.

I should probably take a moment to sit and study the code more closely but I’ll take your word for it. Thanks for the feedback. It’s inspiring to see the work Mozilla is doing on Rust.

@lynxlynxlynx I’m no PREG guru, I didn’t really pick up RegEx until I transitioned over to mostly writing code in JS.

  • \L does make a lot of sense for this

Maybe more modifiers will be added in the future. I was honestly surprised to see the Dotall operator proposal on TC39’s list. Even more-so, that the proposal has been fast-tracked to Stage 4 within a year. We can likely look forward to more being added in the near future.

It looks like a lot more are coming soon:
https://mathiasbynens.be/notes/es-regexp-proposals

My Functional Programming solution. (updated for edge case where before is lcase and after is ucase)

const isUpperCase = str => (/^[A-Z]/).test(str)
  
const preserve = (before, after) => {
  let newStr
  if(isUpperCase(before)) {
  newStr = after.charAt(0).toUpperCase() + after.slice(1)
  return newStr
  }
  if(isUpperCase(after)) {
    newStr = after.charAt(0).toLowerCase() + after.slice(1)
    return newStr
  }
 return after
}

const myReplace = (str, before, after) => {
  let checkCase = preserve(before, after)
 return str.replace(before, checkCase)
}

console.log(
myReplace("His name is Tom", "Tom", "john"),
myReplace("He is Sleeping on the couch", "Sleeping", "sitting"),
myReplace("Let us go to the store", "store", "mall"),
myReplace("This has a spellngi error", "spellngi", "spelling"),
myReplace("Let us get back to more Coding", "Coding", "algorithms"),
myReplace("This is not the wrong caes", "caes", "Case")
)