Issues / Progress in URL Shortener Microservice (Beta FCC)

Okay couldn’t find anyone else who’d done the beta version where the url is submitted in form input. So my issue is in the fcc example (https://thread-paper.glitch.me/) urls submitted without http(s) aren’t coming up as valid. Which is fine because it follows the format wanted for the project.

However in my code example (https://mirror-brow.glitch.me/), just doing some testing with dns lookup it throws an error when using http(s) (e.g., https://www.freecodecamp.com), but not regular www type urls (e.g., www.freecodecamp.com).

Any ideas why? Do I just look for http:// at the beginning of the req.body.url string, slice it out and feed to dns lookup?

Keep in mind I’m just testing so it returns json for error and success, no db saves just yet.

My sample testing code :

// our url shortener API endpoint
app.post('/api/shorturl/new', function(req, res) {

  dns.lookup(req.body.url, function(err, addr, fam) {
    if(err) { res.json({ error: 'invalid URL' }); }
    else {
      res.json({ original_url: req.body.url });
    }
  });
});

well alrighty, think we’re moving in the right direction. suggestions, improvements highly welcome :slight_smile:

// our url shortener API endpoint
app.post('/api/shorturl/new', function(req, res) {

  var protocolReg = /(^\w+:|^)\/\//;
  var url = req.body.url;

  // if we have a protocol great
  if(url.match(protocolReg)){
    var result = url.replace(protocolReg, '');// strip the protocol first
    var domain = result.split('/')[0];// remove any routes

    dns.lookup(domain, function(err, addr, fam) {
      // if the domain isn't valid
      if(err) { res.json({ error: 'invalid URL' }); }
      else {
        res.json({ original_url: url });
      }
    });
  } else { // if not, fail
    res.json({ error: 'invalid URL' });
  }

});

The next issue I had was how to handle the short_url field. Technically it just increments and in mysql this would be easy to add a secondary index that auto-increments. However, it seems (from what i’ve read) that in mongodb while in development you can do something similar, i’ve read performance is horrible for production. Not that this is going to production lol, but i like to learn how the pros do.

I also didn’t want to do something like sort the collection by short_url to get the max and just +1 it, though maybe thats okay? So i found this mongoose-sequence package. It really just adds another collection called counters, and keeps track of the field you want incremented and automatically handles that field value on save. Not sure if its better / worse than the described above, but much simpler and seems to be faster when creating a new short url than the fcc example.

edit: i believe the fcc example does something similar just creates the counter collection and handles itself. definitely more robust than my url checking as well. also didn’t know ip format was acceptable.

shortid npm package is even cooler for this

https://www.npmjs.com/package/shortid

Hey @rudolphh, yep I think I must have sunk just about a day on wrapping my head around the best way to generate an unique short identifier to act as the short_url. I similarly ended up opting for an auto incrementing number, which took quite a bit of time to get up and running.

I think you’re right, it’s probably not the best solution, but my frame of mind was to avoid using too much abstraction (because, “All magic comes at a price!”).

In anycase, here is the piece of my code that handles that:

// Saves a new URL to the database returning the new extension on this domain (shortened URL)
function saveURL(url, callback) {
  
  getNextSequence('urlid', function(err, seq) {
    if (err) console.error(err);
    
    
    var newUrl = {
      "original_url": url,
      "short_url": DOMAIN + '/' + seq
    };
    
    db.collection(DB_COLLECTION_NAME).insert(newUrl, function(err, result) {
        if (err) console.error(err);
        
        callback(newUrl);
    });
    
  });
}

function getNextSequence(name, callback){
    db.collection("counters").findAndModify(
        { "_id": name },
        [],
        { "$inc": { "seq": 1 }},
        { "new": true, "upsert": true },
        function(err, result) {
          callback( err, result.value.seq );
        }
    );
}

I mean to comment this in the near future (ideally today), but to fill you in in the mean time.

getNextSequence() looks into a separate collection called ‘counters’ which I have manually initialised with the following document:

{
    "_id": "urlid",
    "seq": 0
}

Every time getNextSequence() is called is called it will increment and update the ‘seq’ (sequence) value and return it to its callback.

It’s callback in this case is more or less the meat of my saveURL() function, which is executed whenever a user requests to add a new url. I enter the code block that saves the new URL in my database. It needs to be within the callback of getNextSequence() because it requires that new sequence numbers, it is the short URL extension associated with that new URL.

You can see why this is a performance issue, the saving of new URLs to the database becomes dependant on writing and receiving the new updated sequence number before doing so. If you were doing this at scale I imagine it would be a nightmare!

Hope this makes sense. Godspeed Rudus!

Allan

Makes perfect sense @AllanPooley and though i used a package i’m so glad you posted that to show essentially what the package I’m using is doing. With mongoose it’s just using a pre save hook to do what you’re describing; increment and update the seq value, and then returning it to save for us in the short_url field. Thank you!

The performance issue I don’t believe had to do with doing this, because this seems like it’d be the smarter way to do, it had more to do with using the in-built mongodb indexing features. Not exactly sure how they work, and/or how to describe.