Database Population Help Needed for AWESOME Project

I’m excited to be getting started on a personal project of mine - it involves finding businesses, storing useful information on these businesses, and allowing users to see/edit/update the information.

But I’m trying to think of how to start this whole thing off, back-end wise. For example, do I just start a blank database and grab a team of people to start populating it? Do I try to program some kind of crawler that will start finding businesses and logging information on them? Do I talk to a bunch of businesses manually and have them populate my database?

Other questions that come to mind is, how do I safeguard my data from faulty entries and how do I implement a system to ensure the data is up-to-date? Also, when I fire up my app, how do I not slam the heck out of Google’s servers when Im pulling data about businesses?

If anyone has any advice in launching a project like this, can give info on the above questions, or would like to join me in this project, please let me know. I could definitely use a hand and it is going to be a really cool product.

Whew. That’s quite the endeavor. If this project is truly personal, with no intention of making any money at all, then I would look into using an already robust source of data, like Google or Yelp. You could do a sort of lazy-loading into the database, so your app only gets the relevant data for business that people actually search for, and store that with whichever information users can supply. That would take care of validation of the most important entries and keep you from punishing their servers with requests.

In order to develop your app, you’re going to need sample data, so you’re unlikely to be starting from a completely blank slate. Since I doubt you’re starting off with a round of venture capital funds, you’re going to want to automate as much as possible. In other words, the crawler sounds like an approach you should be investigating. I wouldn’t worry about overtaxing Google’s servers, they can handle it ;). There’s already plenty of limits in the search API that prevent abuse.

I might be able to help if you have some more information - what sort of data are you looking for - is it publicly available - is there an API - i’m working on an app that scrapes box score data from nba.com that has taken a while to build - there are ways to automate your database population but it depends on the data you want and where you find it (most public servers have an API that makes it quite easy to do things like that)

1 Like

@jemagee

Yes, it is public data and it can be obtained from Google or Yelp. Essentially, I want to grab location and business listing info for specific types of businesses near the user, and then output it to them but with extra info attached. I may not necessarily need to store the business info itself in my DB because the site could use API calls for each request from the user to something like Google Maps… However, the data that my project will hold about the business is something that is not available on Google or Yelp (thus making my project unique) and that data will need to be stored on our DB.

So essentially, the code is going to have to get directory info, check our database for associated information from the given businesses returned by Google API, and then if that information exists, display it to the user. If not, allow the user (or business owner) to enter new information about the business.

Kind of like a review site, but it is not at all a review site. The questions I am currently facing is how best do I go about getting basic data, how much do I want to store on our database vs make dynamic calls out for, and obviously I’ll have to write code that checks our database for information on a business by some unique criteria (perhaps address or if there is an id number).

The good thing, is when the project is launched, the plan is to also have help from business owners, as it will highly benefit their business for them to give us updated data.

If you’re going to build an application like this you need to first know your data source, you need to see how they format the data you want and decide how you’re going to parse it into your site.

Your first step should be writing out the business idea of the site which will then lead to the user stories and the user stories will tell you what you need to store and what you don’t

1 Like