Starting new project, need ideas

Longtime lurker, sorry if this is the wrong subforum.

There is a dataset stored in a ~20mb csv-file that gets updated daily (and grows). I have no control over this, but its freely available.

I have an idea for a an project that uses this data. This will be a milestone project for me, learning wise.
Never made anything like this. So I could need a few pointers.

My plan is to somehow automatically pull this data into a mongo database maybe once a week.

Then make a rest-api (or maybe graphl) with expressjs. And a frontend with React.

OR

Should the react front end get this data directly from the csv? Fetching a 20mb file?

Any advice on how to go about this?

Welcome to the forums as a poster @beinnor, we don’t bite :smiley:

So 20mb of data in a CSV file is a good amount of data. Basically parsing that much data (regardless of format) will take a decent amount of computation time, and network latency, so you should’t load that much data on the client-side if you want a good user experience.

Say for example you do 0 calculations besides getting the data, and create a list of data in the UI, it will probably take a few moments to load the file, then several seconds to read the file and render the UI. After that you would have thousands of DOM nodes that will make React pretty slow from there on out. Just to be clear when I say several seconds I mean several seconds where the app is basically frozen as JS is parsing the data and can’t do anything else. There are ways around this like virtual scrolling where the DOM is only as large as the UI, but the data parsing will always be slow. So parsing that much data on the client-side is more or less out of the question. You could use something like web workers to perform the calculations “off to the side” of the main JS thread, but I’d only take this route if you don’t want or need a db.

So, how you handle this data really should depend on what you plan on showing in the UI. If we go back to the data-table example, reading a few rows from the db at a time would increase performance dramatically as the pagination would be done at the database level, so you only send and show a few rows on the client. (which is designed to be fast and handle this amount of data) This might require a lot more work depending on what features you want to support on the data, like searching, filtering, etc, will need to be handled in the UI, then sent to the backend and database.

Regardless, getting the data from the CSV into a database weekly should be as automated as possible, so you can perform calculations, queries, optimizations on the csv data using a real DB, and 20mb of data for a DB is basically nothing :wink:

This sounds like a fun and interesting project that has a lot of different approaches available, but again it all depends on what the requirements for the project itself. No need to go overboard and make more work if you need something simple, but you also don’t want to make it too simple and taking the wrong approach initially :smiley:

Thank you bradtaniguchi. I’ll definitively use a db on the backend side :).

Though I’ll have to postpone this project for about a week. I applied for a job on friday, and they sent me a home test project :grin: