If you’re getting the data from an API, it’s not web scraping. Web scraping is when you extract the data from the HTML code of a website. If you get the data from an API the data is already formatted in a standard format, e.g. JSON or XML, and you can use it directly.
I’m just wondering how much work replacing Excel is going to be. If the formulas are simple, it shouldn’t be too hard. However it’s possible to get Excel to do really complex calculations on lots of data - that’s what it’s designed to do -, so if that’s your case, doing all those calculations yourself, accurately, might be difficult. I am unfamiliar with baseball so I have no idea how complex those calculations are likely to be.
I’ve spent a lot of time trying to answer this question. I find it very hard to answer because it’s not clear why you want Excel out of the picture. Are you trying to produce a file that you can upload to OneDrive, just without opening Excel, for your own needs, or are you going one step further and publishing it on the web? Or maybe you are already publishing the data on the web, updating the data manually, and you want to streamline the process of updating the data? Your technical choices will be different and so will the time you will need to invest into learning and producing your solution.
I see three main options, depending on different assumptions:
Option 1: a script that runs in the terminal and produces a file (no web development)
Assumptions: you want a program that either runs all the time on your computer or is launched automatically at a given interval. All it does is grab the data from the web API, performs calculations and produces a file that maybe gets uploaded to OneDrive or some other place. The point is to always have the calculated results available for the latest raw data, without needing to have Excel running all the time. You don’t care about publishing on the web.
Here what you need is a language that allows you to do calculations easily. I wouldn’t recommend Javascript for this as it has weird quirks when dealing with numbers, leading to bugs. Python is friendly to beginners and has the NumPy library which is good for calculations and widely used by scientists. Just make sure you choose between Python 2 and Python 3 and stick to it, as they are not compatible (see https://wiki.python.org/moin/Python2orPython3/).
Option 2: a purely front-end web app
Assumptions: The reason you want to get rid of Excel is to make your calculated data available to other people through a web page. The data needs to be always up to date. The calculations are not very time-consuming and you don’t mind making the calculation code public. You don’t mind if the data is only available to browsers running Javascript.
I suppose, since you asked here, that you plan to publish your results on a web page.
If at all possible I’d design this as a front-end project. What you talked about (a script running on the back-end) requires a lot of programming experience and it will be a long time before it’s up and running and reasonably solid.
Doing this as a front-end project means you write a web page that uses Javascript on the user’s computer to query the API, do the calculations and display the results. It might not be viable if the calculations are very long or require a lot of processing power. If it can all be done within 10 or even 20 seconds on the average computer, it might be worth it, provided you show the user that progress is happening while you calculate.
The main hurdle I can see here is implementing the calculations efficiently and bug-free. Note that Excel is provided in binary code specific to your processor and is optimised for fast calculations. Javascript would be at least 10 times slower and has weird quirks when you need precision in calculations. So you’d need to read up on that. But it’s the only language you can use in the browser.
Option 3: do the calculations on the back-end
Assumptions: You want to make your calculated data available on the web. You either can’t or won’t perform the calculations in the browser.
Now you’re stuck learning both front-end for the web page and back-end for the data.
If you have your own server machine, you can choose to implement your calculation script in any language, but you have to handle the DevOps responsibilities. If instead you host your web server on someone else’s machines (AWS, Digital Ocean, Heroku…), you will be limited by what they allow you to do. Check carefully the limits and the prices. As far as I can tell, the more freedom you have, the more expensive it gets. Also the price goes up the more data you have and the more pages you serve. It’s possible that different languages are priced differently, I’m not sure, but some languages will give you more choice than others in terms of hosting, e.g. PHP seems available everywhere but other languages are only supported by some providers.
You need to implement:
-
a script that will get your data, similar to the one in option 1. It can write to a file or to a database (a database is more work to maintain and you need to pay extra to host it, updating a file will cause access conflicts which you’ll need to deal with).
-
a web server program that will serve pages in response to a browser request. It will access the data from the file/database and produce HTML that gets sent back to the client.
Possibly the web server could include the code that gets the data from the API and run it asynchronously. I’m not sure what’s best.
Note that, if storing data in a file, there are likely to be annoying issues of conflicting access when the web server wants to read it while the update script is writing to it. If you have a database, you don’t have to deal with this but it’s yet another thing you’ll have to learn to use correctly.
There are other options too. For instance you could implement your own API and have your web page query it. But honestly that’s just complicating the matter. Going the server route is not an easy option and if you’re just starting web development, it’s really a long term goal.
If you hesitate between options 2 and 3, you can start with option 2 knowing that you can always implement your server with Node and reuse the calculation code.