What should I learn?

Hi, recently I’ve become graduated in sociology. I don’t want to be more time in university, I’ve learned a lot but now it’s time to grow on my own. I’ve done some HTML CSS and a little javascript. But now I’m focused on Python.

My first objective is to be able to do some work that nowadays a lot of sociologists do it manually in an automatic way. That work is data collection from multiple webs and the posterior organization to extract a file that non-programming people can understand and work with it (Excel or similar). My question is, wich skills and technologies are mandatory to achieve my main objective. I think that maybe SQL is a good way to start, also some python libraries like urllib , some deeper knowledge on XML and JSON and so on.

And if you allow me to take advantage of your generosity, where can I learn about it? I’m finishing the course “Scientific Computing with Python”. Also, I’m planning to do the next course on freecodecamp. But maybe out there I can find nice resources where to learn deeper on this matter.

Thanks for your time. If someone is interested in the same topic I encourage you to contact me to help us with each other. I really hope that this topic could help more people.

1 Like

maybe something like Automate the boring stuff with python can be useful for you:

1 Like

Awesome! Definitely, I will follow through that way.

Also, I’m searching to go in deeper and being able to write some more sophisticated programs that not only automatize data collection. Sometimes data is in multiple webs and I should collect all and process it. But as a first step Automate the boring stuff with python will be a great path.

If your aim is to grab data and shove into (for example) Excel, then yes on focussing on Python, yes on SQL. I’d look at Postgres – get it installed, get familiar with it (psql command line is good, and it has good free GUIs). Look at how you interact with it via Python (not difficult, the libraries are great, and you’re going to be scripting most of the time which is much easier than writing applications). And Python has a few really good libraries for outputting data as Excel. Python also has excellent web scraping libraries as well. JSON is extremely simple, that and XML are data interchange formats, and again there are very robust libraries for dealing with them. I’m not sure how much you’d need to look at XML (older datasets yes, but it’s fallen out of favour more recently), but it depends on your domain.

1 Like

Thanks Dan! I will get used with Postgres. One question, what’s the difference about writing apps or scripts? Maybe is a stupid question.

No, not at all. With what you’re describing, what you can do is write a list of commands in a file on your computer in a programming language (Python in this case), then you just tell the Python interpreter to execute the commands. So for example, import a dB and web scraping library, then use them to connect to the database and then scrape some information from {example dot com}, then take that information and save it into the database. So you are just automating a task you’d do by hand – you can just keep running the script anytime you want.

An application – for example you want to do the above, but it’s a thing that doesn’t just live on in a file on your computer that you have to manually run, it’s something you install and it maybe has a GUI, you have some interface where you give it URLs and it scrapes that site then saves the data

1 Like

Great explanation, thanks! One last question would be possible to evolution a script to become an application? I mean, if I work on various scripts for different datatypes on various webs finally I could unite them in one app?
Again, thanks Dan. I appreciate your patience.

Sure. More likely you would start your application from scratch and then pilfer code that you wrote for your individual scripts.

1 Like

Thanks! I will apply all your comments.

I’d say most of the apps I wrote for my previous job started out as simple scripts.

1 Like

Allright! I will start writing “simple” scripts, thanks for your answer.