22 Jul 2019
I recently completed a four-month immersive bootcamp in Data Science and Machine Learning with no prior background in computer science or coding. Below are four lessons I learned.
The learning environment of bootcamps are designed to be practical. Students presented with a challenge would experiment with different machine learning algorithms and tinker with parameters to improve results. Whilst my classmates were coding by hook or by crook, I was reading papers- adamant that I wanted to better understand machine learning before trying my hand at it. This was a flawed strategy.
Domingos (2012) states that "...developing successful machine learning applications requires a substantial amount of black art that is hard to ﬁnd in textbooks." Am I discouraging reading? No. You will be expected to read widely to remain relevant in this fast-paced sector. However, I found reading prompted by challenges encountered during practical experimentation to be infinitely more valuable than reading in pursuit of a theoretical understanding. Read enough to ensure that you can motivate your approach and explain your results, but don't get lost in the theory before gaining practical experience.
Machine learning algorithms can perform numerical gymnastics far beyond the capabilities of our human brains. It can do amazing things with data, but it cannot distinguish between good data and bad data- it simply works with what it is given. Algorithms trained on biased or non-representative data won't yield desired results. This makes addressing the shortcomings in data incredibly important.
Whilst this might seem obvious to researchers, it is worth noting that datasets used in machine learning are enormous (Convolutional networks used in image recognition are trained on large public image repositories such as ImageNet containing 1.3 million labelled images). Data scientists also work with data from different domains, sometimes requiring collaboration with domain experts to glean further insights into the appropriateness of the data for a given application. A 2016 survey found that data scientists reported spending about 80% of their time on preparing and managing data for analysis and only 4% on refining algorithms.
But what about the growing field of unsupervised machine learning which uses raw data? Usama et al (2017) states that some unsupervised algorithms operate as a black box; making it unsuitable for applications in which interpretability is important. Our inability to defend the process informing the outcome of unsupervised machine learning algorithms could raise legal and ethical questions; especially when used as the sole determinant in decision-making. This may very well result in there always being demand for supervised machine learning approaches requiring processed and labeled data.
Coding is a high stakes game where you don't get rewarded for getting something 99% correct- things either work or they don't. When your code does not work, you will receive error messages. Sometimes it is because a comma is missing. Other times it is because your logic is fundamentally flawed. Error messages range from explicitly telling you what to fix to cryptic statements about what may be wrong. Out of sheer frustration, I started googling my error messages. The ease with which I could find solutions online made me realize how prevalent receiving error messages were- I was not an idiot, I was just a beginner!
Online resources are plentiful, and the online community encourages collaboration. For instance, all user contributions on Stack Overflow (a popular question and answer site for programmers) are licensed under Creative Commons Attribution-Share Alike. Bootcamps also create environments that foster learning through helping. If you've been staring at your code for hours, it is often the fresh pair of eyes of a classmate that can spot a mistake you've been overlooking.
The World Economic Forum's Future of Jobs Report (2018) estimates that 54% of all employees will require significant reskilling and upskilling by 2022. Although I have been required to upskill numerous times throughout my career as an urban planner, the bootcamp challenged me to reskill in a new sector where my previous education and experience were not of immediate value.
Though humbling at times, I found venturing far outside of my intellectual comfort zone a rewarding experience. Whilst my reskilling exercise was voluntary, I believe that it has served me well in preparing for any future reskilling that may be necessary to remain relevant in a fast-changing job market.
22 Jul 2019