One of my biggest “aha” moments when I was just starting out in machine learning came when I realized how many of the everyday phenomena can be turned into numbers, and thus made conducive to all the tools of mathematical analysis and optimization that I was familiar with. I come from the “hard science” background, and in my mind only the “fundamental” physical quantities could, or even should, be represented in a numerical fashion. All attempts to bring this approach to the “messiness” of everyday life and phenomena that we deal with in our daily routines seemed naive at best, and utterly misguided at worst. But people did this sort of thing anyway, and most amazingly it really worked. One-hot-encoding one’s employment status helped get a better credit score. Representing shopping lists as vectors lead to more useful product recommendations.
This approach was even more astonishing when brought to the kinds of datasets that we have the most immediate and visceral relationship with: text, images, sound. Models built upon “vectorizing” those sorts of datasets showed remarkable power and robustness, at first gradually, and then with the advent of deep learning ever more rapidly, and eventually explosively.
All of the recent advances in AI are “just” downstream consequences of what we get when we vectorize our data and unleash highly optimized algorithms and enormous computational power on it. According to the insights form the people in the field, there is no end in sight to how much we can squeeze out of this approach: more data will get us bigger and better vectors, which we’ll train with an increased amount of compute to get even better models. And these bigger models are not just gradually better, they are meaningfully and impactfully qualitatively different.
IMHO, getting better vector representations of the blind spots of these models will also be the key for them to go beyond their current limitations. For instance, we could find out where these models make factual errors, and try to either get more and better datasets for those areas, or try to find better features that are associated with those ares. There is no reason to generate more data in the areas where an outcome can be represented in terms of a simple deterministic process. Encoding a simple decision tree as a vector is straightforward and easy to implement. Even in the area of very deep learning, feature engineering still has a lot of value.