“[Intelligent] autonomous agents are the natural endpoint of automation in general. In principle, an agent could be used to automate any other process. Once these agents become highly sophisticated and reliable, it is easy to imagine an exponential growth in automation across fields and industries.” - Bojan Tunguz, Ph.D.
That was the quote that I gave in an interview for Matt Schlicht blog back in April. Very shortly after the launch of ChatGPT and similar AI chatbot assistants at the end of last year, it became obvious that the killer feature for this technology will be getting various domain and task specific chatbots to interact with each other. Very soon thereafter a trend of "self prompting" or "auto prompting" - chaining and combining outputs form several apps and APIs to create even more combinatorially powerful and useful applications and systems - took off. Several of those efforts were released as open source repos on GitHub, and for a while they were leading in terms of popularity on that platform. (JARVIS, Auto-GPT, babyagi)
For a while it seemed that AI agents are about to take off in a massive way and disrupt many professions and segments of the knowledge economy. Unfortunately (or fortunately, depending on whom you ask.) we are now close to the end of 2023, and the agent revolution never quite materialized. In hindsight, the initial expectations were overly optimistic. In big part what stymied the agents were the same issues that hinder all LLM-based AI systems: hallucinations, various knowledge cutoffs, lack of access to external resources (and the web in particular), lack of context awareness, etc.
An even bigger issue is that people, especially technically very advanced people, often underestimate how much of every job and task is consumed by the nontechnical work. That work is often hard to even describe and articulate, let alone digitize and automate. This was the gist of one of my recent tweet:
Nonetheless, the promise of automating vast areas of work with digital agents is so alluring and the benefits so outstanding, that we’ll eventually solve many (if not most) of the outstanding issues that are still standing in our way, and probably sooner rather than later. It’s very likely that we’ll get to the more advanced agents even sooner than the next major generation of the advanced Large Language Models arrives. For instance, GPT4 is already a Mixture of Experts, meaning that there are several task-specific models behind it, each one of them trained or fine tuned on a particular subdomain/task (“experts”). We can probably get an even more useful system if we create an even higher granularity of experts, many of which don’t even necessarily need to be ML-based. For instance, you don’t need an LLM to do the basic math, and even many areas of advanced math have been largely handled by automated tools for at least a generation (Wolfram’s Mathematica comes to mind.) Learning how to accurately and effectively offload tasks to non-AI processes will be the key if my reading is correct.
At the end, I just wanted to share another open source automation tool that I just came across. AutoGen is a Python library that can automate interaction between many different LLM applications. It’s an offshoot of the FLAML project.
From the project description: “AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.”
It looks extremely promising and interesting, and I intend to play with it in the upcoming days.
Thanks for the writeup. Agree that ppl (esp. with tech backgrounds) underestimate how much nontechnical work is involved in day-to-day jobs/tasks.
Do you use any agents regularly?