Sora and my brief take on AI generated video content
AI video is becoming really impressive, but many challenges remain
This past week OpenAI announced Sora - a new text-to-video model, that, as is often the case with OpenAI’s products, impressed the World with its abilities. And make no mistake: many of the showcased vides were indeed just stunning. They far surpass anything that I’ve seen thus far. For the past half a year or so I’ve been dabbling in AI-generated video, primarily using Runway ML and Pika Labs. The short few-second clips that those tools generate now seem puny and extremely primitive in comparison to Sora. However, the real test of Sora will be when it becomes widely available to the public. Almost any tool can be made to look cool and incredible in a carefully choreographed PR announcement.
From the Sora press release we don’t know much about how the model was trained, and almost nothing about the datasets that were used for training. Based on several tweets from experts in the field it seems that the actual model architecture is probably very simple, as we’ve learned over the past few years that simplicity is “all you need” and it scales very well to huge and heterogeneous datasets. Beyond that it has been made abundantly clear that in this case, as with many other recent cutting edge AI models, better model was in fact a direct consequence of vastly more compute. Thus we have more evidence that scaling laws are, indeed, still going strong, with no end in sight.
As already mentioned above, my first impression of Sora is that it looks absolutely stunning. Definitely a major moving of the needle forward in terms of text-to-video generation. Nonetheless, a closer look at the video revealed that there are still many issues with them, including all the same ones that have been plaguing image generation for years. I'd be happy to sacrifice almost all of that video quality for an AI video generation tool that
1. Doesn't hallucinate
2. Is generally free of weird artifacts
3. Gives me more fine grained control
4. Allows for a consistent and repeatable output and style in general
5. Allows for a compositionally and chaining of video clips
Some of these issues can be handled well enough already, and have been more or less solved for certain image generation systems. However, as has been pointed out by several people already, an ideal system would be a mid point between the “classic” cgi tools, and these new AI systems. Something that you don’t need to develop a lot of technical skills for, and doesn’t require years and years of training and expertise, and yet it can give you enough fine control over your output, control, and reliability, that would make it into a professional tool. I believe that we are still a few years away from realizing that desideratum.
And then there is a whole question of whether these tools will “replace Hollywood”. And here I am actually fairly skeptical, at least for a foreseeable future. And the reason is simple: these sorts of predictions have been around for decades, ever since Jurassic Park stunned the world with its realistic-looking cgi dinosaurs. And sure enough, the use of cgi has grown exponentially over this time frame, but has Hollywood been disturbed? Not in the least. They have been far more disrupted by the advent of the streaming services. And although many, if not most, of the highest-grossing movies over the past few decades have relied heavily on cgi, I myself find those visually overstimulating spectacles extremely vacuous and boring. I can’t even bring myself to watch any of the latest ones, even when I can get them for “free” when they come to the streaming services. And I consider myself an absolute Sci-Fi addict. Over the years I’ve gotten to the point where I primarily appreciate good story, good storytelling, high quality acting, superb cinematography, and similar considerations far more than yet another unrelatable cgi character. And I feel this will continue to be the case even more so with the uncanny AI generated content.
Sora should spin out as an entirely different company. It should specialize in video search and also offer a YouTube competitor.
Unfortunately you can't expect OpenAI to think very strategically. Sam Altman is too busy playing Venture Capital game of thrones.
I wouldn’t be so quick to count disruption of Hollywood out. Hollywood won’t disappear but movie productions will require many fewer people in the future. The moviemaking business will change, significantly, and possibly for the better, as costs are wrung out of the system.