With Sora, AI Video Gets Ready for its Close Up

Video is a Solvable Problem

Doug Shapiro
3 min readMar 4, 2024

Starting April 2025, all full posts, including archived posts, will be available on my Substack, The Mediator.

Still of Sora video, prompt: “Extreme close up of a 24 year old woman’s eye blinking, standing in Marrakech during magic hour, cinematic film shot in 70mm, depth of field, vivid colors, cinematic”

In December 2022, I wrote The Four Horsemen of the TV Apocalypse. It argued that GenAI, among other technologies, could prove disruptive to Hollywood by blurring the quality distinction between independent/creator content and professionally produced content “over the next 5–10 years, resulting in ‘infinite’ quality video content.” (For more about this topic, see How Will the “Disruption” of Hollywood Play Out?, AI Use Cases in Hollywood and Is GenAI a Sustaining or Disruptive Innovation in Hollywood?)

A year ago, this was an abstraction and a theory. With OpenAI’s release of Sora last week, all of a sudden it is a lot less theoretical. It also seems like “5–10 years” was an overestimation. Things are moving wicked fast.

In this post, I discuss why Sora is such a big deal and what’s at stake.

Tl;dr:

  • When discussing the potential for AI to disrupt Hollywood, it’s important to be clear what both disrupt and AI mean.
  • By disrupt, I mean Clay Christensen’s low-end disruption, which provides a precise framework to explore how it might occur and the implications.
  • All “AI” is not the same. Some AI tools are being used as sustaining innovations to improve the efficiency of existing workflows. AI video generators (or “X2V” models), like Runway Gen-2, Pika 1.0 and Sora, have far more disruptive potential because they represent an entirely new way to make video.
  • A lot’s at stake. Christensen didn’t specify the determinants of the speed and extent of disruption, but for Hollywood it could be fast and substantial.
  • That’s because creator content is already disrupting Hollywood from the low end (YouTube is the most popular streaming service to TVs in the U.S., CoComelon and Mr. Beast are the most popular shows in the world in their genres), so X2V models may only throw gas on the fire; the sheer volume of creator content is overwhelming, so only infinitesimally-small percentages need to be considered competitive with Hollywood to upend the supply-demand dynamic; and the technical hurdles to consumer adoption are non-existent.
  • While X2V models have improved dramatically in the past year and have the potential to be disruptive, today they are not. They lack temporal consistency, motion is often janky, the output is very short, they don’t capture human emotion, they don’t offer creators fine control and you can’t sync dialog with mouths.
  • Sora was such a shock to the system because it solved many of these problems in one fell swoop. My layman’s reading of the technical paper is that the key innovations are the combination of transformer and diffusion models (using video “patches”), compression and ChatGPT’s nuanced understanding of language.
  • Sora isn’t perfect (and, importantly, not yet commercially available). But its main lesson is that video is a solvable problem. Owing to the prevalence of open source research, composability and the apparently limitless benefits of additional scale (in datasets and compute), these models will only get better.
  • For Hollywood, it was rightly a wakeup call. For everyone in the value chain, it’s critical to understand these tools, embrace them and figure out what will still be scarce as quality becomes abundant.

Click here to continue reading the full post on my Substack, The Mediator.

--

--

Doug Shapiro
Doug Shapiro

Written by Doug Shapiro

Looking for the frontier. Writes The Mediator: (https://bit.ly/3R0z7vq). Site: dougshapiro.media. Ind. Consultant; Sr Advisor BCG; X: TWX; Wall Street analyst

Responses (4)