The Four Horsemen of the TV Apocalypse
In the grand tradition of windowing in media, starting December 2023, all my writing will be posted first on my Substack, The Mediator, and posted on Medium one week later.
Sign up for free to get The Mediator delivered to your inbox “day-and-date”!
For a revised and shorter version of this essay, please see Forget Peak TV, Here Comes Infinite TV.
“Everybody’s a dreamer and everybody’s a star
And everybody’s in movies, it doesn’t matter who you are” — The Kinks, Celluloid Heroes
Apologies, this is a long one. I’ll get right to the point.
- There is a lot of hand wringing in the entertainment business right now as it grapples with the decline of the traditional pay TV business and realization that streaming video will not be as profitable as hoped. Disney CEO Bob Iger recently called it “an age of great anxiety.”
- Every corner of the TV and film value chain is affected: talent, agencies, advertisers, networks, stations, distributors, theaters, you name it.
- One notable thing about all this angst: it has been caused primarily by disruption of the way TV and films are distributed and, to a lesser extent, changes in how they are consumed. The way they are created, however, has not changed much.
- In fact, while the Internet caused the costs to distribute content to plummet over the last decade or so, the cost to produce TV series and films has risen dramatically. It’s expensive and risky and consequently is still dominated by only a handful of big companies.
- So, what happens if the barriers to produce “high quality” content fall too?
- In this essay, I discuss four trends that, collectively, could disrupt video content creation over the next 5-10 years. Several are early, but they are not theoretical. They are all happening now.
- The explosion of short form may change consumers’ definition of quality, lowering the bar for production values; the hand-in-glove technologies virtual production and AI are on a path to dramatically reduce the labor and time required — and therefore the costs — to produce high production value content; and web3 has the potential to lessen the risk of financing production.
- I am not making a value judgment about these trends, especially AI, which is deeply unsettling to many. They are progressing whether one thinks they are good or bad.
- It is difficult to foresee all the implications now, but they could have a more profound effect on the entertainment business in the next decade than what occurred over the prior one.
Why TV (and Film) Has Been Disrupted: Changes in Distribution and Consumption, Not Creation
Before we get into what happens next, it’s helpful to understand how we got here. TV has been disrupted over the past decade. “Disruption” is an overused word, but I mean Clayton Christensen’s precise definition.
Disruption Follows a Consistent Pattern
According to Christensen, here’s how “low-end” disruption works:
- A new entrant targets a market, usually one that overserves a significant portion of its customers with a product that overshoots their needs and costs too much. Enabled by a new technology and equipped with a new business model (which are individually or collectively a “disruptive innovation”), it offers an inferior product at a lower cost.
- The incumbents dismiss the threat or compete the only way they know how — by making their product even better along traditional performance measure(s) (referred to as a “sustaining innovation”) — and cede the low end of the market.
- The upstart’s product continues to improve along the traditional measures of performance, picking off a progressively larger percentage of the incumbents’ customer base.
- The upstart also often introduces a new set of features, which, if successful, changes consumers’ definition of quality (i.e., introduces new measures of performance) and the bases of competition.
- After watching their core business erode, the incumbents finally throw in the towel and try to mimic the insurgent’s business model, if they can. Usually, they are ill-suited to do so and it’s too late anyway.
The speed and extent of this process may vary, but the pattern plays out over and again, from mini-mills and integrated steel mills; to angioplasty and stents; to digital and film photography; to AirBNB and hotels; to Waze and standalone GPS devices; to Uber and car services; to mobile phones and PCs; to Netflix and pay TV.
TV is a Textbook Example
In the Appendix, I walk through a brief recent history of the TV business and show how well it fits this pattern.
The short version is that when Netflix launched streaming in 2007, pay TV was overserving its customers with too many networks at too high a price; Netflix offered an inferior product that was, at least initially, not just cheap, but free; the incumbents not only dismissed the threat, they empowered Netflix by licensing it their content; they also tried “sustaining innovations” that failed, like TV Everywhere and increasing the quality and quantity of original programming; Netflix got progressively better (adding more content genres, original programming, etc.), picking off an ever-larger proportion of traditional pay TV subs; Netflix also introduced new features that changed consumers’ expectations for a subscription video service (like no ads, everything on demand and a clean, intuitive UI); and, after years of suffering both viewership and subscriber declines, eventually the big media companies all reluctantly launched their own streaming services, which seem likely to be neither as large nor as profitable as the eroding traditional pay TV business that they are trying to replace. And that brings us to the sticky situation in which the industry finds itself today.
The Enabling Change: The Falling Cost of Distribution
Fun history lesson or walk down memory lane, depending on your vantage point. But I glossed over something: what was the “enabling technology” that precipitated all this change? The Internet, of course. Just as occurred in countless other industries, the combination of digitization and networking unbundled information from infrastructure. This dramatically reduced the cost to distribute information goods, including video content. While Comcast needed to spend billions on hybrid-fiber coaxial infrastructure and DirecTV had to build uplink facilities and launch satellites, Netflix didn’t need to do any of that.
Just as occurred in countless other industries, the combination of digitization and networking unbundled information from infrastructure
What Followed: Changes in Consumer Behavior
Lower barriers to entry to distribute TV was the trigger. Changes in consumption behavior followed.
As mentioned above — and consistent with Christensen’s framework — once entry barriers fell, Netflix didn’t only offer the exact same product at a lower price, it introduced new features. It offered a far more intuitive user interface than cable or satellite, the ability to watch on any connected device, no ads and all content on demand, among other advanced features. It also made it much easier to sign up and disconnect service than was possible in traditional pay TV.
These new features changed consumer expectations and behaviors. Consumers started to watch most programming on demand, with the exception of sports and news, and “binge” series; expect a simple, easily navigable UI with personalized recommendations; demand high-quality streams with minimal load times and no buffering; actively seek out content without ads; watch on new form factors, like mobile phones and tablets; download content on their mobile devices; expect to pick up on any show or movie where they left off, even on a different device; and churn much more frequently on and off streaming services to manage their spending.
Figure 1. The TV and Film Value Chain
No Corner of the Long Form Supply Chain is Unaffected
An important observation is that the entire TV and film value chain has been affected even though it was primarily distribution that was disrupted.
Figure 1 shows the value chain divided into content creation, packaging, distribution and consumption. No corner has gone untouched. The decline in pay TV subscribers obviously reduced subscription revenue for the distributors (Comcast, Charter, DirecTV, Verizon, etc.). But the ripple effects are everywhere. Lower subscriber and viewership levels also reduced affiliate fees and advertising revenue, respectively, that are remitted to network groups (Disney, Warner Bros Discovery, NBCU, Paramount, etc.). These network groups were also compelled to roll out their own direct-to-consumer services, incurring massive costs, and forego licensing revenue. TV stations suffered as viewership shifted to streaming options, hurting ratings for their local programming, and powerful broadcast networks responded to the growing pressure on their businesses by shifting from paying affiliated stations to requiring them to pay. New deep-pocketed competitors entered (like Apple and Amazon) and new digital distributors/aggregators (like Roku) emerged. Retailers (like WalMart and Amazon) are selling a lot fewer DVDs (and DVD players) than they once did. Movie theaters are struggling to get consumers back in theaters post pandemic, largely because of the vast at-home streaming options. Advertisers and agencies are left grappling with less TV ad inventory to buy. Studios were forced to contend with a new paradigm for licensing content, as most streamers seek to license all rights in all windows (and, in the case of Netflix, all territories) for one “cost-plus” fee and studios no longer retain backend rights. International stations and networks have less U.S. programming to license as media conglomerates seek to retain rights for their own streaming services. Talent and agents have also been forced to confront the absence of profit participations. The sports leagues and teams find themselves negotiating with new digital distributors and contemplating the implications of a shrinking traditional TV ecosystem on the value of their rights, the reach of their franchises and the future of their own networks (such as RSNs).
What if Barriers to Create Quality Long Form Content Fall Too?
So, to repeat: the entire TV and movie business is in disarray because of disruption to the distribution model and consequent changes in consumer behavior. What hasn’t changed nearly as much is the way TV and movies are created.
For instance, despite upending nearly every aspect of the TV business, Netflix still makes TV shows and movies the same way as everyone else, reading scripts, negotiating with CAA, WME and UTA, shooting in far flung locales and paying for craft service. Netflix may claim that it uses its vast first-party consumer data to inform content creation, but there is no evidence its hit rate is better than anyone else’s.
In fact, even with the entrance of Netflix and 800-lb gorillas like Amazon and Apple, TV and film production spending is still dominated by just a handful of companies. Figure 2 shows Morgan Stanley’s estimates for 2022 content spend from the largest spenders. Although the estimates are from May and may be somewhat dated, the point is that this list looks little changed from five or even 10 years ago, other than the addition of Amazon and Netflix and a couple of mergers. Disney, Comcast (NBCU), Warner Bros. Discovery and Paramount are still at the top of the list. Figure 3 shows my estimate for share of U.S. TV revenue last year, both traditional networks and streaming (SVOD, AVOD and FAST), which underscores the point.
Figure 2. Seven Companies Still Dominate Global Video Content Spend
Source: Morgan Stanley Technology, Media and Telecom Teach In, May 2022
Figure 3. Estimated Total TV Revenue Last Year Makes the Same Point
Source: Company reports, author estimates
This handful of companies is still dominant because while the costs to distribute content have fallen, the costs to create high production value content have gone up. The primary costs are talent, both behind and in front of the camera, special/visual effects and marketing.
With so many companies vying to make great content, those costs have increased, both because of increased bidding to attract a finite pool of talent and an arms race to put ever-higher quality on screen. Ten years ago, production costs for the average hour-long cable drama were about $3–4 million. Today it is common to see dramas exceed $15 million per episode (Figure 4). Any guess how many people it takes to make a big, special/visual effects-laden movie? As shown in this great analysis by Stephen Follows of IMDb credits from 2000–2018, Avengers: Infinity War had the most, almost 4,500 people (Figure 5).
Figure 4. Many TV Series Now Exceed $15 million Per Episode in Production Costs
Figure 5. The Most Labor Intensive Movies Employ Thousands of People
Source: Stephen Follows
Producing content is also very risky, because returns are highly variable and almost all expenses are front loaded. As a result, only large companies with strong balance sheets and a large portfolio of projects are able to manage this risk. The high cost and high risk of content creates a moat.
The risk and high cost of high-quality content creates a moat
All of this raises an obvious question: what happens if the barriers to entry to create TV and movies start to fall too? What happens to the big content producers and, for an industry already struggling to adjust to changes in distribution, what are the ripple effects?
Below, I discuss four trends that could dramatically change the barriers to entry to make high quality content. These are not concepts or theories, they are all happening today. Individually, none of them may seem very transformative (except AI, especially generative AI, which is startling and scary). And some are earlier than others. But, as you read through them, think about how they stack, and what effect they may have collectively.
Also, think about how they will improve. My former colleague Bill Gurley liked to say that “backhoes don’t obey Moore’s Law.” His broader point was that some technologies are subject to rapidly declining cost-per-unit-of-performance curves and some aren’t (like those gated by the limitations of physics, the need to deploy labor-intensive infrastructure or the need for multiple parties with conflicting interests to adopt common standards). For the most part, the technologies I discuss below are gated by the sophistication of algorithms, the size of datasets and compute power — all things that have the potential to progress not just fast, but super-linearly.
The effects of these trends could be more profound than what’s happened over the prior decade. I discuss them in order of immediacy.
TikTok, YouTube and the Changing Consumer Definition of Content Quality
Let’s start with the most present threat: short form.
The absence of any discussion of short form (by which I mean the videos on TikTok, YouTube, Instagram and Facebook Reels, etc.) above is conspicuous. (I’m treating short form as synonymous with “user- generated content” and “social video.”) For while it may be accurate that the cost to make TV series and movies (what I’ll call “long form,” for convenience) has climbed over the past decade, short form video content creation is more accessible than ever. Everyone is carrying a near-cinematic quality video camera in their pockets and has access to high-quality, often free, editing tools. TikTok, in particular, makes it extraordinarily easy to create and remix content.
It’s also clearly very big. YouTube has 2.6 billion users globally and TikTok is expected to reach 1.8 billion by the end of the year. YouTube users upload more than 500 hours of video per minute — or 30,000 hours per hour, equivalent to all the content on Netflix in the U.S. By the end of last year, kids 4–18 were spending an estimated 91 minutes per day on TikTok and 56 minutes per day on YouTube globally. The average adult user in the U.S. of each of TikTok and YouTube currently spend more than 45 minutes per day (Figure 6).
Globally, YouTube generated $29 billion in ad revenue last year, and is tracking to hit $30 billion this year. eMarketer estimates TikTok will hit about $12 billion in revenue this year. Assuming both generate about half their ad revenue in the U.S., that would be over $20 billion combined, or almost equivalent to half of the entire traditional national TV advertising market in the U.S.
Figure 6. Adult Users Spend >45 Min/Day on TikTok and YouTube
Note: ages 18+; * Includes YouTube TV. Source: eMarketer, April 2022
Short form platforms also have a very different, and arguably much lower risk, business model. According to Ampere Analysis, global video content spend by Comcast, Disney, et. al., is $230 billion this year. By contrast, YouTube and TikTok compete with practically infinite, algorithmically-curated content that has virtually no cost. Sure, YouTube remits 55% of advertising revenue to creators, so it pays out billions of dollars per year — but this is all success-based, it carries no risk. TikTok has a small creator fund that might as well be $0.
Yet, most big media companies aren’t very focused on short form because it isn’t clear how much it impinges on traditional TV and movies. Short form is thought of as a “different thing” than TV, with a different use case, initiated when people don’t want (or intend) to commit to a 30 minute-or-longer show (like when procrastinating, on the train, waiting in line or just in need of a quick dopamine hit). This is evident when examining usage data. Consulting firm Activate estimates that TV viewing (defined as traditional plus streaming of professionally-produced, long form content) by adults 18+ hasn’t changed much over the last few years despite the growth of short form (what it refers to as “social video”). It also forecasts long form viewing won’t change much in the next few even as short form continues to grow (Figures 7 and 8).
Most big media companies consider TikTok a tangential threat because it isn’t clear how much it impinges on TV consumption
Figure 7. Viewing of Long Form Video Has Remained Flat…
1. Figures do not sum due to rounding. 2. “Digital video” is defined as video watched on a mobile phone, tablet, desktop/ laptop, or Connected TV. Connected TVs are TV sets that can connect to the internet through built-in internet capabilities (i.e. Smart TVs) or through another device such as a streaming device (e.g. Amazon Fire TV, Apple TV, Google Chromecast, Roku), game console, or Blu-ray player. Does not include social video. 3. “Television” is defined as traditional live and time shifted (e.g. DVR) television viewing. Sources: Activate analysis, eMarketer, GWI, Nielsen, Pew Research Center, U.S. Bureau of Labor Statistics
Figure 8. …Even as Short Form Continues to Grow
Sources: Activate analysis, eMarketer, GWI, Nielsen, Pew Research Center, U.S. Bureau of Labor Statistics
If you’re a big media company, should short form concern you? Yes, for a few reasons:
The ad money must come from somewhere. Many advertisers on social media don’t advertise on TV. But short form video is clearly competing for some brand dollars that would otherwise end up on TV.
TikTok could become a new gatekeeper for content discovery. Younger TikTok users are increasingly turning to TikTok for search. It is the ideal platform for video content discovery, both because of its focus on video and because it is perceived as authentic due to its very high participation rates (according to AdWeek, 83% of TikTok users also post videos; by contrast, YouTube has over 2 billion MAUs but only ~50 million channels, or < 3%). At the least, it is a new platform on which media companies must spend money to reach potential viewers. At worst, TikTok will end up with massive influence over what young people choose to watch.
The time spent on social may unintentionally reduce time spent on traditional media. People may open up Instagram or TikTok with the intention of only spending a few minutes, but the addictive nature of these apps makes it easy to scroll a lot longer than they planned. Just like the ad money, the time has to come from somewhere.
Short form may change the consumer definition of quality, lowering the competitive bar. This is the main point of this section, so let’s dwell on it for a moment. The premise of this essay is that the barriers to entry to create quality video content may fall. Will TikTok make it cheaper to make a movie? No, it probably will not. The chief risk for traditional media companies is that TikTok and YouTube change consumers’ definition of quality in a way that lowers the bar.
Figure 9. TikTok Research Suggest Consumers are Consciously Substituting it for TV
What the heck is “quality,” anyway? Some might define it as craftsmanship, cost of raw materials, reliability and durability, but I think a more useful definition is the combination of attributes that one considers when choosing between similar goods or services for an intended use.
Under this definition, revealed preference definitionally reveals quality preference. Let’s say someone is choosing between two identically priced Gucci and Louis Vuitton purses. If they say “I think the Louis Vuitton is better made, but I’m buying the Gucci because it’s trendier,” that means they actually think the Gucci is higher quality because their internal quality algorithm values trendiness more than craftsmanship. Importantly, this doesn’t mean that craftsmanship doesn’t matter at all, it just means that its relative importance is lower. Revealed preference is what matters to LVMH; knowing that the purse it didn’t sell is perceived to have better craftsmanship is cold comfort.
As described above, one of the consequences of Christensen’s disruption process is that insurgents often introduce new features that consumers value and, in the process, change their definition of quality. This change can sometimes be subtle. It doesn’t necessarily mean that the old attributes of quality are no longer relevant, but that the weighting of these attributes is falling. We may not be conscious of it, but consumers change their definition of quality all the time. Think about how AirBNB has changed the definition of quality in lodging. Cleanliness, location and customer service are all still important attributes of “quality,” but for some people there are now new attributes, like a full kitchen, much more space or a quiet neighborhood. In TV, Netflix ingrained new measures of quality too. The emotional effect of the content is still important (surprising, exciting, dramatic, funny, etc.), but now new attributes are also important, like having all the episodes available or being ad-free, among other things.
So, the simple question is whether short form is changing the definition of quality again in way that lowers the competitive bar. Most studio executives equate TV and movie quality with very high-cost attributes: high production values; established, well-known IP; brand name directors, show-runners, actors and screenwriters; and expensive effects. Short form obviously falls, well, short on these attributes. But it ranks much higher on other attributes, like virality, surprise, digestibility, relevance to my community and personalization. Turns out that these attributes cost a lot less. To the extent that consumers consciously substitute short form for traditional TV, this reveals that their definition of quality is shifting toward lower-cost attributes, and, in the process, lowering the bar. It seems like this is what’s starting to happen. As shown in Figure 9, according to TikTok, as of March 2021, 35% of users were consciously — and therefore intentionally — watching less TV since they started using TikTok.
The chief risk from TikTok is that it changes the consumer definition of quality and lowers the bar
As mentioned above, most media companies don’t think of short form as a threat. To the extent that this content doesn’t really compete with TV and movies, it isn’t. But if short form is reducing the importance of the traditional, expensive markers of content quality and the quality on platforms like TikTok also goes up, then it is.
How will the quality on TikTok and YouTube go up? Let’s keep moving.
Virtual Production and Falling Production Costs
Virtual production is an emerging film and TV production process that promises to greatly increase efficiency and flexibility. But it is a double-edged sword: it may both lower production costs for incumbent studios and entry barriers to create quality video content.
The Traditional Production Process is Linear
To understand the significance of virtual production, you have to start with the traditional TV or film production process. Simplistically, it proceeds in distinct, linear phases: from pre-production (storyboarding, casting, refining the script, scouting locations) to production (principal photography) and finally to post-production (editing and visual effects (VFX)). VFX involves adding elements to the film that weren’t there during shooting, most of which today is computer generated imagery (CGI or often just CG). Below is one of those fun clips showing how foolish actors look emoting in front of a green screen, contrasted against the final cut.
Virtual Production is Continuous and Iterative
Virtual production (VP) uses technology to enable greater collaboration and iteration between the traditional phases of production (and blurs the boundaries between them). The idea is that every visual element within a frame — characters, objects and backgrounds — is a digital asset that can be adjusted in real time. The key enabling technologies are massive increases in computing power and real-time 3D rendering engines, namely Epic’s Unreal Engine (UE), Unity and Nvidia Omniverse, which have quickly emerged as industry standards.
In VP, VFX artists (modelers, designers, animators, etc.) are involved from the beginning. They develop 3D models before shooting (as opposed to compositing in CG afterwards), which allows directors and directors of photography (DP) to better visualize the shooting environment than a storyboard. Rather than discard this previsualization work, those digital assets can be used in the actual production. In addition, the use of real-time rendering makes it possible to see and manipulate these assets “in camera” during the shoot, in real time. This allows the director and DP to see the shots and adjust sets, shooting angles and lighting on the fly. It also enables actors to better visualize the final scene than when shooting in front of a blank screen. Importantly, the digital assets created during this process can be repurposed in sequels, prequels or other productions. They can also be easily ported to “non-linear” experiences, like gaming, VR/AR or (presumably one day) the metaverse.
Use Cases: Progressing From Hybrid Live Action to Fully Digital
Right now, VP is being used primarily to augment the live action production process, but the arc is toward all-digital productions over time.
Hybrid digital/live action. The current state-of-the-art is the use of LED screens that wrap around a soundstage, including the ceiling, called a “volume,” which depicts the set as it will look on screen. As opposed to green screens, this enables the entire cast and crew to see the set during shooting. It also obviates the need to travel to different locations, worry about weather or squeeze in a shoot during fleeting lighting conditions. In this case, a video is worth a million words; watch this explanation of the use of VP during the shooting of The Mandalorian.
The upfront cost of building a volume is still very high, the workflows are still new and bumpy and filmmakers/showrunners have to embrace it, but VP promises to reduce production costs for a number of reasons: more efficient shooting schedules (i.e., the ability to get through more pages per day and reduce the time required of actors); no location and travel costs; the ability to re-use assets and sets on other productions and sequels; elimination of re-shoots, which can sometimes account for 5–10% in cost overruns; and less time in post production.
It’s hard to get at the potential cost savings from VP, but some estimates peg them at 30–40% of production cost, or more. Some of these savings may end up on the screen, as directors use the technology to expand the scope of their productions. But more bang for the buck is good either way.
VP can cut production costs for hybrid digital/live action projects by 30–40%
Sounds pretty good. But turning our attention next to fully digital productions gives a sense of where the technology is headed.
Fully digital. The frontier in VP is productions that are fully digital, meaning there is no set at all. In this case, all the assets and even people are created digitally and the entire production occurs within the engine. (Although the characters’ movement and facial expressions may be mapped to motion capture hardware worn by real actors and their voices are also likely real, at least for now.)
This behind-the-scenes description of a Netflix short produced using real-time rendering is, again, worth a lot of words.
Importantly, all of the people in this short are actually MetaHumans, Unreal Engine’s photorealistic digital humans. Creators can use (and alter) dozens of pre-stocked MetaHumans or create custom MetaHumans using scans, as was done for this short. Unity’s digital humans are even more impressive (see below).
Keep in mind that the quality of rendering is gated by compute power. As GPUs get more powerful (and/or UE and Unity support multiple simultaneous GPUs, as Omniverse already does), these digital humans will become progressively indistinguishable from real people.
Here’s another video, The Matrix Awakens demo created by Warner Bros. and Epic. The video is long, but worth watching. The keys here are severalfold: 1) this video was rendered real-time in UE5 on a PS5 and XBox Series X; 2) it is very difficult to distinguish between which of these characters are real and which aren’t, but everything from about the 2-minute mark on was created in the engine — every car, building, street, lamppost, mailbox and person, even Keanu Reeves and Carrie Ann Moss (albeit mapped to motion capture output); and 3) the transition between the linear story and the gameplay is seamless.
Real time rendering is a very powerful tool that may fundamentally change the cost structure of making high-quality filmed entertainment. But to get a real sense of the potential, it’s helpful to layer on the next piece, AI.
AI and Even Faster Falling Costs
AI is clearly having its Cambrian moment and generative AI, in particular, is rightfully getting a lot of attention. The prospect of art created with little or no human involvement is deeply unsettling to a lot of people, including me. The point of this section, however, is that you do not need to believe we are headed for some soulless, artless dystopia to realize that AI, especially in concert with VP, could meaningfully lower production costs in the relatively near term.
The Generative AI Dystopia
In just the last two years, there has been an explosion of activity in text-to-image, including the introduction of OpenAI’s DALL-E and subsequent release of DALL-E 2; the open source Stable Diffusion and more recent launch of Stable Diffusion 2; and Midjourney. The creepy image at the very beginning of this essay was created by Stable Diffusion 2 in response to the prompt “The Four Horsemen of the Apocalypse.” It’s not quite what I envisioned — note that it interpreted “horsemen” as combined man-horses, not as men riding horses — but it is suitably foreboding and apocalyptic.
Only two months ago, Facebook announced Make-a-Video, an AI based text-to-video generator that turns a text string into a short video clip, and a week later Google announced the similar Imagen Video. Neither are available to the public yet, but given the startling pace of development, they give a sense of what’s coming.
And just a few weeks ago, OpenAI released ChatGPT, a shockingly convincing and agile chatbot, which is causing everyone to freak out, for lack of a better phrase. My Twitter feed is full of creative prompts and more creative responses. Here’s my favorite:
Let’s acknowledge, and quickly dispense with, the dystopian, generative-AI doom-loop end state: ChatGPT-X, trained to generate, evaluate and iterate storylines and scripts; then hooked into Imagen Video vX, which generates the corresponding video content; which is then published to TikTok (or its future equivalent), where content is tested among billions of daily users, who surface the most viral programming; which is then fed back into ChatGPT-X for further development. (H/t to my brilliant former colleague Thomas Gewecke for this depressing scenario.) New worlds, characters, TV series, movies and even games spun up ad infinitum, with no or minimal human involvement. It’s akin to the proverbial infinite monkey theorem.
Whether this is possible or desirable and all the implications are beyond the scope of anything I want to think about. It’s also probably (hopefully?) very far away. We’ll leave the singularity for another time.
Here and Now
The near-term relevance of AI (including generative AI) is not how it will replace human creativity, but how it will speed and enhance the production process.
In parallel with all the excitement about generative AI, there has also been a quieter wave of AI content production technologies and tools over the last year or two (some of which you would also call “generative”). These include:
- RunwayML, which uses AI to erase objects in video, isolate different elements in the video (rotoscoping) and even generate backgrounds with a simple text prompt. Again, a video is better than a description.
- DreamFusion from Google and Magic3D from Nvidia, which are text-to-3D models models (say that five times fast). Type in “a blue poison-dart frog sitting on a water lily” and Magic3D produces a 3D mesh model that can be used in other modeling software or rendering engines like UE, Unity and Omniverse.
- Neural Radiance Field (NeRF) technology, which enables the creation of photorealistic 3D environments from 2D images. See the short demo of Nvidia’s Instant NeRF below or check out Luma AI.
- AI-based motion capture software, such as DeepMotion and OpenPose, which convert 2D video into 3D animation without traditional motion capture hardware.
- There has been academic research on AI-based auto-rigging, which would automatically determine how digital characters move based on their anatomy.
- There are also several enterprise applications, like Synthesia.io, which provide AI avatars that will speak whatever text is provided and even offers customized avatars. Send in a few facial scans, and it will send back an avatar of the subject that can then be used to deliver any written text, in any language.
- Deepdub.ai, which uses AI to dub audio into any language, using the original actor’s voice.
- Lastly, do yourself a favor and go to thispersondoesnotexist.com and hit refresh a few times. None of these very real looking people are real.
The Near Future
Many of these tools are clearly imperfect. The avatar from Synthesia definitely falls into that off putting uncanny valley. Perhaps the 2D motion capture doesn’t seem that crisp. But, here’s the thing: all of this will keep getting better, very quickly. As mentioned above, the gating factors for improvement in all these tools is the size of datasets, the sophistication of algorithms and compute power, all of which are advancing fast.
Real-time rendering engines and AI-enhanced tools make it plausible that very small teams can create very high quality productions
The trajectory here is clear: combining real-time rendering engines and these kinds of AI tools will make it possible for smaller teams, working with relatively small budgets, to create very high quality output. The average TV show requires ~100–200 cast and crew in a season and some a lot more than that. In its first season, for instance, House of the Dragon lists 1,875 people in the cast and crew, including over 600 in visual effects. What if eventually comparable quality could be achieved with half, or one-third or one-fifth as many people?
The timing for different content genres to shift a larger proportion of production into VP will likely depend on consumers’ expectations for video fidelity and photorealism. Real-time rendering engines have been used to develop games for years because gameplay is generally more important that visual quality.
In long form video, one of the earliest candidates for widespread adoption is animation. Traditionally, the workflow in animation is also sequential, similar to live action: storyboarding; 3D modeling; rigging (determining how characters move); layouts; animation; shading and texturing; lighting; and finally, rendering (pulling all of that work together by setting the color of each individual pixel in each individual frame). Rendering is especially time consuming and expensive. Consider a 90-minute movie. With 24 frames per second, that’s ~130,000 frames, each of which takes many hours to render. (Every frame in this scene from Luca took 50 hours to render.) This is performed in render farms and even though many frames are rendered simultaneously, it can take days or weeks to come back. Any adjustments will need to be rendered again. Taking the entire process into account, most Pixar films take 4–7 years to complete. By contrast, using VP, teams can be smaller, since artists can wear more hats, and it becomes relatively trivial to make adjustments, including lighting, colors and perspective, on the fly. (To be clear, 3D engines are not producing photorealistic renders in real time today, so the final frames will still likely need to go out for offline rendering. But the key is that real-time rendering allows experimentation and iteration on the fly. And real-time rendering will continue to improve.) Spire, a new animation studio co-founded by Brad Lewis, producer of Ratatouille, is currently working on a full-length feature created entirely in UE, called Trouble.
The next likely candidate will be CG-intensive live action films, where humans are heavily doctored. As you can see in the behind-the-scenes video I embedded above about The Mandalorian, there are still a lot actors walking around the volume, including the Mandalorian himself, albeit not saying much. Over time, a growing proportion of the footage in these kinds of series and films will likely be produced without actors, other than motion capture. Eventually, even that may be unnecessary. When you watch the Mandalorian walk around in his helmet, Thanos snap his fingers or the Na’vi swim with whales, it raises the question of whether you will need humans in these kinds of series and films at all in five years.
What about a drama or romance with a lot of nuanced emoting? It might take awhile before you could or would even want to supplant Meryl Streep with a MetaHuman. The savings might not be worth it. But will it eventually be technically possible to do a series of facial scans of an actor, then have him voiceover the entire script and have his corresponding MetaHuman do all the “acting,” where the director could manipulate his gestures and facial expressions to get the precise take she wants? For that matter, will is be possible to train an AI on the footage of every Angelina Jolie movie ever, including her voice and facial expressions, license her likeness, and then create a new film starring a 28-year Angelina Jolie, starring opposite a 32-year old Paul Newman, all in the Unreal Engine? The way things are headed, it probably will.
Web3 and a New Financing Model
This is the last piece of the puzzle: financing.
As mentioned before, producing TV and movies has a high barrier to entry not just because it is expensive, but because it is risky. Arguably, anything expensive is risky, but there are couple of additional risk factors with content:
- 1) Returns exhibit power law dynamics, meaning they are highly variable. A rule of thumb in Hollywood is that 80% of films lose money (although maybe it’s more like half) and historically about 60% of broadcast network TV series fail to make it past the first season, which is a good proxy for “failing.” Most projects that don’t lose money will still fail to earn their cost of capital or eke out meager returns.
- 2) The investment is front loaded. You need to spend a lot of money to create an entertainment asset and then a lot of money to market it before you find out if an audience will even show up.
Contrary to popular belief (and with all due respect to the development people that have the vision to option the right projects), movie studios don’t make movies; they attract the talent that makes movies. And they attract this talent in large part by absorbing risk. But web3 may reduce the need for studios to absorb risk.
Movie studios don’t make movies, they attract the talent that makes movies — in large part by absorbing risk
Crowdfunding on Steroids
It’s a tough time to be a crypto bull. Between the decline in cryptocurrency prices and a related spate of bankruptcies, scandals and meltdowns (FTX, Celsius, 3 Arrows Capital, Terra/LUNA, etc.) public and institutional interest and trust in crypto has (rightfully) fallen.
There is, however, a big difference between “money crypto” and “tech crypto.” As the former has imploded, the latter keeps marching along, with continued technological advancements (the Merge, anyone?), venture capital infusions and project launches. Undoubtedly, the current crypto winter and high levels of disillusionment reduce the momentum of any of the initiatives described in this section. But whether you are a firm believer that there is unique utility, and inevitability, of the decentralized Internet or complete skeptic, here’s the concept: web3, by which I simply mean applications that are facilitated by the combination of public blockchains and tokens, enables what you could call “crowdfunding on steroids.”
Crowdfunding content isn’t new. It’s been done for years on Kickstarter and Indiegogo. The highest profile example is the reboot of Veronica Mars, which raised $5.7 million on Kickstarter from 90,000 fans for a new film, seven years after the series went off the air. For the most part, these campaigns only work for established IP with a large pre-existing fan base. They also usually are positioned as donations, not investments, or offer trivial incentives, like merchandise, autographs, movie tickets or DVDs, not profit participation or any governance rights. The JOBS Act, signed into law in 2012, paved the way for several regulations about online crowdfunding, but these options either: 1) are limited to accredited investors; 2) have relatively onerous reporting requirements; or 3) are restricted to raising no more than $1 million within a 12-month time frame. They are also illiquid.
The combination of tokens and public blockchains provides several benefits:
- Governance and other perks. Tokens can be structured such that token holders (or holders of specific classes of tokens) can vote on significant decisions (including the direction of storyline itself, sort of a communal “choose-your-own-adventure”). They can also provide token-gated perks, such as member-only Discord servers, or early or exclusive access to content and merchandise.
- Distributed creation. Tokens can also enable decentralized content creation, where holders have both the right to contribute to the creative process and receive tokens as compensation for contributing. It has yet to be seen whether this model will produce radically new (and compelling) stories or will suffer from too many cooks in the kitchen. (Ultimately, there still likely needs to be someone, or a small group, in charge.)
- Graduated financing. As mentioned above, the typical model for many traditional content projects is to invest tens of millions in production and tens of millions more in marketing before finding out if anyone’s going to show up. Web3 projects enable creators to build community first (such as through initial NFT projects) and use subsequent NFT sales to fund additional content projects.
Web3 inverts the traditional risk profile of content production; rather than spend heavily to build IP and then try to find an audience, it builds the community first and then develops the IP
- Social signaling. The tokens themselves, which can be showcased publicly, may provide social currency. For instance, the early backers of a project can display their tokens as proof-of-fandom.
- Economic participation with liquidity. People are fans because they are passionate about something. Tokens can supercharge that fandom by providing something new: an economic incentive. Tokens can be structured with direct profit participation rights or fractionalized IP ownership. Or tokens may simply be limited collectibles that will likely rise in value if the associated IP succeeds. And they are liquid. An economic incentive will likely turn fans into even more ardent evangelizers.
A Few Examples
There are enough examples of blockchain-based, community-driven film and TV development that it has earned its own moniker, Film3. A lot of these are just experiments at this point, but they give a sense of the potential. Here are a couple of the highest-profile examples:
Aku World revolves around Aku, a young Black boy who wants to be an astronaut. It originated as a series of NFTs that told Aku’s story, which raised $10 million, and was followed by a series of Aku profile picture NFTs (PFPs) that raised an additional $34 million. (Unfortunately, the latter funds were lost due to a flawed smart contract, but that’s besides the point. Web3, amiright!?) Aku was the first NFT project that was optioned for a film and TV project and the founder reportedly intends to give the community input into the future development of the IP.
Jenkins the Valet is the name and persona that the owner of a Bored Ape Yacht Club (BAYC) NFT assigned to his ape, which he developed by writing stories about Jenkins’ exploits. The company subsequently formed behind Jenkins, Tally Labs, has issued a series of governance and utility tokens. The tokenomics are somewhat complicated, but the basic idea is that the NFTs enable holders to influence the direction of the IP, which so far has included voting on the plot of a novel, and license their own apes to be included in IP, in exchange for royalties. These tokens can also be used to claim other tokens, which can then be staked or claimed for additional NFTs and governance rights (like I said, complicated). Jenkins has signed with CAA, with the intention to develop other media properties, including film and TV.
Shibuya is a platform for creating and publishing video content, which enables creators to provide governance rights and direct IP ownership to fans. Its first project is White Rabbit; fans can vote on the plot development of each chapter and, when completed, ownership will be converted into a fractionalized NFT. Last week it raised $7 million, led by a16z and Variant.
A Rough Cut of the Implications of Falling Production Costs
In 2005, I wrote a research note arguing that “IPTV” (the terms “over-the-top” and “streaming video” weren’t in use yet) wasn’t a threat to the traditional pay TV business (Figure 10), mostly because it wouldn’t be possible to offer a competitive product at a comparable price for a very long time. We all know how that went. I was wrong for two reasons: 1) I didn’t appreciate how fast cost curves could fall; and 2) I didn’t understand the disruption process, and how an inferior product can get a foothold at the low end of the market and completely upend an industry.
Figure 10. A Bad Prediction
As I described at the beginning, disruption happens when new entrants, enabled by new technologies and business models, enter a market with an inferior product that gets progressively better. When you pull together the trends I described above, the seeds of the disruption of content production are clear:
- Short form/user-generated content is changing the consumer definition of quality to include lower-cost attributes, blurring the traditional distinction between “low” and “high” quality, and increasingly competing with long form content, especially for younger viewers;
- The hand-in-glove technologies of virtual production and AI are making it increasingly feasible for small teams or even individual creators to make quality content with relatively small budgets;
- There is a possibility that web3 will break Hollywood’s historical stranglehold on financing.
If you went back 15 years ago and tried to predict the implications of the disruption of video distribution, you probably wouldn’t have pieced together what’s happened since. It’s mind boggling to think about what may happen if content production follows a similar path. But here are some first order (and obvious) effects:
Every aspect of the TV and film business will be affected. As noted above, the disruption of TV distribution has had ripple effects everywhere. The disruption of TV and film creation would undoubtedly also affect every part of the supply chain.
There will be a lot more “high quality” content and hits will emerge from the tail. As mentioned above, every hour YouTube creators upload 30,000 hours of video, the equivalent of Netflix’s entire domestic library. Who knows the comparable number on TikTok? There is not only an insatiable demand to consume content, there is apparently also an insatiable demand to create it. The vast majority of this is crap. If the average quality of this tonnage lifts, however, and even a tiny percentage breaks through, it could meaningfully increase the supply of what we currently consider quality video content.
Think about it this way: today, there are relatively few companies in Hollywood that make the vast majority of TV series and films and there are relatively few people at these companies that work in development and even fewer that make greenlight decisions. How many? Maybe 100, 200 max. Is it likely that this small group of people collectively has greater creative intuition than an untold number of potential creators?
Does the small number of people with greenlight authority across Hollywood have better collective creative intuition than a vast number of independent creators?
This is already what occurs in music. It was recently announced that 100,000 tracks are uploaded to streaming music services each day, the overwhelming majority of which get no traction. But almost all of the new breakout acts of the last few years — like The Weeknd, Billie Eilish, Lil Uzi Vert, XXXTentacion, Bad Bunny, Post Malone, Migos and many more — emerged from the tail of self-distributed content.
There will be far more diverse content. If it sometimes feels like every TV show and movie is a reboot, prequel, sequel, spinoff or adaptation of established IP, that’s because a growing proportion are. This article shows the data for TV and movies; Ampere Analysis also recently reported that 64% of new SVOD originals in the first half of 2022 were based on existing IP. This reliance on established IP is an understandable risk mitigation tactic by the studios, especially as the costs of content and the stakes for delivering hits rise. If the trends I described above continue to play out, studios may become more risk averse and lean even more heavily on established IP. The collective tail will be much more willing to take creative risk and experiment with new stories, formats and experiences. It will also, by definition, have much more diverse creators.
Curation will become even more important. As I wrote about here, value flows toward scarce resources and truly disruptive technologies tend to change which resources are scarce and which are abundant. Prior to the advent of the Internet, content was relatively scarce because there were high barriers to entry to distribute it (such as the need to lay fiber and coax, own scarce local spectrum licenses or build printing facilities). There wasn’t much to curate, so curation — like local TV listings, TV Guide or Reader’s Digest — was “abundant” and extracted little value. The Internet flipped this dynamic, making content abundant and curation scarce and valuable.
There is no better example than the news business, where the barriers to entry to create content were always low. Once distribution barriers also fell, there was an explosion of “news” content (from bloggers, independent journalists, the Twitterati, local and regional newspapers distributing globally and digital native news organizations) and the bulk of the value created by news content is actually extracted by the curators/aggregators of news (Google, Meta, Apple News, Twitter, etc.), not news organizations.
In long form video, this value shift hasn’t occurred because even after distribution barriers fell, content creation barriers remained high. (The outer wall of the fortress fell, but the inner one still stands.) A similar explosion of quality video content would cause value to shift to curation, as consumers find it exponentially harder to wade through all their choices and become less reliant on only a handful of big content creators.
A new way of creating content may enable (and necessitate) a new way to monetize it. Of course, the degree to which costs will fall is both critically important and unknowable. If it becomes possible to create a Pixar-quality film with half the team, half the budget and half the time, what happens then? Maybe not that much changes. It probably gets financed independently, picked up by Netflix and distributed (and monetized) like everything else. What if costs fall 75%? 90%? What if you could make a high quality TV series for $500,000 an episode, not $5 million? $50,000? As costs fall, new monetization models become possible. Maybe ad revenue is enough? Perhaps single sponsors (as we head back to the days of soap operas) or product placements? Perhaps microtransactions? Maybe fractionalized NFTs, where the creators get paid by retaining a significant portion of the tokens? Maybe abundant, low-priced video content becomes top-of-funnel for some other forms of monetization for the most committed fans?
Counterintuitively, the most expensive content may be affected soonest. As mentioned above, one of the content genres that will benefit soonest from the combination of VP and AI is CG-heavy live action films and series. These are also the most expensive productions (look again at Figure 4). The good news for studios is that these tools could meaningfully reduce production costs for these kinds of projects. The bad news is that they may also lower entry barriers for the highest-value content.
The most valuable franchises may become even more valuable. With new tools and lower costs, many creators will want to dream up entirely new stories. A lot will also probably want to expand on their favorite fictional worlds, whether Harry Potter, the MCU or Game of Thrones — or create mash-ups between them. Historically, Hollywood has guarded its IP closely and has been more inclined to view fanfiction as copyright infringement than enhancement. But progressive rights owners would be wise to harness all the potential creative energy, not stifle it.
Last embed, I promise. This video shows a small team — actually, it is mostly one guy — using AI tools to create their own version of the animated Spiderman: Into the Spiderverse, incorporating other live action footage from MCU films. The video is long, but if you watch the first few minutes and then the movie he put together (which starts at about the 19:45 mark), you get the point. It exemplifies a lot of of what I’ve discussed above.
The Good News? It’s Early
Given all the dislocation that has occurred from the disruption of the distribution model, disruption of the content creation model would probably result in an industry that looks almost nothing like it does today.
What should studios do? That probably requires another essay, but a few things come to mind:
The big media companies’ current predicament could be summarized this way: the tech companies became media companies before the media companies could become tech companies
Embrace the technology. The big media companies’ current predicament could be summarized this way: the tech companies became media companies before the media companies could become tech companies. Hollywood has a very spotty record with new technologies. It doesn’t embrace them, it goes through something like the five stages of grief: denial, dismissal, resistance (often through legal means), “innovation theater” (as they go through the motions of embracing a new technology, but really don’t) and capitulation. Hollywood should embrace VP and AI to capitalize both on the greater cost efficiency and the optionality of having every visual element warehoused as a reusable, extensible digital asset.
Put a different way, the trends I described above may be inevitable, but disruption is not. Disruption describes a process by which incumbents ignore a threat until it is too late. That doesn’t mean the incumbents have to repeat this pattern.
Lean into fanfiction. As mentioned above, with a democratization of high quality production tools, many independent creators will want to expand on their favorite IP, especially those with rich, well developed worlds. Rather than resist, IP holders should think of their IP similarly to the music industry. Perhaps a framework will emerge similar to “publishing rights,” that enable video IP rights owners to monetize third-party exploitation of their work?
Look to the labels. Historically, the music labels controlled every aspect of the business, including A&R, artist development, production, distribution and marketing. Today, many of those roles have been supplanted by technology. Anyone can set up a recording studio in their bedroom; anyone can self-distribute on streaming services; and artists market through their social followings. But labels have maintained their primacy, in large part by helping artists negotiate the incredible complexity of the business and leveraging the bargaining power of their artist rosters and deep libraries. The analogy is imperfect (for instance, library is a lot more important in music than video, giving the labels a lot of bargaining leverage) and streamers’ business models would probably need to change, but the labels provide a hopeful model for how to pivot.
The good news is that it’s still early.
Appendix: A Brief Recent History of Pay TV
The recent history of the pay TV business is a textbook case of disruption.
- In 2007, Netflix first offered its DVD-by-mail subscribers the ability to stream some library movies and TV shows. Initially, the product was so bad it didn’t even warrant a separate price. The incumbents, both networks and pay TV distributors, mostly ignored it. (The exception is that in 2007 NBC and News Corp. launched Hulu — and Disney soon joined — to offer their broadcast content online and on demand, but its growth was ultimately hamstrung by misaligned interests between Hulu and its corporate owners and unwieldy governance.)
- In 2008, Netflix somehow convinced Starz! to license its full catalog of movies and series, on demand, including Disney and Sony pay-one (1st pay TV window) movie rights, for less than $30 million per year (even though it was generating more than $300 million per year licensing the same content to pay TV distributors). Netflix was starting to look like a real threat.
- In 2009, Time Warner CEO Jeff Bewkes and Comcast CEO Brian Roberts jointly announced TV Everywhere, an effort to enable pay TV subscribers to access on demand content online at no extra charge (a “sustaining innovation”). (Full disclosure: At the time, I believed that improving the price/value equation of traditional pay TV would slow Netflix’s advance. I was wrong.) Although widely deployed, it ultimately failed to address the underlying problem: consumers were paying too much for too much pay TV.
- Not only did the incumbents fail to move down market to compete, they gave Netflix a big leg up. In 2010, NBC struck an output deal to license off net dramas and comedies as well as Saturday Night Live episodes one day after airing. Later that year, Disney struck a comprehensive deal to provide Netflix with its (critically important) kids programming, including some content only 15 days post broadcast. Disney would later license its pay-one movie rights to Netflix. Viacom, CBS and Time Warner would eventually jump on the licensing bandwagon too.
- After saying that it was “quite unlikely” Netflix would step outside its “circle of competence” and produce original content, in 2012 Netflix launched its first original series, Lilyhammer, and the following year, released its first bona fide original hit, House of Cards. A few years later it began producing original films too, including its first film released in theaters, Beasts of No Nation.
- Traditional TV networks pursued another sustaining innovation, dramatically increasing investment in original programming. According the FX Networks research, between 2010 and 2017, original scripted series on basic and pay cable and broadcast almost doubled, from 212 to 370 (Figure A1).
Figure A1. Traditional Networks Dramatically Increased Original Programming
Note: Culled from Nielsen, Online Services, Futon Critic, Wikipedia, Epguides, et al. Online Services = Amazon Prime, Crackle, Facebook Watch, Hulu, LouisCK.net, Netflix, Playstation, Seeso, Sundance Now, Vimeo, Yahoo, and YouTube Red. Excludes library, daytime dramas, one-episode specials, non-English language/English-dubbed, children’s programs, and short-form content (< 15 mins). Source: FX Networks Research.
- All the while, there were growing signs that pay TV subscribers were shifting viewership over to SVOD services and “cutting the cord” to fully supplant pay TV with SVOD, especially Netflix. Pay TV subscribers started declining for the first time in 2013.
- As Christensen’s framework predicted, Netflix also changed consumer expectations for the experience of watching TV. All content was offered on demand, with no ads (at least until recently). Netflix continuously invested in, and improved, its digital product, which was vastly better than the klugey electronic program guide on cable and satellite set tops. Consumers now consider this type of UX to be table stakes.
- As the pressure on pay TV mounted, eventually the media conglomerates capitulated, launching their own streaming services: CBS All Access (the predecessor to Paramount+) in 2014, HBO Now (the predecessor to HBO Max) in 2015, Disney+ in 2019, etc.
In the U.S., in 2021 Netflix ended the year with an estimated 67 million subscribers and generated an estimated $12 billion in revenue. By contrast, Pay TV subs (including virtual MVPDs) ended last year at about 83 million, down from a peak of 100 million in 2013.
Figure A2. Over Time, Netflix Moved Upmarket and Successively Picked Off More Customer Segments