Will AI Video Cannabalize the Entertainment Industry?
With tons of people speculating that the gold rush of generative AI tools might just be another hype-bubble similar to that of NFTs or the Metaverse, AI video generation models provide evidence to the contrary. AI video generators such as Runway and Luma’s Dream Machines have made an amazing amount of technical headway in a short amount of time, specifically in the quality and nuance of the videos generated – just look at videos like this or this. It’s clear that as a society, we have more pressing concerns than debating if this new technology is just an AI bubble. The real issue is whether these videos will entirely replace human creators of culture and the countless workers in the entertainment industry (200,000 people between the US and Canada are represented by IATSE, the union that represents people who work in theaters and film sets ) that might be out of a job because of this.
AI video generators work through the user prompting what images they would like to be generated, which they do by using text inputs that they type. The generator references the text that they wrote and picks things similar from its training model. The video generator uses NLP (natural language processing ) to find the sentiment and context of the text the user is inputting to find what visuals should be generated. During the video generation process, it looks at all iterations of similar images of the text such as animations and even stock images that are included in the dataset that the video model is trained on. With video generation, the actual video part is only one-half of it because you also have the audio portion which can vary from voiceovers to atmospheric sounds, all depending on the context of the visuals. Also, another task that has been difficult for a lot of these video generators is synching dialogue or voice with the mouth movements of the video. But even that is improving with software like Live Preview.
The last step of the video generation process is rendering the video and making certain that it is the proper visual output and has the correct format and landscaping. While this seems like a long process, it is done within a matter of minutes and not days, which would probably take an editor and graphic artist at least three times that time. The instant value proposition for anyone who needs creative work done is obvious. This is the reason why AI video generation is poised to make a huge disruption in the entertainment industry.
In 2023, the Screen Actors Guild (SAG-AFTRA) went through an extended strike against the Alliance of Motion Picture and Television Producers (AMPT) for a number issues including better compensation for SAG workers, disclosure of viewership statistics on streaming shows they are working on, and the most existential threat to the movie industry in some time, AI. Video generation tools have gotten so good that this one of the central reasons why it was taking so long for the SAG/AFTRA-AMPTP agreement to be finalized. The AI issues discussed in the guidelines are digital replicas, digital alteration, synthetic performers and employment-based digital replicas. Each issue has its own nuance but a lot of these are tied into video generators because more and more special VFX studios are starting to use tools such as Runway and Midjourney in their everyday workflow to get visual results for movie and TV productions.
While there is a shift occuring in the entertainment industry due to these AI video generators, there are still some drawbacks with this software. First, the narrative aspect is still rudimentary. It is hard to put together scenes created in these platforms without iterating the same portion multiple times. Then there is the minute details of hands and fingers sometime being out of place or sometimes not there all together. Creating individual features in faces en masse can be difficult to generate. While all of these are current problems facing these AI video generators, they are the worst they will ever be. When Runway and Stable Diffusion first released their models, there was this crude AI-generated video of Will Smith eating Spaghetti – but now in 2024 these models are producing much more sophisticated video of Will Smith eating Spaghetti. If that improvement in just two years shows anything about the trajectory that we are headed towards in terms of AI video not only the entertainment industry but as a society, we might see a world where human and AI actors are indistinguishable from each other. The scarcier question is: will audiences even care to make that distinction?