Last week, the tech industry was flooded with hype and anticipation from two of the biggest companies in the AI space, Google and OpenAI. In November 2022, OpenAI released ChatGPT, a watershed moment for the new era of AI that made the technology accessible to the general public. Google, for their part, created a lot of the core technologies used in these AI models and large language models (LLMs) such as transformers, but weren’t able to replicate OpenAI’s success with ChatGPT, which attracted over 100 million registered users, with the company’s own Killer app. In other words, Google, a company that has a reputation for dominating the tech space, had all of a sudden found itself trailing in the realm of AI. 

Last Monday the OpenAI Spring Update was livestreamed via Youtube and accompanied by interesting yet sparse typography as the over 300,000 watchers waited for the keynote meeting to start. Many people were wondering what would be announced. Would it be GPT-5, the highly anticipated update to GPT-4? Or would it be a new search engine powered by AI results from ChapGPT? The latter turned out to be a honeypot okie doke rumor planted by OpenAI leadership in an attempt to find employees who may have been leaking information to popular OpenAI rumor accounts Jimmy Apples and Flowers from the Future. But the keynote wouldn’t reveal GPT-5 either – instead, it announced a new multimodal called GPT-4o. A multimodal model is a machine learning model that can process and integrate data from multiple sources, including visual (images, videos) and linguistic (text) inputs, enabling it to learn and make predictions from diverse forms of data. That’s why it’s called the 4o model – “o”  stands for Omni, meaning it does all of the multimodalities.

ChatGPT had previously had a pretty good voice app associated with it,  but the 4o model is like having Popeyes after eating the spinach. The first thing that makes it so much better is the improvement in latency – the original ChatGPT voice app on the GPT-4 model had a latency of 5 seconds, but  the 4o model’s latency is only 300 milliseconds, cutting that latency by a factor of three. It’s also a more natural sounding and capable model in comparison to the current GPT. OpenAI CTO Mira Murtai, who’s highly regarded in the AI realm but whose somewhat notorious for interview response to a question of whether or not OpenAI was trained on Youtube data became a viral meme, directed the keynote, which was accented by warm lighting and post-modern architecture, a far cry from the huge conference or stadium-style keynotes that you might see from larger tech companies such as Nvidia or Microsoft. The keynote was highlighted by a casual, engaging, and at some points, awkward demo where two employees of OpenAI had the 4o model iPhone app do a variety of tasks, such as having it look at a math problem on a board and solving it, and having it view Python code on a separate computer and working suggestions on how to fix the code. The most exhilarating (or depending on where you are on the AI spectrum, terrifying) moment of the demo was when they were just speaking to the model and making it tell stories about robots, instantly having the model switch the voice it was telling into that of an actual robot just by prompting it via the employee’s speech. The entire time there had been references to the 2013 movie Her, even by Sam Altman on Twitter before the keynote, and the voice that was used for the 4o model sounded a awful like Scarlett Johannsen’s sentient operating system from that very same movie. This would lead to a lot of post-keynote controversy that painted Altman as a shady and overbearing tech CEO after Johanssen released this scathing letter recounting when Sam reached out to her about being the voice of the 4o model:

This was met with vitriol from the general public and many people outside of the AI world, and acted as a microcosm for the ethical debates regarding AI and the creative community. Afterward, OpenAI would release this article on how they chose the voice for the 4o model and would be vindicated via a recent Washington Post article outlining how there wasn’t any wrongdoing from OpenAI and Altman regarding the voice. After the keynote, it was clear how much impact even the 4o model will have on society, ranging from real live translation to virtual tutors, plunging educational support company Chegg’s stock price into a nosedive. OpenAI leads the pack and in setting the tone on how the rest of the general public perceives and interprets AI and what it can do. To think all of these existential questions came from 26-minute keynote.

Two days after the OpenAI s keynote, Google would hold its annual developer conference, Google I/O, where they unveiled the latest updates and innovations for its various platforms and products. Many thought that OpenAI hosting its keynote right before Google I/O was a way for them to step on the momentum of whatever Google would announce, but  to the chagrin of OpenAI, Google would make a lot of impactful project announcements that were in line with its status as the incumbent powerhouse.

The first thing that stood out at the Google I/O keynote opener, DJ Marc Rebillet, who literally appeared out of a giant Google cup with a Google kimono on. While I thought the performance was a lighthearted and fun way for Google to get the 7,000 attendees at the keynote engaged, many others thought it was cringy and out of place. Yes, it looked and felt like a scene from an episode of “Silicon Valley” – no coincidence there was a Google-esque company on that show called “Hooli” that acted as a representation of the stereotypical big tech company with wild and outlandish perks. Google has been reeling from a number of missteps in the last year when it comes to AI, most notably its Google Gemini model, which created images with people of color instead of white people, resulting in historically inaccurate images via generative AI for most prompts. But Google I/O was going to be the catalyst to rectify those previous issues and establish their dominance within the AI space.

The first thing that stood out in the presentation was the announcement of Google’s AI model, Gemini, expanding to a two million context window, which allows it to ingest and comprehend larger pieces of data such as legal documents and codebases. To give a sense of how large a 2 million context window is, it will allow the model to hold and analyze an entire human genome sequence. Also, the two million context window will allow the conversation you have with Gemini to have a much longer context when users are having conversations with it and not forget pertinent details regarding the context of those conversations. But the most important thing regarding this 2 million context window is that it allows Gemini to analyze multimodality just like the GPT-4o model. They had a great demo in the keynote that showed a participant putting on AI-assisted glasses that allowed them to analyze a whiteboard system design problem just by looking at it and even identifying the items and dog in the same office of the whiteboard. This demo was part of their Project Astra, which is their AI agents program that allows you to give tasks to Gemini that it will complete autonomously, such as figuring out a piece of code or even fetching a receipt from your email to create a return order for a pair of shoes you bought.  

One of the more impactful aspects of the keynote and something that will have ripple effects throughout the creative community was the announcement of their AI-generated music software called Music AI Sandbox and their AI-generated video program, Veo. The Music AI allows users to generate loops and sounds just by typing a prompt of an instrument or even just the style of a particular type of sound or loop. In this portion of the keynote, they had pre-recorded segments of Wyclef Jean and Marc Rebillet talking about how much of a game changer this is. Who could disagree with that notion? This will make the process of making music even more accessible than it already is and has the potential to put a lot of instrumentalists who rely on playing background on tracks out of business. Marc Rebillet had an interesting quote in the segment that speaks volumes about the Music AI Sandbox – “it makes it sound, ironically, at the end of the day more human.”  

On the other side of the creative spectrum, Veo allows users to create video clips with high-resolution graphics and even photorealism within a couple of text prompts. A direct competitor to OpenAI’s own AI-generated video platform, Sora (which hasn’t been released to the general public), Veo was on par with the amazing clips shown in the Sora demo. In the Veo demo, they used examples of creating a blooming flower for b-roll material and a video clip of a high-res racing video game that would look indistinguishable from what you see on a AAA title for PlayStation5. They had Donald Glover speaking in one of the pre-recorded segments regarding the Veo product, where he gave a quote that is representative of the upcoming liberation and disruption regarding creativity in the age of AI: “Everybody’s gonna become a director, and everybody should be a director- because at the heart of all this is just storytelling. The closer we are to being able to tell each other our stories… the more we’ll understand each other.”  Just like many of these AI tools, there is a constant struggle between balancing making the creative industry more accessible and destroying the creatives who rely on it for a living.

 In 2014, Google bought AI company Deepmind, which has become the de facto AI development team for Google. It’s also been responsible for many of the major breakthroughs and studies done on AI via Google. Demis Hassabis, the CEO of Deepmind, talked about their newly released AI model called AlphaFold3. This cutting-edge model wields the power to accurately predict the intricate structures of proteins, DNA, RNA, and ligands and even how they will interact with each other. While many of the other announcements were in the realm of consumer technology or specifically for developers, this will have the largest ripple effect on a scientific and commercial level, especially when it comes to drug discovery, a process that typically takes 10-15 years and costs an average of 300 million dollars, but that AlphaFold3 will cut by at least 65%. This means drugs will be able to be developed and released to market faster and be significantly more effective since drug discovery time will be shortened. Also, many in the AI space have stated that AlphaFold3 and other tools like it will allow individual patients to combat their diseases and health problems more effectively through gene therapy.  So, instead of seeing what might or might not work by a doctor prescribing a variety of medicines, they will be able to analyze your gene pattern and know exactly how to treat whatever disease you’re afflicted with. Because of tools like AlphaFold3, 20 years from now, death might become optional. Toward the end of the keynote, it was great that the CEO of Google, Sundar Pichai, poked fun at himself by referencing a counter that counted how many times he said the word AI, a reference to the viral video meme of all the times he said AI in last year’s Google I/O.  

While Google was able to make all of these announcements, it is very telling that the GPT-4o model actually captured more of the general public’s attention, since the brand recognition of OpenAI and its GPT products is what your average person associates with the word AI. While Google is a significantly bigger company with 190,000 employees compared to OpenAI’s 1,200 employees, OpenAI represents 34% market share of the AI industry, while Google only represents 10 percent of the AI industry. It does seem like the competition of Google vs. OpenAI is not just a competition for who will have a stranglehold on the tech industry but, because of how powerful AI is, who will have control of the societal and geopolitical factors that drive humanity. OpenAI’s rapid ascent has left the behemoth that is Google scrambling to keep up.