Tech

Google’s Gemini 3 Outshines Competitors in AI Benchmark Tests

Published

3 weeks ago

November 23, 2025

MOUNTAIN VIEW, California – Google has officially launched its Gemini 3 large language model, marking a significant advancement in artificial intelligence capabilities. Released this week, Gemini 3 has surpassed ChatGPT and other AI competitors in industry-standard benchmark tests, solidifying its position as the most capable AI chatbot to date.

As a result of extensive internal testing, Google employees have praised the model for its performance. Tulsee Doshi, senior director of product management for Gemini, noted an impressive demonstration of the model’s ability to write in Gujarati, a language not commonly found online. “I call it signs of life, right? People were coming back and saying, ‘I feel it, I think we’ve hit on something,’” said Doshi.

Early tests were conducted by companies like Box, whose CEO Aaron Levie reported that their evaluations showed Gemini 3 outperforming earlier models by substantial margins. “We kind of had to squint and be like, ‘OK, did we do something wrong in our eval?’ because the jump was so big,” Levie stated.

The launch of Gemini 3 provides Google with a crucial victory in the ongoing AI race, particularly against competitors such as OpenAI and Anthropic. Michael Nathanson, an analyst at MoffettNathanson, comments, “They are AI winners, that’s pretty clear.”

Google’s efforts to revitalize its AI strategy have been apparent since the emergence of ChatGPT three years ago. The company aimed to stay relevant in a landscape where many fear that traditional search engines might lose traffic to more conversational AI interfaces.

During a recent developers conference, Google outlined its commitment to AI, unveiling a range of sophisticated products and a revamped search engine that integrates features of Gemini 3. Robby Stein, vice president of product for search, shared the impact of the new model through an interactive simulation that visually explained concepts to users, enhancing the learning experience.

“I was like, ‘Wow, this actually can be capable of presenting information in the best way given the question,” Stein reflected.

As Google prepares to offer Gemini 3 to subscribers, the company anticipates a surge in user engagement. Assessments of the model’s performance included evaluations termed Vending Bench, which tests logical reasoning and planning abilities in a simulated vending machine environment. Doshi highlighted this test as a pivotal factor in demonstrating Gemini 3’s strengths.

With the potential to redefine online interactions, Gemini 3 not only showcases advanced reasoning and analysis but also raises the competitive stakes in the fast-evolving landscape of AI technology.