More

    New version of Gemini beats other AIs at math, science, and reasoning

    Google’s new Gemini Pro is smarter than other AIs at reasoning, science, and coding.

    This is according to a series of benchmark results posted by Google on Thursday. In short, Gemini 2.5 Pro beats chief competitors at nearly everything — though we’re sure the companies behind those competitors would disagree.

    According to Google’s data, Gemini 2.5 Pro has a healthy lead over OpenAI o3, Claude Opus 4, Grok 3 Beta, and DeepSeek R1, in the Humanity’s Last Exam benchmark, which evaluates a model’s math, science, knowledge, and reasoning. It’s also better at code editing (per the Aider Polyglot benchmark), and it wins over all competitors in several factuality benchmarks including FACTS Grounding, meaning it’s less likely to provide factually inaccurate text.

    Mashable Light Speed

    The only benchmark in which Gemini 2.5 Pro isn’t a clear winner is the mathematics-focused AIME 2025, and even there the differences between results are pretty small.

    SEE ALSO:

    Gemini now autogenerates summaries for long Gmail threads

    As a result of all the improvements in Gemini 2.5 Pro, this model is now on top of the LMArena leaderboard with a score of 1470.

    There’s a catch, though: The final version of Gemini 2.5 Pro isn’t widely available yet. Google calls this latest version an “upgraded preview,” with a stable version coming “in a couple of weeks.” The preview should now be available in the Gemini app, though.

    Topics
    Artificial Intelligence
    Google Gemini



    Read the full article here

    Recent Articles

    Related Stories

    Leave A Reply

    Please enter your comment!
    Please enter your name here

    Stay on op - Ge the daily news in your inbox