Google Launches Gemini 2.0 with Autonomous Tool Linking

Date:

Share:


Google is embracing “agentic experiences” in the rollout of Gemini 2.0, its new flagship family of generative AI expected to compete with ChatGPT with OpenAI o1, GitHub Copilot, and Amazon Nova.

The tech giant released the first model, Gemini 2.0 Flash, on Dec. 11 for global developers through the Gemini API in Google AI Studio and Vertex AI. Consumers can expect Gemini 2.0 to impact Google Search and AI Overviews, with limited testing beginning next week. A public rollout is set for early 2025.

Through Gemini 2.0, developers can access multimodal input and text output, while early access partners can test text-to-speech and native image generation. The Gemini app will be updated with Gemini 2.0 Flash “soon,” Google said in a press release.

General availability, and additional model sizes such as the base model Gemini 2.0, are expected to follow in January.

What is Gemini 2.0?

Gemini 2.0 is a multimodal generative AI model running on Google’s Trillium hardware. It is designed to make online tasks easier and more intuitive by assisting with summarizing information, performing web searches, and even interacting with tools or apps more naturally.

Google noted that Gemini 2.0 Flash is twice as fast as its predecessor, 1.5 Pro, and it surpasses it in AI performance benchmarks such as MMLU-PRO and LiveCodeBench.

“If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful,” Google CEO Sundar Pichai said in a statement.

What sets Gemini 2.0 apart is its agentic capabilities. Pichai described these capabilities as enabling the model to “understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision.”

Google further emphasized that Gemini 2.0 distinguishes itself through:

  • The multimodal processing.
  • Ability to understand long books or wide swaths of the web.
  • Function calling.
  • “Native tool use.”
  • “Complex instruction following and planning.”

Native tool use allows the AI to incorporate tools like Google Search and code execution to perform autonomous actions. In practical terms, that sometimes looks like Google’s Project Astra — an Android app now in testing that uses the phone’s camera and Gemini’s reasoning to answer questions about the world in real time. Project Astra can analyze up to 10 minutes of video at a time.

Google also announces additional projects, prototypes

Project Mariner

Another proof of concept is Project Mariner, an experimental Chrome extension showcasing Google’s effort to enable Gemini to read browser screens. Users can ask it to summarize web pages or make a purchase.

“It’s still early, but Project Mariner shows it’s becoming technically possible to navigate within a browser, even though it’s not always accurate and slow to complete tasks today, which will improve rapidly over time,” Demis Hassabis, CEO of Google DeepMind and Koray Kavukcuoglu, CTO of Google DeepMind, wrote in the press release.

SEE: Google revealed specialized image and video generation AI models in early December, too.

Deep Research

Deep Research, available with a Gemini Advanced subscription, is an experimental model connected to the web. It is designed to create research plans and outlines for grad students, scientists, or entrepreneurs. The tool searches the web for the topic of your choice, presents a research plan to approve or change, and then analyzes the existing body of work.

Jules developer assistant

Google also announced a new developer tool called Jules, a coding assistant powered by Gemini 2.0 Flash. Jules sits within GitHub and can write code, fix bugs, and create and execute multi-step plans.  Jules is available to a limited pool of testers today. Google expects expanded availability in early 2025.

Google is preparing for cyber threats

Google also noted that it is aware Project Mariner, in particular, might be a rich hunting ground for prompt injection attacks. The company said it is working on putting up guardrails against phishing and fraud attempts where attackers might sneak AI instructions into emails, websites, or documents.



Source link

━ more like this

Mecha Break finally takes to the skies next spring with a clear post-launch plan

Mecha BREAK - Game Mode Trailer | The Game Awards 2024 Mecha Break, Amazing Seasun Games’ upcoming multiplayer mecha action shooter, showed up at...

Ragebound is a new Ninja Gaiden game from the team behind Blasphemous

Resurrecting a beloved gaming series like Ninja Gaiden is always a tricky proposition. Anyone who might have worked on the franchise in its...

Catly promises a photorealistic cat game, but its tech is a mystery

Catly Trailer 4K - THE GAME AWARDS 2024 In what might be the most eye-catching trailer of The Game Awards, players got their first...

Derek COPY v2 OF The 70 best Black Friday tech deals you can still get under $50

The expensive tech gets all the attention — thousand-dollar phones and $500 tablets. But the supporting players, the cables and batteries and chargers...

Yes, a Multichannel Contact Center CAN Replace a Dozen Apps

Accommodating your customers’ communication preferences is important, especially as your business grows. This means supporting them via social media, website forms, live chat,...
spot_img