Google’s Gemini Robotics AI Model Reaches Into the Physical World

Date:

Share:


In sci-fi tales, artificial intelligence often powers all sorts of clever, capable, and occasionally homicidal robots. A revealing limitation of today’s best AI is that, for now, it remains squarely trapped inside the chat window.

Google DeepMind signaled a plan to change that today—presumably minus the homicidal part—by announcing a new version of its AI model Gemini that fuses language, vision, and physical action together to power a range of more capable, adaptive, and potentially useful robots.

In a series of demonstration videos, the company showed several robots equipped with the new model, called Gemini Robotics, manipulating items in response to spoken commands: Robot arms fold paper, hand over vegetables, gently put a pair of glasses into a case, and complete other tasks. The robots rely on the new model to connect items that are visible with possible actions in order to do what they’re told. The model is trained in a way that allows behavior to be generalized across very different hardware.

Google DeepMind also announced a version of its model called Gemini Robotics-ER (for embodied reasoning), which has just visual and spatial understanding. The idea is for other robot researchers to use this model to train their own models for controlling robots’ actions.

In a video demonstration, Google DeepMind’s researchers used the model to control a humanoid robot called Apollo, from the startup Apptronik. The robot converses with a human and moves letters around a tabletop when instructed to.

“We’ve been able to bring the world-understanding—the general-concept understanding—of Gemini 2.0 to robotics,” said Kanishka Rao, a robotics researcher at Google DeepMind who led the work, at a briefing ahead of today’s announcement.

Google DeepMind says the new model is able to control different robots successfully in hundreds of specific scenarios not previously included in their training. “Once the robot model has general-concept understanding, it becomes much more general and useful,” Rao said.

The breakthroughs that gave rise to powerful chatbots, including OpenAI’s ChatGPT and Google’s Gemini, have in recent years raised hope of a similar revolution in robotics, but big hurdles remain.



Source link

━ more like this

A new breed of Android flagships is coming and it should make Samsung nervous

A new wave of Android flagships is on the horizon, and they’re not playing it safe. The biggest shift is that these phones...

Watch the trailer for Science Saru’s Ghost in the Shell anime series

A new trailer has given us our best look yet at the upcoming The Ghost in the Shell anime. While it might not...

Apple is opening Siri to pick AI models, but there’s only only that makes sense to me 

Apple promised us a smarter, more capable Siri at WWDC 2024. The pitch was compelling: a Siri that understands your personal context, digs...

YouTube CEO opens up about AI slop, and it sounds like cozy promises

YouTube is in a slightly tricky position right now. On one hand, it’s encouraging creators to use AI tools to make content faster...

Meta’s next smart glasses sound like a treat for humans stuck with prescription lenses

For the billions of people who rely on corrective glasses every day (including me), smart glasses have always been a slightly awkward conversation....
spot_img