top of page

Google Demos Multimodal AI That Turns Pictures/Video Into Data

  • Writer: Niv Nissenson
    Niv Nissenson
  • Sep 11
  • 2 min read
ree

Google put on quite a show at its recent AI event. The keynote hall was packed to capacity and people were standing in the aisles. Confidence in Google seemed abundant and at one point the crowd cheered when Google's "Customer Engineer" presented “Nano Banana”. Apparently all of Google's presenters had the title "Customer Engineer" (likely kind of an external facing product manager).


The Multimodal Table

In one of the centerpiece demos was what Google called the “multimodal table.” The scenario: a customer success executive reviewing a dashboard that showed revenue inching up… but satisfaction scores sliding down. The table held the usual: delivery data, customer notes, and feedback text. Some cells, though, were blank.


Then the demo took a turn. The presenter explained that customers increasingly send pictures or even videos of packages instead of text comments. Google showed how its AI can analyze those images or clips — and automatically generate textual insights that populate directly into the table.


This specific practicality is debatable. If a customer cared enough to upload a photo, they likely wrote a complaint too. Still, the underlying capability is impressive. It means that non-text inputs can be transformed into structured textual data for existing systems.

The potential use cases span far beyond ecommerce complaints:

  • Small bakeries could run video QA on cake quality before delivery.

  • Hospitals could monitor cleanliness via video feeds and log automated reports.

  • Retailers could scan shelf photos for stock and merchandising compliance.


Multimodal AI is gaining serious traction. We recently covered Qualified’s multimodal agent as well as the expected tripling of the AI image market. Google’s multi modal table demo reinforces just how powerful the approach is.


The ability to turn images, sound, and video into searchable, comparable, and actionable text opens up use cases we never thought possible. This isn’t just about customer complaints, it’s about bridging the gap between the physical world and structured digital data. And that’s a step forward with enormous implications.


bottom of page