Jump to content
  • Sign Up
×
×
  • Create New...

[AI]Baidu ERNIE multimodal AI beats GPT and Gemini in benchmarks


Recommended Posts

  • Diamond Member

This is the hidden content, please
’s latest ERNIE model, a super-efficient multimodal AI, is beating GPT and
This is the hidden content, please
on key benchmarks and targets enterprise data often ignored by text-focused models.

For many businesses, valuable insights are locked in engineering schematics, factory-floor video feeds, medical scans, and logistics dashboards.

This is the hidden content, please
’s new model, ERNIE-4.5-VL-28B-A3B-Thinking, is designed to fill this gap.

What’s interesting to enterprise architects is not just its multimodal capability, but its architecture. It’s described as a “lightweight” model, activating only three billion parameters during operation. This approach targets the high inference costs that often stall AI-scaling projects.

This is the hidden content, please
is betting on efficiency as a path to adoption, training the system as a foundation for “multimodal agents” that can reason and act, not just perceive.

Complex visual data analysis capabilities supported by AI benchmarks

This is the hidden content, please
’s multimodal ERNIE AI model excels at handling dense, non-text data. For example, it can interpret a “Peak Time Reminder” chart to find optimal visiting hours, a task that reflects the resource-scheduling challenges in logistics or retail.

ERNIE 4.5 also shows capability in technical domains, like solving a bridge circuit diagram by applying Ohm’s and Kirchhoff’s laws. For R&D and engineering arms, a future assistant could validate designs or explain complex schematics to new hires.

This capability is supported by

This is the hidden content, please
’s benchmarks, which show ERNIE-4.5-VL-28B-A3B-Thinking outperforming competitors like GPT-5-High and Gemini 2.5 Pro on some key tests:

  • MathVista: ERNIE (82.5) vs Gemini (82.3) and GPT (81.3)
  • ChartQA: ERNIE (87.1) vs Gemini (76.3) and GPT (78.2)
  • VLMs Are Blind: ERNIE (77.3) vs Gemini (76.5) and GPT (69.6)

It’s worth noting, of course, that AI benchmarks provide a guide but

This is the hidden content, please
. Always perform internal tests for your needs before deploying any AI model for mission-critical applications.

This is the hidden content, please
shifts from perception to automation with its latest ERNIE AI model

The primary hurdle for enterprise AI is moving from perception (“what is this?”) to automation (“what now?”). ERNIE 4.5 claims to address this by integrating visual grounding with tool use.

Asking the multimodal AI to find all people wearing suits in an image and return their coordinates in JSON format works. The model generates the structured data, a function easily transferable to a production line for visual inspection or to a system auditing site images for safety compliance.

The model also manages external tools and can autonomously zoom in on a photograph to read small text. If it faces an unknown object, it can trigger an image search to identify it. This represents a less passive form of AI that could power an agent to not only flag a data centre error, but also zoom in on the code, search the internal knowledge base, and suggest the fix.

Unlocking business intelligence with multimodal AI

This is the hidden content, please
’s latest ERNIE AI model also targets corporate video archives from training sessions and meetings to security footage. It can extract all on-screen subtitles and map them to their precise timestamps.

It also demonstrates temporal awareness, finding specific scenes (like those “filmed on a bridge”) by analysing visual cues. The clear end-goal is making vast video libraries searchable, allowing an employee to find the exact moment a specific topic was discussed in a two-hour webinar they may have dozed off a couple of times during.

This is the hidden content, please
provides deployment guidance for several paths, including transformers, vLLM, and FastDeploy. However, the hardware requirements are a major barrier. A single-card deployment needs 80GB of GPU memory. This is not a tool for casual experimentation, but for organisations with existing and high-performance AI infrastructure.

For those with the hardware,

This is the hidden content, please
’s ERNIEKit toolkit allows fine-tuning on proprietary data; a necessity for most high-value use cases.
This is the hidden content, please
is providing its latest ERNIE AI model with an Apache 2.0 licence that permits commercial use, which is essential for adoption.

The market is finally moving toward multimodal AI that can see, read, and act within a specific business context, and the benchmarks suggest it’s doing so with impressive capability. The immediate task is to identify high-value visual reasoning jobs within your own operation and weigh them against the substantial hardware and governance costs.

See also:

This is the hidden content, please

This is the hidden content, please

Want to learn more about AI and big data from industry leaders? Check out

This is the hidden content, please
taking place in Amsterdam, California, and London. The comprehensive event is part of
This is the hidden content, please
and is co-located with other leading technology events including the
This is the hidden content, please
. Click
This is the hidden content, please
for more information.

AI News is powered by

This is the hidden content, please
. Explore other upcoming enterprise technology events and webinars
This is the hidden content, please
.

The post

This is the hidden content, please
appeared first on
This is the hidden content, please
.

This is the hidden content, please

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.