Jump to content
  • Sign Up
×
×
  • Create New...

[AI]Baidu restricts Google and Bing from scraping content for AI training


Recommended Posts

  • Diamond Member

******** internet search provider

This is the hidden content, please
has
This is the hidden content, please
to prevent
This is the hidden content, please
and
This is the hidden content, please
This is the hidden content, please
from scraping its content.

This change was observed in the latest update to the

This is the hidden content, please
Baike robots.txt file, which denies access to Googlebot and Bingbot crawlers.

According to the Wayback Machine, the change took place on August 8. Previously,

This is the hidden content, please
and
This is the hidden content, please
search engines were allowed to index
This is the hidden content, please
Baike’s central repository, which includes almost 30 million entries, although some target subdomains on the website were restricted.

This action by

This is the hidden content, please
comes amid increasing demand for large datasets used in training artificial intelligence models and applications. It follows similar moves by other companies to protect their online content. In July,
This is the hidden content, please
blocked various search engines, except
This is the hidden content, please
, from indexing its posts and discussions.
This is the hidden content, please
,
This is the hidden content, please
, has a financial agreement with
This is the hidden content, please
for data access to train its AI services.

According to sources, in the past year,

This is the hidden content, please
considered restricting access to internet-search data for rival search engine operators; this was most relevant for those who used the data for chatbots and generative AI services.

Meanwhile, the ********

This is the hidden content, please
, with its 1.43 million entries, ******** available to search engine crawlers. A survey conducted by the South China Morning Post found that entries from
This is the hidden content, please
Baike still appear on both
This is the hidden content, please
and
This is the hidden content, please
searches. Perhaps the search engines continue to use older cached content.

Such a move is emerging against the background where developers of generative AI around the world are increasingly working with content publishers in a bid to access the highest-quality content for their projects. For instance, relatively recently, OpenAI signed an agreement with Time magazine to access the entire archive, dating back to the very first day of the magazine’s publication over a century ago. A similar partnership was

This is the hidden content, please
with the Financial Times in April.

This is the hidden content, please
’s decision to restrict access to its
This is the hidden content, please
Baike content for major search engines highlights the growing importance of data in the AI era. As companies invest heavily in AI development, the value of large, curated datasets has significantly increased. This has led to a shift in how online platforms manage access to their content, with many choosing to limit or monetise access to their data.

As the AI industry continues to evolve, it’s likely that more companies will reassess their data-sharing policies, potentially leading to further changes in how information is indexed and accessed across the internet.

(Photo by

This is the hidden content, please
)

See also:

This is the hidden content, please

This is the hidden content, please

Want to learn more about AI and big data from industry leaders? Check out

This is the hidden content, please
taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including
This is the hidden content, please
,
This is the hidden content, please
,
This is the hidden content, please
, and
This is the hidden content, please
.

Explore other upcoming enterprise technology events and webinars powered by TechForge

This is the hidden content, please
.

The post

This is the hidden content, please
appeared first on
This is the hidden content, please
.

This is the hidden content, please


Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.