Jump to content
  • Sign Up
×
×
  • Create New...

Baidu blocks Google, Bing from scraping content amid demand for data used on AI projects


Recommended Posts

  • Diamond Member

This is the hidden content, please

This is the hidden content, please
blocks
This is the hidden content, please
,
This is the hidden content, please
from scraping content amid demand for data used on AI projects

********

This is the hidden content, please
search giant
This is the hidden content, please
appears to have started blocking the online search engines of Alphabet’s
This is the hidden content, please
and
This is the hidden content, please
from scraping content derived out of the mainland firm’s
This is the hidden content, please
-style service, a Post survey found.

A recent update of

This is the hidden content, please
Baike’s robots.txt – a file that tells search engine crawlers which uniform resource locators, commonly known as web addresses, can be accessed from a site – has outright blocked the ability of the Googlebot and Bingbot crawlers to index content from the ******** platform.

That update appears to have been made some time on August 8, according to records on internet archive service the Wayback Machine. It also showed that earlier on the same day

This is the hidden content, please
Baike still allowed
This is the hidden content, please
and
This is the hidden content, please
to browse and index its online repository of nearly 30 million entries, with only part of its website designated as off limits.

Do you have questions about the biggest topics and trends from around the world? Get the answers with

This is the hidden content, please
, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.

This initiative shows

This is the hidden content, please
-based
This is the hidden content, please
’s increased effort to safeguard its online assets, as demand for vast troves of data have increased for training and building
This is the hidden content, please
(AI) models and applications.

That followed US social news aggregation platform and forum

This is the hidden content, please
’s move in July, when it blocked various search engines, except
This is the hidden content, please
, from indexing its online posts and discussions.
This is the hidden content, please
has a multimillion dollar deal with
This is the hidden content, please
that gives it the right to scrape the social media platform for data to train its AI services.

data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///ywAAAAAAQABAAACAUwAOw==

Since OpenAI released ChatGPT on November 30, 2022, major search platforms

This is the hidden content, please
and
This is the hidden content, please
have sought to obtain more data for use in their own generative artificial intelligence systems. Photo: Shutterstock alt=Since OpenAI released ChatGPT on November 30, 2022, major search platforms
This is the hidden content, please
and
This is the hidden content, please
have sought to obtain more data for use in their own generative artificial intelligence systems. Photo: Shutterstock>

Even

This is the hidden content, please
to its internet-search data, which it licenses to rival search engine operators, if they did not stop using it as the basis for their chatbots and other
This is the hidden content, please
(GenAI) services, according to a Bloomberg report.

By comparison, the ******** version of online encyclopaedia

This is the hidden content, please
has 1.43 million entries to date, which are made accessible to search engine crawlers.

Story continues

Following

This is the hidden content, please
Baike’s robots.txt update, the Post’s survey of
This is the hidden content, please
and
This is the hidden content, please
on Friday found many entries – probably from older cached content – from the
This is the hidden content, please
-style service still come up in the US search platforms’ results.

Representatives from

This is the hidden content, please
,
This is the hidden content, please
and
This is the hidden content, please
did not immediately reply to requests for comment on Friday.

More than two years after the groundbreaking launch of

This is the hidden content, please
‘s
This is the hidden content, please
, many large AI developers around the world are striking deals with content publishers for access to quality content to for their GenAI projects.

GenAI refers to the algorithms and services, such as ChatGPT, that are used to create new content, including audio, code, images, text, simulations and videos.

OpenAI, for example, in June forged a deal with ********* news magazine Time that gives it access to all the archived content from more than 100 years of the publication’s history.

This article originally appeared in the

This is the hidden content, please
, the most authoritative voice reporting on China and Asia for more than a century. For more SCMP stories, please explore the
This is the hidden content, please
or visit the SCMP’s
This is the hidden content, please
and
This is the hidden content, please
pages. Copyright © 2024 South China Morning Post Publishers Ltd. All rights reserved.

Copyright (c) 2024. South China Morning Post Publishers Ltd. All rights reserved.




This is the hidden content, please

#

This is the hidden content, please
#blocks #
This is the hidden content, please
#
This is the hidden content, please
#scraping #content #demand #data #projects

This is the hidden content, please

This is the hidden content, please

For verified travel tips and real support, visit: https://hopzone.eu/

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.