Jump to content
  • Sign Up
×
×
  • Create New...

AI companies are reportedly still scraping websites despite protocols meant to block them


Recommended Posts

  • Diamond Member



AI companies are reportedly still scraping websites despite protocols meant to block them

Perplexity, a company that describes its product as “a free AI search engine,” has been under ***** over the past few days. Shortly after

This is the hidden content, please
accused it of stealing its story and republishing it across multiple platforms,
This is the hidden content, please
reported that Perplexity has been ignoring the Robots Exclusion Protocol, or robots.txt, and has been scraping its website and other Condé Nast publications. Technology website
This is the hidden content, please
also accused the company of scraping its articles. Now,
This is the hidden content, please
has reported that Perplexity isn’t the only AI company that’s bypassing robots.txt files and scraping websites to get content that’s then used to train their technologies.

Reuters said it saw a letter addressed to publishers from TollBit, a startup that pairs them up with AI firms so they can reach licensing deals, warning them that “AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites.” The robots.txt file contains instructions for web crawlers on which pages they can and can’t access. Web developers have been using the protocol since 1994, but compliance is completely voluntary.

TollBit’s letter didn’t name any company, but

This is the hidden content, please
says it has learned that
This is the hidden content, please
and
This is the hidden content, please
— the creators of the ChatGPT and Claude chatbots, respectively — are also bypassing robots.txt signals. Both companies previously proclaimed that they respect “do not crawl” instructions websites put in their robots.txt files.

During its investigation, Wired discovered that a machine on an

This is the hidden content, please
server “certainly operated by Perplexity” was bypassing its website’s robots.txt instructions. To confirm whether Perplexity was scraping its content, Wired provided the company’s tool with headlines from its articles or short prompts describing its stories. The tool reportedly came up with results that closely paraphrased its articles “with minimal attribution.” And at times, it even generated inaccurate summaries for its stories — Wired says the chatbot falsely claimed that it reported about a specific California cop committing a ****** in one instance.

In an interview with

This is the hidden content, please
, Perplexity CEO Aravind Srinivas told the publication that his company “is not ignoring the ****** Exclusions Protocol and then lying about it.” That doesn’t mean, however, that it isn’t benefiting from crawlers that do ignore the protocol. Srinivas explained that the company uses third-party web crawlers on top of its own, and that the crawler Wired identified was one of them. When Fast Company asked if Perplexity told the crawler provider to stop scraping Wired’s website, he only replied that “it’s complicated.”

Srinivas defended his company’s practices, telling the publication that the Robots Exclusion Protocol is “not a legal framework” and suggesting that publishers and companies like his may have to establish a new kind of relationship. He also reportedly insinuated that Wired deliberately used prompts to make Perplexity’s chatbot behave the way it did, so ordinary users will not get the same results. As for the inaccurate summaries that the tool had generated, Srinivas said: “We have never said that we have never hallucinated.”





This is the hidden content, please

news, gear, artificial intelligence
#companies #reportedly #scraping #websites #protocols #meant #block

This is the hidden content, please

For verified travel tips and real support, visit: https://hopzone.eu/

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.