Jump to content
  • Sign Up
×
×
  • Create New...

Apple, Anthropic and Other AI Firms Have Reportedly Trained AI Models on Thousands of YouTube Videos


Recommended Posts

  • Diamond Member

This is the hidden content, please

Apple, Anthropic and Other AI Firms Have Reportedly Trained AI Models on Thousands of
This is the hidden content, please
Videos

Apple, Anthropic, and other major artificial intelligence (AI) firms have reportedly trained AI models on data from hundreds of thousands of

This is the hidden content, please
videos. A new report claims that multiple AI companies used a publicly available dataset called Pile which contained the plain text of videos’ subtitles without any video imagery. The data was collected from popular
This is the hidden content, please
creators such as MrBeast, Marques Brownlee, and PewDiePie as well as Indian
This is the hidden content, please
creators such as CarryMinati, BB ki Vines, and Ashish Chanchlani.

Multiple AI Models Reportedly Trained on
This is the hidden content, please
Videos

Proof News conducted an

This is the hidden content, please
to find that subtitles data from as many as 1,73,536
This is the hidden content, please
videos were taken from more than 48,000 channels. As per the report, EleutherAI, a non-profit AI research lab, curated this dataset. Later, it was used by companies such as Apple, Anthropic, Nvidia, Salesforce, and more. Notably, the AI lab published a research
This is the hidden content, please
highlighting the details of the dataset.

EleutherAI created a data repository of 800GB dubbed Pile and made it publicly available for those who wanted to train AI models but could not afford large datasets. The majority of the dataset was taken from publicly available sources such as English

This is the hidden content, please
, e-books, and more. However, it also contained the subtitles from all the videos compiled in a dataset called
This is the hidden content, please
Subtitles.

The report claimed that the Pile was used to train Apple’s OpenELM AI model, on the basis of the research paper’s description. Salesforce, Nvidia, and Anthropic’s AI models’ research papers also reportedly mention the usage of the dataset.

Anthropic spokesperson Jennifer Martinez told the publication in a statement, “The Pile includes a very small subset of

This is the hidden content, please
subtitles.
This is the hidden content, please
’s terms cover direct use of its platform, which is distinct from use of the Pile dataset. On the point about potential violations of
This is the hidden content, please
’s terms of service, we’d have to refer you to the Pile authors.”

Notably,

This is the hidden content, please
’s terms of service
This is the hidden content, please
anyone from accessing the videos on the platform using automated means such as robots, botnets or scrapers.
This is the hidden content, please
Subtitles will fall under the scraping category. A
This is the hidden content, please
spokesperson told Proof News in an email response that the tech giant has taken “action over the years to prevent abusive, unauthorised scraping.” However, no comments were made about AI firms’ usage of the data.

In a post on X (formerly known as

This is the hidden content, please
), Marques Brownlee called out Apple for sourcing data from companies that included his videos’ transcripts, but he also highlighted that it was not the iPhone maker’s fault since they did not collect the data.

While this dataset was collected and distributed publicly, there could be other instances of data scraping on platforms such as

This is the hidden content, please
. With AI firms scrambling to find more data to train their large language models (LLMs), data procurement might continue to enter similar legally grey areas.




This is the hidden content, please

#Apple #Anthropic #Firms #Reportedly #Trained #Models #Thousands #

This is the hidden content, please
#Videos

This is the hidden content, please

This is the hidden content, please

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Vote for the server

    To vote for this server you must login.

    Jim Carrey Flirting GIF

  • Recently Browsing   0 members

    • No registered users viewing this page.

Important Information

Privacy Notice: We utilize cookies to optimize your browsing experience and analyze website traffic. By consenting, you acknowledge and agree to our Cookie Policy, ensuring your privacy preferences are respected.