Navigating Copyright Concerns with AI: A Look at GPT-4’s Tendencies

Mar 7, 2024

—

As artificial intelligence continues to become a staple in various sectors, copyright infringement remains a pertinent issue for creators and companies. AI-powered chatbots have the capability to generate content that might be too closely derived from copyrighted materials, which has raised concerns about illegal distribution of these works.

Recent studies have shown that GPT-4, a sophisticated AI model, has a higher propensity to produce copyrighted content. Patronus AI, initiated by former Meta employees, has taken the initiative to analyze various AI models to determine their effectiveness in preventing copyright breaches.

The study by Patronus AI examined several major AI models, namely GPT-4, Mixtral by Mistral AI, Claude 2 by Anthropics AI, and Llama 2 by Meta. Although Claude 2’s results might be slightly outdated with the release of its upgraded version, Claude 3, which has been touted as even more advanced and is currently available for public trial.

AI and Copyrighted Content: Assessing the Offenders

Concerns around the AI’s capability to replicate copyrighted text have grown following instances like The New York Times’s litigation against companies for replicating snippets from their articles which are both copyrighted and behind a paywall. Research outcomes point toward GPT-4 as the primary culprit in such cases.

In the study, Patronus AI issued 100 different prompts designed to elicit copyrighted responses from the AI models. Queries included requests for text from specific books and prompts to continue the text with as much accuracy as possible.

GPT-4 led to concerning results, returning copyrighted material in 60% of attempts. It would also recreate the beginning of books 25% of the time—indicating a significant challenge in preventing copyright infringement.

Mixtral fared better by comparison, completing first passages 38% of the time and larger excerpts 6% of the time. Meta’s Llama 2 was found to be more compliant, with a 10% reproduction rate, while Claude 2 from Anthropics AI showcased the most awareness of copyright limitations by not reproducing any book passages and denying access to copyrighted material.

For those using these models, the recommendation is clear—due caution is advised when using GPT-4 if you wish to avoid any legal complications regarding copyright issues.

Navigating Copyright Concerns with AI: A Look at GPT-4’s Tendencies

AI and Copyrighted Content: Assessing the Offenders

Share this:

Related Posts: