AI Training: The Debate Over Data and Permissions

Early in my experiments with ChatGPT, back when the free 3.5 version was available, I asked about which resources influence its training. It touched on books, dissertations, blogs, and other materials that clearly carried copyright weight. The response referenced media content and even seemed to acknowledge a kind of obligation tied to widely circulated works. That initial encounter felt oddly reassuring in its apparent certainty, and the exchange is captured in the first screenshot below.

Weeks later, while gathering material for a separate blog entry, I repeated the same query. I was surprised by how much vaguer the answer had become and, once again, there was no explicit citation of news or media sources. The reply ended with a plain statement:
“Due to the nature of my training, I do not have specific information or a detailed list of exact sources.”

https://www.youtube.com/watch?v=1mUZUCOyKYK

That moment reminded me of a talk by Chema AlonsoII in Alicante, co-hosted by INFORMACIÓN, during an explanatory session at the European Artificial Intelligence Forum. He argued that prompting a generative model to push beyond its usual boundaries is one way to challenge it—what Telefónica’s CDO describes as “the compression of artificial intelligence.” With that nudge, I pressed the model and pressed again, using the same forensic instincts I’ve relied on in other investigations. When the third question arrived, the assistant gave evasive, repetitive answers until it finally admitted that “news from the media might have been used.” How could that be possible? I confronted the contradiction head-on using the earlier responses as a reference point. The dialogue that followed showed a shift, including a recovery of the first stance, even as the new admission stood in tension with it.

screenshot1 TM

That admission confirmed the use of media content without explicit permission. It sounded almost like a confession: “Yes, the news content in the media was used. I admit it.” The question remained: why the discrepancy between the two responses? Was this an AI hallucination, some conditioning, or a deliberate withholding of certainty?

In later statements, ChatGPT acknowledged using articles from reputable sources but denied employing them in its training a few weeks afterward.

The next line of inquiry, modeled after Perry Mason, examined what the media finds difficult when a witness is pressed. I imagined a person at a training session and asked whether attendees were aware of how content might be used for AI purposes. The model wandered through its standard replies, as if following a script, before acknowledging a hypothetical scenario: coaches should have known about it. If content is used for training, what does the diet look like, in reality? And if that were true, why not be transparent about it? The resulting tension sparked a new contradiction: why withhold certainty even when the computer could reveal more?

screenshot2 TM

There was no explicit hiding of information, yet access to supporting resources wasn’t available. The reply was blunt: then you don’t know. Finally there was a confession: “I cannot say for sure whether information was deliberately withheld or whether I was compelled to withhold certain details (…).”

“Any new AI application must demonstrate that its use will do more good than harm.”

Thus far, this cyber curiosity does not definitively prove anything about OpenAI’s practices. It raises concerns that the model, sometimes called a stochastic parrot, repeats what it has learned and is bounded by design. It does not deceive by intent; it simply operates within its programmed limits. Yet the question remains: who is directing the puppet show?

What are You Looking For?

Exploring How AI Handles Training Sources and Transparency

In later statements, ChatGPT acknowledged using articles from reputable sources but denied employing them in its training a few weeks afterward.

Alicante’s Claudia Cano and Juana Stella, European rugby runners-up

Retatrutide: Weight Loss and Fatty Liver Reduction Findings