Anthropic secretly scanned millions of books to train Claude AI

Bright editorial collage: an open hardback book at center whose pages dissolve into a stream of glowing cyan data particles flowing into a crisp Anthropic logo, with a subtle courthouse silhouette in the background, high contrast warm amber papers against cool blue data, clean white backdrop, medium close-up framing

Anthropic ran a secret project in early 2024 to scan millions of books for training its Claude AI system. Internal documents unsealed in legal filings last week revealed the effort, which the company wanted to keep quiet. According to The Washington Post, planning documents described the initiative as „Project Panama“ and stated the company aimed to „destructively scan all the books in the world.“

Company Sought to Hide Scanning Operation

The internal planning document explicitly said Anthropic did not want the project to become public knowledge. The company bought physical copies of books and scanned them to extract text data. After scanning, the company discarded the physical books.

The destructive scanning method allowed Anthropic to gather training data from a vast library of written works. The project operated while AI companies faced growing scrutiny over how they acquire training data. Many authors and publishers have sued AI firms for using copyrighted materials without permission.

Court documents filed last week brought Project Panama to light. The unsealed materials showed how the company planned and executed the book scanning effort. The filings did not specify the exact number of books scanned or the total cost of the operation.

Silicon Valley’s Data Collection Methods

Anthropic’s approach reflects broader patterns in how AI companies build their systems. Firms often buy or collect massive amounts of data to train language models. The practice has sparked debates about copyright, fair use, and intellectual property rights.

The company develops Claude, a chatbot that competes with OpenAI’s ChatGPT and Google’s Gemini. Training such systems requires enormous amounts of text data. Books provide high-quality writing samples that help AI models learn language patterns and factual knowledge.

AI companies argue that training on published works falls under fair use. Authors and publishers counter that such use violates copyright law and deprives creators of compensation. Multiple lawsuits against major AI firms remain pending in U.S. courts.

Total
0
Shares
Previous Post
Macro close-up of a sleek AI accelerator chip with the Microsoft four-color window logo centered on the die, seated on a server board with luminous cyan and amber reflections, crisp shallow depth of field, bright rim lighting, subtle heat shimmer, no text or numbers beyond the Microsoft logo.

Microsoft launches Maia 200 chip to run AI 30% cheaper

Next Post
A large glossy water droplet seamlessly morphing into stacked modern data center server racks, with the Microsoft four-square logo embedded inside the droplet, bright azure and steel tones contrasted with warm orange rim light, clean studio backdrop, central close-up composition.

Microsoft says AI boom will double its water use by 2030

Related Posts