In the rapidly evolving landscape of artificial intelligence, major tech companies are wielding the power of published books to train their AI models. However, this practice is not without its controversies, as it raises ethical questions about the appropriation of authors’ work, the legality of pirated content, and the overall impact on the literary and technological realms.
Unveiling a Disturbing Trend
Recent revelations have shed light on the practices of AI giants such as OpenAI and Meta, where they employ pirated books from shadow libraries to fuel the training of their large language models.
This practice not only circumvents the need for obtaining proper authorization from authors but also denies these authors their rightful sales royalties. A study published by The Atlantic exposed the extent to which companies like OpenAI and Meta engage in this practice, exploiting pirated content without compensating the creators.
AI’s Literary Training Ground
OpenAI utilizes two extensive collections of books, known as Books1 and Books2, drawn from the internet’s vast repository. Remarkably, approximately 15% of the training data for their flagship GPT-3 model originates from these sources. However, this content acquisition method is not devoid of controversy. Court filings have surfaced, with authors suing OpenAI for allegedly incorporating pirated books from shadow libraries like Library Genesis (LibGen), Z-Library (Bok), Sci-Hub, and Bibliotik into their datasets.
Similarly, Meta employs a dataset named Books3, which houses over 170,000 books primarily published within the last two decades. This expansive corpus serves as a pivotal resource for training other language models. The implications of these practices are far-reaching, fundamentally altering the way we consume and interact with written content. The very essence of AI’s future narrative is shaped by these “stolen words,” as eloquently phrased by Alex Reisner, an Atlantic writer.
The Conundrum of Compensation
The heart of the issue lies in the discrepancy between the colossal profits reaped by these tech giants and the meager compensation offered to authors. OpenAI, with a valuation soaring to $29 billion, employs individuals earning substantial annual salaries, such as software engineers who can earn up to $370,000. However, the same cannot be said for authors, many of whom struggle to earn a fraction of these incomes from their literary creations.
The glaring dichotomy between the financial prosperity enjoyed by tech employees and the compensation withheld from authors is a source of unease within the industry. This raises questions about the moral responsibility of these companies to ensure that the creators of the content fueling their innovations are justly rewarded for their contributions.
Ethical Lapses and Labor Exploitation
The pursuit of AI advancement seems to come at the expense of ethical considerations. OpenAI’s controversial hiring practices, including allegedly underpaying Kenyan workers for refining ChatGPT, spotlight the lengths to which some companies will go to minimize costs. Reports reveal that Kenyan workers were earning a paltry $1.32 to $2 per hour, a far cry from the minimum wage in California, where OpenAI is based.
Similarly, Meta’s ambitious investments in AI have drawn attention to the labor conditions of subcontracted employees. Accusations of poor working conditions and the stifling of union organizing efforts have raised alarms about the ethical foundation of the company’s practices. The tension between AI’s potential to revolutionize industries and the treatment of the workers enabling this transformation underscores the multifaceted challenges facing the tech sector.
A Glimpse of the AI Landscape
As AI continues its relentless march forward, the intersections of technology, literature, and ethics become increasingly intricate. The reliance on pirated content from shadow libraries raises fundamental questions about intellectual property, fair compensation, and the future of creativity. The narrative being woven by AI models is complex, entwining both innovation and moral obligations.
In conclusion, the issue of AI companies resorting to pirated books for training underscores the delicate balance between technological progress and ethical responsibilities. The time has come for the industry to grapple with these concerns, crafting a future where creativity is valued, compensation is equitable, and the promise of AI is written with words that are rightfully obtained and acknowledged.