AI companies are too cheap to pay for legit book
– Big tech companies like OpenAI and Meta are using pirated books from shadow libraries without authorization to train their artificial intelligence models.
– OpenAI employs Books1 and Books2, collections of books from the internet, with around 15% of GPT-3's training set coming from these sources.
– Authors who sued OpenAI claim that Books2 contains pirated content from libraries like Library Genesis (LibGen), Z-Library (Bok), Sci-Hub, and Bibliotik.
– Meta also utilizes a dataset named Books3, containing over 170,000 books from the last two decades, to train its language models.
– The practice of using pirated books as inputs for AI programs is raising concerns about the ethical implications and the future of reading and communication.
– Copyright lawsuits have been initiated against OpenAI for using authors' content without consent and compensation.
– The use of pirated content highlights the disparity between tech companies' profits and authors' compensation, with some authors earning significantly less than tech employees.
– OpenAI's valuation reached $29 billion in June, while allegations of underpaid Kenyan workers for ChatGPT have also been raised.
– Meta, despite announcing significant investments in AI, has faced accusations of exploiting subcontracted workers and suppressing union efforts.
– Google has invested in Anthropic, a company founded by ex-OpenAI employees, and is working on AI chatbots, but reports suggest that workers hired to train these models are overworked and underpaid.
– The unethical use of pirated content for AI training raises questions about the practices of these tech companies and their impact on various stakeholders.