Is the Internet To Blame for GenAI Copyright Sweep?
- By David Stephen
- January 02, 2024
There has been such an abundance of knowledge since around a century of universal education that all of it seems ripe for generative AI's capture. A lot is hidden away that just something like generative AI would connect the dots, scraping resources from everywhere.
There are different ways to value work. There are different ways workers feel wages are commensurate.
Though mostly monetary, remuneration may include other elements that may matter more to some than others. The differences may have been capped, even with digital development, without the internet—a platform for scale but with an abundance of those stuffing it—adjusting regular labor stipulations.
The internet altered product and service dynamics, like where and what should be paid for. It is also where content is sought or found for free, delivered, sometimes, by those who obtained it illegally.
The internet also boosted anonymity, whose freedom felt like some reward. The internet, especially social media, is a destination for many content suppliers because a lot can be taken from there. There are views, likes, reposts and others.
There is also the situation where many providers let people use their services in exchange for viewing ads. The people on the internet may have also arrived there, surviving or earning in other ways, so posting does not seem like work, requiring pay. Some post leisurely from work, others do it in their parent's home, or from school, and so forth.
There are some works on the internet where the pay came from elsewhere, like several research papers, with studies via grants. Some use the internet for different objectives but welcome not getting paid directly.
It is on some of these backdrops, and more, that generative AI got trained and became excellent. There are people's works that LLMs were trained on who did not expect an extra layer of collection from them after some of the one-sided user agreements of some social platforms.
The problem is not that cases for copyright infringement cannot be made; the problem is that the internet can be one humongous piracy sinkhole, where policing is easier for a few prominent or standard works, and the rest are parasitized.
The internet came on as an option for the real world, and within the internet, continuous options are one of its strengths.
There are several lost works that generative AI may have scooped. However, without the internet, it would have been difficult to have LLMs, where the owners would not pay, in some form, for training data—even for a localized wiki.
The future could be different, with users making new demands on platforms, especially for concessions, if their content would be used for training. It could be some form of credit for something on the platform, depending on the volume or quality of the individual's posts.
Though artists may need bigger platforms to promote their works, there should be intentions to steer some of their best works away from those, especially if they are using them to train generative AIs seeking to replace artists.
The internet broke a lot of traditional gatekeeping but has now exposed many independents to having their work scraped.
If these independents find a way that works for small teams of artists, they may choose to form groups elsewhere, where their works would be shielded from LLMs' training web crawlers.
The views and opinions expressed in this article are those of the author and do not necessarily reflect those of CDOTrends. Image credit: iStockphoto/Diki Prayogo
David Stephen
David Stephen's research in conceptual brain science is centered on the electrical and chemical impulses of neurons for a model of the human mind applicable to mental health, consciousness, neurotechnology, and artificial intelligence. He was a visiting scholar in medical entomology at the University of Illinois Urbana-Champaign, UIUC, IL. He did computer vision research at Rovira i Virgili University, URV, Tarragona.