The intellectual property saga: The age of AI-generated content | Part 1
Patent authorities globally are grappling with the challenge of redefining their approach to handling inventions generated not by human ingenuity but by AI. It has sparked considerable debate within the intellectual property community. This analysis initiates a three-part series that delves into the influence of AI on intellectual property rights.
As AI advances rapidly, machines are increasingly gaining human-like skills, which is increasingly blurring the distinction between humans and machines. Traditionally, computers were tools that assisted human creativity with clear distinctions: humans had sole ownership and authorship. However, recent AI developments enable machines to independently perform creative tasks, including complex functions such as software development and artistic endeavours like composing music, generating artwork, and even writing novels.
This has sparked debates about whether creations produced by machines should be protected by copyright and patent laws? Furthermore, the question of ownership and authorship becomes complex, as it raises the issue of whether credit should be given to the machine itself, the humans who created the AI, the works the AI feeds off from or perhaps none of the above?
This essay initiates a three-part series that delves into the influence of AI on intellectual property rights (IPR). To start off, we will elucidate the relationship between AI-generated content and copyright. In the following essays, we will assess the ramifications of AI on trademarks, patents, as well as the strategies employed to safeguard intellectual property (IP) in the age of AI.
Understanding IP and the impact of AI
In essence, IP encompasses a range of rights aimed at protecting human innovation and creativity. These rights include patents, copyrights, trademarks, and trade secrets. They serve as incentives for people and organisations to invest their time, resources, and intelligence in developing new ideas and inventions. Current intellectual property rules and laws focus on safeguarding the products of human intellectual effort.
Google recently provided financial support for an AI project designed to generate local news articles. Back in 2016, a consortium of museums and researchers based in the Netherlands revealed a portrait named ‘The Next Rembrandt’. This artwork was created by a computer that had meticulously analysed numerous pieces crafted by the 17th-century Dutch artist, Rembrandt Harmenszoon van Rijn. In principle, this invention could be seen as ineligible for copyright protection due to the absence of a human creator. As a result, they might be used and reused without limitations by anyone. This situation could present a major obstacle for companies selling these creations because the art isn’t protected by copyright laws, allowing anyone worldwide to use it without having to pay for it.
Hence, when it comes to creations that involve little to no human involvement the situation becomes more complex and blurred. Recent rulings in copyright law have been applied in two distinct ways.
One approach was to deny copyright protection to works generated by AI (computers), potentially allowing them to become part of the public domain. This approach has been adopted by most countries and was exemplified in the 2022 DABUS case, which centred around an AI-generated image. The US Copyright Office supported this stance by stating that AI lacks the necessary human authorship for a copyright claim. Other patent offices worldwide have made comparable decisions, except for South Africa, where the AI machine Device for Autonomous Bootstrapping of Unified Sentience (DABUS), is recognised as the inventor, and the machine’s owner is acknowledged as the patent holder.
In Europe, the Court of Justice of the European Union (CJEU) has made significant declarations, as seen in the influential Infopaq case (C-5/08 Infopaq International A/S v Danske Dagblades Forening). These declarations emphasise that copyright applies exclusively to original works, requiring that originality represents the author’s own intellectual creation. This typically means that an original work must reflect the author’s personal input, highlighting the need for a human author for copyright eligibility.
The second approach involved attributing authorship to human individuals, often the programmers or developers. This is the approach followed in countries like the UK, India, Ireland, and New Zealand. UK copyright law, specifically section 9(3) of the Copyright, Designs, and Patents Act (CDPA), embodies this approach, stating:
‘In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.’
AI-generated content and copyright
This illustrates that the laws in many countries are not equipped to handle copyright for non-human creations. One of the primary difficulties is determining authorship and ownership when it comes to AI-generated content. Many argue that it’s improbable for a copyrighted work to come into existence entirely devoid of human input. Typically, a human is likely to play a role in training an AI, and the system may acquire knowledge from copyrighted works created by humans. Furthermore, a human may guide the AI in determining the kind of work it generates, such as selecting the genre of a song and setting its tempo, etc. Nonetheless, as AI becomes more independent in producing art, music, and literature, traditional notions of authorship become unclear. Additionally, concerns have arisen about AI inadvertently replicating copyrighted material, raising questions about liability and accountability. The proliferation of open-source AI models also raises concerns about the boundaries of intellectual property.
In a recent case, US District Judge Beryl Howell ruled that art generated solely by AI cannot be granted copyright protection. This ruling underscores the need for human authorship to qualify for copyright. The case stemmed from Stephen Thaler’s attempt to secure copyright protection for AI-generated artworks. Thaler, the Chief Engineer at Imagination Engines, has been striving for legal recognition of AI-generated creations since 2018. Furthermore, the US Copyright Office has initiated a formal inquiry, called a notice of inquiry (NOI), to address copyright issues related to AI. The NOI aims to examine various aspects of copyright law and policy concerning AI technology. Microsoft is offering legal protection to users of its Copilot AI services who may face copyright infringement lawsuits. Brad Smith, Microsoft’s Chief Legal Officer, introduced the Copilot Copyright Commitment initiative, in which the company commits to assuming legal liabilities associated with copyright infringement claims arising from the use of its AI Copilot services.
On the other hand, Google has submitted a report to the Australian government, highlighting the legal uncertainty and copyright challenges that hinder the development of AI research in the country. Google suggests that there is a need for clarity regarding potential liability for the misuse or abuse of AI systems, as well as the establishment of a new copyright system to enable fair use of copyright-protected content. Google compares Australia unfavourably to other countries with more innovation-friendly legal environments, such as the USA and Singapore.
Training AI models with protected content
Clarifying the legal framework of AI and copyright also requires further guidelines on the training data of AI systems. To train AI systems like ChatGPT, a significant amount of data comprising text, images, and parameters is indispensable. During the training process, AI platforms identify patterns to establish guidelines, make assessments, and generate predictions, enabling them to provide responses to user queries. However, this training procedure may potentially involve infringements of IPR, as it often involves using data collected from the internet, which may include copyrighted content.
In the AI industry, it is common practice to construct datasets for AI models by indiscriminately extracting content and data from websites using software, a process known as web scraping. Data scraping is typically considered lawful, although it comes with certain restrictions. Taking legal action for violations of terms of service offers limited solutions, and the existing laws have largely proven inadequate in dealing with the issue of data scraping. In AI development, the prevailing belief is that the more training data, the better. OpenAI’s GPT-3 model, for instance, underwent training on an extensive 570 GB dataset. These methods, combined with the sheer size of the dataset, mean that tech companies often do not have a complete understanding of the data used to train their models.
An investigation conducted by the online magazine The Atlantic has uncovered that popular generative AI models, including Meta’s open-source Llama, were partially trained using unauthorised copies of books by well-known authors. This includes models like BloombergGPT and GPT-J from the nonprofit EleutherAI. The pirated books, totalling around 170,000 titles published in the last two decades, were part of a larger dataset called the Pile, which was freely available online until recently.
In specific situations, reproducing copyrighted materials may still be permissible without the consent of the copyright holder. In Europe, there are limited and specific exemptions that allow this, such as for purposes like quoting and creating parodies. Despite growing concerns about the use of machine learning (ML) in the EU, it is only recently that EU member states have started implementing copyright exceptions for training purposes. The UK`s 2017 independent AI review, ‘Growing the artificial intelligence industry in the UK’, recommended allowing text and data mining by AI, through appropriate copyright laws. In the USA, access to copyrighted training data seems to be somewhat more permissive. Although US law doesn’t include specific provisions addressing ML, it benefits from a comprehensive and adaptable fair use doctrine that has proven favourable for technological applications involving copyrighted materials.
The indiscriminate scraping of data and the unclear legal framework surrounding AI training datasets and the use of copyrighted materials without proper authorisation have prompted legal actions by content creators and authors. Comedian Sarah Silverman and authors Christopher Golden and Richard Kadrey have filed lawsuits against OpenAI and Meta, alleging that their works were used without permission to train AI models. The lawsuits contend that OpenAI’s ChatGPT and Meta’s LLaMA were trained on datasets obtained from ‘shadow library’ websites containing copyrighted books authored by them.
Why does it matter?
In conclusion, as AI rapidly advances, it blurs the lines between human and machine creativity, raising complex questions regarding IPR. Legislators are facing a challenging decision – whether to grant IP protection or not. As AI continues to advance, it poses significant legal and ethical questions by challenging traditional ideas of authorship and ownership. While navigating this new digital frontier, it’s evident that finding a balance between encouraging AI innovation and protecting IPRs is crucial.
If the stance is maintained that IP protection only applies to human-created works, it could have adverse implications for AI development. This would place AI-generated creations in the public domain, allowing anyone to use them without paying royalties or receiving financial benefits. Conversely, if lawmakers take a different approach, it could profoundly impact human creators and their creativity.
Another approach could be AI developers guaranteeing adherence to data acquisition regulations, which might encompass acquiring licences or providing compensation for IP utilised during the training process.
One thing is certain, effectively dealing with IP concerns in the AI domain necessitates cooperation among diverse parties, including policymakers, developers, content creators, and enterprises.