Separate from copyright concerns, several database vendors licensed by Oxy, like EBSCO and ICPSR, prohibit using their content in generative AI tools. This prohibition means you cannot download an article from Academic Search Elite or a dataset from ICPSR, for example, and then upload it into generative AI tools like Gemini or Perplexity.
Others, like Wiley, Springer, Elsevier and Liebert, explicitly allow for text and data mining for non-commercial research purposes.
Make sure to consider the Text and Data Mining policies of vendors, as well as copyright restrictions, before inputting PDFs into AI tools or using AI to scrape licensed content.
Both generative AI and copyright law are complicated and nebulous, involving many different stakeholders at many different points of intersection. Lee, Cooper and Grimmelmann propose a supply chain framing of how AI generates content to help understand where copyright law impacts AI technology.
One way generative AI tools can be used to infringe on a copyright owner’s exclusive rights is by producing derivatives. Before entering any copyrighted material into a generative AI tool as part of a prompt, permissions may need to be obtained.
Even without being directly uploaded to AI, Emory University Law Professor Matthew Sag has pointed out that generative AI tools can also be used to infringe on copyright in what he has dubbed the ‘Snoopy Problem:' copyrighted works, like fictional characters, can be ‘memorized’ and later generated by AI based on a user prompt and "the more abstractly a copyrighted work is protected, the more likely it is that a generative AI model will 'copy' it" (Sag, 2024).
Generative AI tools are trained on collections of material gathered from many places. Some AI image and text generation tools have been trained on material scraped from web pages without the consent or knowledge of the web page owners.
Several law suits have been brought against AI image and text generation platforms that have used visual and text content created or owned by others as training material. These law suits claim that the use of artists’ or writers' content, without permissions, to train generative AI is an infringement of copyright. Many of these cases are still ongoing as of July 2025.
Several experts have pointed to previous fair use cases to justify a fair use argument for the use of various training data for AI image generation tools. However, the use of material to train a Large Language Model (LLM) will have a different fair use analysis than the use of a small collection of resources to train a custom agent that uses an existing LLM, or the loading of works into a generative AI tool for analysis. Loading copyright protected material into an existing LLM or generative AI tool may be an infringement of copyright.