Challenges in removing data from AI models
Removing a user’s data from a trained AI model proves to be a difficult task, often requiring a complete reset of the model, which entails forfeiting the considerable resources invested in its training.
Researchers have encountered a significant challenge in the realm of AI: the difficulty of making an AI model forget the information it learns from private user data. This predicament is commonly referred to as the AI unlearning problem. The issue came to the forefront when James Zou, a distinguished professor at Stanford University and a prominent figure in biomedical data science, received an email from the UK Biobank requesting the removal of their data from an AI model he had previously trained.
However, removing a user’s data from a trained AI model proves to be a difficult task, often requiring a complete reset of the model, which entails forfeiting the considerable resources invested in its training. This challenge stands as one of the most unresolved issues in our burgeoning era of AI, alongside concerns like AI hallucinations and the complexities of explaining certain AI outputs. According to numerous experts, the AI unlearning problem intersects with inadequate regulations concerning privacy and misinformation, posing a looming dilemma. As AI models grow in size and ingest increasingly vast amounts of data, the absence of effective solutions for data removal from a model, and potentially the model itself, emerges as a pressing concern for all stakeholders.