A recent Canadian analysis has revealed that AI models were trained on datasets containing child sexual abuse images. This finding has raised concerns about the ethical practices surrounding the use of such data in AI development.
Lloyd Richardson, the director of technology at a Winnipeg-based child protection centre, stated that many AI models, used in research and various applications, are trained using data collected in ways that lack proper ethical oversight. He emphasized that this oversight has led to the inclusion of known child exploitation material in datasets, a problem that could be avoided with due diligence.
The child protection centre’s investigation focused on a dataset called Nudenet, which includes tens of thousands of images collected from social media and adult sites. The dataset is often used by researchers to build AI tools for detecting nudity.
The centre found about 680 images linked to child sex abuse and exploitation material. Over 120 of these images involved minors from Canada and the U.S. The images included explicit depictions of children engaged in sexual acts.
Following the discovery, the centre issued a removal request to Academic Torrents, a site where researchers download datasets. The flagged images were subsequently removed from the platform.
The centre called for stronger regulations to prevent child abuse images from being included in datasets used for AI research. This follows a 2023 Stanford University report, which found similar issues in text-to-image AI models.
Prime Minister Mark Carney has emphasized the importance of AI development in Canada’s digital policy. However, AI Minister Evan Solomon has indicated that the government’s focus will be more on privacy and data protection rather than strict regulation of AI technologies.
