AI and Data Privacy
Axel Spies
German attorney-at-law (Rechtsanwalt)
Dr. Axel Spies is a German attorney (Rechtsanwalt) in Washington, DC, and co-publisher of the German journals Multi-Media-Recht (MMR) and Zeitschrift für Datenschutz (ZD).
No good fit
The nexus of the EU’s AI Act and the GDPR is a timely topic. The AI Act has not finally been adopted in Europe, while the General Data Protection Regulation (GDPR) has been applied in the EU since 2018. How AI must comply with the GDPR remains unclear. A lack of clarity on the GDPR and AI could jeopardize any U.S.-EU cooperation on data processing or conflict with federal agencies’ rule-making process for AI uses.
Major compliance problems for AI
The new EU AI Act, if it will be adopted (which is at present not assured) and become applicable presumably in 2025 or even later, will sit on top of the already-applicable GDPR. It will neither modify nor overrule it. The EU AI Act’s most recent draft (e.g. its Recital 41) states that clearly. This means that all developers and users of AI (companies and institutions) must each comply with the GDPR now and in the future to the extent they control or process personal data. As they know very well, the potential penalties of the GDPR for violations by data controllers or processors are high.
The problem is that there are almost no guidelines from the data protection agencies (DPA) that specifically focus on AI. AI Model Contracts are still being debated. The joint group and coordination body on the EU level, the European Data Protection Board (EDPB), has set up a task force which has not yet handed down its guidance. The only tangible case so far on GDPR compliance stems from the proactive Italian DPA “Il Garante” against ChatGPT in March/April 2023. Il Garante temporarily banned the chatbot for Italy only and launched a probe over suspected GDPR breaches. But the chatbot was quickly reactivated in Italy on April 28 after its maker addressed the alleged privacy issues. The Italian decision is not very specific. It was criticized as a solo run of Il Garante and can hardly serve as a general precedent for the EU.
GDPR and AI do not mingle well
The GDPR was debated and adopted at a time (2016) when AI was not a big issue. Its drafts reach back to 2012. It only applies if there is a “processing of personal data” which its Article 4 defines as “any operation or set of operations which is performed on personal data or sets of personal data, whether or not by automated means, such as collection, recording, organization, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction.” This definition is not a good fit for the training of large AI language models (LLM). The aim of the training is to enable an LLM to supplement text input with words or sentences that, based on the distributions in the training data, are highly likely to occur in the context of the text input. In simple terms, an LLM’s objective is not to reproduce or categorize its training data like a search engine. Rather, it provides output on the basis of probabilities and the input from the user through “prompts.” In other words, AI training does not require personal data to be stored somewhere by the AI provider, e.g., in a cache on an AI provider’s server: its goal is the best possible generation of highly probable text structures, not data processing—very different for a search engine. The fact that a large Generative AI system may contain personal data in its output does not explain how the data sets enter the system (e.g. via prompts) and leaves it open who “controls” these data sets in a neuronal network.
There are almost no guidelines from the data protection agencies that specifically focus on AI training.
There is not much guidance from Germany on Art. 4 of the GDPR. Some of the German DPAs have issued papers on AI informing controllers and data subjects that there are multiple legal risks but without clear statements on how the industry and public institutions can avoid them. For instance, the “Checklist for the use of LLM-based chatbots” that the Hamburg DPA published last November remains very vague on the processing issue: “If the terms and conditions allow the AI provider to process data for its own purposes, you should not transmit any personal data to the AI.” The Bavarian DPA’s (BayLfD) guidance does not focus on the mentioned training issue but on the pseudonymization of the data sets. The AI discussion paper of the Baden-Württemberg DPA of last November is also not very clear on training: “A further processing step can be the production or development of an AI.” Finally, the “Berlin Group” of DPAs (Internationale Arbeitsgruppe für Datenschutz in der Technologie, IWGDPT) has also been silent on the data processing issue.
AI training is not the only problem. Further AI compliance problems with the application of the GDPR occur with the “right to be forgotten” (Art. 17 GDPR), the principle of data minimization (Art. 5 (1) (c) GDPR), the disclosure of the purposes of the processing, and how to describe AI in a privacy policy. Another area is data protection by design (Art. 25 GDPR). Behind this principle is the thought that data protection in products and services is best adhered to when it is already integrated into the technology when created. What can be stated is that in many respects AI is a black box and the erasure of personal data in the neuronal structure may not be technically possible. A data controller may not be able to describe how AI reaches a decision and why certain personal data sets are part of the output. To sum up, the GDPR is not a good fit for AI. There are (too) many unresolved issues, much compliance work depends on the individual AI use, and the general question remains why the GDPR is needed for AI if the EU adopts a comprehensive AI Act.
What can Germany do? Anything?
Germany is in a special position to provide more clarity on the data privacy issues created by the GDPR. The first data protection law in the world came from the State of Hesse in 1974. Germany is the only country in the EU where each state has an independent DPA—they cooperate on the national level (Düsseldorf Circle). But despite the expertise that the DPAs have, they take a wait-and-see attitude. The federal and state governments cannot tell the DPAs what to do because they must remain fully independent and “free from external influence, whether direct or indirect, and shall neither seek nor take instructions from anybody” (Art. 52 GDPR). On the level of the federal government, there are general reservations against the AI Act and indirectly against the GDPR blocking generative AI- and new AI-based products and services. Volker Wissing, Germany’s Minister of Digital Affairs and Transport, stated already last November that it there should be no “competition to see who can regulate the fastest and most strongly,” while preferring Voluntary Codes of Conduct for Foundation Models. There is no desire to touch the data privacy issue.
However, this does not mean that there is nothing the German government can do. The data privacy issues have not been resolved in the United States either, and there is ample room for discussion. There is no counterpart to the GDPR in the United States, and its fourteen state data privacy laws do not focus on AI. As federal lawmakers struggle to find their footing on AI, lawmakers at the state level (particularly in California) will be the likely primary legislators writing AI-specific laws in 2024. The California Privacy Protection Agency (CPPA) released its first draft of proposed Automated Decisionmaking Technology (ADMT) regulations in December. The dialogue between the CCPA and the DPAs in Germany should be intensified.
European guidelines on the GDPR and AI should create more pressure on the U.S. government to legislate after the U.S. elections. Germany wields significant influence within the European Commission, the EDPB, and the EU Council. A potential international forum for raising AI and data privacy issues is the EU-U.S. Trade and Technology Council and the ongoing separate EU-U.S. talks on the review of the adequacy of the EU-U.S. Data Privacy Framework (DPF). The European Commission and the U.S. government have agreed upon the DPF, which indicates that the U.S. government is receptive to EU privacy regulations and decisions of the European Court of Justice. Another forum is the Council of Europe, an international body with forty-six member countries, which set up the Committee on Artificial Intelligence to develop the Framework Convention on Artificial Intelligence, Human Rights, Democracy and the Rule of Law. The current plan is to finalize negotiations on a binding AI Convention by March. The United States is participating as an observer country and is pushing for exempting companies by default. Germany and other countries are currently resisting the pressure. The most promising forum for Germany and other EU member states to influence the debate is the ongoing discussion on an “AI Pact” promoted by the EU Commission that convenes AI developers who commit voluntarily to implement key obligations of the AI Act ahead of the legal deadlines. However, a successful outcome without a major push from governments is unlikely.