Unlock New Heights in Employee Development

The New Performance Appraisal

AI Buyers Guide for HR: Data Privacy and Security

AI Buyers Guide for HR: Navigating Data Privacy and Security in AI (#4 in Series)

 

 

 

 

 

 

 

 

 

 

 

 

 

AI Buyers Guide for HR: Navigating Data Privacy and Security in AI (#4 in Series)

by Frank P. Ginac, CTO at TalentGuard

Welcome to the next addition to my series, AI Buyers Guide for Human Resources (HR) Professionals. This is article number 4 in the series. My objective for this series is to arm HR professionals responsible for selecting, deploying, and managing AI-based HR Tech solutions in the enterprise with the knowledge they need to perform these tasks confidently. The information shared here is not just of value to HR professionals but also generally applies to any buyer of AI-based software. I hope you find the information helpful and welcome your feedback and comments. I would greatly appreciate it if you’d share this article and the others in the series with your network.

HR is increasingly faced with safeguarding employee data. This responsibility extends beyond traditional confidentiality concerns, as they collaborate with IT and data security experts to implement robust systems and processes. These measures are critical for protecting sensitive employee information within the organizational framework. Compliance with stringent government regulations, encompassing a range of data security and privacy aspects, is paramount. Regulations such as the EU AI Act and GDPR highlight the complexity of this responsibility, as they introduce nuanced requirements and challenges, especially in the context of AI technology utilization in enterprises.

Regulations like the EU AI Act and data protection and privacy ones like GDPR present unique challenges to both vendors of AI-based HR software and the HR professionals who would like to use them in their enterprise. The EU AI Act, proposed in April 2021, aims to regulate AI in the EU to ensure safety, transparency, and fairness. It classifies AI systems based on the risk they pose, with varying levels of regulation for each risk level. AI systems are divided into unacceptable risk, high risk, and general-purpose categories, each with specific obligations for providers and users​​.[1] The GDPR, effective May 25, 2018, is a comprehensive data protection law that imposes obligations on organizations globally if they target or collect data related to people in the EU. It includes strict penalties for violations and defines key terms such as personal data, data processing, and data controllers. The GDPR emphasizes principles like lawfulness, fairness, transparency, data minimization, and accountability in data processing​​. [2]

Artificial intelligence creates new challenges for HR teams. Large language models (LLMs) fine-tuned with sensitive employee data can memorize sensitive information, posing a risk of leaking private data like personal identifiable information (PII). Attackers can exploit this leakage in a number of ways. One often cited technique is the model inversion attack, where the model’s output is used to infer sensitive information. In a model inversion scenario, an attacker systematically queries the model and uses the output predictions to infer the characteristics of the training dataset. This is particularly concerning if the model has been trained on sensitive data such as PII or medical records. The attacker leverages patterns in the model’s responses to reconstruct aspects of the input data, potentially breaching privacy even without direct access to the training dataset.

Techniques like data masking, pseudonymization, and anonymization are employed in both data privacy applications and during the data preparation phase prior to machine learning model training. Data masking involves creating structurally equivalent but inauthentic representations of real data, pseudonymization involves using an encoding scheme to replace sensitive employee data, while data anonymization involves erasing or encrypting identifiers that connect an individual to stored data.

For example, data masking can be used to mask the last 4 digits of a US social security number (SSN) thus protecting the employee’s SSN while allowing the model to potentially learn something from the region and year that the number was issued. Pseudonymization can be used to replace an employee’s name with a unique code thus allowing the model to learn something about a particular employee without knowledge of their name. Anonymization involves erasing or encrypting sensitive data which effectively prevents model training to learning anything from that data. However, these methods have limitations. For example, threat actors with access to the training data can use de-anonymization methods to retrace the process and reveal personal information​​ even when data is anonymized. In short, these techniques are insufficient for comprehensive data protection. Despite their limitations, these techniques are still effective at reducing the probability of the model memorizing sensitive data during training.

In addition to techniques used to prepare sensitive data for model training, a technique called Differential Privacy Stochastic Gradient Descent (DP-SGD) can be used to prevent models from memorizing private data during training. DP-SGD interferes with a machine learning model’s ability to memorize private data by introducing calibrated noise into the training process, making individual contributions indistinguishable. The mathematical proof of its effectiveness lies in the privacy guarantees provided by differential privacy, which ensures that the algorithm’s output is approximately invariant to the inclusion or exclusion of any individual’s data. Combining both data preparation and differential privacy provides a robust strategy for protecting sensitive data.

HR professionals are at the forefront of navigating the intricate interplay between data protection laws and the challenges posed by AI in the workplace. The EU AI Act and GDPR have set a new precedent for stringent AI and data processing guidelines. While LLMs and other AI technologies offer significant advantages, they also present risks, notably in potentially memorizing sensitive data. However, combining data preparation techniques like anonymization, pseudonymization, and synthetic data alongside advanced model training methods like DP-SGD can effectively mitigate the risk. HR professionals must remain vigilant and proactive, ensuring that their organizational practices not only comply with current regulations but also integrate cutting-edge technologies to safeguard privacy and maintain the trust of their employees.

[1] European Parliament. (n.d.). EU AI Act: first regulation on artificial intelligence. European Parliament News. Retrieved from https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence

[2] GDPR.eu. (n.d.). What is the GDPR, the EU’s new data protection law? Retrieved from https://gdpr.eu/what-is-gdpr/

Continue to read the series: