Study Suggests HIPAA Data De-identification Improvements Required
Under HIPAA Rules, healthcare providers and other covered entities (CEs) are permitted to use the Protected Health Information (PHI) of patients – and share this information with others – provided that the data has been de-identified. It must not be possible for PHI data to be tied to any individual.
CEs are permitted to share the data if it can be demonstrated that the risk of that data being associated with a particular patient is small and have two options for de-identifying healthcare data prior to sharing that information with a Business Associate:
They can de-identify data using a model such as k-anonymity, or they can set a rule-based policy – the Safe Harbor model – that changes data values; for example, changing dates of birth to the following or preceding year, or stripping out days and dates to just provide a patient’s age. However, while the latter method is often used, it is far from perfect.
According to a recent study published in the Journal of the American Medical Informatics Association (JAMIA), this procedure does not tailor protections to the capabilities of the recipient. The study also says that “Rule-based policies can be mapped to a utility (U) and re-identification risk (R) space, which can be searched for a collection, or frontier, of policies that systematically tradeoff between these goals.”
Get The FREE
HIPAA Compliance Checklist
Immediate Delivery of Checklist Link To Your Email Address
Please Enter Correct Email Address
Your Privacy Respected
HIPAA Journal Privacy Policy
Under the HIPAA Safe Harbor model, there are 18 different rules that exist to de-identify data and remove explicit identifiers such as patient names. The removal of quasi-identifiers – such as dates of birth and appointment or treatment dates – is also covered. In these cases, dates are replaced with years and ages are changed or are grouped (18-24, 25-30, 90+). The problem, as pointed out by the researchers, is that these rules are inflexible and are not tailored to the intended recipient, which is far from ideal.
To tackle this issue, the researchers have suggested an alternative model for protecting PHI. In the paper, it is explained that the Sublattice Heuristic Search (SHS) algorithm could actually be ideal for the healthcare industry to adopt and use in data de-identification policies.
Researchers showed that an efficient and effective mechanism can be utilized to discover alternatives to rule-based de-identification policies for patient-level datasets. This was achieved by developing an “algorithm designed to search a collection of de-identification policies that compose a frontier that optimally balances risk (R) and utility (U).” The researchers determined that “this approach allows for guidance, interpretation, and justification of rule-based policies, as opposed to relying on a predefined standard in terms of the re-identification risk and data utility or formal models.”
According to the research paper, “Formally, a frontier is a set of policies that are not strictly dominated by other policies,” the paper said. “Intuitively, a policy pA strictly dominates a policy pB when both risk and utility loss values of pA are no greater than the corresponding values of pB and at least one value is strictly less that of pB.”
Frontier initialization and improvement strategies were assessed by the researchers who compared the resulting frontiers after searching the same quantity of policies. According to the paper, the researchers “used the area under the frontier in the R-U space, denoted as AU, as the criteria of the frontier given the orientation of risk and utility loss,” the authors wrote. “We reported the results after every 1000 policies while searching the first 5000 policies.”
R-U tradeoffs were then compared relating to the frontier discovered by the best SHS configuration; using a popular k-anonymity method in addition to the HIPAA Safe Harbor method. The researchers determined that k-anonymization strategies and fixed rule-based policies were inferior to SHS.
The researchers concluded that the SHS strategy “has the potential to be a method that overcomes the limitations of a single fixed rule-based policy while being interpretable to health data managers.”
The study also concluded that “R-U frontiers of de-identification policies can be discovered efficiently, allowing healthcare organizations to tailor protections to anticipated needs and trustworthiness of recipients.”