Healthcare Data Leaks on GitHub: Credentials, Corporate Data and the PHI of 150,000+ Patients Exposed
A new report has revealed the personal and protected health information of patients and other sensitive data are being exposed online without the knowledge of covered entities and business associates through public GitHub repositories.
Jelle Ursem, a security researcher from the Netherlands, discovered at least 9 entities in the United States – including HIPAA-covered entities and business associates – have been leaking sensitive data via GitHub. The 9 leaks – which involve between 150,000 and 200,000 patient records – may just be the tip of the iceberg. The search for exposed data was halted to ensure the entities concerned could be contacted and to produce the report to highlight the risks to the healthcare community.
Even if your organization does not use GitHub, that does not necessarily mean that you will not be affected. The actions of a single employee or third-party contracted developer may have opened the door and allowed unauthorized individuals to gain access to sensitive data.
Exposed PII and PHI in Public GitHub Repositories
Jelle Ursem is an ethical security researcher who has previously identified many data leaks on GitHub, including by Fortune 500 firms, publicly traded companies, and government organizations. Ursem decided to conduct a search to find out if any medical data had been leaked on GitHub. It took just 10 minutes to confirm that it had, but it soon became clear that this was far from an isolated case.
Get The Checklist
Free and Immediate Download
of HIPAA Compliance Checklist
Delivered via email so verify your email address is correct.
Your Privacy Respected
Ursem conducted searches such as “companyname password” and “medicaid password FTP” and discovered several hard-coded usernames and passwords could be found in code uploaded to GitHub. Those usernames and passwords allowed him to login to Microsoft Office 365 and Google G Suite accounts and gain access to a wide range of sensitive information such as user data, contracts, agendas, internal documents, team chats, and the protected health information of patients.
“GitHub search is the most dangerous hacking tool out there,” said Ursem. Why go to the trouble of hacking a company when it is leaking data that can be found with a simple search on GitHub?
Ursem attempted to make contact with the companies concerned to alert them to the exposure of their data and ensure the information was secured, but making contact with those organizations and getting the data secured proved problematic, so Ursem contacted databreaches.net for assistance.
Together, Dissent Doe of DataBreaches.net and Ursem worked together to contact the organizations concerned and get the data secured. In some cases, they succeeded – with considerable effort – but even after several months of attempts at contacting the companies concerned, explaining the severity of the situation, and offering help to address the problems that led to the exposure of data, some of that data is still accessible.
9 Leaks Identified but There are Likely to be Others
The report details 9 leaks that affected U.S. entities – namely Xybion, MedPro Billing, Texas Physician House Calls, VirMedica, MaineCare, Waystar, Shields Health Care Group, AccQData – and one unnamed entity: Unnamed because the data is still accessible.
The most common causes of GitHub data leaks were developers who had embedded hard-coded credentials into code that had been uploaded into public GitHub repositories, the use of public repositories instead of private repositories, and developers who had abandoned repositories when they were no longer required, rather than securely deleting them.
For example, Ursem found that a developer at Xybion – a software, services and consulting company with a presence in workplace health issues – had left code in a public GitHub repository in February 2020. The code included hard-coded credentials for a system user that, in connection with other code, allowed Ursem to access billing back-office systems that contained the PHI of 7,000 patients, together with more than 11,000 insurance claims dating back to October 31, 2018.
It was a similar story with MaineCare – a state- and federally-funded program that provides healthcare coverage to Maine residents. In that case, hard-coded credentials gave Ursem administrative access to the entire website, access to the internal server infrastructure of MaineCare / Molina Health, MaineCare SQL data sources, and the PHI of 75,000 individuals.
The Typhoid Mary of Data Leaks
The report highlights one developer, who has worked with a large number of healthcare organizations, whose GitHub practices have led to the exposure of many credentials and the PHI of an estimated 200,000 clients. That individual has been called the “Typhoid Mary of Data Leaks”.
The developer made many mistakes that allowed client data to be exposed, including leaking the credentials of 5 employers on GitHub and leaving repositories fully accessible after work had been completed. In one case, the actions of that developer had allowed access to the central telephone system of a large entity in debt collection, and in another credentials allowed access to highly sensitive records for people with a history of substance abuse.
While it was not possible to contact that individual directly, it appears that the work of DataBreaches.net and Ursem has gotten the message through to the developer. The repositories have now been removed or made private, but not before the data was cloned by at least one third party.
This was just one example of several outsourced or contracted developers who were being used by HIPAA-covered entities and business associates, whose practices exposed data unbeknownst to the CEs and BAs.
“No matter how big or small you are, there’s a real chance that one of your employees has thrown the front door key under the doormat and has forgotten that the doormat is transparent,” explained Dissent Doe of DataBreaches.net. Regardless of whether your organization uses GitHub, HIPAA Journal believes the report to be essential reading.
The collaborative report from Jelle Ursem and DataBreaches.net explains how the leaks occurred, why they have gone undetected for so long, and details several recommendations on how data breaches on GitHub can be prevented – and detected and addressed quickly in the event that mistakes are made. You can download the full PDF report on this link.
Many thanks to Dissent Doe for notifying HIPAA Journal, to Jelle Ursem for discovering the data leaks, and for the hard work of both parties investigating the leaks, contacting the entities concerned, and highlighting the problem to help HIPAA-covered entities and their business associates take steps to prevent GitHub data breaches moving forward.