As companies continue to collect more information about their customers, good data hygiene is imperative to minimize the risk of inadvertent disclosure or data exposure because of a breach. Data minimization, or limiting the amount of sensitive data collected and stored, can help organizations be responsible stewards of customer data while reducing their risks of financial and reputational damage and strengthening their cybersecurity program.
The legal definition of sensitive data varies based on state, federal, or other jurisdictions’ laws. Generally, sensitive data involves data that, if exposed, could lead to identity theft, fraud, and financial and reputational harm or that could harm national security. Sensitive data includes personally identifiable information such as Social Security numbers, credit card details, and driver’s license numbers. It also includes personal health information, such as medical history, diagnoses, mental health treatment, test results, and blood and tissue samples. Criminal justice data, intellectual property, trade secrets, and information regarding national security also constitute sensitive data.
One of the first steps in building a mature data management program is to identify what data the organization has and what characteristics are associated with the data. A complete inventory of the data is as important to cybersecurity as an accurate inventory of hardware and software. Automated tools can track what data organizations have as well as levels of sensitivity. Organizations can identify and track critical information by asking the following questions:
In the data life cycle, the acquisition phase is the first opportunity to practice good hygiene. Organizations should assess whether they are collecting the minimally necessary information. The process of acquisition should be examined against several questions, including:
By reducing the amount of data they collect, organizations can lower their exposure as well as their costs to maintain the data over its life cycle. Data minimization also helps reduce the regulatory burden of notification and reporting and limit discovery burdens during litigation.
If they haven’t already, organizations can establish a practice of tagging data, which can help improve data hygiene. Organizations also should verify that they have properly classified the data and that the data has been secured appropriately according to the classification level. Even if organizations do not retroactively tag previously collected data, they can be better data stewards going forward.
Eventually, nontagged data will age out if organizations properly follow a retention schedule. This retention schedule should identify each category of data, such as human resources data, health information, contract information, and financial statements. The schedule should include dates after which a specific data type can be deleted based on the regulatory framework that governs the information collection, processing, and storage of this data.
A data protection schema based on the level of sensitivity of the data is an excellent tool to put in place. Storage and handling requirements for sensitive data should include that the sensitive data be encrypted at rest and in transit, access to the data should be on a least-privileged basis, and multifactor authentication and single sign-on tools should be deployed to enforce access restrictions. Data access should be logged. Access should be reviewed at a frequency which is commensurate with the sensitivity of the data: quarterly for confidential data, monthly for restricted data, and weekly for classified data. Finally, the most powerful data management tool is data deletion when there is no longer a business or regulatory justification to retain the data.
Organizations also might need to comply with regulations, such as federal and state laws including the Health Insurance Portability and Accountability Act (45 CFR Part 164) and Criminal Justice Information Services Division rules, or the General Data Protection Regulation if they have business operations in Europe. Data protection controls should comply with these regulatory obligations as failure to do so can result in significant fines and penalties. Data privacy regulations in the U.S. present a challenge to organizations as data privacy regulations are not uniform, and each state promulgates its own regulations. Organizations can reduce costs by setting their data protection to comply with the strictest privacy regulations found in the states where they operate and by using that schema for all their sensitive data. Jurisdictions must be monitored for any regulatory changes that might require modifications to the data protection schema organizations use.
Once organizations have gathered data and tagged, encrypted, and restricted access to it, they should then define what retention period applies. If that date is rapidly approaching, general counsel might have objections about deleting it. Some of that data might be on litigation hold for preservation, and, where that is applicable, the data cannot be deleted. Business units should coordinate with legal to verify that the data that has reached its retention date can be deleted. If organizations have a board-approved retention policy in place (or statutory-required policy, if a government agency) and they follow that policy, deleting data that is not under a preservation legal hold is permitted. If the data is deleted under a valid policy, then it does not have to be turned over in discovery and can’t be included in a future data breach.
Once organizations have identified that a data breach has occurred, four groups of external stakeholders come into view. The first group is threat actors. In the case of ransomware attacks, they demand money, and if they don’t receive it, they will release the data. In other cases, particularly with advanced persistent threat actors, their goal is spying, corporate espionage, or gathering as much intelligence as possible.
The second group includes regulators. In many jurisdictions, laws require that organizations report data breaches that affect a greater number of subjects than some specific statutory thresholds allow.
Data subjects represent the third group. In many breaches, data subjects must be notified that their data has been compromised, and, in many cases, this notification must be conducted promptly. Even if the breach involves a single data subject, notification that data has been compromised is often required.
The fourth group of stakeholders is investors and the public. Publicly traded companies must provide information to investors about any material impact of a cybersecurity event. Such an event is material if an investor would consider the event and its impacts when making an investment decision. If government entities are breached, they are required to notify affected parties – and the general public – by publishing a press release or notice in newspapers or on the agency’s website.
The cornerstone of good data hygiene is minimizing data as much as is practical. Once data is no longer needed, it should be deleted. Data that is retained should be tagged and tracked. If a customer wishes to use the data for analysis, reporting, or other purposes, the data can also be anonymized to mitigate the damage due to a breach.
Data minimization is not just a best practice; it’s a necessary approach to reducing the risks associated with data breaches. By carefully assessing what data is collected, implementing secure handling practices based on the sensitivity of the data, and deleting unnecessary data, organizations can significantly enhance their security posture and cyber resilience while maintaining trust with their customers and reducing the risks of financial and reputational damage through responsible data stewardship.