About the detection, search and management of personal data

About personal data discovery, search and management

The European General Data Protection Regulation (EU GDPR) is inexorably approaching. The more you look into the policies, the more confusing and dense the data protection jungle seems to get.

But on the 25. May 2018, for all organizations that collect or process personal data of EU citizens (regardless of whether the organization has a presence in the EU or not), the regulation will come into force.

The following lines are intended to point out a few hints to help you better navigate through the jungle.

The EU GDPR contains a wide range of requirements in relation to personal data. These include u.a.:

  • Providing information to data subjects regarding the collection of data
  • Access to and copies of data subjects' data in a commonly used format
  • Correcting inaccurate or incomplete sensitive data
  • Restriction on processing and/or erasure of data (in certain circumstances) and at the request of data subjects
  • Securing personal data at rest and in motion, and protecting it for ongoing confidentiality, integrity, availability, and resilience

In order to comply with any of the above GDPR requirements at all, the first step is to identify the data that an organization has collected, stored or processed. This means identifying what data, defined as personal data in the regulation, is affected and where it is stored – including copies of it. In addition, there must be the ability to export this sensitive data in a common file format while keeping it safe and secure.

It can be seen that this is not as simple as it seems at first glance. Key to this is a good data identification and data classification system and the use of available security tools to assist in implementing.

What is sensitive personal data?

Before proceeding with data classification, one must first understand the types of data that fit within the GDPR's list of requirements. According to Article 4 of the Regulation, personal data means any information relating to an identified or identifiable natural person (data subject); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, a location, an online identifier or to several factors which could be representative of a physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

As can be seen here, this is done with the phrase “both directly and indirectly” Very quickly arrive at a very broad definition. Some of the things that fall under this definition consist of information that we don't normally think of as personal data, such as an IP address, a mobile device ID, or a web cookie.

But that is not all. To complicate matters further, not all types of personal data are treated the same. Article 9 of the GDPR deals with a special category of personal data, commonly referred to as sensitive personal data. This type of data requires additional protection and consists of data related to racial or ethnic origin, political opinion, religious or philosophical beliefs, trade union membership, and health or sex life. Criminal data is also addressed in a separate provision in Article 10.

What is not personal data?

One buzzword that appears in the GDPR is pseudonymization. This refers to the use of, for example, encryption, hashing or other technological means to disguise instant identification of the personal data of a particular individual. The GDPR suggests pseudonymization, as well as encryption, to companies as specific recommended measures to protect personal data.

It is very important to understand that pseudonymized data is still classified as personal data under the GDPR, which means that it is still subject to most of the above requirements. However, there are some exceptions or relaxed requirements for pseudonymized data, in terms of data breach notification requirements and greater flexibility to profile data without the explicit consent of the data subject.

The only data that is not considered personal data under the regulation is data that is not linked in any way to a specific individual. For the purposes of the General Data Protection Regulation, companies and organizations are not considered “persons” Although such entities may have this status under certain laws, information about such organizations is not likely to fall within the definition of “personal data” Would fall. But beware, this would only apply if the organization was so large that you couldn't identify any direct information about the individual, within the organization. This is not really realistic!?

Where is the personal data?

Once you get a clear picture in the jungle of data definitions of what is now personal data (and what is not), you can start implementing it. However, this is easier said than done. Digitized personal data can be found in many different places on different storage locations of one or more systems. Databases are probably the most obvious place for such data, but personal information can also be found in documents, spreadsheets, email messages and other file types.

How to find that data when it's hidden in a sea of other, non-personal data? It is certainly not feasible for a human to read every file to check for personal data. You need a way to automate the process with algorithms that are able to recognize personal data and distinguish it from other data.

How to identify personal data?

Fortunately, many types of personal data follow an identifiable pattern. For example, have passport numbers, phone numbers, credit card numbers, etc. a certain number of digits or characters. In the U.S., Social Security numbers follow the pattern of three digits, a dash, two digits, another dash and four digits. However, we are talking about the European Union, and that means we are dealing with many different countries, and the patterns that some of the identifiers refer to are different again for each country.

These identifiers can include driver's licenses, license plates, VAT codes, health identification numbers and various other national ID numbers. Searching for all these different patterns can be very challenging.

Powerful and comprehensive search technology is required to track down all personal data collected, stored and processed by an organization. Multiple search types are needed. Regular expressions can be used to find sequences that follow the known patterns of different types of personal data (e.g. Numbers that match the number of digits for EU passports). Other algorithms can search for keywords that, when combined with the sequence of digits, indicate a specific type of personal data, such as a country code or name.

Seek and you shall find: use integrated software tools and functions

The software you use to store and process your data most likely also includes some tools to search for personal data. For example, Microsoft's Azure, Office 365, Enterprise Mobility + Security (EMS), Dynamics 365, Azure SQL Database, and SharePoint cloud services include several search features that could be used to search for personal data.

With Azure Active Directory, Azure Data Catalog, Power Query (for Hadoop clusters in Azure HDInsight), Azure Search and other related tools, personal data can be found in Microsoft Azure environments .

Once you have found the personal data, you can use other tools like Azure Information Protection to implement data classification and apply persistent labels to the personal data. Also, via REST API or with Azure data catalog, be able to annotate registered personal data.

Office 365 customers could use content search in Advanced eDiscovery to find and identify personal data in Exchange Online, SharePoint Online, OneDrive for Business and Skype for Business. With Office 365 Labels it is possible to classify personal data and apply encryption and access restrictions. The Advanced Data Governance tool is further used to automate policies and the retention and deletion of data.

To find personal data on on-premises computers and on-premises servers, Windows Search, PowerShell, and other operating system features are a good place to start.

Searching personal data in SQL databases can be done using SQL language for querying databases and customizing tools or services. Microsoft Compliance Manager can help cloud service customers manage necessary compliance from one interface.

Personal data management and protection

Once personal data has been identified and classified, it must be managed and protected in accordance with the GDPR requirements. This is where monitoring, management and security software, as well as various security services, come into play. For example, GFI offers various security solutions to keep the network secure, apply patches, prevent data breaches through vulnerability exploits, reduce the risk of data leaks – including those related to BYOD systems and portable storage devices – and detect suspicious activity that could signal a personal data breach.

Conclusion

In order to comply with the GDPR regulations, which are aimed at protecting personal data, the data must first be found and then classified. But this is only the first step. Once the data has been identified, appropriate security controls must be implemented to ensure the continued confidentiality and integrity of the data. After that, you should also be aware of the notification requirement in the event of an evtl. Thinking about a data breach that has occurred.

The functions integrated into operating systems and cloud services help to find the data to be protected. The combination of integrated security functions, appropriately configured with advanced management and monitoring solutions, supports the next step in achieving the necessary compliance.