Data Classification Redefined: Unleashing the Power of Archetypes

July 15, 2023
Cyrus Tehrani
5 min read

As the shift towards digital and cloud-based solutions accelerates, data has become the lifeblood of any organization. However, as the volume of data we store and process grows exponentially, the challenge of classifying and protecting it escalates at the same level.

Making matters more difficult, the data comes in a myriad of data types, including intellectual property, financial, business confidential, and regulated PII/PCI/PHI data – all within increasingly intricate environments. Because so much of this data is unstructured, it only adds to the complexity and ramps up the data protection challenges.

This is where data classification comes into play. By systematically categorizing your data based on its sensitivity, organizations are empowered to protect their most critical information from unauthorized access, disclosure, or alteration.

What is data classification? A brief primer

Data Classification helps you identify high-value and sensitive data by categorizing it into a very specific set of meaningful categories.

Data Classification drives multiple use cases such as data labelling, sensitive data identification, automating protection, compliance, security, access control, and data retention.

A basic definition: data classification is about labeling your data.

The limitations of traditional data classification

Unfortunately, traditional data classification methods often fall short, unable to provide the level of granularity and precision needed to effectively manage and secure sensitive data.

Traditional data classification methods often rely on broad categories and manual processes. They struggle to keep up with the sheer volume and complexity of modern data, leading to inaccuracies, inefficiencies, and increased risk.

On top of that, these methods often fail to capture the nuanced differences between different types of sensitive information, such as a legal contract versus a financial tax form. This lack of precision can leave organizations vulnerable to data breaches and non-compliance penalties.

Enter the power of archetypes – a game-changing approach to data classification that can redefine the landscape of data security.

Introducing Archetypes: data classification redefined

An archetype, in the context of data classification, is a specific type of data or file that contains sensitive or confidential information. This could be a contract in the legal industry, a tax form in finance, a workers’ compensation claim in insurance, or a sales quote. The ability to identify and classify these archetypes provides a level of granularity and precision that traditional methods simply can’t match.

Concentric AI: Leading the Way in Archetype-Based Data Classification

The power of archetypes lies in their specificity and level of granularity. Concentric Semantic Intelligence identifies the exact type of form or document, such as a business insurance claim or an auto insurance policy. This level of detail allows for more precise risk assessments and data management strategies. It’s a significant advancement beyond traditional Personally Identifiable Information (PII) detection.

The term “archetype” is unique to Concentric AI, setting it apart from competitors who lack this level of granularity.

This differentiation is a testament to Concentric’s innovative approach to data security. Our models, over 600 and growing, are unique and automatically match different categories of information.

Concentric’s solution also allows customers to define their own categories of information and associate them with specific types of privacy identifiers. This feature is particularly useful for organizations dealing with unique forms or documents not covered by existing archetypes. It’s a proactive approach to data security, allowing organizations to identify potential risks without having to search for them.

Drilling Down: Unveiling the Layers of Data with Concentric

One of the standout features of Concentric’s Semantic Intelligence solution is its ability to drill down into the data after identifying the archetype. This isn’t just about identifying a document as a contract or a tax form; it’s about understanding the layers of information within that document and categorizing them accordingly.

Once an archetype is identified, Concentric can further classify the data into subcategories. For example, a contract could be subcategorized based on its specific type, such as an employment contract, a non-disclosure agreement, or a lease agreement. This level of detail provides organizations with a more nuanced understanding of their data, enabling more precise risk assessments and data management strategies.

But Concentric also identifies the sensitive information contained within the document. This could include personal information such as names, addresses, dates of birth, email addresses, phone numbers, or Social Insurance Numbers. By identifying and classifying this sensitive information, Concentric’s solution provides organizations with a comprehensive overview of their data landscape, highlighting potential areas of risk and ensuring that appropriate data protection measures can be put in place.

This ability to drill down into the data is what Concentric is all about: it’s not just about classifying data; it’s about understanding it. And in the world of data security, understanding is the first step towards protection.

A brief example of archetypes in the Concentric Semantic Intelligence platform

Let’s check our demo company’s Finance category from the Content Explorer on the lefthand navigation menu.

Notice that Concentric has broken down risk by Category, Confidence, Subcategory, and Archetype.

When we click on More under Archetype, we get a comprehensive list of very specific data types.

Let’s click on Tax Form 990.

When we click on the Files tab in the lower section of the window, we can see the actual filename that contains sensitive data on a Tax Form 990.

Now, when we click on the Directory tab in that lower section, we can see all the different locations the company is storing this file.

In this case, it’s located in 4 different locations even though it should only be stored in one place.

See Archetypes in action with your own data

Ultimately, Concentric autonomously identifies data, classifies that data, learns how it’s used, and determines whether it’s at risk. Our solution empowers you to know where your data is across unstructured or structured data repositories, email/ messaging applications, cloud or on-premises – all with semantic context and with more granularity than any other solution on the market.

To see firsthand — with your own data — how you can quickly and easily deploy Concentric AI’s solution to classify your data without rules, regex, or end-user involvement, book a demo today.



Libero nibh at ultrices torquent litora dictum porta info [email protected]

Getting started is easy

Start connecting your payment with Switch App.