Concentric AI Named a Representative Vendor for Information Governance in the Gartner® Market Guide
Get Your Copy
March 12, 2025

A Guide to Data and Access Sprawl

Reading time: 10 mins
banner-bg-dawn

Massive cloud migration and digital transformation are inarguably great for business, but managing all that data is like fighting a rising tide. 

This phenomenon, called data sprawl, presents both challenges and opportunities for organizations of all sizes.

What is data sprawl?

Data sprawl is all about the proliferation of data across various locations, formats, and systems, both within and outside an organization’s control. This data is not just confined to structured databases; it also includes unstructured data such as emails, documents, social media posts, and more. 

The proliferation of cloud services, remote work, and multiple device usage like IoT further exacerbates data sprawl, leading to data being scattered across different systems, servers, and geographical locations.

What are the key data sprawl challenges?

While data sprawl can lead to innovative insights and improved decision-making when managed effectively, it often opens the door to significant risks and complexities. These include increased costs for data storage and management, potential security vulnerabilities, and difficulties in maintaining data compliance and governance.

The main challenge with data sprawl is not just its volume but also its complexity and diversity. Because data can exist in various formats and can be stored in different systems, each with its own set of access controls and security measures. The “sprawl” ramps up the degree of difficulty for organizations in achieving a comprehensive view of their data, let alone manage and secure it effectively.

Why should companies care about data sprawl?

There are numerous reasons companies should care about data sprawl, and we’ll cover some compelling statistics in the next section, but here are three.

  1. Uncontrolled data sprawl brings about significant security risks. When data is dispersed across various locations and systems, it becomes more challenging to secure and monitor, increasing the risk of data breaches.
  2. Data sprawl can complicate compliance with data protection regulations, as it becomes harder to track where personal data is stored and who has access to it.
  3. Data sprawl can lead to inefficiencies and increased costs, as redundant data storage and the need for additional management resources can drive up expenses.

Concentric AI is easy to deploy — sign up in ten minutes and see value in days.

Book a demo today

Data sprawl examples

Now, let’s look at a few real-world examples of cybersecurity incidents in which data sprawl played a key role. 

Equifax Data Breach (2017): One of the most infamous examples of how data sprawl contributed to a massive data breach, which affected 143 million individuals. Vital personal information was compromised due to poor data governance and scattered data across multiple systems. 

Marriott International Data Breach (2018): This breach, affecting up to 500 million guests, can be attributed to many factors. But one issue was decentralized data storage and management following multiple acquisitions by Marriott. The attackers had unauthorized access to a scattered array of personal data for years before they were discovered.

Accenture Cloud Misconfiguration (2017): A misconfigured AWS S3 storage bucket led to the exposure of sensitive data from a number of their clients. The company was likely using the AWS servers to migrate data from development to production.

Data sprawl: hypotheticals 

While these next examples are hypothetical, they emphasize how crucial it is to manage data sprawl in healthcare, finance and education. 

  • A hospital system uses multiple SaaS platforms to handle patient records, billing, and scheduling. What if a data breach were to occur where attackers exploit weak integrations between these platforms, gaining access to sensitive patient data scattered across different services?
  • A financial firm experiences a data leak when an employee accidentally shares a spreadsheet containing client investment details on a public cloud storage service. The data is used company-wide but without proper oversight and security measures.
  • A university’s decentralized data storage practices lead to a phishing attack where student records are compromised. What if the attack was made possible because of inconsistent security protocols across various departments’ data storage systems? 

Ultimately, managing data sprawl is crucial for maintaining data security, ensuring regulatory compliance, and optimizing resource usage.

Data sprawl by the numbers: statistics from Concentric AI Data Risk Report

Concentric AI’s Q1 2023 Data Risk Report provides a comprehensive analysis of the state of unstructured data within organizations — a key driver of data sprawl.

Unstructured data, which makes up over 80% of an organization’s data, is embedded in millions of financial reports, corporate strategies documents, source code files, and contracts. However, this data is akin to a shapeless lump of clay to IT security professionals, unseen and insecure. The report emphasizes the lack of visibility into where sensitive data is, much less where the risk is from entitlements, sharing, permissions, and activity.

In this context, data sprawl is a growing concern for businesses, underscored by the fact that the average organization has over 251 different types of business-critical categories hidden in its unstructured data. These categories range from human resources and sales to partner, product, financial, and legal documents.

The sprawl and diversity of unstructured data make it challenging to determine which documents should be a priority for security measures.

The report also presents alarming statistics on data at risk due to oversharing. On average, each organization had 802,000 data files at risk due to oversharing, up from 598,000 in the first half of 2022. Despite increasing cybersecurity investments, oversharing is a key indicator of data sprawl’s growing trend.

Also alarming: did you know that according to the report, 90% of business-critical documents are shared outside the C-suite, and over 15% of all business-critical files are at risk from oversharing, erroneous access permissions, and inappropriate classification? This can lead to internal or external users gaining access to sensitive information they should not have.

The report underscores the need for advanced AI capabilities, like those provided by Concentric AI, to process and categorize unstructured data, evaluate its business criticality, and accurately assess risk. This approach can help organizations gain visibility into their data sprawl, manage their unstructured data more effectively, and mitigate the risks associated with oversharing and inappropriate access.

Is AI Sprawl a thing?

When it comes to using Gen AI like Copilot and Gemini, today’s organizations are fully bought-in. As the enterprise adopts these artificial intelligence technologies, a new form of sprawl is popping up—AI sprawl. Just like data sprawl, AI sprawl refers to the proliferation of AI models, applications, and workflows scattered across multiple teams, departments, and infrastructure.

Just as unmanaged data sprawl creates security risks, inefficiencies, and compliance challenges, AI sprawl is bringing unwanted complexity to the data security table.

Organizations must now grapple with:

  • Untracked AI models and applications: AI projects often run simultaneously, which could lead to fragmented
    oversight, redundant efforts, and inconsistent governance.
  • Security and privacy issues: Without centralized management and proper data classification and access control, AI models and datasets could (and often do) inadvertently expose sensitive data.
  • Compliance risks: Uncontrolled AI deployments make regulatory compliance difficult, especially when handling personally identifiable information (PII), protected health information (PHI), or sensitive financial data.

When Gen AI and data sprawl come together, these issues only multiply, which emphasizes the critical need for cohesive governance across both data and AI assets.

How Concentric AI can help overcome data sprawl challenges

Concentric AI offers a robust solution to the data sprawl challenge through its data discovery and classification solution. Concentric AI leverages advanced machine learning technologies to autonomously scan and categorize data, from financial data to personally identifiable information (PII), protected health information (PHI), payment card information (PCI), intellectual property, and business confidential information — regardless of where it is stored.

With Concentric AI, organizations gain visibility into their sensitive data across unstructured or structured data repositories, email/messaging applications, and cloud or on-premises storage, all within a semantic context.

Our solution also provides centralized data classification, eliminating the need for complex rule writing or reliance on end-users. Because data can easily be shared, copied, duplicated, modified, and shared again in the era of cloud transformation, data classification has become a challenging exercise for enterprises. Concentric AI’s Semantic Intelligence allows security teams to identify their sensitive data with semantic context and label data centrally, making data classification a less daunting task.

Plus, Concentric AI seamlessly integrates with existing classification frameworks, enhancing the effectiveness of defense-in-depth, a pillar of modern data security planning. For instance, it integrates with Microsoft’s Information Protection (MIP) solution for data classification and management.

By providing autonomous data discovery, centralized data classification, and seamless integration with existing frameworks, Concentric AI offers a comprehensive solution to tackle the data sprawl issue — enhancing data security, ensuring regulatory compliance, and optimizing resource usage.

Book a demo today to see firsthand — with your own data — how Concentric AI can quickly and easily be deployed to manage data sprawl in your organization.

Concentric AI is easy to deploy — sign up in ten minutes and see value in days.

Book a demo today

The latest from Concentric AI

concentric

March 20, 2025

A guide to DSPM tools and vendors for 2025
As cloud computing adoption skyrockets, companies are grappling with the challen...
Read More
Concentric

March 14, 2025

A guide to Salesforce Data Loss Prevention (DLP) and Data Classification
Customer Relationship Management (CRM) systems like Salesforce are critical busi...
Read More
Concentric

March 3, 2025

A guide to India’s data privacy act (DPDPA)
Data privacy has never really been just a Western concern. With the rise of glob...
Read More