Webinar with Heidi Shey, Principal Analyst, Forrester – Key Learnings

We hosted our first webinar June 23 2020 and were glad to have Featured Speaker Heidi Shey from Forrester Research join us. The topic was titled “Are you on top of it? How organizations are protecting business critical data”

The topics that we covered included

  • What is business-critical data? It’s not just PCI/PII anymore!
  • Why data security, especially for unstructured data is so hard
  • Why data discovery is foundational for an effective data strategy
  • How discovery and classification strategies vary across use cases
  • Emerging technologies that can help with data discovery and risk monitoring
  • Use cases and considerations for deployment

Heidi made the following key points

  • While PII gets the most attention on data worthy of protection, enterprises should also think about intellectual property, source code, algorithms, scripts and communications such as emails, Zoom chats etc
  • The format of the data is also important – structured data, unstructured as well as other types like audio and video
  • According to Forrester research, 33% of data breaches were attributable to external attacks, 25% to internal incidents and 21% to third party attacks
  • Data discovery is foundational to an effective data strategy to support security, privacy, compliance, data governance and monetization strategy
    • Core to a data centric security and privacy strategy
    • It is necessary for multiple frameworks and standards such as Zero Trust, FFIEC, NIST 800-53, COBIT, HIPAA, SOC1 and SOC2, ISO 27000, PCI
    • Provides additional benefits beyond security by enabling an ethical digital transformation, accelerating the business use of cloud, improving employee productivity and experience and supporting data driven business initiatives
  • You cannot protect what you don’t know
    • Data discovery is an added feature for many tools but in most cases, you get what you get if you treat it as a feature
    • The following are key capabilities as you think about Discovery
      • Finding sensitive data
      • Mapping data flows
      • Identifying data patterns or outliers
      • Identifying data relationships
      • Preparing data for business users
  • Data classification entails
    • Identifying the data
    • Labeling or tagging the file to ensure its appropriate identification
  • In order to set yourself up for success
    • Make sure you have your use cases for data discovery and classification defined
    • Inventory existing tools you have and capabilities and limitations
    • If bringing in new technologies, understand how it performs data discovery and classification for e.g. pattern matching or something else
    • Remember that it is never about just the technology but also the process and people


I addressed the topics of why data security especially around unstructured data is hard and some promising new technologies to help with data discovery and risk mitigation

  • Data discovery Is hard because of a couple of dimensions
    • Unstructured data comes in many forms – from biz confidential data such as contracts, legal agreements to IP data such as source code, patents, research docs to financial data such as trading data, income and bookings – not to mention PII/PCI data strewn across these data elements. This data is complex in nature, context often decides sensitivity and there is no predefined pattern in which this data manifests itself within an enterprise
    • This data also is sprawled across enterprise data stores – from on-premises data stores to cloud repositories. The challenge here is not data volumes but also that a contract may be on SharePoint and a modified variant in S3 and a third version in Box, all data variants with inconsistent access permissions. Finding these inconsistencies is hard
  • Our live findings from customer deployments have provided the following key findings
    • We have detected more than 90+ thematic categories of data, of which >40 were of a biz sensitive nature
    • Over 20% of the data was deemed to be business critical;
    • Over 11% of this data was overshared; Overshared is defined as data that is accessible by users/groups/ third parties that should not have access to it
    • Oversharing dramatically increases the risk surface to the data within an enterprise. One business-critical document that’s overshared with even a small percentage of your workforce is far more likely to be lost. A document erroneously placed in a broadly shared folder, for example, faces the cumulative risk of credential theft from any of that folder’s members.
  • Traditional data discovery techniques fail because they rely on word patterns, regex and rule writing. The limitations are
    • It requires IT staff to be content experts, which does not work in the face of such myriad content
    • Word based patterns lack the context necessary to derive the true meaning of a lot of this content
    • Therefore, existing techniques lack completeness and are riddled with false positives
  • So what can one about it?
    • Deep Leaning offers great promise as a technology to help develop rich context into data and provide thematic category oriented views into the data without regex rules or training data
    • Deep Leaning can also understand not just the various thematic groups but also in aggregate where they are located, who they have been shared with, who has access to it to autonomously identify oversharing
    • Risk from oversharing can be remediated using auto classification, fixing entitlements and so on


Our mission at Concentric is to help enterprises discover and protect their business critical data. We use deep learning to help understand their data, autonomously mine for risk and remediate to mitigate risk of data loss. Interested in a demo? Please drop us a note https://landing.devinclucas.com/demo-request or [email protected]


Share on twitter
Share on linkedin