Delphix & the EU General Data Protection Regulation (GDPR)

keep-calm-and-prepare-for-the-gdpr

GDPR Observations

Last week I was reading a CMS Wire article by Brian Wallace, titled Who’s Ready for the GDPR? [Infographic], and found a few of the data points cited, eye catching.

On May 25, 2018 the GDPR goes into effect and according to the embedded infographic…

Note: direct quotes from the infographic are cited in italic.
  1. The GDPR requires that all EU Citizen data [i.e. Sensitive & Personal] be protected as stipulated in the final text of the regulation, even if the data lives outside of the EU.
    • Sensitive data: Name, location, identification numbers, IP address, cookies, RFID info
    • Sensitive personal data: Health data, genetic data, biometric data, racial or ethnic data, political opinions, sexual orientation
  2. 92% of U.S. businesses list GDPR as a top data protection priority
    • 77% of U.S. businesses have started preparing for GDPR, but only 6% are GDPR-ready
      • The low readiness percentage is consistent with my experience working alongside data owners at major U.S. corporations
  3. In addition to protecting EU Citizen’s data, there are other services a custodian of their data must provide. Some of these include:
    • EU citizens have the right to access their data as well as information about how it is being used 

    • EU citizens can take their data to a different agency upon request 

    • EU citizens have the right to data erasure 

    • Certain companies and governmental organizations must appoint a Data Protection Officer
    • Companies must implement reasonable data protection measures 

    • Companies must assess for threats
  4. Noncompliance with the GDPR will be costly. Top tier fines are set at €20 million or 4 percent of global annual turnover, whichever is greater

What are the challenges specific to a Data Protection Officer?

The challenges are the same challenges faced by CIOs and CDOs in major corporations today;  to secure sensitive and personal data while delivering copies to developers, testers, and analysts in effort to compete at the speed required in the Digital Age. Corporate metrics used to measure success also remain the same – increase revenue, reduce costs, and stay compliant. Sounds reasonable until you evaluate the application services your organization provides and realize your data is heavy and the anchor by which all other tasks in your process workflow wait.

What I consistently hear from clients include, but are not limited to:

  1. Slow, complex, masking process workflows which require teams of programmers to maintain the code
    • Integration testing (i.e. maintaining referential integrity across multiple, disparate, databases) adds substantial complexity
    • No concept of a masked master data set where copies can be quickly created
  2.  More than one masking toolset and/or process workflow; requires multiple skillsets and teams
  3. Masked data does not have realistic values substituted into the fields
  4. Too few copies of data sets to developers, testers, and analysts means sharing
    • a corruption introduced by one individual stops everyone from working
  5. Too many copies of the data sets requires a significant amount of storage and time to refresh and manage multiple copies
    • physically impossible to accommodate due to limited capital resources
  6. Teams that subset data to deliver copies faster and reduce storage are simply pushing the problem downstream
    • Developers cannot test end-to-end processes
      • Too often issues are only exposed in production
    • Testers are limited to a small set of test cases
      • Too often defects are found later in QA
    • QA, as told to me by every CIO I speak to, bears the brunt of performing the lions share of testing, meaning
      • issues and defects which should be found in Dev and Test are found in QA, typically due to providing stale and/or subset data to Dev and Test
      • the goal for these CIOs is to shift left their testing process workflows so QA can focus on a finite set of product quality testing

The Delphix Masking Engine

Addressing #1-3 (above)

Contrary to most of the masking solutions in the industry today, which are complex, require programmers, and are difficult to manage when changes occur to the data sets, Delphix Masking provides a GUI-based software solution. There are (3) powerful and easy to use components which simplify the core capabilities of an enterprise class masking tool.

  1. Profile – scan the selected data sets, identify sensitive data, and return a report of elements found along with recommended masking algorithms.
  2. Secure (Mask) – Apply the assigned masking algorithms to their respective elements while maintaining referential integrity; no programming required. Elements will be masked with fictitious, but realistic data substitutions. Once the algorithms are assigned the masking will be consistent and repeatable.
  3. Audit – To ease the demands of maintaining compliancy, Delphix provides a report that identifies which sensitive data elements have been protected thus simplying delivery to auditors. Audit will also alert admins if new data fields are added which introduce new vulnerabilities.

Screen Shot 2017-06-20 at 4.24.02 PM

The Delphix Data Virtualization Engine

Addressing #4-6 (above)

Data virtualization is the complimentary capability to masking. Protecting data at rest and in use with masking accommodates regulatory requirements but does nothing to enable your business to ‘go faster’. Why? Because data is still heavy and slow. Delphix Data Virtualization addresses the demands of your business by making data lightweight. What if…

  • you can have a full size, secure (masked), read-writable copy of any size database in minutes?
  • have as many copies of that database as you want/need without additional storage costs?
  •  provide developers, testers, and analysts with self-service access to their database (or files) and include the ability to:
    • reset, rewind, or refresh their database w/o opening a ticket
    • bookmark copies of their database for future reference or share with other teams
    • version control data like teams do for source code

Well, those are not ‘what if’ scenarios but real capabilities found in the Delphix Data Virtualization Engine. The (3) areas that define how Delphix manages the data virtualization workflow are:

  1. Collect – Delphix attaches to data sources (Databases and Applications) using native protocols to the platform.
  2. Control – By maintaining a unique set of common blocks in Delphix, users experience a 90% savings in non-prod storage. Leveraging the TimeFlow retention log users can provision copies from any point in time; masked master copies can be created from which all other copies can be created in minutes with certainty where and how the data was protected and distributed.
  3. Consume – Developers, testers, and analysts can refresh, rewind,  restore, bookmark, and share their database(s) and application(s) from any point in time in a matter of minutes versus the hour, days, and weeks required today.
howiiworks-diagram-new-08

Delphix Virtualization & Masking Engine(s)

Summary

GDPR will commence on May 25, 2018 and with it bringing hefty penalties for non-compliance. The level of effort and the impact to every organization in massive in scope and a distraction from day-to-day development and maintenance of your business services. Delphix provides an enterprise class solution to accommodate the protection of sensitive and personal data through an easy to use, but very powerful, masking solution. Combining masking with data virtualization enables businesses to continue to work securely on business services while adapting new workflow processes to address GDPR.

For more information to learn if Delphix is the right solution for you, please Contact Us.

Finding a Data Virtualization Solution

proprietary eponym
brand name or trademark of a successful product, that has come into general use to refer to the generic class of objects rather than the specific brand type, without the exclusive rights to said product being lost by the parent company. For example, kleenex is used to describe many types of facial tissue.
                         DELPHIX may become the next proprietary eponym for Data Virtualization. 

What is Data Virtualization?

In order to find a data virtualization solution we need to understand its meaning. In doing so we quickly realize that data virtualization is a very broad term that’s loosely defined. As such it’s a solution space that’s still in its infancy, changing with the advent of new technologies. So before we can find a solution, we need to define the criteria in which we are searching.

Data are the quantities, characters, or symbols stored for computer processing and calculation. Virtualization makes a single physical instance of something appear as several instances in the same footprint (space) as the original physical instance. Data Virtualization is the representation of a single physical instance of data (e.g. a 5 TB Oracle 11g DB) and presenting several , say 10, usable instances in the space of the original physical instance (in this example, 50 TBs in the space of  5 TB) which appear as 10 full physical instances to the end user.

However, the explanation above is only one interpretation.

Wikipedia defines this term as any approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.

Screen Shot 2016-08-02 at 11.01.33 AM

Search results include references to data integration, data federation, and simplified/consolidated or integrated views. Simplified views represents an abstract layer that sits in front of multiple data sources which are presented to the consumer as a single source. Indeed a data virtualization technique with several products on the market including Cisco Data Virtualization.

In addition to Data Virtualization, other common Google search terms include Data on Demand, Data as a Service (DaaS), and Copy Data Management. These terms are certainly types of data virtualization which provide value to an organization but some fall short in addressing your data. A bigger problem is that each of these data virtualization types imply the need for disparate solution sets. Ideally, executives seek to so rationalize their solutions, not expand them.

Data on Demand    Screen Shot 2016-08-02 at 11.09.11 AM

A search returns many resources offering historical and statistical data from data repositories across many public and private industry verticals.  One example is NASDAQ which offers a service that provides for-fee data sets in different consumable formats. These services do provide a virtual-like service, but none of them {from what I observed} offer a solution to provide your data on demand.

Data as a Service (DaaS)    Screen Shot 2016-08-02 at 11.11.10 AM

As the name implies, *as-a-service is associated with compute clouds. Dataversity explains … DaaS model is all about offloading the risks and burdens of Data Management to a third-party Cloud-based provider. But wouldn’t it be ideal to have your data securely delivered as a service to your developers, testers, and analysts without the requirement to employ a cloud solution too?

Many of the sites returned in this keyword search appear more like external data feed services versus providing your intellectual capital as a service. That’s not to say this isn’t classified as an implementation of data virtualization, it is. Oracle is one example of this type of service where they not only provide data but advanced services as well.

Copy Data Management    Screen Shot 2016-08-02 at 11.12.16 AM

Many of the CIOs I’ve spoken to are quick to point out that they want less copies of data to manage and this phrase, initially, conjures up the wrong vision. That said, an article written by Brien Posey on TechTarget cites – Copy data management seeks to reduce the number of copies to two — the primary data and the backup copy. When additional data copies are required, an underlying snapshot mechanism is used to create a virtual copy of the data – which is close to what executives desire.

Search results point to backup, DR, data protection, and snapshot copies for use with dev, test, sandboxing, etc. The good news is, it’s your data. The shortcoming is employing a snapshot capability. Snapshots are equivalent to a photograph, a single point-in-time image of the data set. In some cases, this is perfectly fine. For many others like agile development, aggressive iteration testing, comparative testing, or Test Data Management (TDM) one would need to take frequent snapshots to accommodate the granularity the users frequently request. To service this request would require a skilled technologist and a delay in receiving the new data.

Mapping search results to capabilities… 

Capabilities that executives identify as value in finding a data virtualization solution include:

  • Dramatically reducing the storage footprint of multiple non-production copies without the complexities, limitations, or resources required to manage the data sets (data virtualization)
  • Delivering the right data, to the right teams, at the right time in minutes (data on demand)
  • Delivering your data across the enterprise to the teams that need it most whether on premise or via cloud services through a self-service access point (data as a service)
  • Simplified management of fresh, masked, full sets of data that are protected and available at specific points in time (copy data management)

Combine these capabilities with self-service delivery of masked data to the consumer and we’ve just defined the criteria for Finding a Data Virtualization Solution.

Delphix Data Virtualization

The Delphix Virtualization Engine is a software solution that was designed from a clean sheet of paper to address the massive shortcoming in delivering your data fast, efficient, and securely to consumers using your existing platforms.

Delphix provides Data Virtualization by delivering instant, secure, full read/writable copies of databases and files to the people who need it, when they need it. For applications that rely on multiple heterogenous databases, Delphix can deliver those databases at the exact point-in-time necessary to achieve integration testing. Simplistic views are achieved by providing end users with their own sets of data which they can change, bookmark, refresh, reset, and share with other teams versus receiving an access-only version.

Delphix provides Data on Demand for your data through an easy to use self-service interface. Users can manage their copies of databases much the same way that developers manage source code via version control. Imagine the ability to provide unlimited versions of data sets to your developers, testers, and analysts with the ability to switch between the copies in a matter of minutes while requiring no overhead in additional storage.

Delphix provides data as a service by eliminating your existing process of opening a ticket to request new data, awaiting approval by management, and coordination of backup admins, DBAs, and storage admins to facilitate the request. Data sets can be easily accessed by the consumer either on premise, or in the cloud, from their desktop.

Delphix provides copy data management by its inherent continuous data protection capability of staying in-sync with all changes to both the production source and virtual copies. Contrary to snapshots (which are similar to photographs) the Delphix Virtualization Engine is analogous to streaming video whereby all points in time are captured and accessible to create an exact replica of the data set from any point in time, securely, in a matter of minutes.

The Delphix Virtualization Engine is the single solution available today to address your data virtualization needs by unlocking the barriers to accessing your data sets fast and securely.

Below is a 2-minute overview which highlights the power of the Delphix solution.

Data First Strategy

 If data are the jewels of the company, then companies are handcuffed to their treasure.

The industry is long overdue for a disruptive correction to the biggest problem facing companies today – being data constrained.  Specifically, the long ‘wait’ times for delivering databases and datasets to your application developers, DBAs, testers, and analysts that are current, consistent, and secure. Data management in the pre-production space can no longer be treated by CIOs as just one of the components in their IT strategy if they expect to remain relevant competing on business agility, customer affinity, and operational excellence. Data must be considered the first priority around which your enterprise IT strategy is built. Today’s brittle IT infrastructures are incapable of handling the current data demands of the business and this has a tremendous impact on cost. How much cost? – Howard Rubin cites in his paper titled Technology Economics: The “Cost of Data”  –   … did you know that overall 92% of the cost of business — the financial services business — is ‘data’? According to Rubin, The next breakthroughs in the cost structure of the banking and financial services technology economic will likely come about through a focus on the efficiencies of data.  So, it’s not surprising that according to a recent news report by Gartner, By 2015, 25% of Large Global Organizations Will Have Appointed Chief Data Officers (CDO). But how will CDOs address the data constraint head on? – The correction required by organizations will be a Data First Strategy.

Root of the problem?

The industry has provided solutions to virtualize and automate everything in the data center. Well, almost everything, except virtualizing the data (data = databases and files.) Are you engaged in some of the latest trends like Agile Development, Cloud, or DevOps? Responsible for assuring data governance and compliancy? If so, are you able to deliver datasets, regardless of size, securely and in a matter of minutes to these teams? Can you provide unlimited copies for comparison or regression testing? Offer inherent continuous data protection so datasets can be reset or rewound to a previous point in time? The answer is typically no, and thus the reason that a Data First Strategy has been impossible, until now.

What’s needed?

The postal service has matured over time from the pony express to locomotives, automobiles, and airplanes to expedite package delivery. In much the same way postal depots were established to optimize delivery routes, so is the need to plan where Corporate data is needed to expedite not only the delivery of data, but also the services dependent on this data. A Data First Strategy is a paradigm shift in the way an organization’s services are created, built, and managed to deliver the right data, to the right teams, at the right time. This strategy relies heavily on the ability to deliver full datasets as fast as other virtualization technologies can deliver their service, typically in minutes. There are two tenets that define a Data First Strategy:

  1. Prioritizing your data first in your architectural design (business, hardware, and software)
    1. Focus on consumers, data center location, services, SLAs, security
  2. Prioritizing your data first in value to the company
    1. Focus on monetization, management, governance, compliance, collaboration, and acquisition of Corporate data

The subtly here is, if you could draw a box in your architecture design to depict immediate delivery and access to your data, then you could remove the ‘wait’ times that impede time to market, provide immediate access to audit data, and instant access to virtually unlimited copies to everyone.

Achieving a Data First Strategy

Possible with Delphix Agile Data Management

Delphix Agile Data Management unlocks your data by virtualizing your Corporate datasets and expediting their delivery to the teams who need them. The datasets are stored in a highly optimized, single set of common blocks, which can be used to securely create, refresh, and rewind copies of any size in a matter of minutes. To the end user the virtualized copies look and respond as full size datasets and can be managed by these teams through an easy-to-use self-service GUI, further eliminating ‘wait’ times. Some of the immediate capabilities that can be realized include, but are not limited to:

  • Immediate access to point-in-time datasets; delivered in minutes versus day/weeks, consider:
    • audit data access
    • unlimited ‘what if’ testing scenarios
    • root cause analysis of production data issues without impact to production resources
    • unlimited copies for regression testing
    • test data management
    • training facilities
  • Automated data masking to ensure immediate protection of customer data; onshore and offshore
  • Eliminate storage vendor lock-in to manage your pre-production environment
  • Enable seamless data movement across heterogenous storage arrays
    • a mandatory requirement for Cloud and data center migrations
  • Prepare for Cloud enablement by providing a hardware agnostic medium for transferring data efficiently and securely to and from a cloud service
    • includes public, private, and hybrid cloud environments.
  • Leverage a virtualized data copy for Immediate recovery for Disaster Recovery
  • Continuous Data Protection (CDP) for Source and Virtual Data Sets
    • tracks all changes as they occur

Customers have realized the following benefits:  reduced time to market for customer facing services by 60%, provide a new or refreshed data copy 99% faster, increase the number of pre-production data copies while reducing the storage required by 97%.

Closing observation

Data is the constraint in every organization. CIOs and CDOs will need to adopt and adapt a new approach to data delivery within their organization if they expect to achieve the full benefits from initiatives like Agile Development, DevOps, and Cloud. A Data First Strategy is the next step, but can only be achieved by data virtualization. Without the ability to deliver the right data, to the right teams, at the right time organizations will continue to struggle with compliancy, application and service quality, and escalating project costs.