Delphix & the EU General Data Protection Regulation (GDPR)


GDPR Observations

Last week I was reading a CMS Wire article by Brian Wallace, titled Who’s Ready for the GDPR? [Infographic], and found a few of the data points cited, eye catching.

On May 25, 2018 the GDPR goes into effect and according to the embedded infographic…

Note: direct quotes from the infographic are cited in italic.
  1. The GDPR requires that all EU Citizen data [i.e. Sensitive & Personal] be protected as stipulated in the final text of the regulation, even if the data lives outside of the EU.
    • Sensitive data: Name, location, identification numbers, IP address, cookies, RFID info
    • Sensitive personal data: Health data, genetic data, biometric data, racial or ethnic data, political opinions, sexual orientation
  2. 92% of U.S. businesses list GDPR as a top data protection priority
    • 77% of U.S. businesses have started preparing for GDPR, but only 6% are GDPR-ready
      • The low readiness percentage is consistent with my experience working alongside data owners at major U.S. corporations
  3. In addition to protecting EU Citizen’s data, there are other services a custodian of their data must provide. Some of these include:
    • EU citizens have the right to access their data as well as information about how it is being used 

    • EU citizens can take their data to a different agency upon request 

    • EU citizens have the right to data erasure 

    • Certain companies and governmental organizations must appoint a Data Protection Officer
    • Companies must implement reasonable data protection measures 

    • Companies must assess for threats
  4. Noncompliance with the GDPR will be costly. Top tier fines are set at €20 million or 4 percent of global annual turnover, whichever is greater

What are the challenges specific to a Data Protection Officer?

The challenges are the same challenges faced by CIOs and CDOs in major corporations today;  to secure sensitive and personal data while delivering copies to developers, testers, and analysts in effort to compete at the speed required in the Digital Age. Corporate metrics used to measure success also remain the same – increase revenue, reduce costs, and stay compliant. Sounds reasonable until you evaluate the application services your organization provides and realize your data is heavy and the anchor by which all other tasks in your process workflow wait.

What I consistently hear from clients include, but are not limited to:

  1. Slow, complex, masking process workflows which require teams of programmers to maintain the code
    • Integration testing (i.e. maintaining referential integrity across multiple, disparate, databases) adds substantial complexity
    • No concept of a masked master data set where copies can be quickly created
  2.  More than one masking toolset and/or process workflow; requires multiple skillsets and teams
  3. Masked data does not have realistic values substituted into the fields
  4. Too few copies of data sets to developers, testers, and analysts means sharing
    • a corruption introduced by one individual stops everyone from working
  5. Too many copies of the data sets requires a significant amount of storage and time to refresh and manage multiple copies
    • physically impossible to accommodate due to limited capital resources
  6. Teams that subset data to deliver copies faster and reduce storage are simply pushing the problem downstream
    • Developers cannot test end-to-end processes
      • Too often issues are only exposed in production
    • Testers are limited to a small set of test cases
      • Too often defects are found later in QA
    • QA, as told to me by every CIO I speak to, bears the brunt of performing the lions share of testing, meaning
      • issues and defects which should be found in Dev and Test are found in QA, typically due to providing stale and/or subset data to Dev and Test
      • the goal for these CIOs is to shift left their testing process workflows so QA can focus on a finite set of product quality testing

The Delphix Masking Engine

Addressing #1-3 (above)

Contrary to most of the masking solutions in the industry today, which are complex, require programmers, and are difficult to manage when changes occur to the data sets, Delphix Masking provides a GUI-based software solution. There are (3) powerful and easy to use components which simplify the core capabilities of an enterprise class masking tool.

  1. Profile – scan the selected data sets, identify sensitive data, and return a report of elements found along with recommended masking algorithms.
  2. Secure (Mask) – Apply the assigned masking algorithms to their respective elements while maintaining referential integrity; no programming required. Elements will be masked with fictitious, but realistic data substitutions. Once the algorithms are assigned the masking will be consistent and repeatable.
  3. Audit – To ease the demands of maintaining compliancy, Delphix provides a report that identifies which sensitive data elements have been protected thus simplying delivery to auditors. Audit will also alert admins if new data fields are added which introduce new vulnerabilities.

Screen Shot 2017-06-20 at 4.24.02 PM

The Delphix Data Virtualization Engine

Addressing #4-6 (above)

Data virtualization is the complimentary capability to masking. Protecting data at rest and in use with masking accommodates regulatory requirements but does nothing to enable your business to ‘go faster’. Why? Because data is still heavy and slow. Delphix Data Virtualization addresses the demands of your business by making data lightweight. What if…

  • you can have a full size, secure (masked), read-writable copy of any size database in minutes?
  • have as many copies of that database as you want/need without additional storage costs?
  •  provide developers, testers, and analysts with self-service access to their database (or files) and include the ability to:
    • reset, rewind, or refresh their database w/o opening a ticket
    • bookmark copies of their database for future reference or share with other teams
    • version control data like teams do for source code

Well, those are not ‘what if’ scenarios but real capabilities found in the Delphix Data Virtualization Engine. The (3) areas that define how Delphix manages the data virtualization workflow are:

  1. Collect – Delphix attaches to data sources (Databases and Applications) using native protocols to the platform.
  2. Control – By maintaining a unique set of common blocks in Delphix, users experience a 90% savings in non-prod storage. Leveraging the TimeFlow retention log users can provision copies from any point in time; masked master copies can be created from which all other copies can be created in minutes with certainty where and how the data was protected and distributed.
  3. Consume – Developers, testers, and analysts can refresh, rewind,  restore, bookmark, and share their database(s) and application(s) from any point in time in a matter of minutes versus the hour, days, and weeks required today.

Delphix Virtualization & Masking Engine(s)


GDPR will commence on May 25, 2018 and with it bringing hefty penalties for non-compliance. The level of effort and the impact to every organization in massive in scope and a distraction from day-to-day development and maintenance of your business services. Delphix provides an enterprise class solution to accommodate the protection of sensitive and personal data through an easy to use, but very powerful, masking solution. Combining masking with data virtualization enables businesses to continue to work securely on business services while adapting new workflow processes to address GDPR.

For more information to learn if Delphix is the right solution for you, please Contact Us.


Finding a Data Virtualization Solution

proprietary eponym
brand name or trademark of a successful product, that has come into general use to refer to the generic class of objects rather than the specific brand type, without the exclusive rights to said product being lost by the parent company. For example, kleenex is used to describe many types of facial tissue.
                         DELPHIX may become the next proprietary eponym for Data Virtualization. 

What is Data Virtualization?

In order to find a data virtualization solution we need to understand its meaning. In doing so we quickly realize that data virtualization is a very broad term that’s loosely defined. As such it’s a solution space that’s still in its infancy, changing with the advent of new technologies. So before we can find a solution, we need to define the criteria in which we are searching.

Data are the quantities, characters, or symbols stored for computer processing and calculation. Virtualization makes a single physical instance of something appear as several instances in the same footprint (space) as the original physical instance. Data Virtualization is the representation of a single physical instance of data (e.g. a 5 TB Oracle 11g DB) and presenting several , say 10, usable instances in the space of the original physical instance (in this example, 50 TBs in the space of  5 TB) which appear as 10 full physical instances to the end user.

However, the explanation above is only one interpretation.

Wikipedia defines this term as any approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.

Screen Shot 2016-08-02 at 11.01.33 AM

Search results include references to data integration, data federation, and simplified/consolidated or integrated views. Simplified views represents an abstract layer that sits in front of multiple data sources which are presented to the consumer as a single source. Indeed a data virtualization technique with several products on the market including Cisco Data Virtualization.

In addition to Data Virtualization, other common Google search terms include Data on Demand, Data as a Service (DaaS), and Copy Data Management. These terms are certainly types of data virtualization which provide value to an organization but some fall short in addressing your data. A bigger problem is that each of these data virtualization types imply the need for disparate solution sets. Ideally, executives seek to so rationalize their solutions, not expand them.

Data on Demand    Screen Shot 2016-08-02 at 11.09.11 AM

A search returns many resources offering historical and statistical data from data repositories across many public and private industry verticals.  One example is NASDAQ which offers a service that provides for-fee data sets in different consumable formats. These services do provide a virtual-like service, but none of them {from what I observed} offer a solution to provide your data on demand.

Data as a Service (DaaS)    Screen Shot 2016-08-02 at 11.11.10 AM

As the name implies, *as-a-service is associated with compute clouds. Dataversity explains … DaaS model is all about offloading the risks and burdens of Data Management to a third-party Cloud-based provider. But wouldn’t it be ideal to have your data securely delivered as a service to your developers, testers, and analysts without the requirement to employ a cloud solution too?

Many of the sites returned in this keyword search appear more like external data feed services versus providing your intellectual capital as a service. That’s not to say this isn’t classified as an implementation of data virtualization, it is. Oracle is one example of this type of service where they not only provide data but advanced services as well.

Copy Data Management    Screen Shot 2016-08-02 at 11.12.16 AM

Many of the CIOs I’ve spoken to are quick to point out that they want less copies of data to manage and this phrase, initially, conjures up the wrong vision. That said, an article written by Brien Posey on TechTarget cites – Copy data management seeks to reduce the number of copies to two — the primary data and the backup copy. When additional data copies are required, an underlying snapshot mechanism is used to create a virtual copy of the data – which is close to what executives desire.

Search results point to backup, DR, data protection, and snapshot copies for use with dev, test, sandboxing, etc. The good news is, it’s your data. The shortcoming is employing a snapshot capability. Snapshots are equivalent to a photograph, a single point-in-time image of the data set. In some cases, this is perfectly fine. For many others like agile development, aggressive iteration testing, comparative testing, or Test Data Management (TDM) one would need to take frequent snapshots to accommodate the granularity the users frequently request. To service this request would require a skilled technologist and a delay in receiving the new data.

Mapping search results to capabilities… 

Capabilities that executives identify as value in finding a data virtualization solution include:

  • Dramatically reducing the storage footprint of multiple non-production copies without the complexities, limitations, or resources required to manage the data sets (data virtualization)
  • Delivering the right data, to the right teams, at the right time in minutes (data on demand)
  • Delivering your data across the enterprise to the teams that need it most whether on premise or via cloud services through a self-service access point (data as a service)
  • Simplified management of fresh, masked, full sets of data that are protected and available at specific points in time (copy data management)

Combine these capabilities with self-service delivery of masked data to the consumer and we’ve just defined the criteria for Finding a Data Virtualization Solution.

Delphix Data Virtualization

The Delphix Virtualization Engine is a software solution that was designed from a clean sheet of paper to address the massive shortcoming in delivering your data fast, efficient, and securely to consumers using your existing platforms.

Delphix provides Data Virtualization by delivering instant, secure, full read/writable copies of databases and files to the people who need it, when they need it. For applications that rely on multiple heterogenous databases, Delphix can deliver those databases at the exact point-in-time necessary to achieve integration testing. Simplistic views are achieved by providing end users with their own sets of data which they can change, bookmark, refresh, reset, and share with other teams versus receiving an access-only version.

Delphix provides Data on Demand for your data through an easy to use self-service interface. Users can manage their copies of databases much the same way that developers manage source code via version control. Imagine the ability to provide unlimited versions of data sets to your developers, testers, and analysts with the ability to switch between the copies in a matter of minutes while requiring no overhead in additional storage.

Delphix provides data as a service by eliminating your existing process of opening a ticket to request new data, awaiting approval by management, and coordination of backup admins, DBAs, and storage admins to facilitate the request. Data sets can be easily accessed by the consumer either on premise, or in the cloud, from their desktop.

Delphix provides copy data management by its inherent continuous data protection capability of staying in-sync with all changes to both the production source and virtual copies. Contrary to snapshots (which are similar to photographs) the Delphix Virtualization Engine is analogous to streaming video whereby all points in time are captured and accessible to create an exact replica of the data set from any point in time, securely, in a matter of minutes.

The Delphix Virtualization Engine is the single solution available today to address your data virtualization needs by unlocking the barriers to accessing your data sets fast and securely.

Below is a 2-minute overview which highlights the power of the Delphix solution.

Why I left Oracle for Delphix?

I recently bumped into a colleague of mine from Sun Microsystems in New York City. Although it’s been 18 months since I left Oracle, post-Sun acquisition in 2010, he was shocked. We had met when I started at Sun in 1997 and, since the Financial melt down in 2008, we assumed roles and trained as Enterprise Architects (EA); think TOGAF, ITIL, OEA, Business Value, Operations Management Capabilities Model, etc. versus the Java definition of EA. We were focused on critical business problems at the executive level. My management was very supportive in providing all necessary resource to ensure success, I was frequently asked to facilitate industry talks (Engineered Systems, Cloud, Data Center consolidation, and Virtualization, to name a few), and the compensation plan is excellent. I wasn’t looking for a new career.

So he asks me, Why the [explicative] would you walk away from a secure job to chase a startup?

So I asked him, What’s the biggest, unsolved problem in IT today? – Being data constrained, right? And isn’t data the ‘jewels’ of every company? It’s great we have all these industry driven buzzword-initiatives like Virtualization, DevOps, Agile Development, Cloud (Public, Private, Hybrid, *aaS), Big Data, and even Engineered Systems. But at the end of the day, no matter how fast you can stand-up infrastructure, you still can’t provide the developers, analysts, DBAs, and testers with a copy, or multiple copies, of a gigabyte/terabyte database in minutes for pre-production environments; in most cases it takes days, weeks, etc. to provision or refresh. And, if you can deliver a fast copy, it’s typically a snapshot which begins to go stale the moment the end-user receives it.

Delphix’ Agile Data Management solves this data constraint for databases and application data by providing an optimized copy of the Source. The Delphix Engine efficiently manages all changes from the Source, and those applied to the full read-writable virtual copies, to provide continuous data protection (CDP) from which copies can be created, refreshed, and reset to any point in time in minutes.

So Delphix solves the biggest problem in the industry, which I found not only exciting but a significant change agent to assist customers realize the potential benefits of Cloud, Big Data, DevOps, etc. by eliminating the ‘wait time’ for data delivery and management. But what ultimately convinced me to leave a position of 16 years was the executive management team and rock star list of engineers who are at the core of this company. A cool product is one aspect, having a team that can execute makes all the difference. I was offered an opportunity to be part of this team and have no regrets.