WavebreakmediaMicro - Fotolia


Insight on data masking tools

Private and non-public data must not be compromised during testing. Learn about data masking and the vendors that offer data masking tools.

Testers today face challenges from disruptive technologies such as cloud, mobile devices and big data that have taken testing to a new level of complexity. At the same time, testers are challenged by the "need for speed" as Agile methodologies evolve into continuous delivery and continuous deployment. It can be tempting to use production data to speed up the test process. Ironically, those very same technologies make this practice increasingly more dangerous. If production data is also privacy-protected data, using it in testing may be illegal. At the very least, it opens up the data to compromise.

Testers must collaborate with security professionals to develop a test data privacy approach. Usually, such an approach is based on data masking, which involves changing or obfuscating personal and non-public information. Data masking does not prevent access to the data; it only makes private data unrecognizable. Data masking can be accomplished by several methods, depending upon the complexity required. These range from simply blanking out the data, to replacing it with more generic data, to using algorithms to scramble the data. The challenge of data masking is that the data not only has to be unrecognizable, but also still useful for testing.

Data masking comes in two main forms: static and dynamic. For testing purposes, the usual approach is static data masking, where the data is masked before it's loaded into the test environment. With this approach, a new database is created (which is especially important when testing is outsourced). However, the database may not contain the same data or data in the same state as the actual database, which is very important in testing.

With dynamic data masking, production data is masked in real time as users request the data. The main advantage of this approach is that even users authorized to access the production database never see the private or non-public data. Furthermore, dynamic data masking can be user role-specific; the data that is masked depends upon the entitlements of the user requesting the data.

A review of automated data masking tools

Automated software tools are required to mask data efficiently and effectively. When evaluating data masking tools, it is important to consider the following attributes. Most importantly, the tool should mask the data so the masking cannot be reversed; however, the data must remain realistic enough for testing. Ideally, the tool should provide both static and dynamic data masking functionality, and possibly, data redaction, a technique used for data masking in PDFs, spreadsheets and documents. The extract, transform, load (ETL) process used for static data masking must be secure because the data will be extracted in its unmasked state. Also, the tool should mask data for distributed platforms, including cloud.

Here is a brief look at a variety of vendors in this arena. As with any tool evaluation, organizations must consider their own specific needs when choosing a vendor.

According to Gartner's Magic Quadrant, IBM, Oracle and Infomatica are the market leaders in data masking for privacy purposes. Each offers static and dynamic data masking, as well as data redaction. IBM offers integration with its Rational Suite. Oracle offers an API tool for data redaction and provides templates for Oracle eBusiness Suite and Oracle Fusion. Both IBM and Oracle products are priced relatively high compared to other vendors.

Infomatica offers data redaction for many types of files and is a top player in dynamic data masking for big data. It offers dynamic data masking for Hadoop, Cloudera, Hortonworks and MapR. Infomatica's product is integrated with PowerCenter and its application information lifecycle management (ILM), which makes it a good choice for organizations that use those products.

Mentis offers a suite of products for static and dynamic data masking and data redaction, as well as data access monitoring and data intrusion prevention at a reasonable cost. One of the most exciting features of these products is usability; not only are templates available for several vendor packages, including Oracle eBusiness and PeopleSoft, but also the user interface is designed for use by the business as well as IT. Mentis was rated as a "challenger" by Gartner in 2013.

One of the least expensive products on the market, Net 2000, offers usability as its main feature. Net 2000 provides only static data masking for Oracle and SQL servers. Net 2000 was rated as a "niche" player by Gartner in 2013. This tool is a good choice for a small organization with a simple environment.

Data privacy is one of the most important issues facing test managers and testers today. Private and non-public data must not be compromised during testing; therefore, an understanding of data masking methodologies, approaches and tools is critical to effective testing and test management.

Dig Deeper on Topics Archive