Test Data Management. Part 1: goals and planning

The test data management is a set of procedures that allows the company to create better solutions using test data sets. We mean desktop or client-server software and database system as well as web-service or any other solution that deals with large scale data arrays.
The first and most important step of the process is a goals identification. The company should decide why test data sets or databases are important. Also, developers, QA engineers, and administrators have to describe how the test data helps to make software or hardware solutions better.

There are possible goals:

  • Software testing with realistic test data helps to find more problems in the code.
  • Huge data sets allow developers to analyze a server-side performance of the solution.
  • The data generation process can identify bottlenecks of the data loading.
  • The software execution against large-scale data arrays helps to analyze hardware loading: communication channels, storage system, CPU, memory usage, etc.

As you can see we have three group of goals:

  • Software under development testing.
  • Operating System and third party tools or components compatibility testing.
  • Hardware trials.

Depending on identified goals we can define a set of requirements for our test databases or arrays. For example, if we need to test standard components or server hardware the data realistic is not so important but data generation performance is critical to approach the server overloading state. At the other hand, for custom solution testing the data realistic is a critical property.

For complex cases, the team should create a few different data sets for the same object like a table, array or even a whole database. As example we can generate:

  • Large table with the same kind of data to test performance.
  • Table with maximum data variance to understand test coverage.
  • Table with incorrect or "broken" values to test algorithm exceptions.
  DTM Data Generator: test data set types

The time of data sets creation is an important property that can limit the test data generation process. The test data management team must keep in mind that data with complex internal relationships creation can take days and weeks if we need billions of data rows. They should make a choice: data complexity, data generation time or array size.

Another limitation is critical data protection. In most cases, the company can't use real customer's data in the test set. It is important to prevent personal data like SSN, card numbers or account state compromising or disclosing. The data masking (or scrambling) operation must be included in the test data generation process if customer's data used as a base.

The second step of the test data management is planning. The team creates a plan based on goals, available resources, and limitations.
The test data management plan should contain:

  • Detailed goals definition.
  • Data sets to be created.
  • Test data creation strategy: will company generate test data sets automatically or compile them manually. Also, the company can use third party tool or develop custom test data generation scripts or applications in-house.
  • Persons or company roles who will create, maintain and manage data sets.
  • Where data sets will be stored.
  • Test data security policy: who can access or modify test data sets.
  • Test data sets lifecycle: when and who must remove the test data after the end of use.
  DTM Data Generator: data generation process aspects

The next part of the article will disclose a few ideas about test data management process.