Test data and file generation: desktop software or on-line service?

At the moment ant QA engineer or software developer has access to a lot of the data generation and data management tools. There are two common ways to generate the test data by third-party tool: install the software in own infrastructure (desktop or server) or use SAAS model (software as a service.

The third-party service has a few advantages for fast data with easy structure generation:

  • You have not to install it at all.
  • It is ready to use. You should not prepare or tune it.
  • Easy to use interface.
  • Most of these services are free.

It is a perfect way to create a small set of data without complex relationships or dependencies. You can make it in a few mouse clicks only.

However, the online services have some critical limitation by design. The most important of them is access to your local data sources, database schemas, files, spreadsheets, etc. For security reasons, you can't give access to your infrastructure for third-party web service.

Therefore, the SAAS solution can't analyze your database structure if your database is not cloud-based as well. You should define the target data set structure manually or export and import. Also, you can't use your Excel spreadsheet or Microsoft Access database as a source for a target column. Instead of, you have to use random data. It makes datasets less realistic.

The second limitation is internal structures and sources disclosure. The third party service will access to your database schema, your files with dictionaries, etc. It is critical for companies with strong policy and can block SAAS model usage.

The last important problem id a channel bandwidth. It is not important if you need to create 1M easy rows with 5-7 columns. But if you need 10 billion of rows with the complex structure the channel will be a bottleneck.

Due to the free model of distribution, most online data generation services do not provide you with quality technical support. On the other hand, all of them are easy to use for most popular cases.

The desktop solutions have no mentioned limitations. However, all of the modern tools have complex GUI. You need some time to understand the generation model and learn the tool. For example, some tools based on the database table list and do not allow you to create a few rules for the same table.

The second, you need some time to tune the tool for your environment: database connections, access rights to shared data sources and files, etc.

Most of powerful data generators are not a free or open source. The typical license cost is between 50-70 USD for the single database to 400-500 USD for universal full-featured tools per user.

So, we recommend to start from free online services and switch to the desktop product if SAAS model's limitations are critical for your projects.