As we already mentioned in the previous article DTM Flat File Generator offers two ways to import structure of the file: structure definition file and sample data file. In this article we dwells with the second way. This import method is applicable if the user already have small data file and want to create new data set with same file structure or extend the existing file.
Let's discuss information that can be extracted from sample file. It is obvious but we should say that the first row of the file will be used as description for columns names. The user can disable this behavior; F1, F2, ... Fn column names will be assigned in this case.
Structure Definition Import
The first kind of information that the program extracts is data type. It can find numeric (integer and non-integer values), date and time columns and mark all others as "string" data type. Also, the program can find and use GUID data type.
Next type of information is value range. For numeric, date and time values it is minimum and maximum for provided column. The generator will produce values in found range. Of course the user can extend or restrict it manually if necessary.
Well-known Column Name Import
Let's back to name of the column. For string columns DTM Flat File Generator tries to recognize more than 50 well known column names like 'URL', 'City', 'Industry' or 'e-mail' and assigns custom data generators for found columns.
The next option is dictionary recognition. The dictionary is small set of values that covers all possible cases. As example we can consider something like "gender" ('M' or 'F' only) or marital status ('married' or 'single').
Data Value Properties Recognition
For the string values the program recognizes and uses minimal and maximal length of sample data. Also, it can analyze typical string capitalization patterns like: all lower letters, all upper or starts with upper letter.
The program can identify, that the value in column is unique and provide the generator for column with required instruction.
Automatically incremental value is very popular way to make unique identifiers for database objects. The software can find this type of value sequences. It recognizes start value and step. Currently, only incremental sequences can be recognized. The decremental case will be supported in the future releases.
All described methods of data property recognition allows users to create data generation project with realistic data within a few seconds.