Microsoft Business Intelligence

Load pre staging data from .CSV, .TXT or any other flat file type

Version - SSIS 2013
Client based in - USA

The Challenge

Our customer basically receives the zip file(s) for the health care data from various medications units and organisations (who are their clients) to be loaded to their Pre staging databases. Each zip file has one or more CSV, TXT, or any other flat file type of source with very bulky data to be loaded to Pre staging database.

The Solution

We have developed a load package to accomplish this task.

The first 4 rows of each of this file contain the meta data details like the table structure, column name, data type, and size.
In some cases, only the column names are available and other details like data type and size are missing. The load package handles such cases by using default varchar(255) columns.

First they manually run the Build Pre Staging schema package that reads each of this file and creates SQL table in the specified database server, it’s a one time job.

Then, the load package runs on daily basis to load the data to these tables dynamically, it performs Upsert operation based on the given unique primary key.

Overall, it performs each following step,

  • It extracts the given zip file.
  • Process the extracted files whether it’s TXT or CSV or any other flat file source one by one in For each loop container, the file extension is configurable parameter.
  • It performs Upsert as the data is incremental.
  • After loading the data of all files, it archives the processed files.
  • The customer is using their own ETL framework, so during the package the details about the Batch, Batch set, task, package step, source count, destination count, updated count, inserted count, deleted count, etc. managed for the internal ETL Framework database.

It’s really a big package and many ifs and buts were elegantly handled in this package.
We also implemented parallelism into it to load the data very quickly into the Pre staging database.