As Azure Data Factory (ADF) does not offer Excel as a sink option, an alternative method is required to copy Excel files using ADF pipelines.
Step 1: Utilize the “Get Metadata” activity to retrieve the list of files stored in the blob storage. Select the “Child Items” option from the field list to include all items within the specified folder.
Dataset: In this scenario, a binary type of dataset was utilized. However, it’s also feasible to use a delimited text dataset to fetch the list of files. The choice between datasets does not impact the functionality.
Step 2: Once the list of files is obtained, it needs to be filtered to include only Excel format files for copying. To achieve this, employ the Filter activity to Select the output of the “Get List of Files” activity (Get Metadata) as the input in the “Items” section, as illustrated in the image below. Then, specify the condition to check whether each file contains the “.xlsx” extension.
Condition: @contains(item().Name,'.xlsx')
Step 3: Once the list of filtered files is obtained, iterate through them using a For Each loop activity, and copy each file individually using a single Copy Data activity. Add “value” at the end of the expression, as the output of the filter is stored under the array named “value”.
Step 4: Inside the For Each loop, utilize a single Copy Data activity to copy files from the source to the destination. In the image below, take the output of the For Each loop to iterate through file names and pass each name to the source dataset to identify the file from the blob storage.
Source:
Source Dataset:
Sink: For the sink side, there’s no need to add parameters as files can be copied without specifying their names. The files will be automatically copied to the destination with the same names. However, if needed, the file name can be passed through parameters to the sink dataset, allowing for dynamic expression to change the name of the file.
Sink Dataset:
Pipeline Run: After completing the above steps, execute the pipeline. This process will retrieve a list of all file types but will specifically proceed with copying only the Excel files, maintaining their original format from the source.
Azure Blob Storage: In this case, there are only two files in “XLSX” format. As depicted in the image above, the copy data activity is executed twice to transfer these files.
Input Folder:
Output Folder:
To address the absence of Excel as a sink option in Azure Data Factory (ADF), we have demonstrated an alternative method for copying Excel files with any number of sheets while preserving their original format in the destination.
© All Rights Reserved. Inkey IT Solutions Pvt. Ltd. 2024
This article is great and is what I need. however, it seems the picture quality is not so good, not clear enough to figure out the exact process. If there are any reference to this topic? thanks!
Hello Chenghai,
Apologies for the image quality. We will take care of it in the future and also try to update this blog with better quality images. Will also get back to you with references as you requested.
Hello Chenghai,
I would like to inform you that we actually had done a Proof Of Concept(POC) about this so we don’t have any reference.