Why Microsoft Fabric Dataflow Loads Partial Data When Triggered from Pipeline (But Works on Rerun)

Introduction

If you are working with Microsoft Fabric Dataflow Gen2 triggered through a Pipeline, you might encounter a confusing scenario:

The first pipeline run loads only partial data
The Dataflow shows success
There are no errors
When you rerun the Dataflow (manually or via pipeline), it loads complete data correctly

This behavior can be frustrating because everything appears successful, yet the results are inconsistent.

In this blog, we’ll break down:

Why this happens
The role of Metadata Sync (MD Sync)
How Fabric architecture contributes to this behavior
Recommended best practices to avoid it

The Real Root Cause: Metadata Synchronization (MD Sync)

In Microsoft Fabric, when a Dataflow writes data to a Lakehouse table, multiple internal components must synchronize:

OneLake storage (Delta files)
Lakehouse metadata catalog
SQL Analytics Endpoint
Dataflow and Pipeline runtime engines

This synchronization process is called Metadata Sync (MD Sync).

The key point is:

Metadata synchronization is not always instantaneous.

So when your pipeline immediately performs another operation (like reading from the table or moving data forward), it may access an older metadata snapshot.

Why First Run Loads Partial Data

Here is what typically happens behind the scenes:

Step-by-Step Timeline

Pipeline activity writes data into Lakehouse tables
Pipeline proceeds to next step immediately
Metadata sync is still in progress
Next activity reads incomplete table state
Pipeline finishes successfully with partial data

Later…

Metadata sync completes in the background
Rerun happens
Full data becomes visible

This is why reruns magically “fix” the problem.

Why There Are No Errors

Fabric considers the operation successful because:

Data was written correctly
Pipeline executed successfully
No system failures occurred

The issue is timing, not correctness.

This is an example of eventual consistency behavior in distributed data platforms.

How to Confirm This Is Your Issue

You are likely facing MD sync delay if:

✅ First pipeline run loads partial rows
✅ Rerun loads full rows
✅ Manual Dataflow execution works fine
✅ No failures appear in logs
✅ Source data volume is correct

Architecture Insight: Why Fabric Behaves This Way

Microsoft Fabric is built on distributed cloud components designed for scalability and performance.

Because of this:

Storage layer and compute layer operate independently
Metadata propagation is asynchronous
Short delays are expected under heavy workloads

This is normal behavior in modern lakehouse architectures.

Production Design Recommendation

For enterprise-grade reliability:

Always include a stabilization step after Dataflow writes before consuming Lakehouse tables.

This prevents intermittent partial data issues and improves pipeline predictability.

Key Takeaway

If your Fabric Dataflow:

Loads partial data only on the first pipeline run
Works perfectly on rerun

The most likely cause is Metadata Sync Delay (MD Sync).

It is not a bug in your logic, it is a timing behavior in the platform.

Final Thoughts

Understanding Fabric’s internal behavior helps avoid confusion and build more reliable data pipelines.

Small architectural adjustments like adding wait or validation steps can eliminate these issues completely.

If you found this helpful, consider sharing it with your team or community to help others troubleshoot similar Fabric scenarios.

Introduction

The Real Root Cause: Metadata Synchronization (MD Sync)

Why First Run Loads Partial Data

Why There Are No Errors

How to Confirm This Is Your Issue

Recommended Solutions (Best Practices)

Architecture Insight: Why Fabric Behaves This Way

Production Design Recommendation

Key Takeaway

Final Thoughts

Related articles

Delaying Pipeline Execution Until a Specific Time in Fabric Data Pipeline by Implementing a Dynamic Wait Mechanism

Data Migration: SSIS to Fabric Warehouse Using API as Source

How Microsoft Fabric Helps You Make Data-Driven Decisions Without the Tech Jargon