03-Flow Matching and Conditional Flow Matchings

本文最后更新于：June 11, 2025 am

The series of tutorial is based on Flow Matching Guide and Code

arXiv: 2412.06264
Thank you, META

Flow Matching Problem

Instead of learning the likelihood of the target like they did in flow models, we directly learn the ground truth velocity field $u_t$ , that is, find $u^t(\theta)$ generating $p_t$ , with $p_0=p$ and $p_1=q$ . In Flow Matching loss, we minimize the difference of the learned velocity field and the GT velocity field directly

\mathcal{L}(\theta)= \mathbb{E}_{X_t\sim p_t}D(u_t(X_t), u_t^\theta(X_t))

where $D$ is a dissimilarity measure between vectors, such as $l_2$ -norm $D(u,v)=\|u-v\|^2$ .

Data Dependencies

We know Flow Matching is actually learning the mapping from a source distribution to a target distribution, then the source and the target could be independent, or they originate from a joint distribution

(X_0, X_1) \sim \pi_{0,1}(X_0,X_1)

known as coupling. If they are independent, we would write $\pi_{0,1}(X_0,X_1)=p(X_0)q(X_1)$ . For example, in diffusion models, we generate images from a standard Gaussian noise, hence they are independent. But if we want to generate high resolution images from their low resolution counterpart (not conditioning), then they are coupling.

Designing Probability Path and the GT Velocity Field

In FM problem, we want to learn the GT velocity field, but we don’t have access to a tractable velocity field. Here, we start from building a conditional probability path to show that the problem simplifies through conditional strategy.

Consider a probability path $p_{t|1}$ conditioned on a single target sample $X_1=x_1$ , then the marginal probability path is

p_t(x) = \int p_{t|1}(x|x_1)q(x_1)dx_1 \tag{1}

Remember that we must satisfy $p_0=p$ and $p_1=q$ , which means we start from our source distribution and at the target distribution. These boundary conditions can be enforced by requiring the conditional probability paths to satisfy:

p_{0|1}(x|x_1)=\pi_{0|1}(x|x_1)\quad p_{1|1}(x|x_1)=\delta_{x_1}(x)

where $\pi_{0|1}(x_0|x_1)=\pi_{0,1}(x_0,x_1)/q(x_1)$ and $\delta_{x_1}$ is the delta measure centered at $x_1$ .

proof. From Equation (1), let $t=0$

\begin{aligned}p_0(x) &= \int p_{0|1}(x|x_1)q(x_1)dx_1 \\&=\int \pi_{0|1}(x|x_1)q(x_1)dx_1\\&=\pi_0(x)\end{aligned}

which is just our source distribution, and similarly let $t=1$

\begin{aligned}p_1(x) &= \int p_{1|1}(x|x_1)q(x_1)dx_1 \\&=\int \delta_{x_1}(x)q(x_1)dx_1\\&=q(x)\end{aligned}

Q.E.D.

If the source and the target are independent, then $p_{0|1}(x|x_1)=p(x)$ . For the delta measure $\delta_{x_1}(x)$ , since it doesn’t have a density, we should read it as

\int p_{t|1}(x|y)f(y)dy \rightarrow f(x)\quad t\rightarrow 1

for continuous functions $f$ . An example of conditional path satisfying the boundary condition is

\mathcal{N}(\cdot|tx_1,(1-t)^2I)\rightarrow \delta_{x_1}(\cdot)\quad t\rightarrow 1

As you can see $t=0$ we have our standard Gaussian distribution and at $t=1$ we obtain our target samples.

We now find our GT velocity field according to the conditional probability path.

We know that the Continuity Equation tells us, if the conditional velocity field generates the conditional probability path, then

\frac{d}{dt}p_{t|1}(x|x_1)=-\nabla\cdot (p_{t|1}(x|x_1)u_t(x|x_1))

then from Equation (1)

\begin{aligned} \frac{d}{dt}p_t(x) &= \int q(x_1)\frac{d}{dt}p_{t|1}(x|x_1)dx_1\\ &=-\nabla\cdot \int q(x_1)p_{t|1}(x|x_1)u_t(x|x_1) dx_1 \end{aligned}

Now, if the marginal velocity field generates the marginal probability path, the above equation must also be a Continuity Equation, compare the above equation to Continuity Equation, we can say

p_t(x)u_t(x) = \int q(x_1)p_{t|1}(x|x_1)u_t(x|x_1) dx_1

devide both sides by $p_t(x)$ yields

u_t(x) = \int u_t(x|x_1)\frac{p_{t|1}(x|x_1)q(x_1)}{p_t(x)}dx_1 \tag{2}

Recall Bayes trick

p(x|y) = \frac{p(x,y)}{p(y)} =\frac{p(y|x)p(x)}{p(y)}

we then can transform Equation (2) into

u_t(x) = \int u_t(x|x_1)p_{1|t}(x_1|x)dx_1 \tag{3}

So our GT velocity field is a weighted average of the conditional velocities $u_t(x|x_1)$ , with weights $p_{1|t}(x_1|x)$ representing the posterior probability of target samples $x_1$ given the current sample $x$ , we can also write Equation (3) in expectation form

u_t(x) = \mathbb{E}[u_t(X_t|X_1)|X_t=x]

General Conditioning

So far we are conditioning on $X_1$ which is samples from target distribution, however, we can actually condition on any arbitrary $Z\in \mathbb{R}^m$ with PDF $p_Z$ , and the marginal probability path becomes

p_t(x) = \int p_{t|Z}(x|z)p_Z(z)dz

which is generated by the following marginal velocity field

u_t(x) = \int u_t(x|z)p_{Z|t}(z|x)dz = \mathbb{E}[u_t(X_t|Z)|X_t=x]\tag{4}

We have an important theorem called marginalization trick, but before we introduce it, we have to make some assumptions

Assumption.

$p_{t|Z}(x|z)$ is $C^1([0,1)\times \mathbb{R}^d)$ and $u_t(x|z)$ is $C^1([0,1)\times \mathbb{R}^d, \mathbb{R}^d)$ as a function of $t,x$
$p_Z$ has bounded support, which means $p_Z(x)=0$ outside some bounded set in $\mathbb{R}^m$
$p_t(x)>0$ for all $x\in \mathbb{R}^d$ and $t\in [0,1)$

We have the following theorem

Theorem. Under the above assumption, if $u_t(x|z)$ is conditionally integrable and generates the conditional probability path $p_t(\cdot|z)$ , then the marginal velocity field $u_t$ generates the marginal probability path $p_t$ , for all $t\in[0,1)$

where the conditionally integrable means

\int_0^1 \int \int \|u_t(x|z)\|p_{t|Z}(x|Z)p_Z(x)dzdxdt <\infty

proof. To prove that the marginal velocity field generates the marginal probability path, we can utilize the Continuity Equation

\begin{aligned}\frac{d}{dt}p_t(x) &= \frac{d}{dt}\int p_{t|Z}(x|z)p_Z(x)dz \\&=\int \frac{d}{dt}p_{t|Z}(x|z)p_Z(x)dz \\&=\int -\nabla\cdot u_t(x|z)p_{t|Z}(x|z)p_Z(x)dz \\&=-\nabla\cdot \int u_t(x|z)p_{t|Z}(x|z)p_Z(x)dz \\&=-\nabla\cdot \int u_t(x|z)p_t(x)\frac{p_{t|Z}(x|z)p_Z(x)}{p_t(x)}dz\\&=-\nabla\cdot \int u_t(x|z)p_{Z|t}(z|x)p_t(x)dz\\&=-\nabla\cdot (u_t(x)p_t(x))\end{aligned}

We can further prove that $u_t$ is integrable

\begin{aligned}\int_0^1\int \|u_t(x)\|p_t(x)dxdt&=\int_0^1\int \left\|\int u_t(x|z)p_{Z|t}(z|x)dz \right\|p_t(x)dxdt\\&\leq \int_0^1\int \int \|u_t(x|z)\|p_{Z|t}(z|x)p_t(x)dzdxdt\\&=\int_0^1\int \int \|u_t(x|z)\|p_{Z|t}(z|x)p_t(x)dzdxdt\\&=\int_0^1\int \int \|u_t(x|z)\|\frac{p_{t|Z}(x|z)p_Z(z)}{p_t(x)}p_t(x)dzdxdt\\&=\int_0^1\int \int \|u_t(x|z)\|p_{t|Z}(x|z)p_Z(z)dzdxdt\\&<\infty\end{aligned}

Q.E.D.

Here, we actually made a loop in the proof, we found $u_t$ by continuity equation but here we prove it satisfies continuity equation. Actually, the marginal velocity field $u_t$ is constructed to be like Equation (3), and we found that this construction satisfies Continuity Equation.

So far we have designed our probability path as conditional probability path and the velocity field as conditional velocity field, we also established the relation between conditional path/field and the marginal path/field.

However, we still don’t have a tractable GT velocity field to learn, let’s continue by taking a closer look to the loss function.

Flow Matching Loss

We want a tractable loss function, and now the GT velocity field $u_t$ from Equation (3) is still intractable because we have to marginalize over the entire training set. Now, we introduce a family of loss functions known as Bregman divergences, and we will show that using them will provide unbiased gradients for $u_t^\theta(x)$ to learn $u_t(x)$ by learning the conditional counterpart $u_t^\theta (x|z)$ , and hence we don’t need the marginal field $u_t(x)$ anymore.

The definition of Bregman divergence is as follows

D(u,v):=\Phi(u) - [\Phi(v)+\langle u-v, \nabla\Phi(v)\rangle]

Actually, the squared Euclidean distance $D(u,v)=\|u-v\|^2$ is a Bregman divergence, let $\Phi(u)=\|u\|^2$ , we have

\begin{aligned} D(u,v) &=\|u\|^2-[\|v\|^2+\langle u-v, 2v\rangle]\\ &=\|u\|^2 - \|v\|^2 - 2\langle u,v\rangle+2\|v\|^2\\ &=u^Tu-2u^Tv+v^Tv\\ &=(u-v)^T(u-v)\\ &=\|u-v\|^2 \end{aligned}

A key property of Bregman divergence is that their gradient with respect to the second argument is affine invariant

\nabla_v D(au_1+bu_2,v) = a\nabla_v D(u_1,v)+b\nabla_v D(u_2,v)\quad a+b=1

The above property allows us to swap the expected values with gradients as follows

\nabla_vD(\mathbb{E}[Y],v)=\mathbb{E}[\nabla_v D(Y,v)]\tag{5}

similar to the affine invariant, where we can take the linear operation out of the gradient.

We now restate our two objective, the first is to learn the marginal velocity field

\mathcal{L}_{FM}(\theta) = \mathbb{E}_{t,X_t\sim p_t}D(u_t(X_t), u_t^\theta (X_t))

the second is to learn the conditional velocity field

\mathcal{L}_{CFM}(\theta) = \mathbb{E}_{t,Z,X_t\sim p_{t|Z}(\cdot|Z)}D(u_t(X_t|Z),u_t^\theta (X_t))

We will show the following theorem

Theorem

\nabla_\theta \mathcal{L}_{FM}(\theta) = \nabla_\theta \mathcal{L}_{CFM}(\theta)

So the minimizer of the conditional flow matching loss is the marginal velocity field $u_t$ , because they have the same gradient.

proof.

\begin{aligned}\nabla_\theta\mathcal{L}_{FM}(\theta) &= \nabla_\theta \mathbb{E}_{t,X_t\sim p_t}D(u_t(X_t), u_t^\theta(X_t))\\&=\mathbb{E}_{t,X_t\sim p_t}\nabla_\theta D(u_t(X_t), u_t^\theta(X_t))\\&=\mathbb{E}_{t,X_t\sim p_t}\nabla_{\color{green}{v}} D(u_t(X_t), u_t^\theta(X_t))\nabla_{\color{green}{\theta}} u_t^\theta (X_t)\\&\overset{(4)}{=}\mathbb{E}_{t,X_t\sim p_t}\nabla_{v} D(\mathbb{E}_{Z\sim p_{Z|t}(\cdot |X_t)}[u_t(X_t|Z)], u_t^\theta(X_t))\nabla_\theta u_t^\theta (X_t)\\&=\mathbb{E}_{t,X_t\sim p_t}\mathbb{E}_{Z\sim p_{Z|t}(\cdot |X_t)}\nabla_{v} D(u_t(X_t|Z), u_t^\theta(X_t))\nabla_\theta u_t^\theta (X_t)\\&=\mathbb{E}_{t,X_t\sim p_t}\mathbb{E}_{Z\sim p_{Z|t}(\cdot |X_t)}\nabla_{\color{green}{\theta}} D(u_t(X_t|Z), u_t^\theta(X_t))\\&=\nabla_{\theta}{\color{green}{\mathbb{E}_{t,Z\sim q,X_t\sim p_{t|Z}(\cdot|Z)}}} D(u_t(X_t|Z), u_t^\theta(X_t))\\&=\nabla_\theta \mathcal{L}_{CFM}(\theta)\end{aligned}

Q.E.D.

The above theorem actually has a more general form

Theorem. Let $X\in S_X$ , $Y\in S_Y$ be RVs over state spaces $S_X$ and $S_Y$ , we have a function $g^\theta(x):\mathbb{R}^p\times S_X\rightarrow \mathbb{R}^n$ , where $\theta\in \mathbb{R}^p$ is the learnable parameters.

Let $D_x(u,v)$ , $x\in S_X$ be a Bregman divergence over a convex set $\Omega \sub \mathbb{R}^n$ that contains the image of $g^\theta(x)$ , then

\nabla_\theta \mathbb{E}_{X,Y}D_X(Y,g^\theta (X)) = \nabla_\theta \mathbb{E}_XD_X(\mathbb{E}[Y|X], g^\theta (X))

and the global minimum of the above $g^\theta(x)$ satisfies

g^\theta(x) = \mathbb{E}[Y|X=x]

I don’t really understand the subscript $x$ in $D_x$ here, maybe they mean we can choose a divergence depends on $x$ and varies according to $x\sim X$ ? I don’t quite get it.

proof.

\begin{aligned}\nabla_\theta \mathbb{E}_{X,Y}D_X(Y,g^\theta (X)) &= \mathbb{E}_{X,Y}\nabla_{\color{green}{v}}D_X(Y,g^\theta (X))\nabla_{\color{green}{\theta}}g^\theta(X)\\&=\mathbb{E}_{X}[\mathbb{E}[\nabla_{v}D_X(Y,g^\theta(X))\nabla_{\theta}g^\theta(X)|X]]\\&\overset{(5)}{=}\mathbb{E}_{X}[\nabla_{v}D_X(\mathbb{E}[Y|X],g^\theta(X))\nabla_{\theta}g^\theta(X)]\\&=\mathbb{E}_{X}[\nabla_{\color{green}{\theta}}D_X(\mathbb{E}[Y|X],g^\theta(X))]\\&=\nabla_{\theta}\mathbb{E}_{X}D_X(\mathbb{E}[Y|X],g^\theta(X))\end{aligned}

Therefore, we can choose $g^\theta(x)=\mathbb{E}[Y|X=x]$ to obtain the global minimum

Q.E.D.

Solving Conditional Generation

We have now designed the conditional probability path, the conditional velocity field, the marginal probability path and the marginal velocity field, we also proved that we can learn conditional velocity field instead of the intractable marginal one. We now consider how to realize these things.

Here, we design conditional path and velocity fields via conditional flows, we define a flow model $X_{t|1}$ satisfying the boundary conditions. Once we defined the flow, we can get the velocity field via flow ODE.

If we define

X_{t|1} = \psi_t(X_0|x_1)\quad X_0\sim \pi_{0|1}(\cdot|x_1)

where $\psi:[0,1)\times \mathbb{R}^d\times \mathbb{R}^d\rightarrow \mathbb{R}^d$ is a conditional flow satisfying

\psi_t(x|x_1) = \begin{cases} x\quad t=0\\ x_1\quad t=1 \end{cases}\tag{6}

We then can obtain the conditioned probability path via push-forward formula

p_{t|1}(x|x_1):=[\psi_t(\cdot|x_1)_\sharp \pi_{0|1}(\cdot|x_1)](x)\tag{7}

which means we use the conditional flow we defined to push-forward the source distribution. And you will find that the probability path satisfies boundary conditions. At $t=0$ , the flow is identity map, so it is the source distribution directly. At $t=1$ , the flow is a constant map, it only provides us the target samples.

Recall the flow ODE

\frac{d}{dt}\psi_t(x) = u_t(\psi_t(x))

Let $x=\psi^{-1}(x')$

\frac{d}{dt}\psi_t(\psi^{-1}(x')) = u_t(x')

which gives us a way to extract the velocity field. Let’s denote $\dot \psi_t = \frac{d}{dt}\psi_t$ , then we can write the conditional version as

u_t(x|x_1)=\dot \psi_t(\psi_t^{-1}(x|x_1)|x_1) \tag{8}

Now we have shown that the conditional flow can give us the conditional probability path and the conditional velocity field, so we only need to build the conditional flow satisfying Equation (6).

Recalibrate our Flow Matching Loss

We know that we have defined our loss as learning the conditional velocity field

\mathcal{L}_{CFM}(\theta) = \mathbb{E}_{t,X_1,X_t\sim p_t(\cdot|X_1)}D(u_t(X_t|X_1), u^\theta_t(X_t))

From Equation (7) we have

u_t(X_t|X_1)=\dot\psi_t(\psi_t^{-1}(X_t|X_1)|X_1)

We know that by definition of the flow function we have

X_t = \psi_t(X_0|X_1)

then

\begin{aligned} u_t(X_t|X_1)&=\dot\psi_t(\psi_t^{-1}(X_t|X_1)|X_1)\\ &=\dot\psi_t(\psi_t^{-1}(\psi_t(X_0|X_1)|X_1)|X_1)\\ &=\dot\psi_t(X_0|X_1) \end{aligned}\tag{9}

Hence our loss function becomes

\mathcal{L}_{CFM}(\theta) = \mathbb{E}_{t,X_1,X_t\sim p_t(\cdot|X_1)}D(\dot\psi_t(X_0|X_1), u^\theta_t(X_t))

with the minimizer

u_t^\theta(x) = \mathbb{E}[\dot\psi_t(X_0|X_1)|X_t=x]\tag{10}

Recalibrate Marginalization Trick

We must again state our marginalization trick for conditional flows. Assume $u_t(x|x_1)$ is conditionally integrable, which means

\mathbb{E}_{t,(X_0,X_1)\sim \pi_{0,1}}\|\dot \psi_t(X_0|X_1)\|<\infty

we give the following corollary to the marginalization trick

Corollary. Assume that q has bounded support, $\pi_{0|1}(\cdot|x_1)$ us $C^1(\mathbb{R}^d)$ and strictly positive for some $x_1$ with $q(x_1)>0$ , and $\psi_t(x|x_1)$ is a conditional flow satisfying the boundary condition and the above integrable condition.

Then $p_{t|1}(x|x_1)$ and $u_t(x|x_1)$ defined (7) and (8), define a marginal velocity field $u_t(x)$ generating the marginal probability path $p_t(x)$ interpolating p and q.

We know that from the marginalization trick before, if conditional field is conditionally integrable and generates the conditional probability path, then the marginal field generates the marginal path. We already know that the conditional filed generates the conditional path (they are linked by flow), then we only need to prove the conditional field is integrable

proof.

\begin{aligned}\int_0^1 \int \|u_t(x|x_1)\|p_t(x)dx &= \int_0^1\int \|u_t(x|x_1)\|p_{t|1}(x|x_1)q(x_1)dx_1dxdt\\&=\mathbb{E}_{t,X_1\sim q,X_t\sim p_{t|1}(\cdot|X_1)}\|u_t(X_t|X_1)\| \\&\overset{(9)}{=}\mathbb{E}_{t,(X_0,X_1)\sim \pi_{0,1}} \|\dot \psi_t(X_0|X_1)\| \\&<\infty\end{aligned}

We then proved that the field is conditionally integrable, then the marginal field $u_t$ generates marginal path $p_t$

Q.E.D.

Optimal Transport and Linear Conditional Flow

We have shown in the previous section that the conditional path and field can be designed via conditional flow, we have one last question left: How do we find a useful conditional flow?

Here, we introduce the flow derived from the dynamic Optimal Transport(OT) problem.

(p_t^*, u_t^*) = \argmin_{p_t,u_t}\int_0^1\int \|u_t(x)\|^2p_t(x)dxdt

s.t.

p_0=p,p_1=q\quad \frac{d}{dt}p_t+\nabla\cdot(p_tu_t)=0

Together with the flow ODE, solving the OT problem gives us the flow with the form

\psi_t^*(x) = t\phi(x) + (1-t)x

called the OT displacement interpolant.

The above form also solves the Flow Matching problem

X_t = \psi_t^*(X_0)\sim p_t^*\quad X_0\sim p

we can see that the OT formulation promotes straight sample trajectories

X_t = \psi_t^*(X_0) = t\phi(X_0)+(1-t)X_0 = X_0+t(\phi(X_0)-X_0)

so the distribution at $t$ is just a displacement with the constant velocity $\phi(X_0)-X_0$ times $t$ . This form is also much more friendly to ODE solvers.

To find a specific $\phi$ , and hence a specific $\psi_t(x|x_1)$ , let’s solve the OT problem with the conditional velocity field in Equation (10)

\begin{aligned} \int_0^1\int \|u_t(x)\|^2p_t(x)dxdt&=\int_0^1 \mathbb{E}_{X_t\sim p_t}\|u_t(X_t)\|^2dt \\ &\overset{(10)}{=}\int_0^1 \mathbb{E}_{X_t\sim p_t}\|\mathbb{E}[\dot \psi_t(X_0|X_1)|X_t]\|^2dt\\ &\leq \int_0^1 \mathbb{E}_{X_t\sim p_t}\mathbb{E}[\|\dot \psi_t(X_0|X_1)\|^2|X_t]dt\\ &=\mathbb{E}_{(X_0,X_1)\sim \pi_{0,1}}\int_0^1 \|\dot\psi_t(X_0|X_1)\|^2dt \end{aligned}

The above problem involves minimizing the expectation, but now we can minimize individually for each $X_0$ and $X_1$ , let $\gamma_t=\psi_t(x|x_1)$ , we can focus on the integral itself

\min_{\gamma}\int_0^1\|\dot \gamma_t\|^2dt\quad \text{s.t.} \quad \gamma_0=x,\gamma_1=x_1

This is a variational problem and can be solved using Eular-Lagrange equations, we directly write the final answer, the conditional flow that comes from the OT problem is

\psi_t(x|x_1) = tx_1+(1-t)x\tag{11}

this is the minimizer of the above variational problem.

We can find some facts, first, the linear conditional flow in (11) minimizes a bound of the Kinetic Energy among all conditional flows.

Second, note that Equation (11) now gives us

X_t = \psi_t(X_0|X_1) = tX_1+(1-t)X_0

and rearrange it yields

X_0 = \frac{X_t-tX_1}{1-t}

if the target $q$ consists of a single data point $q(x)=\delta_{x_1}(\cdot)$ , then the above equation is

X_0 = \frac{X_t-tx_1}{1-t}

which means if you know $X_t$ , then $X_0$ is deterministic because there is no randomness in $x_1$ for Dirac measures. We will find that

\mathbb{E}[\dot\psi_t(X_0|x_1)|X_t] = \dot\psi_t(X_0|x_1) = x_1-X_0

since $X_t$ is given and $x_1$ is Dirac, and also $X_0$ is deterministic when $X_t$ is given, we can get rid of the expectation.

Theorem. If $q=\delta_{x_1}$ , then the dynamic OT problem has an analytic solution given by the OT displacement interpolant in (11).

We can use this linear conditional flow to construct our conditional velocity field and the conditional probability path, then our Flow Matching Loss. Refer to the 01-Overview of Flow Matching.

Visual Generation > Flow Matching

#Deep Learning #Generative Model #Flow Matching

03-Flow Matching and Conditional Flow Matchings

https://jesseprince.github.io/2025/06/09/visual_gen/flow_match/03_flow_match/

Author

林正

Posted on

June 9, 2025

Licensed under

01-Introduction Previous

02-Flow model, Everything Before Flow Matching Next