03-Flow Matching and Conditional Flow Matchings

本文最后更新于:June 11, 2025 am

The series of tutorial is based on Flow Matching Guide and Code

  • arXiv: 2412.06264
  • Thank you, META

Flow Matching Problem

Instead of learning the likelihood of the target like they did in flow models, we directly learn the ground truth velocity field utu_t, that is, find ut(θ)u^t(\theta) generating ptp_t, with p0=pp_0=p and p1=qp_1=q. In Flow Matching loss, we minimize the difference of the learned velocity field and the GT velocity field directly

L(θ)=EXtptD(ut(Xt),utθ(Xt))\mathcal{L}(\theta)= \mathbb{E}_{X_t\sim p_t}D(u_t(X_t), u_t^\theta(X_t))

where DD is a dissimilarity measure between vectors, such as l2l_2-norm D(u,v)=uv2D(u,v)=\|u-v\|^2.

Data Dependencies

We know Flow Matching is actually learning the mapping from a source distribution to a target distribution, then the source and the target could be independent, or they originate from a joint distribution

(X0,X1)π0,1(X0,X1)(X_0, X_1) \sim \pi_{0,1}(X_0,X_1)

known as coupling. If they are independent, we would write π0,1(X0,X1)=p(X0)q(X1)\pi_{0,1}(X_0,X_1)=p(X_0)q(X_1). For example, in diffusion models, we generate images from a standard Gaussian noise, hence they are independent. But if we want to generate high resolution images from their low resolution counterpart (not conditioning), then they are coupling.

Designing Probability Path and the GT Velocity Field

In FM problem, we want to learn the GT velocity field, but we don’t have access to a tractable velocity field. Here, we start from building a conditional probability path to show that the problem simplifies through conditional strategy.

Consider a probability path pt1p_{t|1} conditioned on a single target sample X1=x1X_1=x_1, then the marginal probability path is

pt(x)=pt1(xx1)q(x1)dx1(1)p_t(x) = \int p_{t|1}(x|x_1)q(x_1)dx_1 \tag{1}

Remember that we must satisfy p0=pp_0=p and p1=qp_1=q, which means we start from our source distribution and at the target distribution. These boundary conditions can be enforced by requiring the conditional probability paths to satisfy:

p01(xx1)=π01(xx1)p11(xx1)=δx1(x)p_{0|1}(x|x_1)=\pi_{0|1}(x|x_1)\quad p_{1|1}(x|x_1)=\delta_{x_1}(x)

where π01(x0x1)=π0,1(x0,x1)/q(x1)\pi_{0|1}(x_0|x_1)=\pi_{0,1}(x_0,x_1)/q(x_1) and δx1\delta_{x_1} is the delta measure centered at x1x_1.

proof. From Equation (1), let t=0t=0

p0(x)=p01(xx1)q(x1)dx1=π01(xx1)q(x1)dx1=π0(x)\begin{aligned}p_0(x) &= \int p_{0|1}(x|x_1)q(x_1)dx_1 \\&=\int \pi_{0|1}(x|x_1)q(x_1)dx_1\\&=\pi_0(x)\end{aligned}

which is just our source distribution, and similarly let t=1t=1

p1(x)=p11(xx1)q(x1)dx1=δx1(x)q(x1)dx1=q(x)\begin{aligned}p_1(x) &= \int p_{1|1}(x|x_1)q(x_1)dx_1 \\&=\int \delta_{x_1}(x)q(x_1)dx_1\\&=q(x)\end{aligned}

Q.E.D.

If the source and the target are independent, then p01(xx1)=p(x)p_{0|1}(x|x_1)=p(x). For the delta measure δx1(x)\delta_{x_1}(x), since it doesn’t have a density, we should read it as

pt1(xy)f(y)dyf(x)t1\int p_{t|1}(x|y)f(y)dy \rightarrow f(x)\quad t\rightarrow 1

for continuous functions ff. An example of conditional path satisfying the boundary condition is

N(tx1,(1t)2I)δx1()t1\mathcal{N}(\cdot|tx_1,(1-t)^2I)\rightarrow \delta_{x_1}(\cdot)\quad t\rightarrow 1

As you can see t=0t=0 we have our standard Gaussian distribution and at t=1t=1 we obtain our target samples.

We now find our GT velocity field according to the conditional probability path.

We know that the Continuity Equation tells us, if the conditional velocity field generates the conditional probability path, then

ddtpt1(xx1)=(pt1(xx1)ut(xx1))\frac{d}{dt}p_{t|1}(x|x_1)=-\nabla\cdot (p_{t|1}(x|x_1)u_t(x|x_1))

then from Equation (1)

ddtpt(x)=q(x1)ddtpt1(xx1)dx1=q(x1)pt1(xx1)ut(xx1)dx1\begin{aligned} \frac{d}{dt}p_t(x) &= \int q(x_1)\frac{d}{dt}p_{t|1}(x|x_1)dx_1\\ &=-\nabla\cdot \int q(x_1)p_{t|1}(x|x_1)u_t(x|x_1) dx_1 \end{aligned}

Now, if the marginal velocity field generates the marginal probability path, the above equation must also be a Continuity Equation, compare the above equation to Continuity Equation, we can say

pt(x)ut(x)=q(x1)pt1(xx1)ut(xx1)dx1p_t(x)u_t(x) = \int q(x_1)p_{t|1}(x|x_1)u_t(x|x_1) dx_1

devide both sides by pt(x)p_t(x) yields

ut(x)=ut(xx1)pt1(xx1)q(x1)pt(x)dx1(2)u_t(x) = \int u_t(x|x_1)\frac{p_{t|1}(x|x_1)q(x_1)}{p_t(x)}dx_1 \tag{2}

Recall Bayes trick

p(xy)=p(x,y)p(y)=p(yx)p(x)p(y)p(x|y) = \frac{p(x,y)}{p(y)} =\frac{p(y|x)p(x)}{p(y)}

we then can transform Equation (2) into

ut(x)=ut(xx1)p1t(x1x)dx1(3)u_t(x) = \int u_t(x|x_1)p_{1|t}(x_1|x)dx_1 \tag{3}

So our GT velocity field is a weighted average of the conditional velocities ut(xx1)u_t(x|x_1), with weights p1t(x1x)p_{1|t}(x_1|x) representing the posterior probability of target samples x1x_1 given the current sample xx, we can also write Equation (3) in expectation form

ut(x)=E[ut(XtX1)Xt=x]u_t(x) = \mathbb{E}[u_t(X_t|X_1)|X_t=x]

General Conditioning

So far we are conditioning on X1X_1 which is samples from target distribution, however, we can actually condition on any arbitrary ZRmZ\in \mathbb{R}^m with PDF pZp_Z, and the marginal probability path becomes

pt(x)=ptZ(xz)pZ(z)dzp_t(x) = \int p_{t|Z}(x|z)p_Z(z)dz

which is generated by the following marginal velocity field

ut(x)=ut(xz)pZt(zx)dz=E[ut(XtZ)Xt=x](4)u_t(x) = \int u_t(x|z)p_{Z|t}(z|x)dz = \mathbb{E}[u_t(X_t|Z)|X_t=x]\tag{4}

We have an important theorem called marginalization trick, but before we introduce it, we have to make some assumptions

Assumption.

  1. ptZ(xz)p_{t|Z}(x|z) is C1([0,1)×Rd)C^1([0,1)\times \mathbb{R}^d) and ut(xz)u_t(x|z) is C1([0,1)×Rd,Rd)C^1([0,1)\times \mathbb{R}^d, \mathbb{R}^d) as a function of t,xt,x
  2. pZp_Z has bounded support, which means pZ(x)=0p_Z(x)=0 outside some bounded set in Rm\mathbb{R}^m
  3. pt(x)>0p_t(x)>0 for all xRdx\in \mathbb{R}^d and t[0,1)t\in [0,1)

We have the following theorem

Theorem. Under the above assumption, if ut(xz)u_t(x|z) is conditionally integrable and generates the conditional probability path pt(z)p_t(\cdot|z), then the marginal velocity field utu_t generates the marginal probability path ptp_t, for all t[0,1)t\in[0,1)

where the conditionally integrable means

01ut(xz)ptZ(xZ)pZ(x)dzdxdt<\int_0^1 \int \int \|u_t(x|z)\|p_{t|Z}(x|Z)p_Z(x)dzdxdt <\infty

proof. To prove that the marginal velocity field generates the marginal probability path, we can utilize the Continuity Equation

ddtpt(x)=ddtptZ(xz)pZ(x)dz=ddtptZ(xz)pZ(x)dz=ut(xz)ptZ(xz)pZ(x)dz=ut(xz)ptZ(xz)pZ(x)dz=ut(xz)pt(x)ptZ(xz)pZ(x)pt(x)dz=ut(xz)pZt(zx)pt(x)dz=(ut(x)pt(x))\begin{aligned}\frac{d}{dt}p_t(x) &= \frac{d}{dt}\int p_{t|Z}(x|z)p_Z(x)dz \\&=\int \frac{d}{dt}p_{t|Z}(x|z)p_Z(x)dz \\&=\int -\nabla\cdot u_t(x|z)p_{t|Z}(x|z)p_Z(x)dz \\&=-\nabla\cdot \int u_t(x|z)p_{t|Z}(x|z)p_Z(x)dz \\&=-\nabla\cdot \int u_t(x|z)p_t(x)\frac{p_{t|Z}(x|z)p_Z(x)}{p_t(x)}dz\\&=-\nabla\cdot \int u_t(x|z)p_{Z|t}(z|x)p_t(x)dz\\&=-\nabla\cdot (u_t(x)p_t(x))\end{aligned}

We can further prove that utu_t is integrable

01ut(x)pt(x)dxdt=01ut(xz)pZt(zx)dzpt(x)dxdt01ut(xz)pZt(zx)pt(x)dzdxdt=01ut(xz)pZt(zx)pt(x)dzdxdt=01ut(xz)ptZ(xz)pZ(z)pt(x)pt(x)dzdxdt=01ut(xz)ptZ(xz)pZ(z)dzdxdt<\begin{aligned}\int_0^1\int \|u_t(x)\|p_t(x)dxdt&=\int_0^1\int \left\|\int u_t(x|z)p_{Z|t}(z|x)dz \right\|p_t(x)dxdt\\&\leq \int_0^1\int \int \|u_t(x|z)\|p_{Z|t}(z|x)p_t(x)dzdxdt\\&=\int_0^1\int \int \|u_t(x|z)\|p_{Z|t}(z|x)p_t(x)dzdxdt\\&=\int_0^1\int \int \|u_t(x|z)\|\frac{p_{t|Z}(x|z)p_Z(z)}{p_t(x)}p_t(x)dzdxdt\\&=\int_0^1\int \int \|u_t(x|z)\|p_{t|Z}(x|z)p_Z(z)dzdxdt\\&<\infty\end{aligned}

Q.E.D.

Here, we actually made a loop in the proof, we found utu_t by continuity equation but here we prove it satisfies continuity equation. Actually, the marginal velocity field utu_t is constructed to be like Equation (3), and we found that this construction satisfies Continuity Equation.

So far we have designed our probability path as conditional probability path and the velocity field as conditional velocity field, we also established the relation between conditional path/field and the marginal path/field.

However, we still don’t have a tractable GT velocity field to learn, let’s continue by taking a closer look to the loss function.

Flow Matching Loss

We want a tractable loss function, and now the GT velocity field utu_t from Equation (3) is still intractable because we have to marginalize over the entire training set. Now, we introduce a family of loss functions known as Bregman divergences, and we will show that using them will provide unbiased gradients for utθ(x)u_t^\theta(x) to learn ut(x)u_t(x) by learning the conditional counterpart utθ(xz)u_t^\theta (x|z), and hence we don’t need the marginal field ut(x)u_t(x) anymore.

The definition of Bregman divergence is as follows

D(u,v):=Φ(u)[Φ(v)+uv,Φ(v)]D(u,v):=\Phi(u) - [\Phi(v)+\langle u-v, \nabla\Phi(v)\rangle]

Actually, the squared Euclidean distance D(u,v)=uv2D(u,v)=\|u-v\|^2 is a Bregman divergence, let Φ(u)=u2\Phi(u)=\|u\|^2, we have

D(u,v)=u2[v2+uv,2v]=u2v22u,v+2v2=uTu2uTv+vTv=(uv)T(uv)=uv2\begin{aligned} D(u,v) &=\|u\|^2-[\|v\|^2+\langle u-v, 2v\rangle]\\ &=\|u\|^2 - \|v\|^2 - 2\langle u,v\rangle+2\|v\|^2\\ &=u^Tu-2u^Tv+v^Tv\\ &=(u-v)^T(u-v)\\ &=\|u-v\|^2 \end{aligned}

A key property of Bregman divergence is that their gradient with respect to the second argument is affine invariant

vD(au1+bu2,v)=avD(u1,v)+bvD(u2,v)a+b=1\nabla_v D(au_1+bu_2,v) = a\nabla_v D(u_1,v)+b\nabla_v D(u_2,v)\quad a+b=1

The above property allows us to swap the expected values with gradients as follows

vD(E[Y],v)=E[vD(Y,v)](5)\nabla_vD(\mathbb{E}[Y],v)=\mathbb{E}[\nabla_v D(Y,v)]\tag{5}

similar to the affine invariant, where we can take the linear operation out of the gradient.

We now restate our two objective, the first is to learn the marginal velocity field

LFM(θ)=Et,XtptD(ut(Xt),utθ(Xt))\mathcal{L}_{FM}(\theta) = \mathbb{E}_{t,X_t\sim p_t}D(u_t(X_t), u_t^\theta (X_t))

the second is to learn the conditional velocity field

LCFM(θ)=Et,Z,XtptZ(Z)D(ut(XtZ),utθ(Xt))\mathcal{L}_{CFM}(\theta) = \mathbb{E}_{t,Z,X_t\sim p_{t|Z}(\cdot|Z)}D(u_t(X_t|Z),u_t^\theta (X_t))

We will show the following theorem

Theorem

θLFM(θ)=θLCFM(θ)\nabla_\theta \mathcal{L}_{FM}(\theta) = \nabla_\theta \mathcal{L}_{CFM}(\theta)

So the minimizer of the conditional flow matching loss is the marginal velocity field utu_t, because they have the same gradient.

proof.

θLFM(θ)=θEt,XtptD(ut(Xt),utθ(Xt))=Et,XtptθD(ut(Xt),utθ(Xt))=Et,XtptvD(ut(Xt),utθ(Xt))θutθ(Xt)=(4)Et,XtptvD(EZpZt(Xt)[ut(XtZ)],utθ(Xt))θutθ(Xt)=Et,XtptEZpZt(Xt)vD(ut(XtZ),utθ(Xt))θutθ(Xt)=Et,XtptEZpZt(Xt)θD(ut(XtZ),utθ(Xt))=θEt,Zq,XtptZ(Z)D(ut(XtZ),utθ(Xt))=θLCFM(θ)\begin{aligned}\nabla_\theta\mathcal{L}_{FM}(\theta) &= \nabla_\theta \mathbb{E}_{t,X_t\sim p_t}D(u_t(X_t), u_t^\theta(X_t))\\&=\mathbb{E}_{t,X_t\sim p_t}\nabla_\theta D(u_t(X_t), u_t^\theta(X_t))\\&=\mathbb{E}_{t,X_t\sim p_t}\nabla_{\color{green}{v}} D(u_t(X_t), u_t^\theta(X_t))\nabla_{\color{green}{\theta}} u_t^\theta (X_t)\\&\overset{(4)}{=}\mathbb{E}_{t,X_t\sim p_t}\nabla_{v} D(\mathbb{E}_{Z\sim p_{Z|t}(\cdot |X_t)}[u_t(X_t|Z)], u_t^\theta(X_t))\nabla_\theta u_t^\theta (X_t)\\&=\mathbb{E}_{t,X_t\sim p_t}\mathbb{E}_{Z\sim p_{Z|t}(\cdot |X_t)}\nabla_{v} D(u_t(X_t|Z), u_t^\theta(X_t))\nabla_\theta u_t^\theta (X_t)\\&=\mathbb{E}_{t,X_t\sim p_t}\mathbb{E}_{Z\sim p_{Z|t}(\cdot |X_t)}\nabla_{\color{green}{\theta}} D(u_t(X_t|Z), u_t^\theta(X_t))\\&=\nabla_{\theta}{\color{green}{\mathbb{E}_{t,Z\sim q,X_t\sim p_{t|Z}(\cdot|Z)}}} D(u_t(X_t|Z), u_t^\theta(X_t))\\&=\nabla_\theta \mathcal{L}_{CFM}(\theta)\end{aligned}

Q.E.D.

The above theorem actually has a more general form

Theorem. Let XSXX\in S_X, YSYY\in S_Y be RVs over state spaces SXS_X and SYS_Y, we have a function gθ(x):Rp×SXRng^\theta(x):\mathbb{R}^p\times S_X\rightarrow \mathbb{R}^n, where θRp\theta\in \mathbb{R}^p is the learnable parameters.

Let Dx(u,v)D_x(u,v), xSXx\in S_X be a Bregman divergence over a convex set ΩRn\Omega \sub \mathbb{R}^n that contains the image of gθ(x)g^\theta(x), then

θEX,YDX(Y,gθ(X))=θEXDX(E[YX],gθ(X))\nabla_\theta \mathbb{E}_{X,Y}D_X(Y,g^\theta (X)) = \nabla_\theta \mathbb{E}_XD_X(\mathbb{E}[Y|X], g^\theta (X))

and the global minimum of the above gθ(x)g^\theta(x) satisfies

gθ(x)=E[YX=x]g^\theta(x) = \mathbb{E}[Y|X=x]

I don’t really understand the subscript xx in DxD_x here, maybe they mean we can choose a divergence depends on xx and varies according to xXx\sim X? I don’t quite get it.

proof.

θEX,YDX(Y,gθ(X))=EX,YvDX(Y,gθ(X))θgθ(X)=EX[E[vDX(Y,gθ(X))θgθ(X)X]]=(5)EX[vDX(E[YX],gθ(X))θgθ(X)]=EX[θDX(E[YX],gθ(X))]=θEXDX(E[YX],gθ(X))\begin{aligned}\nabla_\theta \mathbb{E}_{X,Y}D_X(Y,g^\theta (X)) &= \mathbb{E}_{X,Y}\nabla_{\color{green}{v}}D_X(Y,g^\theta (X))\nabla_{\color{green}{\theta}}g^\theta(X)\\&=\mathbb{E}_{X}[\mathbb{E}[\nabla_{v}D_X(Y,g^\theta(X))\nabla_{\theta}g^\theta(X)|X]]\\&\overset{(5)}{=}\mathbb{E}_{X}[\nabla_{v}D_X(\mathbb{E}[Y|X],g^\theta(X))\nabla_{\theta}g^\theta(X)]\\&=\mathbb{E}_{X}[\nabla_{\color{green}{\theta}}D_X(\mathbb{E}[Y|X],g^\theta(X))]\\&=\nabla_{\theta}\mathbb{E}_{X}D_X(\mathbb{E}[Y|X],g^\theta(X))\end{aligned}

Therefore, we can choose gθ(x)=E[YX=x]g^\theta(x)=\mathbb{E}[Y|X=x] to obtain the global minimum

Q.E.D.

Solving Conditional Generation

We have now designed the conditional probability path, the conditional velocity field, the marginal probability path and the marginal velocity field, we also proved that we can learn conditional velocity field instead of the intractable marginal one. We now consider how to realize these things.

Here, we design conditional path and velocity fields via conditional flows, we define a flow model Xt1X_{t|1} satisfying the boundary conditions. Once we defined the flow, we can get the velocity field via flow ODE.

If we define

Xt1=ψt(X0x1)X0π01(x1)X_{t|1} = \psi_t(X_0|x_1)\quad X_0\sim \pi_{0|1}(\cdot|x_1)

where ψ:[0,1)×Rd×RdRd\psi:[0,1)\times \mathbb{R}^d\times \mathbb{R}^d\rightarrow \mathbb{R}^d is a conditional flow satisfying

ψt(xx1)={xt=0x1t=1(6)\psi_t(x|x_1) = \begin{cases} x\quad t=0\\ x_1\quad t=1 \end{cases}\tag{6}

We then can obtain the conditioned probability path via push-forward formula

pt1(xx1):=[ψt(x1)π01(x1)](x)(7)p_{t|1}(x|x_1):=[\psi_t(\cdot|x_1)_\sharp \pi_{0|1}(\cdot|x_1)](x)\tag{7}

which means we use the conditional flow we defined to push-forward the source distribution. And you will find that the probability path satisfies boundary conditions. At t=0t=0, the flow is identity map, so it is the source distribution directly. At t=1t=1, the flow is a constant map, it only provides us the target samples.

Recall the flow ODE

ddtψt(x)=ut(ψt(x))\frac{d}{dt}\psi_t(x) = u_t(\psi_t(x))

Let x=ψ1(x)x=\psi^{-1}(x')

ddtψt(ψ1(x))=ut(x)\frac{d}{dt}\psi_t(\psi^{-1}(x')) = u_t(x')

which gives us a way to extract the velocity field. Let’s denote ψ˙t=ddtψt\dot \psi_t = \frac{d}{dt}\psi_t, then we can write the conditional version as

ut(xx1)=ψ˙t(ψt1(xx1)x1)(8)u_t(x|x_1)=\dot \psi_t(\psi_t^{-1}(x|x_1)|x_1) \tag{8}

Now we have shown that the conditional flow can give us the conditional probability path and the conditional velocity field, so we only need to build the conditional flow satisfying Equation (6).

Recalibrate our Flow Matching Loss

We know that we have defined our loss as learning the conditional velocity field

LCFM(θ)=Et,X1,Xtpt(X1)D(ut(XtX1),utθ(Xt))\mathcal{L}_{CFM}(\theta) = \mathbb{E}_{t,X_1,X_t\sim p_t(\cdot|X_1)}D(u_t(X_t|X_1), u^\theta_t(X_t))

From Equation (7) we have

ut(XtX1)=ψ˙t(ψt1(XtX1)X1)u_t(X_t|X_1)=\dot\psi_t(\psi_t^{-1}(X_t|X_1)|X_1)

We know that by definition of the flow function we have

Xt=ψt(X0X1)X_t = \psi_t(X_0|X_1)

then

ut(XtX1)=ψ˙t(ψt1(XtX1)X1)=ψ˙t(ψt1(ψt(X0X1)X1)X1)=ψ˙t(X0X1)(9)\begin{aligned} u_t(X_t|X_1)&=\dot\psi_t(\psi_t^{-1}(X_t|X_1)|X_1)\\ &=\dot\psi_t(\psi_t^{-1}(\psi_t(X_0|X_1)|X_1)|X_1)\\ &=\dot\psi_t(X_0|X_1) \end{aligned}\tag{9}

Hence our loss function becomes

LCFM(θ)=Et,X1,Xtpt(X1)D(ψ˙t(X0X1),utθ(Xt))\mathcal{L}_{CFM}(\theta) = \mathbb{E}_{t,X_1,X_t\sim p_t(\cdot|X_1)}D(\dot\psi_t(X_0|X_1), u^\theta_t(X_t))

with the minimizer

utθ(x)=E[ψ˙t(X0X1)Xt=x](10)u_t^\theta(x) = \mathbb{E}[\dot\psi_t(X_0|X_1)|X_t=x]\tag{10}

Recalibrate Marginalization Trick

We must again state our marginalization trick for conditional flows. Assume ut(xx1)u_t(x|x_1) is conditionally integrable, which means

Et,(X0,X1)π0,1ψ˙t(X0X1)<\mathbb{E}_{t,(X_0,X_1)\sim \pi_{0,1}}\|\dot \psi_t(X_0|X_1)\|<\infty

we give the following corollary to the marginalization trick

Corollary. Assume that q has bounded support, π01(x1)\pi_{0|1}(\cdot|x_1) us C1(Rd)C^1(\mathbb{R}^d) and strictly positive for some x1x_1 with q(x1)>0q(x_1)>0, and ψt(xx1)\psi_t(x|x_1) is a conditional flow satisfying the boundary condition and the above integrable condition.

Then pt1(xx1)p_{t|1}(x|x_1) and ut(xx1)u_t(x|x_1) defined (7) and (8), define a marginal velocity field ut(x)u_t(x) generating the marginal probability path pt(x)p_t(x) interpolating p and q.

We know that from the marginalization trick before, if conditional field is conditionally integrable and generates the conditional probability path, then the marginal field generates the marginal path. We already know that the conditional filed generates the conditional path (they are linked by flow), then we only need to prove the conditional field is integrable

proof.

01ut(xx1)pt(x)dx=01ut(xx1)pt1(xx1)q(x1)dx1dxdt=Et,X1q,Xtpt1(X1)ut(XtX1)=(9)Et,(X0,X1)π0,1ψ˙t(X0X1)<\begin{aligned}\int_0^1 \int \|u_t(x|x_1)\|p_t(x)dx &= \int_0^1\int \|u_t(x|x_1)\|p_{t|1}(x|x_1)q(x_1)dx_1dxdt\\&=\mathbb{E}_{t,X_1\sim q,X_t\sim p_{t|1}(\cdot|X_1)}\|u_t(X_t|X_1)\| \\&\overset{(9)}{=}\mathbb{E}_{t,(X_0,X_1)\sim \pi_{0,1}} \|\dot \psi_t(X_0|X_1)\| \\&<\infty\end{aligned}

We then proved that the field is conditionally integrable, then the marginal field utu_t generates marginal path ptp_t

Q.E.D.

Optimal Transport and Linear Conditional Flow

We have shown in the previous section that the conditional path and field can be designed via conditional flow, we have one last question left: How do we find a useful conditional flow?

Here, we introduce the flow derived from the dynamic Optimal Transport(OT) problem.

(pt,ut)=arg minpt,ut01ut(x)2pt(x)dxdt(p_t^*, u_t^*) = \argmin_{p_t,u_t}\int_0^1\int \|u_t(x)\|^2p_t(x)dxdt

s.t.

p0=p,p1=qddtpt+(ptut)=0p_0=p,p_1=q\quad \frac{d}{dt}p_t+\nabla\cdot(p_tu_t)=0

Together with the flow ODE, solving the OT problem gives us the flow with the form

ψt(x)=tϕ(x)+(1t)x\psi_t^*(x) = t\phi(x) + (1-t)x

called the OT displacement interpolant.

The above form also solves the Flow Matching problem

Xt=ψt(X0)ptX0pX_t = \psi_t^*(X_0)\sim p_t^*\quad X_0\sim p

we can see that the OT formulation promotes straight sample trajectories

Xt=ψt(X0)=tϕ(X0)+(1t)X0=X0+t(ϕ(X0)X0)X_t = \psi_t^*(X_0) = t\phi(X_0)+(1-t)X_0 = X_0+t(\phi(X_0)-X_0)

so the distribution at tt is just a displacement with the constant velocity ϕ(X0)X0\phi(X_0)-X_0 times tt. This form is also much more friendly to ODE solvers.

To find a specific ϕ\phi, and hence a specific ψt(xx1)\psi_t(x|x_1), let’s solve the OT problem with the conditional velocity field in Equation (10)

01ut(x)2pt(x)dxdt=01EXtptut(Xt)2dt=(10)01EXtptE[ψ˙t(X0X1)Xt]2dt01EXtptE[ψ˙t(X0X1)2Xt]dt=E(X0,X1)π0,101ψ˙t(X0X1)2dt\begin{aligned} \int_0^1\int \|u_t(x)\|^2p_t(x)dxdt&=\int_0^1 \mathbb{E}_{X_t\sim p_t}\|u_t(X_t)\|^2dt \\ &\overset{(10)}{=}\int_0^1 \mathbb{E}_{X_t\sim p_t}\|\mathbb{E}[\dot \psi_t(X_0|X_1)|X_t]\|^2dt\\ &\leq \int_0^1 \mathbb{E}_{X_t\sim p_t}\mathbb{E}[\|\dot \psi_t(X_0|X_1)\|^2|X_t]dt\\ &=\mathbb{E}_{(X_0,X_1)\sim \pi_{0,1}}\int_0^1 \|\dot\psi_t(X_0|X_1)\|^2dt \end{aligned}

The above problem involves minimizing the expectation, but now we can minimize individually for each X0X_0 and X1X_1, let γt=ψt(xx1)\gamma_t=\psi_t(x|x_1), we can focus on the integral itself

minγ01γ˙t2dts.t.γ0=x,γ1=x1\min_{\gamma}\int_0^1\|\dot \gamma_t\|^2dt\quad \text{s.t.} \quad \gamma_0=x,\gamma_1=x_1

This is a variational problem and can be solved using Eular-Lagrange equations, we directly write the final answer, the conditional flow that comes from the OT problem is

ψt(xx1)=tx1+(1t)x(11)\psi_t(x|x_1) = tx_1+(1-t)x\tag{11}

this is the minimizer of the above variational problem.

We can find some facts, first, the linear conditional flow in (11) minimizes a bound of the Kinetic Energy among all conditional flows.

Second, note that Equation (11) now gives us

Xt=ψt(X0X1)=tX1+(1t)X0X_t = \psi_t(X_0|X_1) = tX_1+(1-t)X_0

and rearrange it yields

X0=XttX11tX_0 = \frac{X_t-tX_1}{1-t}

if the target qq consists of a single data point q(x)=δx1()q(x)=\delta_{x_1}(\cdot), then the above equation is

X0=Xttx11tX_0 = \frac{X_t-tx_1}{1-t}

which means if you know XtX_t, then X0X_0 is deterministic because there is no randomness in x1x_1 for Dirac measures. We will find that

E[ψ˙t(X0x1)Xt]=ψ˙t(X0x1)=x1X0\mathbb{E}[\dot\psi_t(X_0|x_1)|X_t] = \dot\psi_t(X_0|x_1) = x_1-X_0

since XtX_t is given and x1x_1 is Dirac, and also X0X_0 is deterministic when XtX_t is given, we can get rid of the expectation.

Theorem. If q=δx1q=\delta_{x_1}, then the dynamic OT problem has an analytic solution given by the OT displacement interpolant in (11).

We can use this linear conditional flow to construct our conditional velocity field and the conditional probability path, then our Flow Matching Loss. Refer to the 01-Overview of Flow Matching.


03-Flow Matching and Conditional Flow Matchings
https://jesseprince.github.io/2025/06/09/visual_gen/flow_match/03_flow_match/
Author
林正
Posted on
June 9, 2025
Licensed under