The series of tutorial is based on Flow Matching Guide and Code
arXiv: 2412.06264
Thank you, META
1 The Definition of the Velocity ODE
Let’s consider a particle moving in the space, we use function ψt(x) to describe its current position at time t∈[0,1] starting from x. That says, given the initial position x and the time t, the function ψ:[0,1]×Rd→Rd tells us the current position.
Now we define another function ut(x), which is a vector field u:[0,1]×Rd→Rd, this vector field describes the velocity of the particle. Given the current position x and time t, it tells us the velocity of the particle.
We now can define an ODE describing the movement of the particle:
dtdψt(x)=ut(ψt(x))(1)
Basically, this equation says, the derivative of the position function ψt is the velocity of the particle given by the vector field function ut. We define that ψ0(x)=x, since at t=0 the particle is at the initial position.
Now, if we know the velocity field ut, solving the ODE will generate a path flowing from ψ0 to ψ1, we say the velocity field ut generates the probability path pt if its flow ψt satisfies
Xt:=ψt(X0)∼ptforX0∼p0
Therefore, if we have the velocity field, solving the ODE provides us X1=ψ1(X0), let’s say X1 is from target distribution q and X0 is from the source distribution p, then p0=p and p1=q, Flow Matching is trying to learn the field utθ such that its flow ψt generates such a probability path.
2 Flow Matching Objective
Let’s start from a standard example, let the source distribution be a standard Gaussian p:=p0=N(x∣0,I), and the probability path pt is the aggregation of the conditioned probability path pt∣1(x∣x1), where x1 is from the training dataset, let’s take the aggregation:
This is just the marginal probability calculation, this path is also known as the conditional optimal-transport or linear path. Using this path, we may say Xt∼pt is the linear combination of X0∼p and X1∼q, note that here the X0 and X1 are random variables drawn from distributions.
The above loss function is intractable, but this objective simplifies drastically if we condition the loss on a single target example X1=x1 picked randomly from training set.