ProjFlow: Projection Sampling with Flow Matching for Zero-Shot Exact Spatial Motion Control

1Waseda University, 2LY Corporation
CVPR 2026

We introduce a zero-shot flow-matching sampler that generates 3D human motion that exactly matches given spatial targets without additional training or optimization.

(A) Follow a joint path

Exact trajectory following for a selected joint.

(B) Lift 2D cues to 3D

Exact 2D-to-3D lifting under a known camera.

(C) Keep a fixed offset

Maintain relative position between joints (e.g., wrists).

(D) Seamless loop closure

Match start/end poses to generate a clean loop.

Abstract

Generating human motion with precise spatial control is a challenging problem. Existing approaches often require task-specific training or slow optimization, and enforcing hard constraints frequently disrupts motion naturalness. Building on the observation that many animation tasks can be formulated as a linear inverse problem, we introduce ProjFlow, a training-free sampler that achieves zero-shot, exact satisfaction of linear spatial constraints while preserving motion realism. Our key advance is a novel kinematics-aware metric that encodes skeletal topology. This metric allows the sampler to enforce hard constraints by distributing corrections coherently across the entire skeleton, avoiding the unnatural artifacts of naive projection. Furthermore, for sparse inputs, such as filling in long gaps between a few keyframes, we introduce a time-varying formulation using pseudo-observations that fade during sampling. Extensive experiments on representative applications, motion inpainting, and 2D-to-3D lifting, demonstrate that ProjFlow achieves exact constraint satisfaction and matches or improves realism over zero-shot baselines, while remaining competitive with training-based controllers.

Results: Trajectory Control (Qualitative Comparisons)

Qualitative results for trajectory control. Each example fixes a target trajectory for one or more joints. We show three methods under the same prompt and control signal: OmniControl, MaskControl, and ProjFlow (ours).

Anchor

Pelvis tracking

Prompt: "a person runs forward in an S path"

OmniControl
MaskControl
ProjFlow (Ours)
Anchor

Head tracking

Prompt: "a person jumps and kicks a football in the air with their head"

OmniControl
MaskControl
ProjFlow (Ours)
Anchor

Left-hand tracking

Prompt: "the person is boxing with their left hand and throws multiple punches"

OmniControl
MaskControl
ProjFlow (Ours)
Anchor

Right-hand tracking

Prompt: "a person puts hands on the armrest"

OmniControl
MaskControl
ProjFlow (Ours)
Anchor

Left-foot tracking

Prompt: "a person stands with both feet on the ground, kicks once with his left foot"

OmniControl
MaskControl
ProjFlow (Ours)
Anchor

Right-foot tracking

Prompt: "a person side steps left and right"

OmniControl
MaskControl
ProjFlow (Ours)
Anchors

Multi-joint tracking (arc)

Prompt: "a person crosses their arms for chest fly"

OmniControl
MaskControl
ProjFlow (Ours)

Control signal: circular-arc trajectories for both hands and the pelvis.

Anchors

Multi-joint tracking (S-shaped)

Prompt: "a person snakes forward while the right hand waves up and down"

OmniControl
MaskControl
ProjFlow (Ours)

Control signal: S-shaped trajectories for the pelvis and right hand.

2D-to-3D reconstruction

Input: an initial 2D pose plus a 2D joint trajectory under a known orthographic camera. ProjFlow enforces the projection constraints exactly while producing a coherent 3D motion consistent with the text prompt.

Sketch2Anim
ProjFlow (Ours)

Citation

@article{watanabe2026projflow,
  title     = {ProjFlow: Projection Sampling with Flow Matching for Zero-Shot Exact Spatial Motion Control},
  author    = {Akihisa Watanabe and Qing Yu and Edgar Simo-Serra and Kent Fujiwara},
  journal   = {arXiv preprint arXiv:2602.22742},
  year      = {2026},
  url       = {https://arxiv.org/abs/2602.22742}
}