First-person capture that records what the demonstrator sees and how they move — RGB, depth, and motion — on the same clock as touch.
What it captures
First-person RGB aligned to the demonstrator's view, with stable exposure for long runs.
Per-frame depth (RGB-D) for geometry, reach, and clutter around the hands.
IMU dynamics — acceleration, angular rates, and movement cues for inertia, rhythm, and effort.
[SPEC: camera resolution · frame rate · depth range — drop in real numbers]
One clock, every modality
The rig shares a single timebase with the glove and skin, so first-person video, depth, motion, and touch line up frame by frame — the alignment that makes episodes trainable.