back to top
9.8 C
Europe
Saturday, September 27, 2025

Apple teaches robots with Apple Vision Pro video

In a new study titled “Humanoid Policy H Human Policy,” the Apple team, in collaboration with MIT, Carnegie Mellon, the University of Washington and the University of California, San Diego, has come up with a unique approach to training humanoid robots using first-person video of people performing everyday activities. An Apple Vision Pro headset was used for filming.

A robot learns by watching a human do it

The idea is simple: if a person performs a certain action, it can be recorded from a first-person point of view, and then hand over the work – and the person will repeat what they saw. During the study, the researchers collected more than 25,000 demonstrations of human actions and 1,500 examples of robotic actions, creating a large-scale PH2D dataset. This dataset was used to train a single control model for a real humanoid robot.

How the data was collected quickly and economically

To collect video data, the team created a custom app for Apple Vision Pro that used the headset’s lower left camera and ARKit to track head and arm movements. To keep costs down, the researchers also 3D printed a mount that allowed the Zed Mini Stereo camera to be mounted on other devices, such as the Meta Quest 3. This provided similar tracking quality but at a much lower cost.

This approach allowed high-quality demonstrations to be produced in seconds – much faster and cheaper than traditional manual robot control.

human motion-slow motion for robots

Because robots move slower than humans, the researchers slowed down the video of human actions by a factor of 4. This allowed robots to learn at their own pace without the need for additional processing.

The HAT model: a universal action policy

Central to the study was the Human Action Transformer (HAT) model, which was trained on mixed data – from both humans and robots – in a single format. Rather than separating into ‘human’ and ‘robotic’ actions, HAT is trained with a common policy that is suitable for any ‘body’ type. This provides flexibility and high efficiency.

In tests, the approach has shown excellent results: robots have successfully performed even tasks they have not encountered before, outperforming traditional learning methods.

Apple teaches robots with Apple Vision Pro video

PH2D: the new standard in robotics

PH2D has become one of the largest and most versatile datasets in the field of robot learning. Apple and partners’ research demonstrates how the use of headset video and advanced AI models can revolutionise the approach to humanoid learning, making it fast, affordable and scalable.

- Реклама -