We present a novel dataset for humanoid robot pose estimation from images, addressing the critical need for accurate pose estimation to enhance human–robot interaction in extended reality (XR) applications. Despite the importance of this task, large-scale pose datasets for diverse humanoid robots remain scarce. To overcome this limitation, we collected sparse pose datasets for commercially available humanoid robots and augmented them through various synthetic data generation techniques, including AI-assisted image synthesis, foreground removal, and 3D character simulations. Our dataset is the first to provide full-body pose annotations for a wide range of humanoid robots exhibiting diverse motions, including side and back movements, in real-world scenarios. Furthermore, we introduce a new benchmark method for real-time full-body 2D keypoint estimation from a single image. Extensive experiments demonstrate that our extended dataset-based pose estimation approach achieves over 33.9% improvement in accuracy compared to using only sparse datasets. Additionally, our method demonstrates the real-time capability of 42 frames per second (FPS) and maintains full-body pose estimation consistency in side and back motions across 11 differently shaped humanoid robots, utilizing approximately 350 training images per robot.
We are making the annotations, the corresponding images, and the list of video URLs freely available for educational purposes under a non-commercial license. The authors do not own the copyrights for the images. Users of the images accept full responsibility for the use of the dataset, including but not limited to the creation of any copies of copyrighted images from the dataset. We make no representations or warranties regarding the license status of each image, and you should verify the license for each image yourself.
@article{cha2024diverseobot,
title={Diverse Humanoid Robot Pose Estimation from Images Using Only Sparse Datasets},
author={Heo, Seokhyeon and Cho, Youngdae and Park, Jeongwoo and Cho, Seokhyun and Tsoy, Ziya and Lim, Hwasup and Cha, Youngwoon},
journal={Applied Sciences},
volume={14},
number={19},
articleno={3390},
numpages = {24},
year={2024},
issue_date = {October 2024},
month = {oct},
publisher={MDPI},
address = {Seoul, Korea},
keywords = {computer vision; robotics; deep learning,
doi = {https://doi.org/10.3390/app14199042},
url = {https://www.mdpi.com/2076-3417/14/19/9042},
}