Human Pose Estimation (HPE) is a computer vision task that leverages machine learning models to detect, track, and annotate human movements in images and videos, simulating the complex processing capabilities of the human eye and brain. As computational power and algorithmic models have advanced, HPE has become easier to implement, facilitating applications in sectors such as healthcare, sports, security, and more. This technology involves the identification of keypoints on the human body, like joints, and is used in both 2D and 3D contexts to improve motion capture, augmented reality, and various AI-powered applications. Despite its usefulness, HPE presents challenges such as dealing with dynamic human movement, clothing diversity, lighting conditions, and the presence of multiple subjects within a video. Various machine learning models, including OpenPose, MediaPipe, and HRNet, have been developed to tackle these challenges, offering solutions for real-time, multi-person tracking and annotation. Tools like Encord facilitate the annotation process by providing features for defining object primitives and skeleton templates, enabling users to reduce manual workload and improve model accuracy. Overall, HPE is a critical tool across many fields, providing enhanced data-driven insights and improving the efficiency of video annotation projects.