Google DeepMind, University College London, and the University of Oxford have developed a revolutionary model for object tracking in video sequences called TAPIR (Tracking Any Point with per-frame Initialization and temporal Refinement). This model addresses the limitations of traditional object-tracking methods by focusing on point-level correspondence, robust occlusion handling, and long-term tracking capabilities. TAPIR provides superior accuracy and robustness in object-tracking scenarios, particularly when tracking specific points of interest within videos. The model is designed to tackle challenges such as point-level correspondence, occlusion handling, and limited real-world ground truth data. By addressing these limitations, TAPIR offers a highly effective solution for object tracking, enabling precise tracking at the point level. TAPIR combines the strengths of two existing architectures, TAP-Net and Persistent Independent Particles (PIPs), through a coarse-to-fine approach, employing a fully convolutional architecture that allows efficient mapping onto modern GPU and TPU hardware. The model also estimates its own uncertainty in position estimation through self-supervised learning, improving benchmark scores and benefiting downstream algorithms that rely on precision.