Encord's video annotation platform addresses the challenge of synchronizing video frames with labels for computer vision model training by tackling issues related to variable frame rates and media player discrepancies. The platform initially encountered problems with misaligned labels due to the HTML <video> element's inability to seek specific frames, relying instead on timestamps, which proved inadequate for videos with variable frame rates or those affected by media player inconsistencies. By utilizing FFmpeg to analyze video metadata, Encord developed a solution that ensures correct frame synchronization, even when dealing with variable frame rates, ghost frames, and audio-induced frame stretching. This approach includes re-encoding videos to maintain a consistent frame rate and removing problematic frames, thereby ensuring data integrity and offering clients precise annotation capabilities. Encord's solution not only resolves synchronization issues but also equips their developers with a deeper understanding of video encoding and offers clients a user-friendly experience by preemptively identifying and addressing potential issues.