Company
Date Published
Author
Mike Coutermarsh
Word count
856
Language
English
Hacker News points
12

Summary

When building PlanetScale, the team had two hard requirements for their background job system: data loss would not impact functionality and a single failed job would be automatically re-run. To achieve this, they used Sidekiq as their background queueing system. The core design decision was to set up another job whose responsibility is to schedule the original job to run, allowing the system to self-heal if jobs are lost or fail. They stored state in the database to ensure that even if a user action fails, the job can still be re-run automatically. Additionally, they added middleware to disable scheduled jobs at any time and implemented bulk scheduling of jobs to improve performance when dealing with large numbers of jobs. The team also handled uniqueness by storing state in the database, using database locks, and utilizing Sidekiq's unique job feature.