Latency and usability upgrades for ML-based validators

Company

Guardrails AI

Date Published

Aug. 19, 2024

Author

Aarav Navani

Word count

632

Language

English

Hacker News points

None

URL

www.guardrailsai.com/blog/validator-latencies

Summary

Efforts to reduce latency in ML-based validators led to hosting models on EC2 instances with T4 GPUs, significantly improving performance compared to local M3 MacBooks, which lacked Nvidia CUDA support. Initially tested with ToxicLanguage and CompetitorCheck validators, the aim was to assess latency variations in local and remote inference based on text length. Results showed that remote inferencing on T4s was faster than local machines, despite additional network latency when using CPUs on the cloud. Publicly available Guardrails inference endpoints for signed-in users enhance speed by eliminating lengthy downloads, with anonymized usage statistics ensuring data privacy. Data for benchmarking was generated using gpt-4o-mini, producing sentences of varying character lengths for the validators. Validation guards were established for both local and remote setups, with the benchmarks' code accessible in the validator repositories.