Lakera has developed an implementation of OpenAI's CLIP model that eliminates the need for PyTorch, facilitating easier deployment on production and edge devices. CLIP, known for its image-to-text capabilities, typically relies on PyTorch for its three main components: the text tokenizer, the image preprocessor, and the model itself, which outputs cosine similarities of text and image embeddings. Lakera has rewritten the text tokenizer in NumPy, created a custom image preprocessor, and exported the CLIP model to an .onnx format to replace PyTorch with the more lightweight onnxruntime. This development allows for a more streamlined and accessible application of CLIP in various environments.