The relationship between Apache Kafka and machine learning (ML) is crucial for building scalable machine learning infrastructure, especially when deploying analytic models in real-time predictions. Model training and deployment can be two separate processes, but they share common steps such as integration, data preprocessing, and aggregation of data. Two options exist for model deployment: model servers with remote procedure calls (RPCs) and natively embedding models into Kafka client applications. TensorFlow Serving is a popular choice for model deployment, offering features like versioning, A/B testing, and canary releases. Embedding the model directly into an application eliminates the need for RPC communication with a model server but requires implementing these features manually. The choice between model servers and embedded models depends on infrastructure, requirements, and capabilities. Kubernetes provides a cloud-native environment where both approaches can be leveraged, offering scalability and robustness. Edge deployment is also possible, allowing for ultra-low latency and lightweight models to be deployed at the edge of the network. Kafka provides a great foundation for building machine learning monitoring infrastructure, including technical monitoring and model-specific monitoring like performance or model accuracy. Understanding the pros and cons of both approaches will help developers make an informed decision for their project.