Pipeline Tuning with Engine Variables in SingleStore
Blog post from SingleStore
SingleStore's pipelines are a crucial feature for ingesting diverse data from sources such as Kafka, S3, and GCS into its tables, and are widely used for ETL purposes by transforming data via stored procedures before loading. The platform offers extensive tuning options at both global and individual pipeline levels to optimize ingestion performance based on data source, type, and arrival intervals. Key engine variables like advanced_hdfs_pipelines, enable_eks_irsa, and pipelines_stored_proc_exactly_once can be configured to enhance functionality such as Kerberos authentication, EKS IAM roles for credentials, and ensuring exactly-once delivery. Users can specify per-pipeline configurations during creation or later using the alter pipeline command, which take precedence over global settings. The document also discusses how variables like pipelines_max_offsets_per_batch_partition and max_partitions_per_batch influence the parallelism and stability of data ingestion, emphasizing the balance needed between Kafka and SingleStore partitions to avoid data skew. Overall, fine-tuning these variables is essential for achieving optimal performance in data ingestion workflows.