Data Processing with Conductor
Blog post from Orkes
A Conductor workflow is implemented to automate the ETL process for Orkes, focusing on extracting data from GitHub, transforming it, and loading it into Orbit to efficiently manage community interactions and data. This workflow involves using GitHub's API to extract user information, transforming the data into a JSON format suitable for Orbit using JQ and JavaScript tasks, and then uploading it to Orbit with the help of API keys. In addition to streamlining this process, the workflow is scheduled to run every 24 hours using Orkes’ new scheduler tool, ensuring data remains current by capturing new stars and forks in the Conductor repository. This approach not only saves time but also allows for extending the workflow to analyze multiple repositories concurrently, contributing to a comprehensive understanding of user engagement across various platforms.