We are playing with different ways of using GitHub to stream data and work with both current, as well as legacy data using our event-driven technology. We’ve explored lately how you can push legacy data to GitHub using Git, or the GitHub API, in regularly scheduled intervals, then proxy the JSON data using Streamata.io to recreate historical streams of data. Pushing forward the conversation around just what is streaming data. We wanted to continue pushing forward this work with new some new ideas we are looking to develop.
On concept we are experimenting with is taking a handful of stock market ticker symbols, then pulling the prices for their stocks over the course of a 4 hour period from some day in the past, and publishing the results to a GitHub repository in regularly scheduled commits. Recreating a specific 4 hour period from a specific day in the past, using GitHub as a static, yet streaming representation of that period in time. Allowing us to take any time period from the past, and publish it as a stream that anyone can consume as part of an application, or used to train or test machine learning models.
Streaming data is often something we assume represents the most up to date information. While this represents the majority of streaming data use cases today, there a numerous ways in which you can adjust, tweak, and distort the time variable to stream any data from the past, expanding the popular opinion about what streaming data is all about. Allowing a single JSON file stored on GitHub to become a real-time stream of data by pushing updates to the JSON file using Git, or the GitHub API, in the same sequence and time frame as they existed in the past. Leveraging the tools we are already putting to work (GitHub), as we expand our event-driven approach to delivering data where it needs to be, exactly when it is needed.
We are just getting going with this approach to using GitHub. We have some more work to do on our streaming subscription prototypes, and GitHub data lake prototypes, using 3rd party APIs. We’ll keep publishing these examples of streaming data here on the blog, and experimenting with a serverless approach to streaming data, and clients side approaches on GitHub until we find the solution that resonates with our customers. Helping them develop low-cost approaches to gathering the data they need and making sure it is pushed and streamed wherever they need it, using the tools and services they already depend on.