We are beginning to roll out a number of AWS Lambda functions for connecting to a variety of APIs which then allow you to stream the results into your existing AWS infrastructure. Next on our list of serverless streaming connectors to deploy, is one for streaming Stack Exchange questions into an AWS data lake, allowing you to train machine learning models on any Stack Exchange search, as well as use in a variety of web, mobile, or other applications. Providing a plug and play, scalable way to stream data available via existing APIs into your organization’s data lake.
The Streamdata.io AWS Lambda streaming connector for Stack Exchange searches is available in the AWS Serverless Application Repository. Allowing you to deploy the serverless function within your existing AWS infrastructure, proxy Stack Exchange API searches using Streamdata.io, and publish the results in real time to an AWS S3 bucket. Turning on a scalable faucet of data from a few, or as many Stack Exchange searches as you’d like to conduct, and orchestrate the streams based upon events or a schedule using AWS Cloudwatch Events–which lets you manage each function, and pay for only when the streams are running.
To run the functions you’ll need a Streamdata.io account and application key–something that takes a minute to set up. You’ll also need an AWS account to deploy the Lambda function into, and your S3 storage activated to establish your data lake. Then you need a Stack Exchange account, and token to be able to make ongoing calls to the Stack Exchange API. Once set up, all you do is add your Streamdata.io key, and Stack Exchange token into the Lambda function, execute the script, and it begins streaming into your S3 bucket. Streamdata.io will proxy and cache the Stack Exchange API, sending only updates to your AWS Lambda function which then publishes the incremental updates to your designated S3 bucket based upon the schedule you set up, using AWS Cloudwatch Events.
The next version of our functions will abstract away accounts and keys needed for Streamdata.io and Stack Exchange, making your AWS account the only thing you need. However, this function should get you started streaming Stack Exchange searches into your data lake. Allowing you to monitor conversations that occur via the popular QA network, across any topic you choose. Enriching your data lake with relevant signals, which can be used to train machine learning models, drive dashboards, web, mobile, and any other application you need. Efficiently tapping into valuable 3rd party data sources like Stack Exchange to find the relevant question and answers, and make them available for use across your existing infrastructure in the AWS cloud.