CloudWatch Sumo Logger: Shippo’s Logging and Instrumentation Tools
Logging and instrumentation are two critical tools which help in monitoring the health of any system. We at Shippo rely heavily on Sumo Logic for application and load balancer logs; and on CloudWatch metrics for instrumentation. It is important to have all logs stored in a centralized location for ease of maintenance and accessibility.
At Shippo, we run most of our apps on AWS EKS, a managed Kubernetes service, which helps us with container orchestration by automating deployment, scaling and management of our systems. We are still in the process of migrating our deployments from AWS Elastic Beanstalk to AWS EKS since the migration is not trivial and spans several months. The logs generated by the apps running on the EC2 instances in beanstalk are collected by a Sumo Logic collector agent running on the VMs and are published to Sumo Logic. Whereas the logs generated in EKS are collected by Fluentd and synced to Sumo Logic via a fluentd daemonset. We also run some of our application logic using AWS Lambdas, which are serverless computing platforms provided by AWS that provision servers on demand and spin them down once processing is complete. You can think of them as stateless functions executing code and are usually referred to as FaaS (function as a service). This is great for several reasons:
- Reduces the overhead of provisioning servers and maintaining them
- Reduces cost since we are charged only for what we use
- Highly scalable — Lambda can spin up as many copies of the function as needed depending on scale
- Has integrations with several of the other AWS services like S3, CloudWatch, API Gateways and so on
There are some limitations of using AWS Lambda as well, which are:
- Cold starts — since servers are being provisioned ad hoc, it usually takes some time for start up. There are ways to overcome this but if you are looking to build low latency applications, it is better to go with the alternative, which is provisioning an actual server.
- Memory limitations
- Though logs generated by Lambdas are streamed to CloudWatch logs, it doesn’t have good integrations with third-party services like Sumo Logic
In this article, we will be looking at how we can publish lambda logs to Sumo Logic. This was implemented during a hack day at Shippo and the theme for the hack day was logging — this could be cleaning up existing logs to reduce noise or any other hack which will make logging better at Shippo.
And the incentive to clean up or improve logging was LEGO’s! Based on the log cleanup (difficulty and noise reduction) or the hack, people were awarded an appropriate number of LEGOs.
Logs generated from lambda’s can be published to Sumo Logic using another Lambda. We first started with a quick proof-of-concept to have something working end to end. The hack was straightforward and entails the following:
Configure HTTP source collector in Sumo Logic. This will be done using the Sumo Logic UI where we can create a new collector and configure the following
- source — as HTTP
- collector name
- source category — this would be the path to check for logs in Sumo Logic when querying. e.g — prod/core/lambda
- rules to process incoming logs. E.g multi line processing and so on
- after setting this up, Sumo Logic generates an unique HTTPS url which we will be using for publishing logs
Create AWS Lambda function which will receive CloudWatch application log events, cleans them and publishes them to Sumo Logic via HTTPS
Setting up the source collector was straightforward since it just required a few clicks in the UI. The lambda function to publish logs to Sumo Logic wasn’t too difficult to implement either. Sumo Logic already had a NodeJS cloudwatch-to-sumo logger function which we were able to re-use for the most part by making a few tweaks.
The logs generated by the application lambda running business logic are streamed to a CloudWatch log group. When creating the new sumo logger Lambda in the AWS Lambda UI, we configured this log group as the trigger to the NodeJS function. As a final step, we were able to set the HTTPS url generated during the Sumo Logic collector creation step as an environment variable, and we were done with the setup.
We triggered the application lambda function and we were able to confirm that the logs were
This proof of concept works great for one CloudWatch log subscription. But in real life, we would want the sumo logger Lambda to subscribe from several different CloudWatch log groups for different application lambdas, and configuring this in the AWS console for each one of them and repeating the same for different environments (dev, prod) is painstaking.
Terraforming the setup
At Shippo, we heavily rely on terraform for provisioning our cloud infrastructure. Terraform is an IaaC(Infrastructure as code) tool which helps define and provision infrastructure in terms of configuration. This greatly helps reduce time taken to provision by automation and helps keep all our environments in sync. If we are building Lambdas frequently, we should be automating the above mentioned steps that were done manually on the AWS lambda console. This can be achieved by doing the following:
- Terraform the cloudwatch-to-sumo logs publisher lambda so that we have similar setup in all environments
- Version control the cloudwatch-to-sumo logs publisher lambda function
- Build and deploy the above lambda function using CI/CD
Once the above steps are done, any application Lambda that we deploy in the future just needs to be terraform-configured so that the log group has permissions to invoke this sumo logger.
Terraform the CloudWatch log publisher
This step requires us to do the following:
- Create the lambda function resource
- Configure necessary permissions for ingress and egress
- Attach IAM policies
This is a one time setup and we should be able to use the same sumo logger lambda for publishing all cloudwatch app logs to Sumo Logic.
And, if we are creating a new lambda to run business logic, we just need to explicitly create a cloudwatch log group, add permissions to invoke the sumo logger lambda and add a subscription filter as shown below.
The Lambda function
When working on the POC, we were able to leverage the NodeJS function provided by Sumo Logic and we assumed that we would be able to re-use that when automating the entire setup. But sadly, when we started working on this, we realized AWS Lambda only supported NodeJS 10.x and above, and the Terraform AWS provider version we were using at that point did not support that version of NodeJS
So, we decided to re-write the logger lambda code in GO. GO has pretty good support for AWS client libraries so we were able to use that for building the lambda handler. The handler basically receives events, cleans up unwanted logs, normalizes it and publishes to Sumo Logic over HTTP.
Build and deploy Lambda
Once merged, our CircleCI pipeline builds the binary and publishes it to AWS S3, which is our source for lambda deployments. We were able to test the end-to-end flow in our test environment after the automated setup and it publishes logs to Sumo Logic as expected!
Note: the CloudWatch log events coming from the application Lambda are already buffered, so we don’t need to have any buffering logic in our end for now. Like any third-party service, Sumo Logic also has rate limiting based on account subscription, but the retries and exponential backoffs should help to some extent.