How Triumph Tech helped Shield Legal, a lead generation company for the Legal Industry, use DevOps and Serverless to automate routine daily tasks and cut their operational costs.
Problem / Statement Definition
Shield Legal had been plagued by spending thousands of dollars on routine daily tasks which could be automated for pennies on the dollar. As their practice grew, the amount of time spent on the generation of reports by humans had increased exponentially. Triumph Technology Solutions (Triumph) recognized that Shield Legal needed a fully automated solution to get valuable productivity time back in order to save on operational expenses.
Proposed Solution and Architecture
As a team of Legal Lead Generation and legal marketing experts, Shield Legal did not have the staff or resources on hand to optimize their current business processes for the 21st century.
Triumph chose AWS to provide the cost efficient resources in order to fully automate the generation of lead reports which were previously done by a human.
· We used a Python Lambda function so we could fetch the reports from the client’s CRM and transform the data into a report which would then be posted into Slack.
· We chose Lambda in order to save money and pay only for what is used in conjunction with CodePipeline and Systems Manager and to rapidly deploy modifications and updates to the serverless function.
Outcomes of Project & Success Metrics
During the initial discovery phase, we did a deep dive with the client in order to understand their process in order for Triumph to build a solution to automate it and help them save valuable operational costs.
We discovered that valuable marketing staff was spending 4 hours a day on the generation of reports when they could have been focused on their core business, lead-gen marketing. Not only was the client losing money on operational costs, they were losing out on revenue generating activity as well.
We found that the Shield Legal was losing $3,000 per day in operational expenses as a result of manually preparing these reports. That adds up to losses of $1,095,000 per year.
With the costs at $10,000 for the development of this microservice, that is over 108x in cost savings, while streamlining his time and business priorities.
Describe TCO Analysis Performed
TCO was calculated based on the amount of time to manually pull data and generate reports versus the amount of time required by our solution.
Lessons Learned
Automated solutions are the optimal way to reduce operational costs and allow Shield Legal to focus on their core business.
Summary Of Customer Environment
Cloud environment is native cloud. The entire stack is running on Amazon Web Services. Stack is being deployed in the US-East-1 region.
AWS Account Configuration
Root User is secured and MFA is required. IAM password policy is enforced.
Operations, Billing, and Security contact email addresses are set and all account contact information, including the root user email address, is set to a corporate email address or phone number.
AWS CloudTrail is enabled in all regions and logs stored in s3.
Operational Excellence
Metric Definitions
CodePipeline Health Metrics
If any step within the pipeline fails, notifications are sent out to the DevOps channel within Slack. This is achieved via an integration between SNS topics and AWS Chatbot integration with Slack.
Container Metrics
Container health is handled at the orchestration level. We set up health checks to monitor specific ports and endpoints within our containers and check for a 200 or predetermined response code.
Lambda Health Metrics
Lambda health is determined by success / failure of the Lambda function. The most important metric is error count and success rate (%).
ELB Target Group Metrics
Unhealthy targets are identified as targets who are not passing ELB health checks.
Metric Collection and Analytics
We consult clients on best practices in terms of log / metric collection. For application related logs we prefer the use of an ELK stack, which takes advantage of AWS Elastic Search Service, Logstash running on EC2, and Kibana. This allows for complete security and granular control over log collection and visualization.
To automate the alerting of unhealthy targets of an application / network / or classic ELB, we consult our client on the use of CloudWatch alarms, SNS alarm trigger notification, and AWS Lambda. The Lambda function makes a describe-load-balancer or describe_target_groups API call to get the identity of the failed target as well as the cause of the failure and then triggers an email notification via SNS with the discovered unhealthy host details.
We recommend the use of Grafana running on EC2 and Prometheus for the monitoring of individual workloads running within a stack. EC2, RDS, Container, EKS, and ECS metrics are collected by Prometheus and data visualized via dashboards within Grafana.
In this particular case we are using a lambda function in order to pipe logs from lambda application to an Amazon Elasticsearch Cluster.
Operational Enablement
Enabling the client to manage and maintain the DevOps pipeline after handover is of the utmost importance. Our goal is to always minimize the amount of maintenance that will be required with the level of automation. Our goal is for all members of the Development team to be able to simply push code, follow a development process, and know that their applications are being tested and rapidly deployed.
Training and handover are always included in scope. This process includes the development of documentation specific to the customer workload. It outlines the development lifecycle from source control and branching all the way through deployment.
We document how to version IAC modules / templates that were developed and push out updates to their infrastructure.
We provide architecture diagrams, which outline the branch strategy / git workflow.
Lastly, we schedule a video conference, and do a “hands on” session with the client where we go over how to push application updates throughout the development, staging, and production environments. We go over the development workflow and branching strategy.
We show the client how to troubleshoot a failed pipeline build within CodeBuild. We show the client where to find all relevant logs in relation to their build and test stages within CodePipeline should they occur. The majority of DevOps related troubleshooting tasks after a CI / CD automation pipeline is properly set up will be found within the CodeBuild logs and fixed at the application layer.
During this video conference we outline common troubleshooting scenarios that the client will run into and show them how to effectively troubleshoot the workload.
We go over each and every component of the infrastructure and ci / cd pipeline that was developed with the client and allow them time to ask any questions.
Deployment Testing and Validation
Deployments are tested and validated through a promotion strategy. The only branch which automatically deploys without approval is the development branch, which is deployed to the isolated development environment. At this point, the team will QA and validate application functionality and approve a promotion to the staging environment. A pull request is submitted to source control and merged into staging. Workloads are then deployed to the staging environment. After testing and validation of staging, a pull request is submitted from staging into master and merged. Master branch triggers a build and deployment to production via CodeBuild / CodePipeline.
Version Control
All code assets are version controlled within GitHub.
Application Workload and Telemetry
CloudWatch application logging is integrated by default into all of our container and serverless workloads. We include this as an “in scope” item for all DevOps projects. This provides a centralized system where error logs can be captured and aid in operational troubleshooting.
Security: Identity and Access Management
Access Requirements Defined
In order to discover access requirements, we take a look at the organizational units within the client’s business, which will be required to access DevOps infrastructure. We discover developers, systems engineers, security engineers, and stakeholders. We have previously defined best practices that we follow for each of these groups.
IAM groups are created for each of these Organizational Units and least privilege access is applied to each. Each group is only granted access to what they actually required.
No processes deployed to AWS infrastructure will make use of static AWS credentials. All instances which call other AWS functions use roles. The only case where static AWS credentials are used to call AWS services is when third party integrations can’t make use of assumed roles.
Log into AWS for each APN partner and user of the platform, make use of unique IAM users or federated login. No root access is permitted. We have a CloudWatch alarm setup which triggers an SNS notification via email anytime the root user logs in.
Security IT / Operations:
Components which require encryption:
Lambda Variables: These are encrypted at rest using KMS.
AWS API Integration
AWS CLI is used for all programmatic access.
Big Data Reliability
Deployment Automation
The deployment process is fully automated. When we merge a change into the master branch from development within GitHub, CodePipeline is triggered. CodePipeline first runs CodeBuild and compiles application dependencies via pip and requirements.txt, then creates an artifact and CloudFormation template which triggers the deployment of the serverless function via CloudFormation. We use change sets and then automatically execute those change sets via CodePipeline.
Availability Requirements
RTO: Application reports run 3x daily at 8PM, 2PM, and 9PM
Application can be down for a maximum of 14 hours without causing any significant harm to the business.
RPO: 24 Hours
Data is backed up every 24 hours, so the worst case scenario is that we lose a day.
Adapts to Changes in Big Data Demand
This application uses Lambda, which can scale in response to demand. Reports are only run 3x daily which does not warrant the use of provisioned concurrency.
Cost Optimization
Cost Modelling
We deployed the workload into a development Lambda environment to test. We run the workload and record the transaction time, then estimate the cost using the AWS calculator. We multiply this by 90 because this report will run 90 times in a month. This particular function will fall under the AWS Free Tier in terms of cost.
We found that the client was losing $1,095,000 per year and that this solution would cost $10,000 in design and implementation. This amounts to 108x ROI in the first year.
Looking to implement an DevOps with Big Data? Contact one of our Big Data Scientists today.