|Case Study Title||TalkMap EKS Container Migration|
|Case Study Short Description||How Triumph Tech helped Discourse.ai., a AI and ML powered business insights company, use DevOps and Containers to rapidly speed up the development process, automate security scanning & compliance, and migrate from their legacy infrastructure.|
|Problem / Statement Definition||Discourse.ai needed to modernize their deployment by migrating from their existing legacy infrastructure quickly, they needed increased scalability and built out multiple mirrored environments to support workload promotion through multiple stages of their devops Software Development life cycle.
Discourse.ai provides a near real time business intelligent service that needs to quickly scale with demand, additionally they are at the forefront of the latest in AI and ML advancements and need the flexibility to innovate quickly.
|Proposed Solution and Architecture||Triumph Tech used Containers along with AWS Elastic kubernetes Service to provide a step change to Discourse.ai’s existing DevOps and SDLC process. Empowering their development team to rapidly test, validate and deploy new innovations into their environment but providing a high level of automated testing and validation to ensure a high level of quality.
Triumph selected AWS to provide the resources required to fix the problem. We used eksctl along with CloudFormation to deploy all of our resources. This included Dev,Test,QU and production environments consisting of:
● EKS Cluster
● Dedicated per environment VPC consisting of public and private subnets, enabling deployment of workloads over multiple availability zones, leveraging NAT Gateways with Fixed Elastic IP’s for use with Allow Lists.
● A least privilege approach to Security Groups, separating out ingress, control and workload traffic.
● Limited Access specific Identity and access control roles and supporting policies endearing to AWS best practices.
● EKS node groups with EC2 AutoScaling to get great cost performance benefits.
● Maximize Ml processing with using the Nvidia V100 Tensor Core GPU’s within the P3 instance Type, ensuring the best cost benefit with EC2 Autoscaling groups.
● EC2 classic Load Balancers to provide seamless integration with Kubernetes deployment workflow.
● EFS, providing persistent volumes to our containers running inside the EKS Cluster.
● KMS, used to ensure data at rest is always encrypted.
● AWS Backup, to protect critical data stored within EFS should the worst happen.
Container builds and deployments throughout the SDLC are managed by an integrated self managed EC2 hosted gitlab server.
|Outcomes of Project & Success Metrics||We did a deep dive into the legacy environment in order to understand the existing application stack and SDLC.
We discovered that the current architecture was hindering rapid innovation and needed to be migrated to a modernized infrastructure to give Discourse.ai much needed technical advantages.
Project success was therefore determined by migrating into an environment that enabled rapid innovation whilst providing automated testing in isolated environments. Deployments were automated, security scans were automated, and most importantly developers can release new innovations easier and quicker providing real RTO and growth opportunity.
|Describe TCO Analysis Performed||TCO analysis was performed by collecting data from the AWS Billing Console.|
|Lessons Learned||Automated solutions are the optimal way to reduce time to market.
Eksctl combined with CloudFormation offers repeatable deployments for multiple customer environments.
EC2 Auto scaling offers the flexibility needed to keep business insights at near real time whilst keeping costs under control by aligning consumption with scaling.
|Summary Of Customer Environment||Cloud environment is native cloud. The entire stack is running on Amazon Web Services. Stack is being deployed in the US-East-2 region.|
Deployments are tested and validated through a promotion strategy. The only branch which automatically deploys without approval is the development branch, which is deployed to the isolated staging environment. At this point, the team will QA and validate application functionality and approve a promotion to the production environment. A pull request is submitted to source control and merged into Master. Workloads are then deployed to the production environment.
In both the staging and production environments, security scanning is automated via a vulnerability scanner called Trivy. If a critical issue is found, Trivy will produce a non-zero exit code and the build will fail. No container image artifact will be published to EKS. Client must patch the vulnerability and the vulnerability scan must pass in order to publish the image to EB.
All code assets are version controlled within GitHub.
All CloudFormation assets are stored within AWS CodeCommit.
CloudWatch application logging is integrated by default into all of our container and serverless workloads. We include this as an “in scope” item for all modernization and migration projects. This provides a centralized system where error logs can be captured and aid in operational troubleshooting.