Next-generation data ingestion platform on AWS for the world’s leading health and security services company
About the Company
The world’s leading health and security services firm with nearly two-thirds Fortune Global 500 companies as its clients. The company provides multi-cultural health, security and logistics services and support from over 1,000 locations in 85 countries.
On its journey to develop competitive products and services, the company wanted to design and build a modern data ingestion capability and revamp its legacy ETL, data storage, and data consumption platforms. They needed a partner who could help them design and develop next-generation data ingest platform.
One of the goals was to design and develop a decision engine by consolidating and streamlining content ingest tools, processes, and operations while ensuring data integrity and quality. The company’s customer data was provided using multiple ingress protocols and it had to be ingested into a cloud-native enterprise data lake. The company was looking to design and integrate high-performance APIs for ingestion and consumption for analytics and visualization tools, third party integrations, and web and mobile applications.
What we did
Data Governance Model
Data Catalog, Data Integration and Ingestion
Enterprise Data Lake
AWS Data Migration
Cloud Managed Services
Our client was poised to embrace the public cloud for the first time with the help of CompuGain experts. AWS was the recommended data ingestion platform for flexibility, reliability, and scalability.
Our team created the solution architecture into three distinct parts:
- Ingress mechanism: Secure API, SFTP
- Data Pipeline – Serverless ETL pipeline.
- Data storage – Elastic search, Cloud-Native Data Lake, and Application database consumption.
AWS Aurora with PostgreSQL 11 was the chosen RDS for the Data Storage. It provided out of the box capabilities around the non-functional requirements allowing the team to focus on the business problem at hand.
- We designed and set up multi-region (US NorthEast Region & AWS France region) and multi availability zone infrastructure to ensure data security, availability, and adherence to GDPR compliance.
- Created data governance framework for multiple environments and IAM.
- We designed and implemented the Glue job to perform complex interactions with elastic search and insert, update, and retrieve data into employee RDBMS.
- Used AWS data wrangler within the python Glue job using SQL Alchemy for Object-relational mapping (ORM).
- Worked closely with the business owners to redesign and migrate updated schema, reference data tables, and refactored the queries for application consumption from legacy SQL server to Aurora Postgres 11.
- We implemented multiple secure ingress push\pull mechanisms that process numerous file formats, record structures, and data types. Ingested data becomes available for consumption with little to less tweaking.
Result and Outcomes
With the new architecture, the client was able to embrace AWS cloud, establish the multi region cloud landing zones, develop and productize a solution for employees and customers in less than 4 months.
improvement in the quality of data ingested
million client employee data ingested for on-boarding 4500 enterprise clients
compliance for the architecture and design
adherence to data privacy, security and compliance requirements
- Cloud-native digital transformation foundation setup for the Enterprise
- With next-generation data ingest platform client products and services have become competitive
- Availability of the data 24x7x365 with automatic backup of the data
- Enable multiple parallel data ingestion across clients and ingress protocols with high availability
- Custom database for onboarding and authenticating clients through OKTA to serve for all the applications
- Consolidated and streamlined content ingest tools, processes, and operations
- Ability to design and integrate high-performance APIs for ingestion and consumption
- Available across the world according to region specific compliance regulations
- Improved geospatial support through POSTGRES 11 and POSTGIS