Designing Big Data Solutions with AWS Cloud

Northbay Solutions (northbaysolutions.com) is a premier AWS (Amazon Web Services) based advanced big data consulting partner based in the USA. Northbay Solutions initiated its off-shore consulting in Lahore, Pakistan. Auxenta joined in as their second off-shore consulting partner in the space of AWS Big Data consultancy.

The Challenge

The challenge here was to learn and adapt to new technologies in a short time span in order to service clients and be a consulting partner.

How we helped

Nearly 10 members of our cloud practice obtained certification in AWS and went on a rigorous AWS Big Data specific training with the help of the experts in Northbay USA and Northbay Pakistan. During this period the team received the AWS core services expertise and mainly the AWS Big Data services knowledge such as Redshift, Lambda, Kinesis Streams/ Firehose, Data Pipeline, EMR, ECS, Data Lake design on S3, Cloud Formation, Amazon Glue, API Gateway, IAM with STS, Notification services via SNS along with real-time streaming with Apache Spark and Apache Flink.

Auxenta has been assisting the Northbay Pakistan offshore team in multiple engagements.

Data Warehousing Optimization: We engaged in an assignment to produce an optimization report analyzing a real data warehouse of a USA health care product client. AWS Redshift has been the technology used and the Auxenta team was engaged in various data warehouse optimization techniques to analyze gaps in the client design and usage.

Data Lake Design: We engaged with clients specialized in financial research, marketing and investment management in the area of Data Lake Design. The inability to handle terabytes of data, which were ingested in a rapid space was the key reason for those clients to investigate on Data Lake design. AWS was the preferred cloud option and the solution was proposed with the help of multiple AWS Big Data services such as S3, Kinesis Firehose, Lambda, Step Function, EMR with Spark/ Flink.

The Data Lake Design helped these organizations
- Support any workload regardless of volume, velocity or variety of data
- Use a variety of descriptive, predictive and prescriptive analytics for business insights

The Solution

  • The basic Data Lake architecture design on AWS comprises five main components. Those components and the related AWS service offerings are listed below.

    1. Data Ingestion (Kinesis, Direct Connect, Snowball, Data Migration Service)
    2. Catalogue and Search (DynamoDB, ElastiSearch)
    3. Protect and Secure (IAM, STS, CloudWatch, CloudTrail, KMS)
    4. Access and User Interface (API Gateway, IAM, Cognito)
    5. Processing and Analytics (Machine Learning, Quicksight, EMR, Redshift)


Figure 1 : The AWS Data Lake Design (Source: AWS Documentation)

BENEFITS TO CLIENT

With the help of Data Lake design approach, the clients were able to,

1. Quickly ingest data without sticking to a pre-defined schema
2. The ability to analyze all the data from a centralized location
3. The ability to enable ad-hoc analysis by applying schemas on read enabling business insights more quickly
4. Improved time to value and reduced TCO
5. The assurance from the most complete platform for Big Data