This diagram shows a typical batch processing solution on AWS with Amazon S3, AWS Lambda, Amazon EMR and Amazon Redshift:
- Amazon S3 is used to store staging data extracted from source systems on-premises or on-cloud.
- AWS Lambda is used to register data arrival in S3 buckets into ETL frameworks and trigger batch process process.
- Amazon EMR is then used to transform data like aggregations and load the data.
- Amazon Redshift is then used to store the transformed data.
This pattern follow the traditional ETL pattern and you can change it to ELT pattern too to do transformations in Redshift directly. Amazon EMR can be replaced with many other products.