aws
17 items tagged with "aws"
Articles
AWS CDK Python - Add Environment Variables for CodeBuild Pipeline
PySpark - Read Parquet Files in S3
This code snippet provides an example of reading parquet files located in S3 buckets on AWS (Amazon Web Services). The bucket used is from New York City taxi trip record data. S3 bucket location is: s3a://ursa-labs-taxi-data/2009/01/data.parquet. To run the script, we need to setup the package dependency on Hadoop AWS package, for example, org.apache.hadoop:hadoop-aws:3.3.0. This can be easily done by passing configuration argument using spark-submit: `` spark-submit --conf spark.jars.packages=org.apache.hadoop:hadoop-aws:3.3.0 ` This can also be done via SparkConf: ` conf.set('spark.jars.packages', 'org.apache.hadoop:hadoop-aws:3.3.0') ` Use temporary AWS credentials In this code snippet, AWS AnonymousAWSCredentialsProvider is used. If the bucket is not public, we can also use TemporaryAWSCredentialsProvider. ` conf.set('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider') conf.set('spark.hadoop.fs.s3a.access.key', ) conf.set('spark.hadoop.fs.s3a.secret.key', ) conf.set('spark.hadoop.fs.s3a.session.token', ) `` If you have used AWS CLI or SAML tools to cache local credentials ( ~/.aws/credentials), you then don't need to specify the access keys assuming the credential has access to the S3 bucket you are reading data from.
AWS EMR Debug - Container release on a *lost* node
EMR - Expected schema-specific part at index : s3:
AWS Certified Cloud Practitioner Notes
Diagrams
AWS Elastic Path Based Listener
This diagram shows how to create a path-based routing on ELB with ECS. For different paths, the requests are routed to different services in ECS. Reference: Achieve path-based routing on an Application Load Balancer | AWS re:Post (repost.aws)
PySpark Reading from S3
This diagram is used as article feature images, which depicts reading data from S3 bucket via PySpark.
AWS EMR Read and Write with S3
This diagram shows a typical EMR application that reads and writes data with S3. References EMR File System (EMRFS) - Amazon EMR
AWS ETL Solution with Glue Diagram
This diagram shows one example of using AWS Glue to crawl, catalog and perform data stored in S3. Data landed in raw bucket is scanned by Glue Crawler and the metadata is stored in Glue Catalog. Glue ETL job loads the raw data and does transformations and eventually store the processed data in curated bucket. The processed files are scanned by Glue Crawler. Processed data is then queried by Amazon Athena. The data can be further utilized in reporting and dashboard.
AWS Streaming Processing Diagrams
This diagram is used as feature image for AWS streaming processing diagram series.
AWS Batch Processing Diagrams
This diagram is used as feature image for AWS batch processing diagram series.
AWS Big Data Lambda Architecture for Streaming Analytics
This diagram shows a typical lambda streaming processing solution on AWS with Amazon Kinesis, Amazon Glue, Amazon S3, Amazon Athena and Amazon Quicksight: Amazon Kinesis - capture streaming data via Data Firehose and then transform and analyze streaming data using Data Analytics; the result of analytics is stored into another Data Firehose process; for batch processing, the captured streaming data can be directly loaded into S3 bucket too. Amazon S3 - store streaming raw data and batch processed data. Amazon Glue - transform batch data in S3 and store the processed data into another bucket for consumption. Amazon Athena - used to read data in S3 via SQL. Amazon Quicksight - data visualization tool. References AWS IoT Streaming Processing Solution Diagram AWS IoT Streaming Processing Solution Diagram w Glue
AWS IoT Streaming Processing Solution Diagram w Glue
This diagram shows a typical streaming processing solution on AWS with Amazon Kinesis, Amazon Glue, Amazon S3, Amazon Athena and Amazon Quicksight: Amazon Kinesis - capture streaming data via Data Firehose and then load the data to S3. Amazon S3 - store streaming raw data and batch processed data. Amazon Glue - transform batch data in S3 and store the processed data into another bucket for consumption. Amazon Athena - used to read data in S3 via SQL. Amazon Quicksight - data visualization tool. Similar solution diagram using streaming transformation: AWS IoT Streaming Processing Solution Diagram.
AWS IoT Streaming Processing Solution Diagram
This diagram shows a typical streaming processing solution on AWS with Amazon Kinesis, Amazon S3, Amazon Athena and Amazon Quicksight: Amazon Kinesis - capture streaming data via Data Firehose and then transform and analyze streaming data using Data Analytics; the result of analytics is stored into another Data Firehose process. Amazon S3 - streaming processed data is stored in Amazon S3. Amazon Athena - used to read data in S3 via SQL. Amazon Quicksight - data visualization tool.
AWS Batch Processing Solution Diagram (using AWS Glue)
This diagram shows a typical batch processing solution on AWS with Amazon S3, AWS Lambda, Amazon Glue and Amazon Redshift: Amazon S3 is used to store staging data extracted from source systems on-premises or on-cloud. AWS Lambda is used to register data arrival in S3 buckets into ETL frameworks and trigger batch process process. Amazon Glueis then used to integrate data like merging, sorting, filtering, aggregations, transformations and load the data. Amazon Redshift is then used to store the transformed data. This diagram is forked from AWS Batch Processing Solution Diagram
AWS Batch Processing Solution Diagram
This diagram shows a typical batch processing solution on AWS with Amazon S3, AWS Lambda, Amazon EMR and Amazon Redshift: Amazon S3 is used to store staging data extracted from source systems on-premises or on-cloud. AWS Lambda is used to register data arrival in S3 buckets into ETL frameworks and trigger batch process process. Amazon EMR is then used to transform data like aggregations and load the data. Amazon Redshift is then used to store the transformed data. This pattern follow the traditional ETL pattern and you can change it to ELT pattern too to do transformations in Redshift directly. Amazon EMR can be replaced with many other products.
Kontext Cloud Diagram Example
This diagram is created for testing purpose to validate whether the software can draw diagrams with Azure, GCP and AWS product SVG icons correctly.