- Step 1: Create an Amazon EC2 Key Pair.
- Step 2: Launch an Amazon EMR Cluster.
- Step 3: Connect to the Master Node.
- Step 4: Load Data into HDFS.
- Step 5: Copy Data to DynamoDB.
- Step 6: Query the Data in the DynamoDB Table.
- Step 7: (Optional) Clean Up.
The basic difference between MongoDB and DynamoDB is that DynamoDB is a NoSQL database service used on AWS console whereas MongoDB is a Database Application. Even though both are NoSQL services, a lot of things differentiate them in terms of installation, maintenance, performance, etc.
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. With DynamoDB, you can create database tables that can store and retrieve any amount of data and serve any level of request traffic.
A table is a collection of items, and each item is a collection of attributes. DynamoDB uses primary keys to uniquely identify each item in a table and secondary indexes to provide more querying flexibility.
DynamoDB charges per GB of disk space a table consumes. The first 25 GB consumed per month is free, and prices start at $0.25 per GB-month thereafter.
Amazon DynamoDB is based on the principles of Dynamo, a progenitor of NoSQL, and brings the power of the cloud to the NoSQL database world. It offers customers high-availability, reliability, and incremental scalability, with no limits on dataset size or request throughput for a given table.
Linear Scalability
DynamoDB supports auto sharding and load-balancing. This allows applications to transparently store ever-growing amounts of data. The linear scalability of DynamoDB is good for applications that need to handle growing datasets and IOPS requirements.DynamoDB is the Serverless NoSQL Database offering by AWS. Being Serverless makes it easier to consider DynamoDB for Serverless Microservices since it goes inline with the patterns and practices when designing serverless architectures in AWS.
DynamoDB is an Amazon Web Services database system that supports data structures and key-valued cloud services. It allows users the benefit of auto-scaling, in-memory caching, backup and restore options for all their internet-scale applications using DynamoDB.
675 companies reportedly use Amazon DynamoDB in their tech stacks, including Netflix, Amazon, and medium.com.
DynamoDB uses primary keys to uniquely identify each item in a table and secondary indexes to provide more querying flexibility to the user. You can use DynamoDB Streams to capture the data modification events in DynamoDB tables.
DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours. A DynamoDB stream is an ordered flow of information about changes to items in a DynamoDB table.
- Step 1: Create an Amazon EC2 Key Pair.
- Step 2: Launch an Amazon EMR Cluster.
- Step 3: Connect to the Master Node.
- Step 4: Load Data into HDFS.
- Step 5: Copy Data to DynamoDB.
- Step 6: Query the Data in the DynamoDB Table.
- Step 7: (Optional) Clean Up.
SQL Constraints are rules used to limit the type of data that can go into a table, to maintain the accuracy and integrity of the data inside table. Constraints can be divided into the following two types, Column level constraints: Limits only column data.
- Step 1: Create a cluster.
- Step 2: Download the data files.
- Step 3: Upload the files to an Amazon S3 bucket.
- Step 4: Create the sample tables.
- Step 5: Run the COPY commands.
- Step 6: Vacuum and analyze the database.
- Step 7: Clean up your resources.
- Summary.
- Step 1: Create a test data set.
- Step 2: Establish a baseline.
- Step 3: Select sort keys.
- Step 4: Select distribution styles.
- Step 5: Review compression encodings.
- Step 6: Recreate the test data set.
- Step 7: Retest system performance after tuning.
- Step 8: Evaluate the results.
Redshift doesn't enforce primary key, foreign key, or uniqueness constraints, though Amazon says "primary keys and foreign keys are used as planning hints and they should be declared if your ETL process or some other process in your application enforces their integrity."
DynamoDB is able to store data that follows different rules of data typing and structure compared to traditional relational data. DynamoDB tables do not have a schema, and schemas are defined for each row instead.
In terms of compute options and configurations, Reserved Instances and On Demand instances are the same. The only difference between the two is that a Reserved Instance is one you rent (“reserve”) for a fixed duration, and in return you receive a discount on the base price of an On Demand instance.
Import the file
The simple bit, loading the CSV file into Redshift from S3 is one command. COPY <table_name> FROM 's3://<bucket_name>/<csv_file>' CREDENTIALS 'aws_access_key_id=< aws_access_key_id >;aws_secret_access_key=< aws_secret_access_key >' CSV <other_options> ; And that is basically it.Below are the steps that you can follow to generate a sequence number using Amazon Redshift SP.
- Create a sequence number table is it is not exists already.
- Get max( seq_num ) value and assign it to variable.
- Increment the variable value by 1.
- Insert updated value into seq_table.
Partition Key − This simple primary key consists of a single attribute referred to as the “partition key.” Internally, DynamoDB uses the key value as input for a hash function to determine storage. Partition Key and Sort Key − This key, known as the “Composite Primary Key”, consists of two attributes.