This topic assumes you have a good understanding of cloud computing in AWS, and that you are proficient with provisioning and using services within AWS. Ideally, you will also have some background understanding of Big Data. There are a large number of AWS Big Data services available and this course is designed to provide the initial core concepts required for each of these services, and to assist people in passing the AWS Big Data Specialist exam.

Amazon Kinesis makes it easy to collect, process, and analyze real-time streaming data so you can get timely insights and react quickly to new information. With Amazon Kinesis, you can ingest real-time data such as application logs, website clickstreams, IoT to imagery data, and more, into your databases, your data lakes and data warehouses. We’ll build your own real-time applications using this data. Amazon Kinesis enables you to process and analyze data as it arrives and respond it real time, instead of having to wait until all your data is collected before the processing can begin. When choosing a big data processing solution from within the available AWS service offerings, it is important to determine whether you need the latency of response from the process to be in seconds, minutes, or hours. This will typically drive the decision on which AWS service is the best for that processing pattern or use case. AWS Kinesis is primarily designed to deliver processing orientated around real-time streaming.

Amazon Kinesis can continuously capture and store terabytes of data per hour from hundreds or thousands of sources, such as website clickstreams, financial transactions, social media feeds, IT logs, and location-tracking events. Amazon Kinesis provides three different solution capabilities. Amazon Kinesis Streams enables you to build custom applications that process or analyze streaming data for specialized needs.

Amazon Kinesis Firehose enables you to load streaming data into the Amazon Kinesis analytics, Amazon S3, Amazon RedShift, and Amazon Elastic Search Services. Amazon Kinesis Analytics enables you to write standard SQL queries on streaming data.

Kinesis Streams

Amazon Kinesis Streams is based on a platform as a service style architecture, where you determine the throughput of the capacity you require, and the architecture and components are automatically provisioned, installed, and configured for you. You have no need or ability to change the way these architectural components are deployed. Unlike some of the other Amazon big data services, which have a container that the server sits within, for an example, a DB instance with an Amazon IDS, Amazon Kinesis doesn’t.
The container is effectively the combination of the accounts, and the region you are provisioning the Kinesis Streams within. An Amazon Kinesis Stream is an ordered sequence of data records. A record is the unit of data in an Amazon Kinesis Stream. Each record in the stream is composed of a sequence number, a partition key, and a data blob. A data blob is the data of interest your data producer adds to a stream. The data records in the stream are distributed into shards. A shard is the base throughput unit of an Amazon Kinesis Stream. One shard provides a capacity of one megabyte per second of data input, and two megabytes per second of data output, and can support up to a thousand put records per second. You specify the number of shards needed when you create a stream. The data capacity of your stream is a function of the number of shards that you specify for that stream. The total capacity of the stream is the sum of the capacity of its shards. If your data rate increases, you can increase or decrease the number of shards allocated to your stream. The producers continuously push data to Kinesis Streams, and the consumers process the data in realtime.

Kinesis Firehose:

With Kinesis Firehose, you do not need to write applications or manage resources. You configure your data producers to send data to Kinesis Firehose and are automatically delivers the data to the destination that you specify. You can also configure Amazon Kinesis Firehose to transform your data before data delivery. Unlike some of the other Amazon big data services which have a container that the service sits within, for example, a DB instance with an Amazon RDS, Amazon Kinesis Firehose doesn’t.
The container’s effectively the combination of the account and the region you provision the Kinesis delivery streams within. The delivery stream is the underlying entity of Kinesis Firehose. You use Kinesis Firehose by creating a Kinesis Firehose delivery stream and then sending data to it which means each delivery stream is effectively defined by the target system that receives the restreamed data. Data producers send records to Kinesis Firehose delivery streams. For example, a web service sending log data to Kinesis Firehose delivery stream is a data producer.

One of the great things about AWS is that they always try and make things easy for you. So when you go to create a new Amazon Kinesis Firehose definition in the console, there are a number of prebate destinations that would help you with streaming data into a AWS big data storage solution. As you can see, you can select one of the three data services currently available as a target, S3, Redshift, or Elasticsearch. Selecting one of these destinations will create additional parameter options for you to complete to assist in creating the data flow.