Kontext's Project
Diagrams
Azure Static Website with Custom Domain via Azure Storage account and CDN
This diagram shows you how to setup a static website on Azure using Azure Storage account and Azure CDN. Azure Storage account supports Static website capability and also allows you to bind a custom domain to access the content. For HTTP only, you can just use built-in Azure Storage account capabilities; for HTTPS, we can have to use Azure Front Door or Azure CDN (classic) services. For HTTPS, we can add a custom domain for the Azure CDN endpoint. As part of that setup, we can also choose CDN managed certificate or our own certificate as the following screenshot shows: !2022092341926-image.png References Quickstart - Create an Azure CDN profile and endpoint | Microsoft Learn Integrate a static website with Azure CDN - Azure Storage | Microsoft Learn
Kafka Offset Explained
This diagram illustrates the concept of offset in Kafka. Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition. It also denotes the position of the consumer in the partition. There are two notions of position or offset relevant to the user of the consumer: Current offset - The position of the consumer gives the offset of the next record that will be given out. It will be one larger than the highest offset the consumer has read in that partition. It automatically advances every time the consumer receives messages in a call to poll(Duration). Committed offset - The committed position is the last offset that has been stored securely. If the process fail and restart, this is the offset that the consumer will recover to. In Kafka consumer client, offset can be automatically committed periodically. It is the default behavior and configurable via enable.auto.commit. Alternatively, it can also be committed manually by calling one of the commit APIs (e.g. commitSync and commitAsync).
Session Window Function in Streaming Analytics
Under streaming analytics context, a session window function is usually used to group events that arrive around similar times and to filter out periods of time where there is no events. Events can belong to only one session window. Session window function usually takes at least two parameters: Window timeout- A session window starts when the first event occurs and if new events occur within the timeout (compared with last ingested event), it will be included in the window otherwise the window will be closed at the timeout. Max window duration - If the events keep appearing, the window will close at the max window duration and a new window will start.
Sliding Window Function in Streaming Analytics
Under streaming analytics context, a sliding window function segments a data stream into time segments that can overlap with each other. The window only changes when an event enters or exits the window. Events can belong to more than one sliding window. Each window must include at least one event otherwise the sliding window will be emitted from output.
Hopping Window Function in Streaming Analytics
Under streaming analytics context, a hopping window function segments a data stream into time segments that can overlap with each other. Events can belong to more than one window. This diagram shows hopping window function with 20 seconds as window duration and 10 seconds as hop size. If the hop size is same as window duration, it generates the same results as a tumbling window function.
Tumbling Window Function in Streaming Analytics
Under streaming analytics context, a tumbling window function segments a data stream into distinct time segments and then do aggregation against them. Events can only belong to one single tumbling window and each tumbling window doesn't overlap with others. This diagram shows tumbling window function with 20 seconds as window duration.
Slowly Changing Dimension (SCD) Type 4
This diagram shows how a slowly changing dimension type 4 table is implemented. customernumber is the business key of the customer table while customerid is a surrogate key. Customer 10001 is changing first_name from Kontext to Context. SCD Type 4 uses a history table to track the historical changes. This method is similar as database change capture or auditing table implementations.
Slowly Changing Dimension (SCD) Type 3
This diagram shows how a slowly changing dimension type 3 table is implemented. customernumber is the business key of the customer table while customerid is a surrogate key. Customer 10001 is changing first_name from Kontext to Context. SCD Type 3 will add a new attribute to keep the current value. The drawback is that it can only keep previous and current values only. For SCD Type 3, surrogate ID is not necessary.
Slowly Changing Dimension (SCD) Type 2
This diagram shows how a slowly changing dimension type 2 table is implemented. customernumber is the business key of the customer table while customerid is a surrogate key. Customer 10001 is changing first_name from Kontext to Context. SCD Type 2 will track history of changes. It is usually implemented to use effectivedate to indicate the start date when the change becomes effective and iscurrent flag to indicate the current active records. It can also be implemented using effectivefromdate and effectivetodate. When effective\to\date is NULL or equals to a high date like (9999-12-31), the record is the latest version. For performance and other considerations, it can also be implemented with a combination of effectivefromdate, effectivetodate and is_current.
Slowly Changing Dimension (SCD) Type 1
This diagram shows how a slowly changing dimension type 1 table is implemented. SCD Type 1 will simply overwrite old data with new data without keeping a history. customernumber is the business key of the customer table while customerid is a surrogate key. Customer 10001 is changing first_name from Kontext to Context. For SCD Type 1, surrogate ID is not necessary.
Schema Merge Example
This diagram shows how schema merge is performed.
SQL Server Diagram
This is a diagram that will be used in articles as feature image.
Flatten Operation in Programming
This diagram depicts a flatten operation that transforms an array of arrays to a single array. It is language agnostic.
Spark SQL Joins - Cross Join (Cartesian Product)
This diagram shows Cross Join type in Spark SQL. It returns the Cartesian product of two tables (relations). References JOIN - Spark 3.2.1 Documentation (apache.org)
Spark SQL Joins - Left Anti Join
This diagram shows Left Anti Join type in Spark SQL. An anti join returns returns values from the left relation that has no match with the right. It is also called left anti join. References JOIN - Spark 3.2.1 Documentation (apache.org)
Spark SQL Joins - Left Semi Join
This diagram shows Left Semi Join type in Spark SQL. A semi join returns values from the left side of the relation that has a match with the right. It is also called left semi join. References JOIN - Spark 3.2.1 Documentation (apache.org)
Spark SQL Joins - Full Outer Join
This diagram shows Full Join type in Spark SQL. It returns all values from both relations, appending NULL values on the side that does not have a match. It is also called full outer join. References JOIN - Spark 3.2.1 Documentation (apache.org)
Spark SQL Joins - Right Outer Join
This diagram shows Right Join type in Spark SQL. It returns all values from the right relation and the matched values from the left relation, or appends NULL if there is no match. It is also called right outer join. References JOIN - Spark 3.2.1 Documentation (apache.org)
Spark SQL Joins - Left Outer Join
This diagram shows Left Join type in Spark SQL. It returns all values from the left relation and the matched values from the right relation, or appends NULL if there is no match. It is also called left outer join. References JOIN - Spark 3.2.1 Documentation (apache.org)
Spark SQL Joins - Inner Join
This diagram shows Inner Join type in Spark SQL. It returns rows that have matching values in both tables (relations). References JOIN - Spark 3.2.1 Documentation (apache.org)