Skip to content

Commit 46fc95c

Browse files
authored
Merge pull request #42 from morsapaes/sql-cookbook_maintenance
Repo maintenance - Round #3
2 parents 86d2caf + f8f7df7 commit 46fc95c

26 files changed

+38
-26
lines changed

aggregations-and-analytics/01/01_group_by_window.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 01 Aggregating Time Series Data
22

3-
:bulb: This example will show how to aggregate time-series data in real-time using a `TUMBLE` window.
3+
> :bulb: This example will show how to aggregate time-series data in real-time using a `TUMBLE` window.
44
55
The source table (`server_logs`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
66

aggregations-and-analytics/02/02_watermarks.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# 02 Watermarks
22

3-
:bulb: This example will show how to use `WATERMARK`s to work with timestamps in records.
3+
![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.10%2B-lightgrey)
4+
5+
> :bulb: This example will show how to use `WATERMARK`s to work with timestamps in records.
46
57
The source table (`doctor_sightings`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
68

aggregations-and-analytics/03/03_group_by_session_window.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 03 Analyzing Sessions in Time Series Data
22

3-
:bulb: This example will show how to aggregate time-series data in real-time using a `SESSION` window.
3+
> :bulb: This example will show how to aggregate time-series data in real-time using a `SESSION` window.
44
55
The source table (`server_logs`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
66

aggregations-and-analytics/04/04_over.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 04 Rolling Aggregations on Time Series Data
22

3-
:bulb: This example will show how to calculate an aggregate or cumulative value based on a group of rows using an `OVER` window. A typical use case are rolling aggregations.
3+
> :bulb: This example will show how to calculate an aggregate or cumulative value based on a group of rows using an `OVER` window. A typical use case are rolling aggregations.
44
55
The source table (`temperature_measurements`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
66

aggregations-and-analytics/05/05_top_n.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# 05 Continuous Top-N
22

3-
:bulb: This example will show how to continuously calculate the "Top-N" rows based on a given attribute, using an `OVER` window and the `ROW_NUMBER()` function.
3+
![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.9%2B-lightgrey)
4+
5+
> :bulb: This example will show how to continuously calculate the "Top-N" rows based on a given attribute, using an `OVER` window and the `ROW_NUMBER()` function.
46
57
The source table (`spells_cast`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
68

aggregations-and-analytics/06/06_dedup.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 06 Deduplication
22

3-
:bulb: This example will show how you can identify and filter out duplicates in a stream of events.
3+
> :bulb: This example will show how you can identify and filter out duplicates in a stream of events.
44
55
There are different ways that duplicate events can end up in your data sources, from human error to application bugs. Regardless of the origin, unclean data can have a real impact in the quality (and correctness) of your results. Suppose that your order system occasionally generates duplicate events with the same `order_id`, and that you're only interested in keeping the most recent event for downstream processing.
66

aggregations-and-analytics/07/07_chained_windows.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 07 Chained (Event) Time Windows
22

3-
:bulb: This example will show how to efficiently aggregate time series data on two different levels of granularity.
3+
> :bulb: This example will show how to efficiently aggregate time series data on two different levels of granularity.
44
55
The source table (`server_logs`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
66

aggregations-and-analytics/08/08_match_recognize.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 08 Detecting patterns with MATCH_RECOGNIZE
22

3-
:bulb: This example will show how you can use Flink SQL to detect patterns in a stream of events with `MATCH_RECOGNIZE`.
3+
> :bulb: This example will show how you can use Flink SQL to detect patterns in a stream of events with `MATCH_RECOGNIZE`.
44
55
A common (but historically complex) task in SQL day-to-day work is to identify meaningful sequences of events in a data set — also known as Complex Event Processing (CEP). This becomes even more relevant when dealing with streaming data, as you want to react quickly to known patterns or changing trends to deliver up-to-date business insights. In Flink SQL, you can easily perform this kind of tasks using the standard SQL clause [`MATCH_RECOGNIZE`](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/match_recognize.html).
66

aggregations-and-analytics/09/09_cdc_materialized_view.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.11%2B-lightgrey)
44

5-
:bulb: This example will show how you can use Flink SQL and Debezium to maintain a materialized view based on database changelog streams.
5+
> :bulb: This example will show how you can use Flink SQL and Debezium to maintain a materialized view based on database changelog streams.
66
77
In the world of analytics, databases are still mostly seen as static sources of data — like a collection of business state(s) just sitting there, waiting to be queried. The reality is that most of the data stored in these databases is continuously produced and is continuously changing, so...why not _stream_ it?
88

foundations/01/01_create_table.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 01 Creating Tables
22

3-
:bulb: This example will show how to create a table using SQL DDL.
3+
> :bulb: This example will show how to create a table using SQL DDL.
44
55
Flink SQL operates against logical tables, just like a traditional database.
66
However, it does not maintain tables internally but always operates against external systems.

foundations/02/02_insert_into.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 02 Inserting Into Tables
22

3-
:bulb: This recipe shows how to insert rows into a table so that downstream applications can read them.
3+
> :bulb: This recipe shows how to insert rows into a table so that downstream applications can read them.
44
55
As outlined in [the first recipe](../01/01_create_table.md) Flink SQL operates on tables, that are stored in external systems.
66
To publish results of a query for consumption by downstream applications, you write the results of a query into a table.

foundations/03/03_temporary_table.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# 03 Working with Temporary Tables
22

3-
:bulb: This example will show how and why to create a temporary table using SQL DDL.
3+
![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.11%2B-lightgrey)
4+
5+
> :bulb: This example will show how and why to create a temporary table using SQL DDL.
46
57
Non-temporary tables in Flink SQL are stored in a catalog, while temporary tables only live within the current session (Apache Flink CLI) or script (Ververica Platform).
68
You can use a temporary table instead of a regular (catalog) table, if it is only meant to be used within the current session or script.

foundations/04/04_where.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 04 Filtering Data
22

3-
:bulb: This example will show how to filter server logs in real-time using a standard `WHERE` clause.
3+
> :bulb: This example will show how to filter server logs in real-time using a standard `WHERE` clause.
44
55
The table it uses, `server_logs`, is backed by the [`faker` connector](https://github.com/knaufk/flink-faker) which continuously generates rows in memory based on Java Faker expressions and is convenient for testing queries.
66
As such, it is an alternative to the built-in `datagen` connector used for example in [the first recipe](../01/01_create_table.md).

foundations/05/05_group_by.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 05 Aggregating Data
22

3-
:bulb: This example will show how to aggregate server logs in real-time using the standard `GROUP BY` clause.
3+
> :bulb: This example will show how to aggregate server logs in real-time using the standard `GROUP BY` clause.
44
55
The source table (`server_logs`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
66

foundations/06/06_order_by.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 06 Sorting Tables
22

3-
:bulb: This example will show how you can sort a table, particularly unbounded tables.
3+
> :bulb: This example will show how you can sort a table, particularly unbounded tables.
44
55
Flink SQL supports `ORDER BY`.
66
Bounded Tables can be sorted by any column, descending or ascending.

foundations/07/07_views.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# 07 Encapsulating Logic with (Temporary) Views
22

3-
:bulb: This example will show how you can use (temporary) views to reuse code and to structure long queries and scripts.
3+
![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.11%2B-lightgrey)
4+
5+
> :bulb: This example will show how you can use (temporary) views to reuse code and to structure long queries and scripts.
46
57
`CREATE (TEMPORARY) VIEW` defines a view from a query.
68
**A view is not physically materialized.**

foundations/08/08_statement_sets.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# 08 Writing Results into Multiple Tables
22

3-
:bulb: In this recipe, you will learn how to use [Statement Sets](https://docs.ververica.com/user_guide/sql_development/sql_scripts.html#sql-statements) to run multiple `INSERT INTO` statements in a single, optimized Flink Job.
3+
![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.13%2B-lightgrey)
4+
5+
> :bulb: In this recipe, you will learn how to use [Statement Sets](https://docs.ververica.com/user_guide/sql_development/sql_scripts.html#sql-statements) to run multiple `INSERT INTO` statements in a single, optimized Flink Job.
46
57
Many product requirements involve outputting the results of a streaming application to two or more sinks, such as [Apache Kafka](https://docs.ververica.com/user_guide/sql_development/connectors.html#apache-kafka) for real-time use cases, or a [Filesystem](https://docs.ververica.com/user_guide/sql_development/connectors.html#file-system) for offline ones.
68
Other times, two queries are not the same but share some extensive intermediate operations.

joins/01/01_regular_joins.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 01 Regular Joins
22

3-
:bulb: This example will show how you can use joins to correlate rows across multiple tables.
3+
> :bulb: This example will show how you can use joins to correlate rows across multiple tables.
44
55
Flink SQL supports complex and flexible join operations over continuous tables.
66
There are several different types of joins to account for the wide variety of semantics queries may require.

joins/02/02_interval_joins.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 02 Interval Joins
22

3-
:bulb: This example will show how you can perform joins between tables with events that are related in a temporal context.
3+
> :bulb: This example will show how you can perform joins between tables with events that are related in a temporal context.
44
55
## Why Interval Joins?
66

joins/03/03_kafka_join.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 03 Temporal Table Join between a non-compacted and compacted Kafka Topic
22

3-
:bulb: In this recipe, you will see how to correctly enrich records from one Kafka topic with the corresponding records of another Kafka topic when the order of events matters.
3+
> :bulb: In this recipe, you will see how to correctly enrich records from one Kafka topic with the corresponding records of another Kafka topic when the order of events matters.
44
55
Temporal table joins take an arbitrary table (left input/probe site) and correlate each row to the corresponding row’s relevant version in a versioned table (right input/build side).
66
Flink uses the SQL syntax of ``FOR SYSTEM_TIME AS OF`` to perform this operation.

joins/04/04_lookup_joins.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 04 Lookup Joins
22

3-
:bulb: This example will show how you can enrich a stream with an external table of reference data (i.e. a _lookup_ table).
3+
> :bulb: This example will show how you can enrich a stream with an external table of reference data (i.e. a _lookup_ table).
44
55
## Data Enrichment
66

joins/05/05_star_schema.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 05 Real Time Star Schema Denormalization (N-Way Join)
22

3-
:bulb: In this recipe, we will de-normalize a simple star schema with an n-way temporal table join.
3+
> :bulb: In this recipe, we will de-normalize a simple star schema with an n-way temporal table join.
44
55
[Star schemas](https://en.wikipedia.org/wiki/Star_schema) are a popular way of normalizing data within a data warehouse.
66
At the center of a star schema is a **fact table** whose rows contain metrics, measurements, and other facts about the world.

joins/06/06_lateral_join.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 06 Lateral Table Join
22

3-
:bulb: This example will show how you can correlate events using a `LATERAL` join.
3+
> :bulb: This example will show how you can correlate events using a `LATERAL` join.
44
55
A recent addition to the SQL standard is the `LATERAL` join, which allows you to combine
66
the power of a correlated subquery with the expressiveness of a join.

other-builtin-functions/01/01_date_time.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 01 Working with Dates and Timestamps
22

3-
:bulb: This example will show how to use [built-in date and time functions](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/functions/systemFunctions.html#temporal-functions) to manipulate temporal fields.
3+
> :bulb: This example will show how to use [built-in date and time functions](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/functions/systemFunctions.html#temporal-functions) to manipulate temporal fields.
44
55
The source table (`subscriptions`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
66

other-builtin-functions/02/02_union-all.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 02 Building the Union of Multiple Streams
22

3-
:bulb: This example will show how you can use the set operation `UNION ALL` to combine several streams of data.
3+
> :bulb: This example will show how you can use the set operation `UNION ALL` to combine several streams of data.
44
55
See [our documentation](https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/queries/#set-operations)
66
for a full list of fantastic set operations Apache Flink supports.

udfs/01/01_python_udfs.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# 01 Extending SQL with Python UDFs
22

3-
:bulb: This example will show how to extend Flink SQL with custom functions written in Python.
3+
![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.11%2B-lightgrey)
4+
5+
> :bulb: This example will show how to extend Flink SQL with custom functions written in Python.
46
57
Flink SQL provides a wide range of [built-in functions](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/functions/systemFunctions.html) that cover most SQL day-to-day work. Sometimes, you need more flexibility to express custom business logic or transformations that aren't easily translatable to SQL: this can be achieved with [User-Defined Functions](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/functions/udfs.html) (UDFs).
68

0 commit comments

Comments
 (0)