Merge pull request #42 from morsapaes/sql-cookbook_maintenance

morsapaes · web-flow · commit 46fc95c07d5a · 2021-05-07T10:43:05.000+02:00
Repo maintenance - Round #3
diff --git a/aggregations-and-analytics/01/01_group_by_window.md b/aggregations-and-analytics/01/01_group_by_window.md
@@ -1,6 +1,6 @@
 # 01 Aggregating Time Series Data
 
-:bulb: This example will show how to aggregate time-series data in real-time using a `TUMBLE` window.
+> :bulb: This example will show how to aggregate time-series data in real-time using a `TUMBLE` window.
 
 The source table (`server_logs`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
 
diff --git a/aggregations-and-analytics/02/02_watermarks.md b/aggregations-and-analytics/02/02_watermarks.md
@@ -1,6 +1,8 @@
 # 02 Watermarks
 
-:bulb: This example will show how to use `WATERMARK`s to work with timestamps in records. 
+![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.10%2B-lightgrey)
+
+> :bulb: This example will show how to use `WATERMARK`s to work with timestamps in records. 
 
 The source table (`doctor_sightings`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
 
diff --git a/aggregations-and-analytics/03/03_group_by_session_window.md b/aggregations-and-analytics/03/03_group_by_session_window.md
@@ -1,6 +1,6 @@
 # 03 Analyzing Sessions in Time Series Data
 
-:bulb: This example will show how to aggregate time-series data in real-time using a `SESSION` window.
+> :bulb: This example will show how to aggregate time-series data in real-time using a `SESSION` window.
 
 The source table (`server_logs`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
 
diff --git a/aggregations-and-analytics/04/04_over.md b/aggregations-and-analytics/04/04_over.md
@@ -1,6 +1,6 @@
 # 04 Rolling Aggregations on Time Series Data
 
-:bulb: This example will show how to calculate an aggregate or cumulative value based on a group of rows using an `OVER` window. A typical use case are rolling aggregations.
+> :bulb: This example will show how to calculate an aggregate or cumulative value based on a group of rows using an `OVER` window. A typical use case are rolling aggregations.
 
 The source table (`temperature_measurements`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
 
diff --git a/aggregations-and-analytics/05/05_top_n.md b/aggregations-and-analytics/05/05_top_n.md
@@ -1,6 +1,8 @@
 # 05 Continuous Top-N
 
-:bulb: This example will show how to continuously calculate the "Top-N" rows based on a given attribute, using an `OVER` window and the `ROW_NUMBER()` function.
+![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.9%2B-lightgrey)
+
+> :bulb: This example will show how to continuously calculate the "Top-N" rows based on a given attribute, using an `OVER` window and the `ROW_NUMBER()` function.
 
 The source table (`spells_cast`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
 
diff --git a/aggregations-and-analytics/06/06_dedup.md b/aggregations-and-analytics/06/06_dedup.md
@@ -1,6 +1,6 @@
 # 06 Deduplication
 
-:bulb: This example will show how you can identify and filter out duplicates in a stream of events.
+> :bulb: This example will show how you can identify and filter out duplicates in a stream of events.
 
 There are different ways that duplicate events can end up in your data sources, from human error to application bugs. Regardless of the origin, unclean data can have a real impact in the quality (and correctness) of your results. Suppose that your order system occasionally generates duplicate events with the same `order_id`, and that you're only interested in keeping the most recent event for downstream processing.
 
diff --git a/aggregations-and-analytics/07/07_chained_windows.md b/aggregations-and-analytics/07/07_chained_windows.md
@@ -1,6 +1,6 @@
 # 07 Chained (Event) Time Windows
 
-:bulb: This example will show how to efficiently aggregate time series data on two different levels of granularity.
+> :bulb: This example will show how to efficiently aggregate time series data on two different levels of granularity.
 
 The source table (`server_logs`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
 
diff --git a/aggregations-and-analytics/08/08_match_recognize.md b/aggregations-and-analytics/08/08_match_recognize.md
@@ -1,6 +1,6 @@
 # 08 Detecting patterns with MATCH_RECOGNIZE
 
-:bulb: This example will show how you can use Flink SQL to detect patterns in a stream of events with `MATCH_RECOGNIZE`.
+> :bulb: This example will show how you can use Flink SQL to detect patterns in a stream of events with `MATCH_RECOGNIZE`.
 
 A common (but historically complex) task in SQL day-to-day work is to identify meaningful sequences of events in a data set — also known as Complex Event Processing (CEP). This becomes even more relevant when dealing with streaming data, as you want to react quickly to known patterns or changing trends to deliver up-to-date business insights. In Flink SQL, you can easily perform this kind of tasks using the standard SQL clause [`MATCH_RECOGNIZE`](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/match_recognize.html).
 
diff --git a/aggregations-and-analytics/09/09_cdc_materialized_view.md b/aggregations-and-analytics/09/09_cdc_materialized_view.md
@@ -2,7 +2,7 @@
 
 ![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.11%2B-lightgrey)
 
-:bulb: This example will show how you can use Flink SQL and Debezium to maintain a materialized view based on database changelog streams.
+> :bulb: This example will show how you can use Flink SQL and Debezium to maintain a materialized view based on database changelog streams.
 
 In the world of analytics, databases are still mostly seen as static sources of data — like a collection of business state(s) just sitting there, waiting to be queried. The reality is that most of the data stored in these databases is continuously produced and is continuously changing, so...why not _stream_ it? 
 
diff --git a/foundations/01/01_create_table.md b/foundations/01/01_create_table.md
@@ -1,6 +1,6 @@
 # 01 Creating Tables
 
-:bulb: This example will show how to create a table using SQL DDL.
+> :bulb: This example will show how to create a table using SQL DDL.
 
 Flink SQL operates against logical tables, just like a traditional database.
 However, it does not maintain tables internally but always operates against external systems.
diff --git a/foundations/02/02_insert_into.md b/foundations/02/02_insert_into.md
@@ -1,6 +1,6 @@
 # 02 Inserting Into Tables
 
-:bulb: This recipe shows how to insert rows into a table so that downstream applications can read them.
+> :bulb: This recipe shows how to insert rows into a table so that downstream applications can read them.
 
 As outlined in [the first recipe](../01/01_create_table.md) Flink SQL operates on tables, that are stored in external systems.
 To publish results of a query for consumption by downstream applications, you write the results of a query into a table. 
diff --git a/foundations/03/03_temporary_table.md b/foundations/03/03_temporary_table.md
@@ -1,6 +1,8 @@
 # 03 Working with Temporary Tables
 
-:bulb: This example will show how and why to create a temporary table using SQL DDL.
+![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.11%2B-lightgrey)
+
+> :bulb: This example will show how and why to create a temporary table using SQL DDL.
 
 Non-temporary tables in Flink SQL are stored in a catalog, while temporary tables only live within the current session (Apache Flink CLI) or script (Ververica Platform). 
 You can use a temporary table instead of a regular (catalog) table, if it is only meant to be used within the current session or script.
diff --git a/foundations/04/04_where.md b/foundations/04/04_where.md
@@ -1,6 +1,6 @@
 # 04 Filtering Data
 
-:bulb: This example will show how to filter server logs in real-time using a standard `WHERE` clause.
+> :bulb: This example will show how to filter server logs in real-time using a standard `WHERE` clause.
 
 The table it uses, `server_logs`,  is backed by the [`faker` connector](https://github.com/knaufk/flink-faker) which continuously generates rows in memory based on Java Faker expressions and is convenient for testing queries. 
 As such, it is an alternative to the built-in `datagen` connector used for example in [the first recipe](../01/01_create_table.md).
diff --git a/foundations/05/05_group_by.md b/foundations/05/05_group_by.md
@@ -1,6 +1,6 @@
 # 05 Aggregating Data
 
-:bulb: This example will show how to aggregate server logs in real-time using the standard `GROUP BY` clause.
+> :bulb: This example will show how to aggregate server logs in real-time using the standard `GROUP BY` clause.
 
 The source table (`server_logs`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
 
diff --git a/foundations/06/06_order_by.md b/foundations/06/06_order_by.md
@@ -1,6 +1,6 @@
 # 06 Sorting Tables 
 
-:bulb: This example will show how you can sort a table, particularly unbounded tables. 
+> :bulb: This example will show how you can sort a table, particularly unbounded tables. 
 
 Flink SQL supports `ORDER BY`. 
 Bounded Tables can be sorted by any column, descending or ascending. 
diff --git a/foundations/07/07_views.md b/foundations/07/07_views.md
@@ -1,6 +1,8 @@
 # 07 Encapsulating Logic with (Temporary) Views
 
-:bulb: This example will show how you can use (temporary) views to reuse code and to structure long queries and scripts. 
+![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.11%2B-lightgrey)
+
+> :bulb: This example will show how you can use (temporary) views to reuse code and to structure long queries and scripts. 
 
 `CREATE (TEMPORARY) VIEW` defines a view from a query. 
 **A view is not physically materialized.** 
diff --git a/foundations/08/08_statement_sets.md b/foundations/08/08_statement_sets.md
@@ -1,6 +1,8 @@
 # 08 Writing Results into Multiple Tables
 
-:bulb: In this recipe, you will learn how to use [Statement Sets](https://docs.ververica.com/user_guide/sql_development/sql_scripts.html#sql-statements) to run multiple `INSERT INTO` statements in a single, optimized Flink Job. 
+![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.13%2B-lightgrey)
+
+> :bulb: In this recipe, you will learn how to use [Statement Sets](https://docs.ververica.com/user_guide/sql_development/sql_scripts.html#sql-statements) to run multiple `INSERT INTO` statements in a single, optimized Flink Job. 
 
 Many product requirements involve outputting the results of a streaming application to two or more sinks, such as [Apache Kafka](https://docs.ververica.com/user_guide/sql_development/connectors.html#apache-kafka) for real-time use cases, or a [Filesystem](https://docs.ververica.com/user_guide/sql_development/connectors.html#file-system) for offline ones.
 Other times, two queries are not the same but share some extensive intermediate operations.
diff --git a/joins/01/01_regular_joins.md b/joins/01/01_regular_joins.md
@@ -1,6 +1,6 @@
 # 01 Regular Joins
 
-:bulb: This example will show how you can use joins to correlate rows across multiple tables.
+> :bulb: This example will show how you can use joins to correlate rows across multiple tables.
 
 Flink SQL supports complex and flexible join operations over continuous tables.
 There are several different types of joins to account for the wide variety of semantics queries may require.
diff --git a/joins/02/02_interval_joins.md b/joins/02/02_interval_joins.md
@@ -1,6 +1,6 @@
 # 02 Interval Joins
 
-:bulb: This example will show how you can perform joins between tables with events that are related in a temporal context.
+> :bulb: This example will show how you can perform joins between tables with events that are related in a temporal context.
 
 ## Why Interval Joins?
 
diff --git a/joins/03/03_kafka_join.md b/joins/03/03_kafka_join.md
@@ -1,6 +1,6 @@
 # 03 Temporal Table Join between a non-compacted and compacted Kafka Topic
 
-:bulb: In this recipe, you will see how to correctly enrich records from one Kafka topic with the corresponding records of another Kafka topic when the order of events matters.  
+> :bulb: In this recipe, you will see how to correctly enrich records from one Kafka topic with the corresponding records of another Kafka topic when the order of events matters.  
 
 Temporal table joins take an arbitrary table (left input/probe site) and correlate each row to the corresponding row’s relevant version in a versioned table (right input/build side). 
 Flink uses the SQL syntax of ``FOR SYSTEM_TIME AS OF`` to perform this operation. 
diff --git a/joins/04/04_lookup_joins.md b/joins/04/04_lookup_joins.md
@@ -1,6 +1,6 @@
 # 04 Lookup Joins
 
-:bulb: This example will show how you can enrich a stream with an external table of reference data (i.e. a _lookup_ table).
+> :bulb: This example will show how you can enrich a stream with an external table of reference data (i.e. a _lookup_ table).
 
 ## Data Enrichment
 
diff --git a/joins/05/05_star_schema.md b/joins/05/05_star_schema.md
@@ -1,6 +1,6 @@
 # 05 Real Time Star Schema Denormalization (N-Way Join)
 
-:bulb: In this recipe, we will de-normalize a simple star schema with an n-way temporal table join. 	 
+> :bulb: In this recipe, we will de-normalize a simple star schema with an n-way temporal table join. 	 
   
 [Star schemas](https://en.wikipedia.org/wiki/Star_schema) are a popular way of normalizing data within a data warehouse. 
 At the center of a star schema is a **fact table** whose rows contain metrics, measurements, and other facts about the world. 
diff --git a/joins/06/06_lateral_join.md b/joins/06/06_lateral_join.md
@@ -1,6 +1,6 @@
 # 06 Lateral Table Join
 
-:bulb: This example will show how you can correlate events using a `LATERAL` join.
+> :bulb: This example will show how you can correlate events using a `LATERAL` join.
 
 A recent addition to the SQL standard is the `LATERAL` join, which allows you to combine 
 the power of a correlated subquery with the expressiveness of a join. 
diff --git a/other-builtin-functions/01/01_date_time.md b/other-builtin-functions/01/01_date_time.md
@@ -1,6 +1,6 @@
 # 01 Working with Dates and Timestamps
 
-:bulb: This example will show how to use [built-in date and time functions](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/functions/systemFunctions.html#temporal-functions) to manipulate temporal fields.
+> :bulb: This example will show how to use [built-in date and time functions](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/functions/systemFunctions.html#temporal-functions) to manipulate temporal fields.
 
 The source table (`subscriptions`) is backed by the [`faker` connector](https://github.com/knaufk/flink-faker), which continuously generates rows in memory based on Java Faker expressions.
 
diff --git a/other-builtin-functions/02/02_union-all.md b/other-builtin-functions/02/02_union-all.md
@@ -1,6 +1,6 @@
 # 02 Building the Union of Multiple Streams
 
-:bulb: This example will show how you can use the set operation `UNION ALL` to combine several streams of data.
+> :bulb: This example will show how you can use the set operation `UNION ALL` to combine several streams of data.
 
 See [our documentation](https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/queries/#set-operations)
 for a full list of fantastic set operations Apache Flink supports.
diff --git a/udfs/01/01_python_udfs.md b/udfs/01/01_python_udfs.md
@@ -1,6 +1,8 @@
 # 01 Extending SQL with Python UDFs
 
-:bulb: This example will show how to extend Flink SQL with custom functions written in Python.
+![Twitter Badge](https://img.shields.io/badge/Flink%20Version-1.11%2B-lightgrey)
+
+> :bulb: This example will show how to extend Flink SQL with custom functions written in Python.
 
 Flink SQL provides a wide range of [built-in functions](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/functions/systemFunctions.html) that cover most SQL day-to-day work. Sometimes, you need more flexibility to express custom business logic or transformations that aren't easily translatable to SQL: this can be achieved with [User-Defined Functions](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/functions/udfs.html) (UDFs).