Skip to content

Commit 238b4d8

Browse files
authored
docs: add basic and advanced documentation and refactor main readme (#72)
1 parent a44468f commit 238b4d8

23 files changed

+508
-266
lines changed

.markdownlint.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
---
22
MD013:
3-
line_length: 120
3+
line_length: 400

README.md

+60-151
Original file line numberDiff line numberDiff line change
@@ -1,186 +1,86 @@
11
# Data Factory - Testing Framework
22

3-
A test framework that allows you to write unit and functional tests for Data Factory
4-
pipelines against the git integrated json resource files.
5-
6-
Supporting currently:
7-
8-
* [Fabric Data Factory](https://learn.microsoft.com/en-us/fabric/data-factory/)
9-
* [Azure Data Factory v2](https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities?tabs=data-factory)
10-
11-
Planned:
12-
13-
* [Azure Synapse Analytics](https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities?context=%2Fazure%2Fsynapse-analytics%2Fcontext%2Fcontext&tabs=data-factory/)
14-
15-
## Disclaimer
16-
17-
This unit test framework is not officially supported.
18-
It is currently in an experimental state and has not been tested with every single data factory resource.
19-
It should support all activities out-of-the-box but has not been thoroughly tested,
20-
please report any issues in the issues section and include an example of the pipeline that is not working as expected.
21-
22-
If there's a lot of interest in this framework, then we will continue to improve it and move it to a production-ready state.
3+
A stand-alone test framework that allows to write unit tests for Data Factory pipelines on [Microsoft Fabric](https://learn.microsoft.com/en-us/fabric/data-factory/) and [Azure Data Factory](https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities?tabs=data-factory).
234

245
## Features
256

26-
Goal: Validate that the evaluated pipeline configuration with its expressions is behaving as expected on runtime.
27-
28-
1. Evaluate expressions with their functions and arguments instantly by using the framework's internal expression parser.
29-
2. Test a pipeline or activity against any state to assert the expected outcome.
30-
A state can be configured with pipeline parameters, global parameters, variables and activity outputs.
31-
3. Simulate a pipeline run and evaluate the execution flow and outcome of each activity.
32-
4. Dynamically supports all activity types with all their attributes.
33-
34-
> Pipelines and activities are not executed on any Data Factory environment,
35-
> but the evaluation of the pipeline configuration is validated locally.
36-
> This is different from the "validation" functionality present in the UI,
37-
> which only validates the syntax of the pipeline configuration.
38-
39-
## Why
40-
41-
Data Factory does not support unit testing out of the box.
42-
The only way to validate your changes is through manual testing or running e2e tests against a deployed data factory.
43-
These tests are great to have, but miss the following benefits that unit tests, like using this unit test framework, provide:
44-
45-
* Shift left with immediate feedback on changes - Evaluate any individual data factory resource
46-
(pipelines, activities, triggers, datasets, linked services etc..), including (complex) expressions
47-
* Allows testing individual resources (e.g. activity) for many different input values to cover more scenarios.
48-
* Less issues in production - due to the fast nature of writing and running unit tests,
49-
you will write more tests in less time and therefore have a higher test coverage.
50-
This means more confidence in new changes, fewer risks in breaking existing features (regression tests),
51-
and thus far fewer issues in production.
52-
53-
> Even though Data Factory is UI-driven writing unit tests, and might not be in the nature of it.
54-
> How can you be confident that your changes will work as expected,
55-
> and that existing pipelines will not break, without writing unit tests?
56-
57-
## Getting started
58-
59-
### Start writing tests with a Dev Container
60-
61-
To get started using the tests, refer to the following [README](./examples/README.md)
62-
63-
### Start writing tests without a Dev Container
64-
65-
1. Set up an empty Python project with your favorite testing library
66-
2. Install the dotnet runtime from [here](https://dotnet.microsoft.com/en-us/download/dotnet/8.0).
67-
Using only the runtime and not the SDK should be sufficient.
68-
This is required to run some expression functions on dotnet just like in Data Factory.
69-
3. Set up an empty Python project with your favorite testing library
70-
More information:
71-
[docs_Setup](/docs/environment_setup/unit_test_setup.md)
72-
73-
4. Install the package using your preferred package manager:
7+
The framework evaluates pipeline and activity definitions which can be asserted. It does so by providing the following features:
748

75-
Pip: `pip install data-factory-testing-framework`
9+
1. Evaluate expressions by using the framework's internal expression parser. It supports all the functions and arguments that are available in the Data Factory expression language.
10+
2. Test an activity with a specific state and assert the evaluated expressions.
11+
3. Test a pipeline run by verifying the execution flow of activities for specific input parameters and assert the evaluated expressions of each activity.
7612

77-
5. Create a Folder in your project and copy the JSON Files with the pipeline definitions locally.
13+
> The framework does not support running the actual pipeline. It only gives you the ability to test the pipeline and activity definitions.
7814
79-
More information:
80-
[Json Guidance](/docs/environment_setup/json_pipeline_files.md)
15+
### High-level example
8116

82-
6. Start writing tests
17+
Given a `WebActivity` with a `typeProperties.url` property containing the following expression:
8318

84-
## Features - Examples
85-
86-
The samples seen below are the _only_ code that you need to write! The framework will take care of the rest.
19+
```datafactoryexpression
20+
@concat(pipeline().globalParameters.baseUrl, variables('JobName'))
21+
```
8722

88-
1. Evaluate activities (e.g. a WebActivity that calls Azure Batch API)
23+
A simple test to validate that the concatenation is working as expected could look like this:
8924

90-
```python
25+
```python
9126
# Arrange
92-
activity: Activity = pipeline.get_activity_by_name("Trigger Azure Batch Job")
27+
activity = pipeline.get_activity_by_name("webactivity_name")
9328
state = PipelineRunState(
9429
parameters=[
9530
RunParameter(RunParameterType.Global, "BaseUrl", "https://example.com"),
96-
RunParameter(RunParameterType.Pipeline, "JobId", "123"),
9731
],
9832
variables=[
99-
PipelineRunVariable("JobName", "Job-123"),
33+
PipelineRunVariable("Path", "some-path"),
10034
])
101-
state.add_activity_result("Get version", DependencyCondition.SUCCEEDED, {"Version": "version1"})
10235

10336
# Act
10437
activity.evaluate(state)
10538

10639
# Assert
107-
assert "https://example.com/jobs" == activity.type_properties["url"].value
108-
assert "POST" == activity.type_properties["method"].value
109-
body = activity.type_properties["body"].get_json_value()
110-
assert "123" == body["JobId"]
111-
assert "Job-123" == body["JobName"]
112-
assert "version1" == body["Version"]
40+
assert "https://example.com/some-path" == activity.type_properties["url"].value
11341
```
11442

115-
2. Evaluate Pipelines and test the flow of activities given a specific input
43+
## Why
11644

117-
```python
118-
# Arrange
119-
pipeline: PipelineResource = test_framework.repository.get_pipeline_by_name("batch_job")
120-
121-
# Runs the pipeline with the provided parameters
122-
activities = test_framework.evaluate_pipeline(pipeline, [
123-
RunParameter(RunParameterType.Pipeline, "JobId", "123"),
124-
RunParameter(RunParameterType.Pipeline, "ContainerName", "test-container"),
125-
RunParameter(RunParameterType.Global, "BaseUrl", "https://example.com"),
126-
])
127-
128-
set_variable_activity: Activity = next(activities)
129-
assert set_variable_activity is not None
130-
assert "Set JobName" == set_variable_activity.name
131-
assert "JobName" == activity.type_properties["variableName"]
132-
assert "Job-123" == activity.type_properties["value"].value
133-
134-
get_version_activity = next(activities)
135-
assert get_version_activity is not None
136-
assert "Get version" == get_version_activity.name
137-
assert "https://example.com/version" == get_version_activity.type_properties["url"].value
138-
assert "GET" == get_version_activity.type_properties["method"]
139-
get_version_activity.set_result(DependencyCondition.Succeeded,{"Version": "version1"})
140-
141-
create_batch_activity = next(activities)
142-
assert create_batch_activity is not None
143-
assert "Trigger Azure Batch Job" == create_batch_activity.name
144-
assert "https://example.com/jobs" == create_batch_activity.type_properties["url"].value
145-
assert "POST" == create_batch_activity.type_properties["method"]
146-
body = create_batch_activity.type_properties["body"].get_json_value()
147-
assert "123" == body["JobId"]
148-
assert "Job-123" == body["JobName"]
149-
assert "version1" == body["Version"]
150-
151-
with pytest.raises(StopIteration):
152-
next(activities)
153-
```
154-
155-
> See the [Examples](/examples) folder for more samples
156-
157-
## Registering missing expression functions
158-
159-
As the framework is interpreting expressions containing functions, these functions are implemented in the framework,
160-
but there may be bugs in some of them. You can override their implementation through:
45+
Data Factory does not support unit testing, nor testing of pipelines locally. Having integration and e2e tests running on an actual Data Factory instance is great, but having unit tests on top of them provides additional means of quick iteration, validation and regression testing. Unit testing with the _Data Factory Testing Framework_ has the following benefits:
16146

162-
```python
163-
FunctionsRepository.register("concat", lambda arguments: "".join(arguments))
164-
FunctionsRepository.register("trim", lambda text, trim_argument: text.strip(trim_argument[0]))
165-
```
47+
* Runs locally with immediate feedback
48+
* Easier to cover a lot of different scenarios and edge cases
49+
* Regression testing
50+
51+
## Concepts
52+
53+
The following pages go deeper into different topics and concepts of the framework to help in getting you started.
54+
55+
### Basic
56+
57+
1. [Repository setup](docs/basic/repository_setup.md)
58+
2. [Installing and initializing the framework](docs/basic/installing_and_initializing_framework.md)
59+
3. [State](docs/basic/state.md)
60+
4. [Activity testing](docs/basic/activity_testing.md)
61+
5. [Pipeline testing](docs/basic/pipeline_testing.md)
62+
63+
> If you are a not that experienced with Python, you can follow the [Getting started](docs/basic/getting_started.md) guide to get started with the framework.
64+
65+
### Advanced
16666

167-
## Tips
67+
1. [Debugging your activities and pipelines](docs/advanced/debugging.md)
68+
2. [Development workflow](docs/advanced/development_workflow.md)
69+
3. [Overriding expression functions](docs/advanced/overriding_expression_functions.md)
70+
4. [Framework internals](docs/advanced/framework_internals.md)
16871

169-
1. After parsing a data factory resource file, you can use the debugger to easily discover which classes are actually
170-
initialized so that you can cast them to the correct type.
72+
## Examples
17173

172-
## Recommended development workflow for Azure Data Factory v2
74+
More advanced examples demonstrating the capabilities of the framework:
17375

174-
* Use ADF Git integration
175-
* Use UI to create a feature branch, build the initial pipeline, and save it to the feature branch
176-
* Pull feature branch locally
177-
* Start writing tests unit and functional tests, run them locally for immediate feedback, and fix bugs
178-
* Push changes to the feature branch
179-
* Test the new features manually through the UI in a sandbox environment
180-
* Create PR, which will run the tests in the CI pipeline
181-
* Approve PR
182-
* Merge to main and start deploying to dev/test/prod environments
183-
* Run e2e tests after each deployment to validate all happy flows work on that specific environment
76+
Fabric:
77+
78+
1. [Batch job example](examples/fabric/batch_job/README.md)
79+
80+
Azure Data Factory:
81+
82+
1. [Copy blobs example](examples/data_factory/copy_blobs/README.md)
83+
2. [Batch job example](examples/data_factory/batch_job/README.md)
18484

18585
## Contributing
18686

@@ -196,6 +96,15 @@ This project has adopted the [Microsoft Open Source Code of Conduct](https://ope
19696
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
19797
contact [[email protected]](mailto:[email protected]) with any additional questions or comments.
19898

99+
## Disclaimer
100+
101+
This unit test framework is not officially supported.
102+
It is currently in an experimental state and has not been tested with every single data factory resource.
103+
It should support all activities out-of-the-box but has not been thoroughly tested,
104+
please report any issues in the issues section and include an example of the pipeline that is not working as expected.
105+
106+
If there's a lot of interest in this framework, then we will continue to improve it and move it to a production-ready state.
107+
199108
## Trademarks
200109

201110
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft

docs/advanced/debugging.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Debugging
2+
3+
As the framework is dynamically parsing and interpreting data factory resource files, it can be challenging to identify which objects you are working with. It is recommended to use the debugger during development of your tests to get a better idea of what activities are being returned and to understand the structure of the activity and its properties.

docs/advanced/development_workflow.md

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Recommended development workflow for Azure Data Factory v2
2+
3+
* Use ADF Git integration
4+
* Use UI to create a feature branch, build the initial pipeline, and save it to the feature branch
5+
* Pull feature branch locally
6+
* Start writing unit and functional tests, run them locally for immediate feedback, and fix bugs
7+
* Push changes to the feature branch
8+
* Test the new features manually through the UI in a sandbox environment
9+
* Create PR, which will run the tests in the CI pipeline
10+
* Approve PR
11+
* Merge to main and start deploying to dev/test/prod environments
12+
* Run e2e tests after each deployment to validate all happy flows work on that specific environment

docs/advanced/framework_internals.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Framework internals
2+
3+
This page will be used to describe the internals of the testing framework. It will be used to document the architecture, design decisions, and implementation details of the framework.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Overriding expression functions
2+
3+
The framework interprets expressions containing functions, which are implemented within the framework, and they might contain bugs.
4+
You can override their implementation as illustrated below:
5+
6+
```python
7+
FunctionsRepository.register("concat", lambda arguments: "".join(arguments))
8+
FunctionsRepository.register("trim", lambda text, trim_argument: text.strip(trim_argument[0]))
9+
```

0 commit comments

Comments
 (0)