Proposed dialect testing re-org #1699

mtoy-googly-moogly · 2024-04-02T17:01:34Z

mtoy-googly-moogly
Apr 2, 2024
Maintainer

Reducing Test Debt

No shame, but there is some debt in our test setup. I'd like to take a pass at reducing it.

Requirements

Some way to bring CI times down
A developer running a query based test can continue to use jest-runner to run tests from a file in vs-code
A developer working on a dialect has a way to just test the dialect

Test Data

Test data is magically assumed to exist in the test database. The parquet files currently hiding in test/data/duckdb should be the source of truth for the test data. There should be a script for loading the test data into each dialect, much like the loader script for
duckdb. If we are going to load sql into postgres, for example, then there should be a parquet to SQL converter, so that the parquet files contain the truth.

This is important because the test data is going to change (see next item), and we need to make that process as painless as possible.

Reduce Test Data Footprint

Tests which need to run queries are written against some sample data, and that data needs to be pared down.

I would love for the total size of the test database to be under megabyte, so it would not be onerous to load multiple copies of it into memory in a parallel test run.

The main offender is the FAA database and we could certainly trim the number of rows in the tables in this database to produce
a much smaller data set. There would then have to be some work on tests because some of the aggregate computations answers will change.

This a a copy of some data from google analytics, which we use to write most of our tests of deeply nested data. We also need to trim some rows ouf of this data set, but we should keep using it.

Then there are a several other databases which maybe don't need to be in tests at all, we could possibly re-code those tests to run againt one of the two tables listed above.

Fix Test Taxonomy

Test fall into one of three categories

Tests that only need to run once, because they do not use a connection at all, these should be run once per CI run.
- Examples of this kind of test might be unit tests inside of the packages/ directories
Tests that expect to be able to run a query, but still only need to be run exactly once, even if you are not testing the dialect used by the test.
- Focused on how results are handled, examples of this might be tests of the API or rendering or annotation
Tests that run a query, that are testing the code generation of some number of dialects.
- Focused on the correctness of results

CI

I propose combining the tests which always run exactly once per CI run into one npm run ...

npm run test-core

... amd this, which ONLY runs the third kind of tests, against all databases, or a subset, if MALLOY_TEST_DATABASES is used

npm run test-dialect

Then the overall npm test would just do both of the above tests.

However, CI will probably break out indiviual dialects, in order to get some parallel testing, the CI script will probably do

npm run test-core
for each dialect, possibly in parallel, env MALLOY_TEST_DATABASE=dialect_name npm run test-dialect

This will make our tests as slow as the slowest dialect, but would enable us to run

Changes to current location of tests

test/src/databases is a fine place for all the dialect tests to live. Currently we always load all files for all dialects, and ask that each file quietly respect MALLOY_TEST_DATABASES. This is an acceptable approach, but the way tests are sorted inside of databases is a little bit debt-ridden and makes it hard for a dialect author to know what tests they need to write.

The plan would still be to load ALL of these tests for all dialects, but using our magic technology to avoid running tests for dialects not under test. There could also be an advanced plan which only loads the test files relevant to the set of dialects under test, which we could add at some furture date.

The tests inside here are in a number of categories. I propose we formalize these by moving tests around to match this structure

all/ -- Tests which every dialect is required to pass
feature-name/feature-name.spec.ts (e.g. nesting/nest.spec.ts) -- A test which all dialects which support that feature must pass
feature-name/dialect-name.spec.ts (e.g. nesting/nest(bigquery,duckdb).spec.ts) -- A test which one (or more dialects) use to test a particular feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposed dialect testing re-org #1699

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Proposed dialect testing re-org #1699

Uh oh!

Uh oh!

mtoy-googly-moogly Apr 2, 2024 Maintainer

Reducing Test Debt

Requirements

Test Data

Reduce Test Data Footprint

Fix Test Taxonomy

CI

Changes to current location of tests

Replies: 0 comments

mtoy-googly-moogly
Apr 2, 2024
Maintainer