Proposed dialect testing re-org #1699
mtoy-googly-moogly
started this conversation in
Developing Malloy
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Reducing Test Debt
No shame, but there is some debt in our test setup. I'd like to take a pass at reducing it.
Requirements
Test Data
Test data is magically assumed to exist in the test database. The parquet files currently hiding in test/data/duckdb should be the source of truth for the test data. There should be a script for loading the test data into each dialect, much like the loader script for
duckdb. If we are going to load sql into postgres, for example, then there should be a parquet to SQL converter, so that the parquet files contain the truth.
This is important because the test data is going to change (see next item), and we need to make that process as painless as possible.
Reduce Test Data Footprint
Tests which need to run queries are written against some sample data, and that data needs to be pared down.
I would love for the total size of the test database to be under megabyte, so it would not be onerous to load multiple copies of it into memory in a parallel test run.
The main offender is the FAA database and we could certainly trim the number of rows in the tables in this database to produce
a much smaller data set. There would then have to be some work on tests because some of the aggregate computations answers will change.
This a a copy of some data from google analytics, which we use to write most of our tests of deeply nested data. We also need to trim some rows ouf of this data set, but we should keep using it.
Then there are a several other databases which maybe don't need to be in tests at all, we could possibly re-code those tests to run againt one of the two tables listed above.
Fix Test Taxonomy
Test fall into one of three categories
CI
I propose combining the tests which always run exactly once per CI run into one
npm run
...npm run test-core
... amd this, which ONLY runs the third kind of tests, against all databases, or a subset, if
MALLOY_TEST_DATABASES
is usednpm run test-dialect
Then the overall
npm test
would just do both of the above tests.However, CI will probably break out indiviual dialects, in order to get some parallel testing, the CI script will probably do
npm run test-core
env MALLOY_TEST_DATABASE=dialect_name npm run test-dialect
This will make our tests as slow as the slowest dialect, but would enable us to run
Changes to current location of tests
test/src/databases
is a fine place for all the dialect tests to live. Currently we always load all files for all dialects, and ask that each file quietly respectMALLOY_TEST_DATABASES
. This is an acceptable approach, but the way tests are sorted inside ofdatabases
is a little bit debt-ridden and makes it hard for a dialect author to know what tests they need to write.The plan would still be to load ALL of these tests for all dialects, but using our magic technology to avoid running tests for dialects not under test. There could also be an advanced plan which only loads the test files relevant to the set of dialects under test, which we could add at some furture date.
The tests inside here are in a number of categories. I propose we formalize these by moving tests around to match this structure
all/
-- Tests which every dialect is required to passfeature-name/feature-name.spec.ts
(e.g.nesting/nest.spec.ts
) -- A test which all dialects which support that feature must passfeature-name/dialect-name.spec.ts
(e.g.nesting/nest(bigquery,duckdb).spec.ts
) -- A test which one (or more dialects) use to test a particular featureBeta Was this translation helpful? Give feedback.
All reactions