Skip to content

gigapi/duckdb-gigapi-extension

Repository files navigation

GigAPI DuckDB Extension

This extension provides transparent, metadata-driven query support for GigAPI

Overview

The gigapi extension seamlessly rewrites DuckDB SQL queries using GigAPI metadata indices.

  • If an index is found, the extension dynamically rewrites the query to read the specific data files (e.g., Parquet files on S3) relevant to the query's time range and other filters.
  • If no index is found, the query is passed on to DuckDB's default planner, allowing you to work with regular tables as usual.

Configuration

To use this extension, you must first configure a secret in DuckDB to store the connection details for your Redis instance. The extension will look for a redis type secret with the name gigapi.

Creating the Secret

You can create the secret using the following SQL command. Replace the values for host, port, and password with your Redis instance's details.

CREATE SECRET gigapi (
    TYPE redis,
    HOST 'localhost',
    PORT '6379',
    PASSWORD 'your-password'
);

Parameters:

  • TYPE: Must be redis.
  • HOST: The hostname or IP address of your Redis server. (Default: 'localhost')
  • PORT: The port number for your Redis server. (Default: '6379')
  • PASSWORD: The password for your Redis server. (Optional)

Usage: gigapi() Table Function

The primary way to use the extension is via the gigapi() table function. You pass a complete SQL query as a string to this function. The extension will then rewrite it using the metadata from Redis and execute it.

Example

-- Load the extension
INSTALL gigapi FROM community;
LOAD gigapi;

-- Create the Redis secret for the GigAPI backend
CREATE SECRET gigapi (
    TYPE redis,
    HOST '127.0.0.1',
    PORT '6379',
    PASSWORD ''
);

-- Use the gigapi() table function to run a query
SELECT * FROM gigapi('SELECT * FROM my_measurement WHERE time > now() - interval ''1 hour''');

Behind the scenes, the extension will perform the following steps:

  1. Parse the inner SELECT query.
  2. Check Redis for a key named giga:idx:ts:my_measurement.
  3. If the key exists, extract the time range from the WHERE clause.
  4. Fetch the relevant list of data files from the Redis sorted set.
  5. Rewrite the query to be SELECT * FROM read_parquet(['file1.parquet', 'file2.parquet', ...]) WHERE time > now() - interval '1 hour'.
  6. Pass the rewritten query to the DuckDB planner for execution.

Transparent Query Hijacking with GIGAPI

As a more powerful alternative to the gigapi() table function, you can use the GIGAPI keyword at the beginning of any query. This will trigger the same query rewriting logic but allows you to use standard SQL syntax without wrapping your query in a string.

How it Works

Any query prefixed with GIGAPI will be automatically intercepted by the extension's query planner. The planner then rewrites the query based on the metadata found in Redis, just like the gigapi() function does.

Example

-- The same query, but using the GIGAPI keyword for transparent hijacking.
GIGAPI SELECT * FROM my_measurement WHERE time > now() - interval '1 hour';

Behind the scenes, the extension performs the same steps as the gigapi() table function, rewriting the query to read from specific data files before execution. If a query is not prefixed with GIGAPI, it will be handled by DuckDB's default planner.

Developer Information

Dry Run Function

For debugging and development, the extension provides a scalar function gigapi_dry_run(sql_query) that shows you how a query would be rewritten without actually connecting to Redis. It uses a dummy list of Parquet files in its place.

Example:

SELECT gigapi_dry_run('SELECT * FROM my_table WHERE value > 10');

Output:

SELECT * FROM read_parquet(['dummy/file1.parquet', 'dummy/file2.parquet']) WHERE ("value" > 10)

About

Experimental Extension for GigaPI Catalog

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published