This extension provides transparent, metadata-driven query support for GigAPI
The gigapi
extension seamlessly rewrites DuckDB SQL queries using GigAPI metadata indices.
- If an index is found, the extension dynamically rewrites the query to read the specific data files (e.g., Parquet files on S3) relevant to the query's time range and other filters.
- If no index is found, the query is passed on to DuckDB's default planner, allowing you to work with regular tables as usual.
To use this extension, you must first configure a secret in DuckDB to store the connection details for your Redis instance. The extension will look for a redis
type secret with the name gigapi
.
You can create the secret using the following SQL command. Replace the values for host
, port
, and password
with your Redis instance's details.
CREATE SECRET gigapi (
TYPE redis,
HOST 'localhost',
PORT '6379',
PASSWORD 'your-password'
);
Parameters:
TYPE
: Must beredis
.HOST
: The hostname or IP address of your Redis server. (Default: 'localhost')PORT
: The port number for your Redis server. (Default: '6379')PASSWORD
: The password for your Redis server. (Optional)
The primary way to use the extension is via the gigapi()
table function. You pass a complete SQL query as a string to this function. The extension will then rewrite it using the metadata from Redis and execute it.
-- Load the extension
INSTALL gigapi FROM community;
LOAD gigapi;
-- Create the Redis secret for the GigAPI backend
CREATE SECRET gigapi (
TYPE redis,
HOST '127.0.0.1',
PORT '6379',
PASSWORD ''
);
-- Use the gigapi() table function to run a query
SELECT * FROM gigapi('SELECT * FROM my_measurement WHERE time > now() - interval ''1 hour''');
Behind the scenes, the extension will perform the following steps:
- Parse the inner
SELECT
query. - Check Redis for a key named
giga:idx:ts:my_measurement
. - If the key exists, extract the time range from the
WHERE
clause. - Fetch the relevant list of data files from the Redis sorted set.
- Rewrite the query to be
SELECT * FROM read_parquet(['file1.parquet', 'file2.parquet', ...]) WHERE time > now() - interval '1 hour'
. - Pass the rewritten query to the DuckDB planner for execution.
As a more powerful alternative to the gigapi()
table function, you can use the GIGAPI
keyword at the beginning of any query. This will trigger the same query rewriting logic but allows you to use standard SQL syntax without wrapping your query in a string.
Any query prefixed with GIGAPI
will be automatically intercepted by the extension's query planner. The planner then rewrites the query based on the metadata found in Redis, just like the gigapi()
function does.
-- The same query, but using the GIGAPI keyword for transparent hijacking.
GIGAPI SELECT * FROM my_measurement WHERE time > now() - interval '1 hour';
Behind the scenes, the extension performs the same steps as the gigapi()
table function, rewriting the query to read from specific data files before execution. If a query is not prefixed with GIGAPI
, it will be handled by DuckDB's default planner.
For debugging and development, the extension provides a scalar function gigapi_dry_run(sql_query)
that shows you how a query would be rewritten without actually connecting to Redis. It uses a dummy list of Parquet files in its place.
Example:
SELECT gigapi_dry_run('SELECT * FROM my_table WHERE value > 10');
Output:
SELECT * FROM read_parquet(['dummy/file1.parquet', 'dummy/file2.parquet']) WHERE ("value" > 10)