Skip to content

Introduce helper to dump Babelfish operator class defined over built-… #355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Deepesh125
Copy link
Contributor

@Deepesh125 Deepesh125 commented May 2, 2024

Description

Extension changes - babelfish-for-postgresql/babelfish_extensions#2542

There is a bug (described below) in Postgres due to which it does not dump user-defined operator class over built-in data
types. This commit fixed that issue by introducing helper function which will dump operator class and all of its members
appropriately.

Task: BABEL-3385
Signed-off-by: Dipesh Dhameliya [email protected]

Issue details

Let's assume that we want to create certain operator between int (Oid = 23) and numeric (Oid = 1700). And we want to add this operator to existing operator family of integer (intger_ops (oid = 1976)) so that optimiser chooses index scan in certain situation over sequential scan. For this, one would follow following script to create operator class -

CREATE SCHEMA uds;

CREATE FUNCTION uds.int4_numeric_cmp (int, numeric)
RETURNS int
AS
$$
  Select numeric_cmp($1::numeric, $2)
$$
LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION uds.int4_numeric_eq (int, numeric)
RETURNS boolean
AS
$$
  Select numeric_eq($1::numeric, $2)
$$
LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;

CREATE OPERATOR uds.= (
LEFTARG = int,
RIGHTARG = numeric,
FUNCTION = uds.int4_numeric_eq,
COMMUTATOR = =,
NEGATOR = <>,
RESTRICT = eqsel,
JOIN = eqjoinsel
);

CREATE OPERATOR CLASS uds.int_numeric FOR TYPE int4
USING btree FAMILY integer_ops AS
OPERATOR 3 uds.= (int, numeric),
FUNCTION 1 uds.int4_numeric_cmp(int, numeric);

This CREATE OPERATOR CLASS would create unique entries in pg_amop for each of the member operators and will create entry in pg_amproc for each of the support function.

postgres=# select oid, opcname, opcfamily opcintype from pg_opclass where opcname = 'int_numeric';
  oid  |   opcname   | opcintype
-------+-------------+-----------
 16390 | int_numeric |      1976
(1 row)

postgres=# select * from pg_amop where amoplefttype = 23 and amoprighttype = 1700;
  oid  | amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose | amopopr | amopmethod | amopsortfamily
-------+------------+--------------+---------------+--------------+-------------+---------+------------+----------------
 16403 |       1976 |           23 |          1700 |            3 | s           |   16389 |        403 |              0
(1 row)

postgres=# select * from pg_amproc where amproclefttype = 23 and amprocrighttype = 1700;
  oid  | amprocfamily | amproclefttype | amprocrighttype | amprocnum |        amproc
-------+--------------+----------------+-----------------+-----------+----------------------
 16404 |         1976 |             23 |            1700 |         1 | uds.int4_numeric_cmp
(1 row)

But PG does not store dependency of pg_amop or pg_amproc entry on pg_opclass or pg_opfamily . Reason for this is explain below.

postgres=#  select * from pg_depend where classid = 'pg_catalog.pg_amop'::pg_catalog.regclass and objid = 16403;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
    2602 | 16403 |        0 |       2617 |    16389 |           0 | a
(1 row)

^^ it just stores dependency on pg_operator (oid = 2617)

postgres=# select * from pg_depend where classid = 'pg_catalog.pg_amproc'::pg_catalog.regclass and objid = 16404;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
    2603 | 16404 |        0 |       1255 |    16385 |           0 | a
(1 row)

^^ it just stores dependency on pg_proc (oid = 1255)

This decision of not storing dependency of pg_amop on pg_opclass or pg_opfamily is causing two issues -

  1. PG will not clean up entries in pg_amop in case operator class is deleted.
postgres=# DROP OPERATOR CLASS uds.int_numeric USING btree;
DROP OPERATOR CLASS

postgres=# select oid, opcname, opcfamily opcintype from pg_opclass where opcname = 'int_numeric';
 oid | opcname | opcintype
-----+---------+-----------
(0 rows)

postgres=# select * from pg_amop where amoplefttype = 23 and amoprighttype = 1700;
  oid  | amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose | amopopr | amopmethod | amopsortfamily
-------+------------+--------------+---------------+--------------+-------------+---------+------------+----------------
 16403 |       1976 |           23 |          1700 |            3 | s           |   16389 |        403 |              0
(1 row)

postgres=# select * from pg_amproc where amproclefttype = 23 and amprocrighttype = 1700;
  oid  | amprocfamily | amproclefttype | amprocrighttype | amprocnum |        amproc
-------+--------------+----------------+-----------------+-----------+----------------------
 16404 |         1976 |             23 |            1700 |         1 | uds.int4_numeric_cmp
(1 row)

postgres=# CREATE OPERATOR CLASS uds.int_numeric FOR TYPE int4
postgres-# USING btree FAMILY integer_ops AS
postgres-# OPERATOR 3 uds.= (int, numeric),
postgres-# FUNCTION 1 uds.int4_numeric_cmp(int, numeric);
ERROR:  duplicate key value violates unique constraint "pg_amop_fam_strat_index"
DETAIL:  Key (amopfamily, amoplefttype, amoprighttype, amopstrategy)=(1976, 23, 1700, 3) already exists.
  1. PG will not dump operator class as per expectation which means we will loose index scan after upgrade. Expectation is that, it would dump operator class with its member operators and functions but its not doing it. I am also attaching dump file for the reference.
postgres=# select oid, opcname, opcfamily opcintype from pg_opclass where opcname = 'int_numeric';
  oid  |   opcname   | opcintype
-------+-------------+-----------
 16406 | int_numeric |      1976
(1 row)

postgres=# select * from pg_amproc where amproclefttype = 23 and amprocrighttype = 1700;
 oid | amprocfamily | amproclefttype | amprocrighttype | amprocnum | amproc
-----+--------------+----------------+-----------------+-----------+--------
(0 rows)

postgres=# select * from pg_amop where amoplefttype = 23 and amoprighttype = 1700;
 oid | amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose | amopopr | amopmethod | amopsortfamily
-----+------------+--------------+---------------+--------------+-------------+---------+------------+----------------
(0 rows)

why does not PG store dependency of pg_amop or pg_amproc entry on pg_opclass or pg_opfamily ?
The answer lies with the invention of btadjustmembers (code link). Here, Postgres is not trying to register dependency of pg_amop entry on int_numeric (Operator class) but it rather tries to store dependency on integer_ops (oid = 1976 / Builtin operator family for integer datatype). And Oid = 1976 is pinned object and since Postgres does not register dependency on pinned object, recordDependencyOn will not add dependency entry.

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the PostgreSQL license, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…in data types (babelfish-for-postgresql#346)

Extension changes - babelfish-for-postgresql/babelfish_extensions#2460

There is a bug (described below) in Postgres due to which it does not dump user-defined operator class over built-in data
types. This commit fixed that issue by introducing helper function which will dump operator class and all of its members
appropriately.

Task: BABEL-3385
Signed-off-by: Dipesh Dhameliya [email protected]
@Deepesh125 Deepesh125 merged commit be2bf4a into babelfish-for-postgresql:BABEL_3_X_DEV__PG_15_X May 2, 2024
@Deepesh125 Deepesh125 deleted the jira-babel-3385-3x branch May 2, 2024 10:43
Deepesh125 added a commit to babelfish-for-postgresql/babelfish_extensions that referenced this pull request May 2, 2024
Today in Babelfish, Index scan would not be chosen by optimiser if there is any filter that involves operator between int and numeric even though index is created on int column. For example,

```
create table test_int_numeric(a int);

-- insert 1M rows of data
INSERT INTO test_int_numeric (a) SELECT generate_series(1, 1000000);

CREATE INDEX test_int_numeric_idx on test_int_numeric(a);
```

Here, Index scan would not be chosen for following query as operator class and operators between int and numeric does not exist in native Postgres.

```
select count(*) from test_int_numeric where a = 5.0;
```

This commit fixes this issue by appropriately defining operator class and operators between int and numeric.

This changes also requires engine changes (babelfish-for-postgresql/postgresql_modified_for_babelfish#355) in order to dump these Babelfish defined operator class correctly during pg_upgrade.

Task: BABEL-3385
Signed-off-by: Dipesh Dhameliya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants