Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation from PG endpoint for logical database #3769

ahmed-shameem · 2025-05-19T06:29:12Z

Description

If the babelfish table has a column whose type is anything other than text/ntext (eg: varchar, nvarchar etc) and the collation of the column is NOT same as babelfish database/server level collation and if the query is executed from PSQL endpoint, it will throw error saying it can not decide which collation to use.

If the same query is executed from TDS endpoint, it runs successfully.

This is because when the query is executed from PSQL endpoint -- the collation of varchar is server level collation. During resolution of collation at the parent node, the collation of left and right operand does NOT match and none of their collation matches with CLUSTER_COLLATION_OID (this returns DEFAULT_COLLATION_OID as the dialect is PG)

The query is working from TDS endpoint -- the collation of varchar is database level collation. During resolution of collation at the parent node, the collation of left and right operand does NOT match and collation of right operand matches with CLUSTER_COLLATION_OID (this returns database level collation as the dialect is TSQL)

This commit handles the case when the connection is DMS by using the dialect and psql_logical_babelfish_db_name GUC. If the connection is DMS, even if it is PG dialect, we will return babelfish server level collation.

Issues Resolved

BABEL-5077
Signed-off-by: Shameem Ahmed [email protected]

Test Scenarios Covered

Use case based -
Boundary conditions -
Arbitrary inputs -
Negative test cases -
Minor version upgrade tests -
Major version upgrade tests -
Performance tests -
Tooling impact -
Client tests -

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…defined using non-default collation for DMS Signed-off-by: Shameem Ahmed <[email protected]>

coveralls · 2025-05-19T07:04:10Z

Pull Request Test Coverage Report for Build 15438300366

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

14 of 18 (77.78%) changed or added relevant lines in 3 files are covered.
756 unchanged lines in 7 files lost coverage.
Overall coverage increased (+0.03%) to 75.308%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
contrib/babelfishpg_tsql/src/collation.c	3	7	42.86%

Files with Coverage Reduction	New Missed Lines	%
contrib/babelfishpg_common/src/geo_scan.l	3	93.85%
contrib/babelfishpg_tsql/src/iterative_exec.c	17	84.07%
contrib/babelfishpg_common/src/datetime.c	51	92.99%
contrib/babelfishpg_tds/src/backend/tds/tdsprotocol.c	60	81.06%
contrib/babelfishpg_tsql/src/collation.c	140	74.24%
contrib/babelfishpg_tds/src/backend/tds/tdstypeio.c	215	87.6%
contrib/babelfishpg_tsql/src/hooks.c	270	85.09%

Totals
Change from base Build 15065162691:	0.03%
Covered Lines:	48493
Relevant Lines:	64393

💛 - Coveralls

Signed-off-by: Shameem Ahmed <[email protected]>

contrib/babelfishpg_common/src/collation.c

test/JDBC/expected/psql_logical_babelfish_db.out

contrib/babelfishpg_common/src/collation.c

contrib/babelfishpg_common/src/babelfishpg_common.c

…with assign_guc_hook

contrib/babelfishpg_tsql/src/collation.c

Deepesh125 · 2025-05-30T09:20:00Z

contrib/babelfishpg_tsql/src/hooks.c

+	else if (sql_dialect == SQL_DIALECT_PG && pltsql_psql_logical_babelfish_db_name)
+	{
+		/* 
+		 * Check whether the type is collatable or not
+		 * If yes, return server level collation or return InvalidOid
+		 */
+		return OidIsValid(typtup->typcollation) ? CLUSTER_COLLATION_OID() : InvalidOid;
+	}


we can move this section out of this if condition to later before where we are handling case for TEXT. Also, add comment explaining when and why are we doing this

Updated in latest rev

Deepesh125 · 2025-05-30T09:23:22Z

contrib/babelfishpg_tsql/src/collation.c

+	 * No need to tell babelfishpg_common that logical db name has been set if dialect is TSQL
+	 * because the psql_logical_babelfish_db_name is to be used for PG dialect only
+	 */
+	if (sql_dialect == SQL_DIALECT_PG)


I am not 100% sure if this is really needed. Can you please explain use case/example which may break if we dont do thiss

The is_logical_db_name_set for TSQL endpoint as well. Now, for foreign constraint the dialect gets set to PG, hence during collation resolution in merge_collation_state, CLUSTER_COLLATION_OID() returns babelfish server/db level collation. Whereas, it expects DEFAULT_COLLATION_OID. This results in collation conflict and we get the following error during insertion of row:

1> INSERT INTO dms_db_babel_5077_t7(dms_db_babel_5077_t7_col2) VALUES ('unicode_cs_as'); 2> go Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1 could not determine which collation to use for string comparison

By setting is_logical_db_name_set for PG dialect only, we ensure that for foreign constraint case as well we return DEFAULT_COLLATION_OID from CLUSTER_COLLATION_OID() (which was the behaviour initially)

FKEY check will always be done under PG_DIALECT so same question remains. Why is this needed? can you actually show whole example you are referring?

This test will fail: https://github.com/babelfish-for-postgresql/babelfish_extensions/pull/3769/files#diff-baa40afc81f5544d3a1b28c72d43ab458fb930c373f4cc5687ecbd45d509726eR169

CREATE TABLE dms_db_babel_5077_t6(dms_db_babel_5077_t6_col1 text NULL, dms_db_babel_5077_t6_col2 [varchar](44) collate BBF_Unicode_General_CS_AS primary key); GO INSERT INTO dms_db_babel_5077_t6(dms_db_babel_5077_t6_col2) VALUES ('unicode_cs_as'); GO CREATE TABLE dms_db_babel_5077_t7(dms_db_babel_5077_t7_col1 text NULL, dms_db_babel_5077_t7_col2 [varchar](44) collate BBF_Unicode_General_CS_AS references dms_db_babel_5077_t6(dms_db_babel_5077_t6_col2)); GO 1> INSERT INTO dms_db_babel_5077_t7(dms_db_babel_5077_t7_col2) VALUES ('unicode_cs_as'); 2> go Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1 could not determine which collation to use for string comparison

KushaalShroff · 2025-06-03T08:44:24Z

contrib/babelfishpg_tsql/src/collation.c

+		init_and_check_collation_callbacks();
+
+		/* let babelfishpg_common know that psql_logical_babelfish_db_name has been updated */
+		(*collation_callbacks_ptr->set_logical_db_name_cache) (newval);


Given this logic, shall we check for valid values of newval?

Are you indicating to check whether the logical database name exists or not?

IMO, that's redundant as even if the logical db name is incorrect, we won't find any relations, functions, view, procedures etc for that incorrect/invalid logical db

Deepesh125 · 2025-06-06T06:31:28Z

contrib/babelfishpg_tsql/src/collation.c

+	 * No need to tell babelfishpg_common that logical db name has been set if dialect is TSQL
+	 * because the psql_logical_babelfish_db_name is to be used for PG dialect only
+	 */
+	if (sql_dialect == SQL_DIALECT_PG)


FKEY check will always be done under PG_DIALECT so same question remains. Why is this needed? can you actually show whole example you are referring?

Fix loading of data for TEXT column (NULL allowed) if primary key is …

5506250

…defined using non-default collation for DMS Signed-off-by: Shameem Ahmed <[email protected]>

ahmed-shameem requested review from Deepesh125 and KushaalShroff May 19, 2025 06:29

ahmed-shameem added 2 commits May 19, 2025 07:58

update test file to make single-db test pass

8982b05

Signed-off-by: Shameem Ahmed <[email protected]>

Remove psql_logical_babelfish_db from single-db run

1060357

Deepesh125 requested changes May 19, 2025

View reviewed changes

contrib/babelfishpg_common/src/collation.c Outdated Show resolved Hide resolved

test/JDBC/expected/psql_logical_babelfish_db.out Show resolved Hide resolved

KushaalShroff requested changes May 26, 2025

View reviewed changes

contrib/babelfishpg_common/src/collation.c Outdated Show resolved Hide resolved

Address review comments and introduce cache for psql_logical_db_name

68a3d6a

ahmed-shameem changed the title ~~Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation for DMS~~ Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation May 26, 2025

ahmed-shameem changed the title ~~Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation~~ Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation from PG endpoint for logical database May 26, 2025

KushaalShroff requested changes May 26, 2025

View reviewed changes

contrib/babelfishpg_common/src/collation.c Outdated Show resolved Hide resolved

contrib/babelfishpg_common/src/babelfishpg_common.c Outdated Show resolved Hide resolved

KushaalShroff requested changes May 26, 2025

View reviewed changes

contrib/babelfishpg_common/src/babelfishpg_common.c Outdated Show resolved Hide resolved

Shameem Ahmed added 2 commits May 28, 2025 06:16

Address review comments and introduce cache for psql_logical_db_name …

585b913

…with assign_guc_hook

Remove redundant header

788b315

Deepesh125 reviewed May 28, 2025

View reviewed changes

contrib/babelfishpg_tsql/src/collation.c Outdated Show resolved Hide resolved

Address review comment

b9569ee

Deepesh125 reviewed May 30, 2025

View reviewed changes

KushaalShroff requested changes Jun 3, 2025

View reviewed changes

Address review comment

170dcc8

Deepesh125 requested changes Jun 6, 2025

View reviewed changes

Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation from PG endpoint for logical database #3769

Are you sure you want to change the base?

Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation from PG endpoint for logical database #3769

Uh oh!

Conversation

ahmed-shameem commented May 19, 2025

Description

Issues Resolved

Test Scenarios Covered

Check List

Uh oh!

coveralls commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 15438300366

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coveralls commented May 19, 2025 •

edited

Loading