Skip to content

Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation from PG endpoint for logical database #3769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: BABEL_4_X_DEV
Choose a base branch
from

Conversation

ahmed-shameem
Copy link
Contributor

Description

If the babelfish table has a column whose type is anything other than text/ntext (eg: varchar, nvarchar etc) and the collation of the column is NOT same as babelfish database/server level collation and if the query is executed from PSQL endpoint, it will throw error saying it can not decide which collation to use.

If the same query is executed from TDS endpoint, it runs successfully.

This is because when the query is executed from PSQL endpoint -- the collation of varchar is server level collation. During resolution of collation at the parent node, the collation of left and right operand does NOT match and none of their collation matches with CLUSTER_COLLATION_OID (this returns DEFAULT_COLLATION_OID as the dialect is PG)

The query is working from TDS endpoint -- the collation of varchar is database level collation. During resolution of collation at the parent node, the collation of left and right operand does NOT match and collation of right operand matches with CLUSTER_COLLATION_OID (this returns database level collation as the dialect is TSQL)

This commit handles the case when the connection is DMS by using the dialect and psql_logical_babelfish_db_name GUC. If the connection is DMS, even if it is PG dialect, we will return babelfish server level collation.

Issues Resolved

BABEL-5077
Signed-off-by: Shameem Ahmed [email protected]

Test Scenarios Covered

  • Use case based -

  • Boundary conditions -

  • Arbitrary inputs -

  • Negative test cases -

  • Minor version upgrade tests -

  • Major version upgrade tests -

  • Performance tests -

  • Tooling impact -

  • Client tests -

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…defined using non-default collation for DMS

Signed-off-by: Shameem Ahmed <[email protected]>
@coveralls
Copy link
Collaborator

coveralls commented May 19, 2025

Pull Request Test Coverage Report for Build 15438300366

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 14 of 18 (77.78%) changed or added relevant lines in 3 files are covered.
  • 756 unchanged lines in 7 files lost coverage.
  • Overall coverage increased (+0.03%) to 75.308%

Changes Missing Coverage Covered Lines Changed/Added Lines %
contrib/babelfishpg_tsql/src/collation.c 3 7 42.86%
Files with Coverage Reduction New Missed Lines %
contrib/babelfishpg_common/src/geo_scan.l 3 93.85%
contrib/babelfishpg_tsql/src/iterative_exec.c 17 84.07%
contrib/babelfishpg_common/src/datetime.c 51 92.99%
contrib/babelfishpg_tds/src/backend/tds/tdsprotocol.c 60 81.06%
contrib/babelfishpg_tsql/src/collation.c 140 74.24%
contrib/babelfishpg_tds/src/backend/tds/tdstypeio.c 215 87.6%
contrib/babelfishpg_tsql/src/hooks.c 270 85.09%
Totals Coverage Status
Change from base Build 15065162691: 0.03%
Covered Lines: 48493
Relevant Lines: 64393

💛 - Coveralls

@ahmed-shameem ahmed-shameem changed the title Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation for DMS Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation May 26, 2025
@ahmed-shameem ahmed-shameem changed the title Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation Fix loading of data for TEXT column (NULL allowed) if primary key is defined using non-default collation from PG endpoint for logical database May 26, 2025
Comment on lines 5978 to 5985
else if (sql_dialect == SQL_DIALECT_PG && pltsql_psql_logical_babelfish_db_name)
{
/*
* Check whether the type is collatable or not
* If yes, return server level collation or return InvalidOid
*/
return OidIsValid(typtup->typcollation) ? CLUSTER_COLLATION_OID() : InvalidOid;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move this section out of this if condition to later before where we are handling case for TEXT. Also, add comment explaining when and why are we doing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in latest rev

* No need to tell babelfishpg_common that logical db name has been set if dialect is TSQL
* because the psql_logical_babelfish_db_name is to be used for PG dialect only
*/
if (sql_dialect == SQL_DIALECT_PG)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure if this is really needed. Can you please explain use case/example which may break if we dont do thiss

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_logical_db_name_set for TSQL endpoint as well. Now, for foreign constraint the dialect gets set to PG, hence during collation resolution in merge_collation_state, CLUSTER_COLLATION_OID() returns babelfish server/db level collation. Whereas, it expects DEFAULT_COLLATION_OID. This results in collation conflict and we get the following error during insertion of row:

1> INSERT INTO dms_db_babel_5077_t7(dms_db_babel_5077_t7_col2) VALUES ('unicode_cs_as');
2> go
Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1
could not determine which collation to use for string comparison

By setting is_logical_db_name_set for PG dialect only, we ensure that for foreign constraint case as well we return DEFAULT_COLLATION_OID from CLUSTER_COLLATION_OID() (which was the behaviour initially)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FKEY check will always be done under PG_DIALECT so same question remains. Why is this needed? can you actually show whole example you are referring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE TABLE dms_db_babel_5077_t6(dms_db_babel_5077_t6_col1 text NULL, dms_db_babel_5077_t6_col2 [varchar](44) collate BBF_Unicode_General_CS_AS primary key);
GO

INSERT INTO dms_db_babel_5077_t6(dms_db_babel_5077_t6_col2) VALUES ('unicode_cs_as');
GO

CREATE TABLE dms_db_babel_5077_t7(dms_db_babel_5077_t7_col1 text NULL, dms_db_babel_5077_t7_col2 [varchar](44) collate BBF_Unicode_General_CS_AS references dms_db_babel_5077_t6(dms_db_babel_5077_t6_col2));
GO

1> INSERT INTO dms_db_babel_5077_t7(dms_db_babel_5077_t7_col2) VALUES ('unicode_cs_as');
2> go
Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1
could not determine which collation to use for string comparison

init_and_check_collation_callbacks();

/* let babelfishpg_common know that psql_logical_babelfish_db_name has been updated */
(*collation_callbacks_ptr->set_logical_db_name_cache) (newval);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this logic, shall we check for valid values of newval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you indicating to check whether the logical database name exists or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, that's redundant as even if the logical db name is incorrect, we won't find any relations, functions, view, procedures etc for that incorrect/invalid logical db

* No need to tell babelfishpg_common that logical db name has been set if dialect is TSQL
* because the psql_logical_babelfish_db_name is to be used for PG dialect only
*/
if (sql_dialect == SQL_DIALECT_PG)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FKEY check will always be done under PG_DIALECT so same question remains. Why is this needed? can you actually show whole example you are referring?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants