-
Notifications
You must be signed in to change notification settings - Fork 105
charindex support for varbinary and enhancement #3810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: BABEL_5_X_DEV
Are you sure you want to change the base?
charindex support for varbinary and enhancement #3810
Conversation
fc5bdd0
to
583acb1
Compare
Pull Request Test Coverage Report for Build 15526664620Details
💛 - Coveralls |
Signed-off-by: Herambh Shah <[email protected]>
583acb1
to
ef7b05e
Compare
/* Check for NULL pattern and throw error */ | ||
if (PG_ARGISNULL(0)) | ||
ereport(ERROR, | ||
(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error code ERRCODE_NULL_VALUE_NOT_ALLOWED looks not match the err msg ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @forestkeeper for pointing this out. I believe we should change it to "NULL" instead of throwing error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And as per our discussion for cases like charindex(NULL, varbinary) where it throws error, will try to handle it in analyzer.
/* Get function arguments */ | ||
expressionToFind = PG_GETARG_TEXT_PP(0); | ||
expressionToSearch = PG_GETARG_TEXT_PP(1); | ||
start_location = PG_GETARG_INT32(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have NULL check for 3rd arg as well. Else, while fetching the value like this, server may crash. Also add such test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updating the sys function to STRICT so that we wont have to handle it explicitly.
collation = PG_GET_COLLATION(); | ||
|
||
/* If no collation specified, use default */ | ||
if (!OidIsValid(collation)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this should not happen. We should have error condition here instead of assigning DEFAULT COLLATION. Let's throw error llike check_collation_set function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the "if" condition.
else | ||
{ | ||
/* Get substring starting from start_location */ | ||
text *substr = DatumGetTextP(DirectFunctionCall2(text_substr_no_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use DatumGetTextPP instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
if (pos > 0) | ||
result = pos + start_location - 1; | ||
|
||
pfree(substr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a sanity check whether substr is NULL or not. Also, what will be the impact on pos if substr is NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can skip this because DatumGetTextPP will never return NULL value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatumGetTextPP can return pointer to null. Here you are obtaining the substring using a function, what if that function returns NULL itself? There are case where it will happen
if (start_location <= 0) | ||
{ | ||
/* Use byteapos directly */ | ||
result = DatumGetInt32(DirectFunctionCall2(byteapos, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments for tsql_charindex_char apply here as well
@@ -0,0 +1,200 @@ | |||
-- Basic syntax: CHARINDEX(substring, string [, start_location]) | |||
SELECT CHARINDEX('abc', 'xxabcxx'); -- Returns 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have test where 3rd arg is -ve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added !
GO | ||
|
||
-- Force case sensitivity | ||
SELECT CHARINDEX('ABC', 'xxabcxx' COLLATE Latin1_General_CS_AS); -- Returns 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test where collation of both args are different, we expect to throw error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add another test like select charindex('abc' collate DATABASE_DEFAULT, 'abc' collate latin1_general_ci_ai). This should also throw error.
Let's have explicit collation on both args with same collations as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
GO | ||
|
||
-- Special character | ||
SELECT CHARINDEX(' ', 'Hello World'); -- Returns 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check for wildcard characters like %, [], - etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes , added . Thanks
GO | ||
|
||
-- Special character | ||
SELECT CHARINDEX(' ', 'Hello World'); -- Returns 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expressionToSearch is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gives "0". Added more test case for spaces and empty string
SELECT CHARINDEX('Hello World', ''); -- return 0
SELECT CHARINDEX('', ''); -- return 0
SELECT CHARINDEX(' ', ' '); (10 spaces , 3 spaces). -- return 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindly add tests for the same
GO | ||
SELECT CHARINDEX('abc', 'xxabcxx', 1); -- Returns 3 | ||
GO | ||
SELECT CHARINDEX('abc', 'xxabcxx', 4); -- Returns 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the 3rd arg is NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it gives NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindly add tests for the same
/* Check for NULL pattern and throw error */ | ||
if (PG_ARGISNULL(0)) | ||
ereport(ERROR, | ||
(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), | ||
errmsg("Argument data type is invalid for argument 2 of charindex function."))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we throwing invalid argument data type error when pattern is NULL, can you test following query
select charindex(cast(NULL as varbinary), 0x12345678)
go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed above condition.
But in sql server, this is the behavior, I observed for NULL with varbinary arguments -
CASE 1: charindex(NULL, var[binary] ) -> throws error
CASE 2: charindex(d , var[binary]) -> where col d is having NULL values -> gives NULL
when we provide constant NULL value in charindex function along with varbinary argument, it seems sql server throws error
But if NULL value is given through a column, it returns NULL.
It might be the case with other functions also.
if (PG_ARGISNULL(0) || PG_ARGISNULL(1)) | ||
PG_RETURN_NULL(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The corresponding SQL function which uses this is already marked as STRICT which returns NULL on NULL argument, why do we need to check explicitly here as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we actually don't need it . I was following other function convection and thought should use it as a pre-check.
-- Not STRICT because VAR[BINARY] needs custom NULL handling | ||
CREATE OR REPLACE FUNCTION sys.charindex(expressionToFind VARBINARY, | ||
expressionToSearch VARBINARY, | ||
start_location INTEGER DEFAULT 0) | ||
RETURNS INTEGER AS 'babelfishpg_tsql', 'tsql_charindex_binary' | ||
LANGUAGE C IMMUTABLE PARALLEL SAFE; | ||
|
||
CREATE OR REPLACE FUNCTION sys.charindex(expressionToFind BINARY, | ||
expressionToSearch BINARY, | ||
start_location INTEGER DEFAULT 0) | ||
RETURNS INTEGER AS 'babelfishpg_tsql', 'tsql_charindex_binary' | ||
LANGUAGE C IMMUTABLE PARALLEL SAFE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test charindex with UDTs created on binary and varbinary. For example,
declare @a varbinary_udt = 0x656667
declare @b varbinary_udt = 0x707172656667
select charindex(@a, @b)
go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
DECLARE | ||
exception_message text; | ||
BEGIN | ||
ALTER FUNCTION sys.charindex RENAME TO charindex_deprecated_in_5_3_0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use sys.charindex(PG_CATALOG.TEXT, PG_CATALOG.TEXT, INTEGER)
explicitly, else it will raise error 'function name "sys.charindex" is not unique' if we are trying to upgrade from a lower version also has this fix to this version. Although we had added the exception handling which will convert that error to warning, but its better not to throw such incorrect warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it completely as per your below comment.
DO $$ | ||
DECLARE | ||
exception_message text; | ||
BEGIN | ||
ALTER FUNCTION sys.charindex RENAME TO charindex_deprecated_in_5_3_0; | ||
EXCEPTION WHEN OTHERS THEN | ||
GET STACKED DIAGNOSTICS | ||
exception_message = MESSAGE_TEXT; | ||
RAISE WARNING '%', exception_message; | ||
END; | ||
$$; | ||
|
||
CALL sys.babelfish_drop_deprecated_object('function', 'sys', 'charindex_deprecated_in_5_3_0'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are only updating the body, language, strictness and parallel safety of the function and not the definition(i.e. the number of arguments or argument types or return type), i do not see any point in deprecating the function. so why are we deprecating this function ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohkk. I didn't know it before. I thought whenever we update any sys_functions, we need to add it in upgrade file also. Thanks for noticing it. I will remove it from upgrade file.
-- Create tables with specific collations | ||
CREATE TABLE charindex_tests.ChineseTest ( | ||
ID INT, | ||
Content NVARCHAR(100) COLLATE Chinese_PRC_CI_AS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets also test other string datatypes columns such as VARCHAR, CHAR, NCHAR, with specific collation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. Thanks
( CAST('Test©®™' AS BINARY(50)), CAST('Special©Characters®™' AS VARBINARY(MAX)), 'Special characters' ); | ||
GO | ||
~~ROW COUNT: 4~~ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets also add tests for dependent objects(views, procedures, functions, computed columns, constraint) on charindex, to tests its behaviour in the upgrade scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added .
@@ -628,3 +628,4 @@ babel_test_numeric_int8_oper | |||
Test_cast_binary_bytea | |||
numeric_money | |||
MONEY_OBJECT | |||
charindex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add charindex to schedule files of other versions to test the charindex tests in upgrades from other versions as well.
Description
"This Pull Request includes two main enhancements to the CHARINDEX function:
The changes aim to enhance both functionality and performance of the CHARINDEX function."
Issues Resolved
[BABEL-3659] - CHARINDEX() returns incorrect result on [VAR]BINARY arguments
[BABEL-5912] - Enhancement: Charindex Function
Test Scenarios Covered
Check List
By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.