Add lambda function and array related functions #3584

xinyual · 2025-04-27T06:03:03Z

Description

This pr adds lambda function and array related functions. Calcite don't have array related functions so we need to implement by ourselves.
Now the logic for lambda is:
We will consider lambda function as a new PPL expression and parse it regularly to construct rexnode. To get return type for lambda expression, we need to firstly map the argument type in the calciteContext. For example, forall(array(1, 2, 3), x -> x > 0), then x -> INTEGER.
We also have an exception for reduce because the acc is the dynamic type.
The calcite/lin4j generate code according to the input type. For example, reduce(array(1.0,2.0 ,3.0), 0, (acc, x) -> acc + x). Ideally, we should map acc -> INTEGER, x -> DOUBLE. But if we map through this, the code of + would be plus(INTERGER acc, DOUBLE x), then after first apply, the acc would be double, then it will throw exception. Thus, we apply ANY to the acc and infer the return type in getReturnTypeInference

The function is aligned with https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/functions/ppl-collection.md

TODO: nested object is not supported in lambda currently. It will be automatically supported when we support this. E.g. x -> x.a > 0

For detailed implementation and description:

Functions	argument	description	return type	implementation
ARRAY	ARRAY(value1: ANY, value2:ANY, ...)	create an array with input values. Currently we don't allow mixture types. We will infer a least restricted type, for example array(1, "demo") -> ["1", "demo"]	ARRAY	wrap `SqlLibraryOperators.ARRAY`
ARRAY_LENGTH	ARRAY_LENGTH(value: ARRAY)	return array length	integer	`SqlLibraryOperators.ARRAY_LENGTH`
FORALL	forall(value:ARRAY, function: LAMBDA)	check whether all element inside array can meet the lambda function. The function should also return boolean.	boolean	implement by ourselves since we cannot find matched built-in calcite one.
EXISTS	exists(value:ARRAY, function: LAMBDA)	check whether existing one of element inside array can meet the lambda function. The function should also return boolean.	boolean	implement by ourselves since we cannot find matched built-in calcite one.
FILTER	filter(value:ARRAY, function: LAMBDA)	filter the element in the array by the lambda function. The function should return boolean	array	implement by ourselves since we cannot find matched built-in calcite one.
TRANSFORM	transform(value:ARRAY, function: LAMBDA)	transform the element of array one by one using lambda. Transform can accept one more argument like (x, i) -> x + i, where i is the index of element in array.	array	implement by ourselves since we cannot find matched built-in calcite one.
REDUCE	reduce(value:ARRAY, base_value:ANY, acc_function: LAMBDA)/reduce(value:ARRAY, base_value:ANY, acc_function: LAMBDA, reduce_function:LAMBDA)	The function will first use acc_function to go through all element and return value to the acc. Then apply reduce function to the acc if exists. The acc_function's lambda format is (acc,x) -> ..., the reduce_function format is (acc) -> ...	ANY, according to the lambda function	implement by ourselves since we cannot find matched built-in calcite one.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#3575

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: xinyual <[email protected]>

core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java

LantaoJin · 2025-05-29T02:32:19Z

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java

+    switch (targetType) {
+      case DOUBLE:
+        List<Object> unboxed =
+            IntStream.range(0, args.length - 1)
+                .mapToObj(i -> ((Number) args[i]).doubleValue())
+                .collect(Collectors.toList());
+
+        return unboxed;
+      case FLOAT:
+        List<Object> unboxedFloat =
+            IntStream.range(0, args.length - 1)
+                .mapToObj(i -> ((Number) args[i]).floatValue())
+                .collect(Collectors.toList());
+        return unboxedFloat;


could you explain why this special logic needed?

We need to internally convert it. Otherwise, the calcite will directly cast like DOUBLE to INTEGER, which will raise exception.

LantaoJin · 2025-05-29T02:34:52Z

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java

+import org.apache.calcite.sql.type.SqlTypeName;
+import org.opensearch.sql.expression.function.ImplementorUDF;
+
+public class ArrayFunctionImpl extends ImplementorUDF {


can't we reuse SqlLibraryOperators.ARRAY? Again, please add a reason in PR description for any new added function why it must implement by ourselves.

Already update the implementation. Wrap the implementation of SqlLibraryOperators.ARRAY

LantaoJin · 2025-05-29T02:38:55Z

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ExistsFunctionImpl.java

+import org.apache.calcite.sql.type.SqlReturnTypeInference;
+import org.opensearch.sql.expression.function.ImplementorUDF;
+
+public class ExistsFunctionImpl extends ImplementorUDF {


can't we reuse SqlLibraryOperators.ARRAY_CONTAINS? please check all SqlLibraryOperators.ARRAY_* first.

Confirmed. All SqlLibraryOperators.ARRAY_* is for array related function which is not related to lambda. We use SqlLibraryOperators .array_length

Signed-off-by: xinyual <[email protected]>

LantaoJin · 2025-06-04T01:54:12Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java

+                  arguments,
+                  node.getFuncName(),
+                  lambdaNode.getType());
+          lambdaNode = analyze(arg, lambdaContext);


why analyze reduce twice?

Reduce is very special case since it will sometimes change the type of accumulator. For example, reduce([1.0, 2.0], 0, (acc, x) -> acc + x, acc -> acc * 10). Here the lambda (acc, x) -> acc + x firstly will find (integer, double) and then during the iteration, find (double, double). Current solution is we will first analyze once and find the return type is double, then use double as the expected input and cast the initial value of acc to the expected type.

Does it necessary to detect a non-any type in analyzing phase? What if the input list has type of ARRAY?

And does it make sense to infer the return type by using leastRestrictive(arg0. getComponentType(), arg1.getType() instead of analyzing twice.

Does it necessary to detect a non-any type in analyzing phase? What if the input list has type of ARRAY?

ANY would block the implementation in two parts: 1. The UDF part sometimes needs type to choose implementation 2. any would also be blocker for type checker. For example, we use calcite multiply, which only support numeric/interval * numeric/interval, any would throw exception when check the type.

And does it make sense to infer the return type by using leastRestrictive(arg0. getComponentType(), arg1.getType() instead of analyzing twice.

leastRestrictive(arg0. getComponentType(), arg1.getType() doesn't work here. For example, acc=0, (acc, x) -> acc + length(x) * 1.0 would return double, which means we need to cast acc base value to double. But leastRestrictive(integer, string) wouldn't be double.

LantaoJin · 2025-06-04T01:56:04Z

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java

+   * @return We wrap it here to accept null since the original return type inference will generate
+   *     non-nullable type


Does Spark array accept null either? Why we do this wrap?

Yes. Spark array accept null.

LantaoJin · 2025-06-04T02:00:47Z

docs/user/ppl/functions/collection.rst

+
+Version: 3.1.0
+
+Usage: ``array(value1, value2, value3...)`` create an array with input values. Currently we don't allow mixture types. We will infer a least restricted type, for example ``array(1, "demo")`` -> ["1", "demo"]


Question: what is the reason to support infer a least restricted type instead of throwing exception?

please add a Limitation: after each Usage: to explain these functions only work with plugins.calcite.enabled=true

Question: what is the reason to support infer a least restricted type instead of throwing exception?

This is aligned with SPARK.

Signed-off-by: xinyual <[email protected]>

LantaoJin · 2025-06-07T03:37:27Z

core/src/main/java/org/opensearch/sql/expression/function/PPLTypeChecker.java

+            // case DATETIME_INTERVAL ->
+            // SqlTypeName.INTERVAL_TYPES.stream().map(OpenSearchTypeFactory.TYPE_FACTORY::createSqlIntervalType).toList();


how DATETIME_INTERVAL impact reduce?

It's a useless change, I will revert it.

Signed-off-by: xinyual <[email protected]>

qianheng-aws · 2025-06-10T03:25:58Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java

+                  arguments,
+                  node.getFuncName(),
+                  lambdaNode.getType());
+          lambdaNode = analyze(arg, lambdaContext);


Does it necessary to detect a non-any type in analyzing phase? What if the input list has type of ARRAY?

qianheng-aws · 2025-06-10T03:30:14Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java

+                  arguments,
+                  node.getFuncName(),
+                  lambdaNode.getType());
+          lambdaNode = analyze(arg, lambdaContext);


And does it make sense to infer the return type by using leastRestrictive(arg0. getComponentType(), arg1.getType() instead of analyzing twice.

qianheng-aws · 2025-06-10T03:38:07Z

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java

+import org.opensearch.sql.expression.function.UDFOperandMetadata;
+
+// TODO: Support array of mixture types.
+public class ArrayFunctionImpl extends ImplementorUDF {


I would be great to add description and a simple example for each function. It should only show the functionality of this function and as simple as possible. For example, array(1, 2, 3) -> [1, 2, 3] would be enough for array function.

It will improve the code readability for developer, different from the doc for customer. You can do it later in another PR

Already add for each. Please check.

Signed-off-by: xinyual <[email protected]>

xinyual added 11 commits April 23, 2025 16:49

add forall

06fe11a

Signed-off-by: xinyual <[email protected]>

add filter/exists/

31733dd

Signed-off-by: xinyual <[email protected]>

add reduce

e305747

Signed-off-by: xinyual <[email protected]>

add return type inference

8a9d024

Signed-off-by: xinyual <[email protected]>

fix exists

689534b

Signed-off-by: xinyual <[email protected]>

add map for lambda

2cc41d8

Signed-off-by: xinyual <[email protected]>

add infer for reduce

bacccdf

Signed-off-by: xinyual <[email protected]>

add java doc

013f7df

Signed-off-by: xinyual <[email protected]>

merge from main

9fb35fe

Signed-off-by: xinyual <[email protected]>

revert useless change

3d465e6

Signed-off-by: xinyual <[email protected]>

renane

8051c52

Signed-off-by: xinyual <[email protected]>

xinyual marked this pull request as ready for review April 27, 2025 06:05

xinyual requested review from ps48, kavithacm, derek-ho, joshuali925, dai-chen, YANG-DB, mengweieric, Swiddis, penghuo, seankao-az, MaxKsyunz, Yury-Fridlyand, anirudha, forestmvey, acarbonetto, GumpacG, ykmr1224 and LantaoJin as code owners April 27, 2025 06:05

LantaoJin reviewed May 29, 2025

View reviewed changes

xinyual added 9 commits May 30, 2025 10:55

merge from main

9cca2fc

Signed-off-by: xinyual <[email protected]>

test

241a57c

Signed-off-by: xinyual <[email protected]>

use builtin operator

c90f53d

Signed-off-by: xinyual <[email protected]>

add array_length with test

a22a524

Signed-off-by: xinyual <[email protected]>

optimize reduce

7f9c6ec

Signed-off-by: xinyual <[email protected]>

add UT

17a19d3

Signed-off-by: xinyual <[email protected]>

fix reduce and add doc

70d6a7a

Signed-off-by: xinyual <[email protected]>

revert useless change

49c3794

Signed-off-by: xinyual <[email protected]>

add doc

b85aafa

Signed-off-by: xinyual <[email protected]>

LantaoJin reviewed Jun 4, 2025

View reviewed changes

xinyual added 6 commits June 4, 2025 15:41

Merge remote-tracking branch 'origin/main' into addCollection

06d746f

Merge remote-tracking branch 'origin/main' into addCollection

9cffad0

add type checker

47d4189

Signed-off-by: xinyual <[email protected]>

fix ARRAY

e7acda2

Signed-off-by: xinyual <[email protected]>

optimize reduce logic

e4f8e5e

Signed-off-by: xinyual <[email protected]>

revert useless change

23fa5d3

Signed-off-by: xinyual <[email protected]>

LantaoJin previously approved these changes Jun 7, 2025

View reviewed changes

revert useless change

32bfaa8

Signed-off-by: xinyual <[email protected]>

xinyual dismissed LantaoJin’s stale review via 32bfaa8 June 9, 2025 02:47

LantaoJin previously approved these changes Jun 9, 2025

View reviewed changes

qianheng-aws reviewed Jun 10, 2025

View reviewed changes

xinyual added 2 commits June 10, 2025 13:31

add description for each function

adb8997

Signed-off-by: xinyual <[email protected]>

merge from main

13b64c2

Signed-off-by: xinyual <[email protected]>

xinyual dismissed LantaoJin’s stale review via 13b64c2 June 10, 2025 05:38

xinyual added 2 commits June 10, 2025 13:48

fix redundency error

c857635

Signed-off-by: xinyual <[email protected]>

fix redundency name

53e9918

Signed-off-by: xinyual <[email protected]>

qianheng-aws approved these changes Jun 10, 2025

View reviewed changes

LantaoJin approved these changes Jun 10, 2025

View reviewed changes

LantaoJin merged commit 122ae79 into opensearch-project:main Jun 10, 2025
22 checks passed

		* @return We wrap it here to accept null since the original return type inference will generate
		* non-nullable type


		Version: 3.1.0

		Usage: ``array(value1, value2, value3...)`` create an array with input values. Currently we don't allow mixture types. We will infer a least restricted type, for example ``array(1, "demo")`` -> ["1", "demo"]

		// case DATETIME_INTERVAL ->
		// SqlTypeName.INTERVAL_TYPES.stream().map(OpenSearchTypeFactory.TYPE_FACTORY::createSqlIntervalType).toList();

Add lambda function and array related functions #3584

Add lambda function and array related functions #3584

Uh oh!

Conversation

xinyual commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xinyual Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xinyual Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

xinyual commented Apr 27, 2025 •

edited

Loading

xinyual Jun 5, 2025 •

edited

Loading

xinyual Jun 10, 2025 •

edited

Loading