Skip to content

Add earliest and latest in condition function #3640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jun 10, 2025

Conversation

xinyual
Copy link
Contributor

@xinyual xinyual commented May 20, 2025

Description

This pr add latest/earliest as two condition function, it will calculate the relative time according to relative string and current time.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#3639

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

xinyual added 4 commits May 20, 2025 13:39
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
@penghuo
Copy link
Collaborator

penghuo commented May 20, 2025

  • update docs please.
  • what is expectation of earliest and latest? Do we support it in where command?

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SPL doc, these functions are applied to _raw field, but in our impl it's just an alias of min/max?

@LantaoJin LantaoJin added the calcite calcite migration releated label May 22, 2025
Signed-off-by: xinyual <[email protected]>
@xinyual
Copy link
Contributor Author

xinyual commented May 22, 2025

  • update docs please.
  • what is expectation of earliest and latest? Do we support it in where command?

Add the doc.
This PR only add them as the aggregation. Will raise another PR as the boolean function.

xinyual added 7 commits May 29, 2025 17:39
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Comment on lines +639 to +640
"1970-01-18 20:22:32",
"2018-08-19 00:00:00"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid the current implementation may be wrong, the current implementation treat the earliest and latest as window function with default window frame. IMO, the behavior is odd. Please check again if they require to use other window frame instead default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xinyual does spark-ppl support eventstats earliest(birthdate), latest(birthdate), if not, can we decide later, and only support it as condition function?
From user perspective, It is alias of min/max function, not impact usability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark-ppl only supports it as condition function. Currently I forbid them as window/aggregation in the latest code.


Usage: earliest(relative_string, field) returns true if the field is after transferring the relative_string.

relative_string:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does relative_string follow RELATIVE_TIMESTAMP or OpenSearch date math?

  2. relative_string should include a clear definition of the supported time modifiers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does relative_string follow RELATIVE_TIMESTAMP or OpenSearch date math?

Yes.

  1. relative_string should include a clear definition of the supported time modifiers.

@xinyual add link to https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/functions/ppl-datetime.md#relative_timestamp here.

Comment on lines +58 to +62
ZonedDateTime candidateDatetime = ZonedDateTime.ofInstant(candidate, clock.getZone());
ZonedDateTime latest =
getRelativeZonedDateTime(
expression, ZonedDateTime.ofInstant(clock.instant(), clock.getZone()));
return latest.isAfter(candidateDatetime);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case all timestamp use same timezone, this PR actually does not have similar issue as #3725?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

xinyual added 8 commits June 5, 2025 17:13
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
LantaoJin
LantaoJin previously approved these changes Jun 7, 2025
Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once doc comment addressed.

@@ -61,4 +61,9 @@ private PPLOperandTypes() {}
public static final UDFOperandMetadata DATE_OR_TIMESTAMP_OR_STRING =
UDFOperandMetadata.wrap(
(CompositeOperandTypeChecker) OperandTypes.DATE_OR_TIMESTAMP.or(OperandTypes.STRING));
public static final UDFOperandMetadata STRING_DATE_OR_TIMESTAMP =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: change to STRING_OR_TIMESTAMP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already rename to STRING_TIMESTMAP, which means accept (STRING, TIMESTAMP)


Usage: earliest(relative_string, field) returns true if the field is after transferring the relative_string.

relative_string:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does relative_string follow RELATIVE_TIMESTAMP or OpenSearch date math?

Yes.

  1. relative_string should include a clear definition of the supported time modifiers.

@xinyual add link to https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/functions/ppl-datetime.md#relative_timestamp here.

xinyual added 2 commits June 9, 2025 11:32
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
LantaoJin
LantaoJin previously approved these changes Jun 9, 2025
xinyual added 2 commits June 10, 2025 11:00
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
@xinyual xinyual changed the title Add earliest and latest in aggregation and window function Add earliest and latest in condition function Jun 10, 2025
@qianheng-aws qianheng-aws merged commit 246138c into opensearch-project:main Jun 10, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
calcite calcite migration releated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants