Skip to content

Add splunk to ppl cheat sheet #3726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
217 changes: 217 additions & 0 deletions docs/user/ppl/reference/splunk_to_ppl_cheat_sheet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
# Splunk to OpenSearch PPL Cheat Sheet

This cheat sheet helps Splunk users transition to OpenSearch's PPL. It maps common Splunk Search Processing Language (SPL) commands to their PPL equivalents with examples.

## Structure and Concepts

| Aspect | Splunk SPL | OpenSearch PPL | Notes |
|--------|------------|---------------|-------|
| Query structure | `search terms \| command` | `source = index \| command` | PPL requires explicit source at the beginning |
| Index reference | `index=name*` | `source=name*` | Different command to specify data source, [PPL support refering to multiple indices](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/general/identifiers.rst#multiple-indices)|
| Raw field | Special `_raw` field | Identify a field in your OpenSearch data that contains the text content you want to work with (often `message` or `content` fields in log data) | No default raw field |
| Time field | Special `_time` field | User-specified timestamp field | No default time field in PPL, must reference it explicitly |


## Command Reference

This table provides a mapping between Splunk SPL commands and their OpenSearch PPL equivalents:

| Splunk SPL | OpenSearch PPL | Purpose |
|------------|---------------|---------|
| search | search | Basic searching of data |
| stats | stats | Statistical aggregation of data |
| eval | eval | Calculate and create new fields |
| rex | parse | Extract fields using regular expressions |
| table | fields | Select specific fields to display |
| bucket | span | Group numeric or time values into buckets |
| spath | json_extract | Extract fields from structured data (JSON) |
| where | where | Filter results based on conditions |
| dedup | dedup | Remove duplicate results |
| sort | sort | Sort results by specified fields |
| rename | rename | Rename fields in results |
| appendcols | appendcols | Append columns from subsearch |
| lookup | lookup | Enrich data with lookups |
| join | join | Combine results from multiple sources |
| fields | fields | Include or exclude fields |
| fillnull | fillnull | Replace null values |
| eventstats | eventstats | Calculate statistics while preserving events |
| bin | span | Group numeric values into bins |
| head | head | Retrieve the first N results |
| trendline | trendline | Generate trend lines for data |
| top | top | Find the most common values |
| mvexpand | expand | Expand multi-value fields |
| rare | rare | Find the least common values |

## Example Query Conversions

**Simple search:**
- Splunk: `error failed status=500`
- PPL: ```source=`*` multi_match(['*'], 'error failed') status=500```

**Aggregation:**
- Splunk: `... | stats count BY host, status | sort -count`
- PPL: `... | stats count() by host, status | sort - count`

**Time-based query:**
- Splunk: `... earliest=-7d | timechart span=1h count BY host`
- PPL: `... | where earliest("-1d", timestamp) | stats count() by span(timestamp, 1h), host`

**Complex calculation:**
- Splunk: `... | eval mb=bytes/1024/1024 | stats avg(mb) AS avg_mb BY host | where avg_mb > 100`
- PPL: `... | eval mb=bytes/1024/1024 | stats avg(mb) as avg_mb by host | where avg_mb > 100`

## Basic Search Syntax

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Basic search | `error` | `multi_match(['*'], 'error')` | PPL requires explicit use `multi_match` function to search any docs have fields match error |
| Multiple keyword search | `error failed` | `multi_match(['*'], 'error failed')` | PPL uses `multi_match` function to search any doc have fields match error or failed |
| Quoted phrases | `"error failed"` | `multi_match(['*'], 'error', type='phrase')` | PPL uses `multi_match` phrase to search any doc have fields match "error failed" phrase |
| Field value equals | `field = 404` | `field = 404` | Same syntax |
| Multiple values | `field IN (404, 503)` | `field in (404, 503)` | Same syntax |
| Field doesn't equal | `field != 404` | `field != 404` | Same syntax |
| Wildcard search | `field = value*` | `like(field, 'value%')` | PPL uses `like()` function with % as wildcard |

## Field Selection and Manipulation

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Select fields | `... \| fields field1, field2` | `... \| fields field1, field2` | Same syntax |
| Exclude fields | `... \| fields - field3` | `... \| fields - field3` | Same syntax |
| Rename fields | `... \| rename field1 AS new_name` | `... \| rename field1 as new_name` | Same syntax |
| Calculate field | `... \| eval new_field=field1 + field2` | `... \| eval new_field = field1 + field2` | Same syntax |

## Filtering

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Filter results | `... \| where field > 100` | `... \| where field > 100` | Same syntax |
| Compound filter | `... \| where field1=200 OR field2=203` | `... \| where field1=200 or field2=203` | Same syntax |


## Aggregation

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Count | `... \| stats count` | `... \| stats count()` | PPL requires parentheses |
| Count by field | `... \| stats count BY field` | `... \| stats count() by field` | Similar syntax |
| Multiple aggregations | `... \| stats count, avg(field1) BY field2` | `... \| stats count(), avg(field1) by field2` | Similar syntax |
| Distinct count | `... \| stats dc(field)` | `... \| stats dc(field)` | Similar syntax |
| Min/Max | `... \| stats min(field), max(field)` | `... \| stats min(field), max(field)` | Similar syntax |
| Percentiles | `... \| stats perc95(field)` | `... \| stats percentile(field, 95)` | Different function syntax |

## Sorting and Limiting

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Sort ascending | `... \| sort field` | `... \| sort field` | Same syntax |
| Sort descending | `... \| sort - field` | `... \| sort - field` | Same syntax |
| Sort multiple | `... \| sort field1, -field2` | `... \| sort field1, - field2` | Same syntax |
| Limit results | `... \| head 10` | `... \| head 10` | Same syntax |
| Get last results | `... \| tail 10` | `... \| tail 10` | Same syntax |

## Rex vs Parse

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Basic extraction | `... \| rex field=address "(?<streetNumber>\d+) (?<street>.+)"` | `... \| parse address "(?<streetNumber>\d+) (?<street>.+)"` | Similar syntax but different command name |
| Field specification | `... \| rex field=address ...` | `... \| parse address ...` | PPL uses direct field reference without "field=" prefix |
| Default field (_raw) | `... \| rex "(?<streetNumber>\d+) (?<street>.+)"` | Not supported | PPL does not support implicit _raw field and requires explicit field specification |
| Search and replace mode | `... \| rex mode=sed "s/\d+//g"` | Not supported | PPL does not support the search-and-replace mode like SPL's rex with mode=sed |
| Field override | `... \| rex field=address "(?<address>.+)"` | `... \| parse address "(?<address>.+)"` | Both support extracting to field with same name as source |

## Date and Time Functions

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Current time | `... \| eval now=now()` | `... \| eval now=now()` | Same syntax, SPL return epoch value, PPL return Returns the current date and time as a value in 'YYYY-MM-DD hh:mm:ss' format. |
| Relative time | `... \| eval date=relative_time(now(), "-1d@d")` | `... \| eval date=relative_time(timestamp, "%Y-%m-%d")` | Different function |
| Format time | `... \| eval date=strftime(_time, "%Y-%m-%d")` | `... \| eval date=date_format(timestamp, "%Y-%m-%d")` | Different function |
| Relative time | `earliest=-1d latest=now()` | `earliest("-1d", timestamp) and latest("now", timestamp)` | PPL supports earliest() and latest() functions |
| Time extraction | `... \| eval hour=strftime(_time, "%H")` | `... \| eval hour = date_format(timestamp, 'HH')` | PPL uses `date_format()` |
| Time bucket | `... \| bin _time span=5m \| stats count by _time` | `... \| stats count() by span(@timestamp, 5m)` | PPL uses `span()` |

## Dedup

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Deduplicate | `... \| dedup field1, field2` | `... \| dedup field1, field2` | Same syntax |
| Deduplicate with count | `... \| dedup 2 field1` | `... \| dedup 2 field1` | Same syntax |

## Lookup and Joins

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Basic lookup | `... \| lookup vendors product_id` | `... \| lookup vendors product_id` | Same syntax |
| Multiple mapping fields | `... \| lookup vendors product_id, category` | `... \| lookup vendors product_id, category` | Same syntax |
| Aliased field lookup | `... \| lookup vendors product AS id` | `... \| lookup vendors product as id` | PPL uses lowercase "as" |
| Lookup with append | Not applicable | `... \| lookup vendors product_id append dept as department` | PPL-specific feature |
| Lookup with replace | Not applicable | `... \| lookup vendors product_id replace dept as department` | PPL-specific feature |
| Inner join | `... \| join type=inner vendors [search index=vendors]` | `... \| inner join vendors` | Different syntax format |
| Left join | `... \| join type=left vendors [search index=vendors]` | `... \| left join vendors` | Different syntax format |
| Join with ON clause | `... \| join type=inner left=a right=b where a.id = b.id vendors` | `... \| inner join left=a right=b ON a.id = b.id vendors` | PPL uses "ON" instead of "where" |
| Append columns | `... \| appendcols [search source=other_index \| fields id, status]` | `... \| appendcols [source=other_index \| fields id, status]` | Similar syntax |

## Handling Null Values

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Basic null replacement | `... \| fillnull value=0 field1` | `... \| fillnull with 0 in field1` | Similar syntax but uses `with...in` format |
| Multiple fields | `... \| fillnull value="N/A" field1, field2, field3` | `... \| fillnull with 'N/A' in field1, field2, field3` | Similar syntax but uses `with...in` format |

## Results Limiting

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| First N results | `... \| head 10` | `... \| head 10` | Same syntax |
| Last N results | `... \| tail 10` | `... \| tail 10` | Same syntax |
| Moving average | `... \| trendline sma5(value)` | `... \| trendline sma5(value)` | Same syntax |
| Top values | `... \| top 10 field` | `... \| top 10 field` | Same syntax |
| Rare values | `... \| rare 10 field` | `... \| rare 10 field` | Same syntax |

## String Functions

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| String concatenation | `... \| eval result=field1 + " " + field2` | `... \| eval result = concat(field1, ' ', field2)` | PPL requires `concat()` function |
| Substring | `... \| eval result=substr(field, 0, 5)` | `... \| eval result = substring(field, 0, 5)` | Different function name |
| String length | `... \| eval length=len(field)` | `... \| eval length = length(field)` | Different function name |
| Lowercase | `... \| eval result=lower(field)` | `... \| eval result = lower(field)` | Same syntax |
| Uppercase | `... \| eval result=upper(field)` | `... \| eval result = upper(field)` | Same syntax |
| Replace | `... \| eval result=replace(field, "pattern", "replacement")` | `... \| eval result = replace(field, 'pattern', 'replacement')` | Same syntax |
| Trim whitespace | `... \| eval result=trim(field)` | `... \| eval result = trim(field)` | Same syntax |
| Contains (wildcard) | `... \| eval result=like(field, "%pattern%")` | `... \| eval result = like(field, '%pattern%')` | Same syntax |

## Conditional Functions

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| If condition | `... \| eval result=if(field > 100, "High", "Low")` | `... \| eval result = if(field > 100, 'High', 'Low')` | Same syntax |
| Case statement | `... \| eval grade=case(field > 90, "A", field > 80, "B", 1==1, "C")` | `... \| eval grade = case(field > 90 then 'A', field > 80 then 'B', else 'C')` | PPL uses `then` and `else` keywords |
| NULL check | `... \| eval result=if(isnull(field), "Missing", field)` | `... \| eval result = if(isnull(field), 'Missing', field)` | Same syntax |
| Empty check | `... \| eval result=if(isnotnull(field), field, "Default")` | `... \| eval result = if(isnotnull(field), field, 'Default')` | Same syntax |
| Coalesce (first non-null) | `... \| eval result=coalesce(field1, field2, "default")` | `... \| eval result = coalesce(field1, field2, 'default')` | Same syntax |

## Math Functions

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| Addition | `... \| eval sum=field1 + field2` | `... \| eval sum = field1 + field2` | Same syntax |
| Subtraction | `... \| eval diff=field1 - field2` | `... \| eval diff = field1 - field2` | Same syntax |
| Multiplication | `... \| eval product=field1 * field2` | `... \| eval product = field1 * field2` | Same syntax |
| Division | `... \| eval quotient=field1 / field2` | `... \| eval quotient = field1 / field2` | Same syntax |
| Modulo | `... \| eval remainder=field1 % field2` | `... \| eval remainder = field1 % field2` | Same syntax |
| Absolute value | `... \| eval result=abs(field)` | `... \| eval result = abs(field)` | Same syntax |
| Round | `... \| eval result=round(field, 2)` | `... \| eval result = round(field, 2)` | Same syntax |
| Ceiling | `... \| eval result=ceiling(field)` | `... \| eval result = ceil(field)` | Different function name |
| Floor | `... \| eval result=floor(field)` | `... \| eval result = floor(field)` | Same syntax |
| Power | `... \| eval result=pow(field, 2)` | `... \| eval result = pow(field, 2)` | Same syntax |
| Square root | `... \| eval result=sqrt(field)` | `... \| eval result = sqrt(field)` | Same syntax |

## Other Functions

| Operation | Splunk SPL | OpenSearch PPL | Notes |
|-----------|------------|---------------|-------|
| MD5 hash | Not native | `... \| eval hash = md5('string')` | PPL-specific feature |
| SHA1 hash | Not native | `... \| eval hash = sha1('string')` | PPL-specific feature |
| JSON extraction | `... \| spath input=data path=user.name output=username` | `... \| eval username = json_extract(data, '$.user.name')` | Different approach |
Loading