An automated Site Reliability Engineering (SRE) tool that monitors application logs, detects errors, and automates incident management by creating Jira tickets.
- Continuous log monitoring for error detection
- Intelligent detection of on-call employees
- Automatic Jira ticket creation and assignment to appropriate personnel
You can see an end-to-end execution demo in experimental/exp.ipynb
.
The script main.py
runs continuously, periodically scanning the log file for new errors.
To get started:
- Copy
.env.example
to.env
and fill in the necessary configuration values.
Additional configuration options:
MONITORING_INTERVAL
– Time interval (in seconds) between log checks (default:60
)LOG_FILE_PATH
– Path to the log file to monitor
If you don’t have a log file to test with:
- Use the helper script
utils/random_log_generator.py
to generate synthetic logs. - Or, simply try with the provided sample log file:
output/logs.log
This project was developed as part of the course CS 595 - TCPS: MLOps for Generative AI.
Special thanks to Professor Santosh Nukavarapu for such interesting project!