Description
Performance: Optimize uWSGI PSGI Configuration for K8s Deployment
NOTE: This is generated from Claude.ai being given our current config and being told to add create this ticket.
Summary
Need to analyze and optimize current uWSGI configuration to reduce slow responses and improve overall performance in our Kubernetes containerized environment.
Current Configuration
[uwsgi]
master = true
workers = 20
die-on-term = true
need-app = true
vacuum = true
disable-logging = true
listen = 1024
post-buffering = 4096
buffer-size = 65535
early-psgi = true
perl-no-die-catch = true
max-worker-lifetime = 3600
max-requests = 1000
reload-on-rss = 300
harakiri = 60
Performance Issues
- Slow response times observed
- Need data-driven optimization approach
- Current settings may not be optimal for our workload
Action Items (Priority Order)
1. Enable Performance Monitoring (CRITICAL - Do First)
Why: Need baseline metrics before making changes
Implementation:
- Add stats socket to uWSGI config:
stats = 127.0.0.1:9191 stats-http = true
- Deploy and verify stats endpoint accessibility
- Install
uwsgitop
for monitoring:pip install uwsgitop
Success Criteria: Can access uWSGI stats via curl http://localhost:9191
2. Collect Baseline Performance Data (HIGH)
Why: Establish current performance patterns before optimization
Tasks:
- Monitor for 24-48 hours and collect:
- Worker CPU/memory usage:
kubectl top pods
- uWSGI worker stats:
uwsgitop 127.0.0.1:9191
- Container resource limits vs usage
- Response time patterns
- Worker restart frequency
- Worker CPU/memory usage:
Success Criteria: Have documented baseline metrics
3. Enable Targeted Logging (HIGH)
Why: Identify specific bottlenecks without overwhelming logs
Implementation:
- Temporarily replace
disable-logging = true
with:log-slow = 1000 log-4xx = true log-5xx = true logto = /tmp/uwsgi.log
Success Criteria: Can identify slow requests and error patterns
4. Right-size Worker Count (MEDIUM-HIGH)
Why: Most impactful setting - too many workers can hurt performance
Analysis needed:
- Check container CPU allocation
- Monitor CPU usage per worker
- Test worker count =
(2 × CPU cores) + 1
as starting point
Implementation:
- If 4 CPU cores allocated, try
workers = 9
- Monitor performance impact
- Adjust based on CPU utilization patterns
Success Criteria: Workers fully utilize available CPU without thrashing
5. Optimize Memory Settings (MEDIUM)
Why: Frequent worker restarts hurt performance
Current issues:
reload-on-rss = 300
may be too aggressivemax-worker-lifetime = 3600
+max-requests = 1000
causes frequent restarts
Tasks:
- Monitor actual worker RSS usage
- Test increased settings:
reload-on-rss = 512 max-requests = 5000 max-worker-lifetime = 7200
Success Criteria: Reduced worker restart frequency
6. Tune Request Handling (MEDIUM)
Why: Better request processing and timeout handling
Implementation:
- Analyze typical request duration
- Adjust
harakiri = 120
if requests legitimately take >60s - Consider increasing
listen = 2048
if seeing connection drops
7. Optimize Buffer Settings (LOW-MEDIUM)
Why: Current buffer-size = 65535
may be oversized
Tasks:
- Monitor buffer utilization in stats
- Test with
buffer-size = 32768
- Adjust based on actual usage patterns
8. Load Testing & Validation (LOW)
Why: Validate improvements under controlled conditions
Implementation:
- Set up load testing environment
- Compare before/after metrics
- Document optimal configuration
Monitoring Commands
# uWSGI stats
curl http://localhost:9191
# Resource usage
kubectl top pods <pod-name>
kubectl describe pod <pod-name>
# Worker memory
kubectl exec <pod-name> -- ps aux | grep uwsgi
# Connection monitoring
kubectl exec <pod-name> -- netstat -an | grep :80 | sort | uniq -c
Red Flags to Watch For
- High worker churn (frequent restarts)
- Listen queue overflows in stats
- Consistent high CPU with low throughput
- Memory usage steadily climbing
- Regular harakiri timeouts
Success Metrics
- Reduced 95th percentile response time
- Decreased worker restart frequency
- Improved resource utilization efficiency
- Fewer timeout errors
Notes
- Make changes incrementally
- Monitor each change for 24+ hours before next adjustment
- Keep rollback plan ready
- Document all changes and their impact
Priority: High
Estimate: 1-2 sprints
Dependencies: Monitoring tools, load testing capability