Simulation & Safety
One of Inflight's core principles is that no change should reach production without validation. The simulation engine lets you test configuration changes before deployment, reducing risk and building confidence.
Built on Real Production Data
Unlike load testing or chaos engineering—which can only safely run in lower environments—Inflight simulations are built from your actual production metrics. Your APM is already collecting data from real users, real traffic patterns, and real workload characteristics. Inflight uses that data to model what's actually happening in production, not what a synthetic test thinks might happen.
Real Traffic Patterns
Simulations reflect your actual load distribution, peak hours, and usage patterns—not idealized test scenarios.
Zero Production Impact
All simulation happens offline using collected metrics. Nothing touches your running services.
No Environment Limitations
Load tests and chaos experiments are limited to staging. Inflight gives you production-grade insights without the risk.
Continuous Learning
As your production behavior evolves, the simulation models automatically update to reflect current reality.
Why Simulate?
Traditional performance optimization is risky. You change a setting, deploy to production, and hope for the best. If something goes wrong, you roll back and try again. This approach is time-consuming, stressful, and can impact your users.
Without Simulation
- Trial and error in production
- Unexpected outages
- Wasted engineering time
- User-facing impact
With Simulation
- Validate before deploying
- Understand predicted impact
- Deploy with confidence
- Protect your users
How Simulation Works
The Inflight simulator uses your historical metrics to model your application's behavior. When you request a recommendation or want to test a change, here's what happens:
Baseline Capture
The simulator establishes a baseline by analyzing your current performance metrics and resource utilization patterns from the metrics store.
Model Calibration
Using your historical data, the simulator calibrates models specific to your application. Automatic drift detection ensures models stay accurate. Backtester validates against holdout data.
Fidelity Selection
The system automatically selects the appropriate fidelity mode: STATISTICAL for simple changes, HYBRID for moderate complexity, FULL DES for complex scenarios, or DEGRADED when data is insufficient.
Change Application
The proposed configuration change is applied to the model using discrete event simulation with advanced statistical methods and stochastic process modeling.
Safety Evaluation
Platform-aware safety framework validates against Kubernetes limits, Cloud Run quotas, memory/CPU thresholds, and GC stability. Governance system tracks parameter changes.
Report Generation
A comprehensive report is generated with safety verdict, predicted impacts, confidence intervals, model fingerprints, and full audit trail with provenance.
Multi-Fidelity Simulation Engine
The simulator automatically selects the optimal fidelity mode based on change complexity, data availability, and time constraints:
STATISTICAL
Fast statistical models for simple changes with sufficient historical data. Uses regression analysis and distribution fitting. Sub-second response times.
HYBRID
Combines statistical models with targeted simulation for moderate complexity. Balances accuracy and speed. Typical response in 1-5 seconds.
FULL SIMULATION
Complete discrete event simulation with advanced statistical methods and stochastic process modeling. Highest accuracy for complex changes.
DEGRADED
Fallback mode when time or data constraints prevent full evaluation. Provides best-effort predictions with reduced confidence scores.
Automatic Fidelity Escalation
When initial predictions have low confidence, the system automatically escalates to higher fidelity modes. The Advisor can trigger fidelity escalation when comparing candidates to ensure reliable recommendations.
Model Calibration & Backtesting
The simulator continuously learns from production data to maintain prediction accuracy:
Continuous Calibration
Models are calibrated from production metrics using batch processing and hot overrides. Cross-validation on holdout sets ensures generalization. SHA256 fingerprinting tracks model integrity.
Drift Detection
Continuous monitoring detects distribution drift between calibration and production data. Automatic recalibration triggers when drift exceeds thresholds.
Backtester Validation
Rolling-origin validation tests calibrated models on fresh historical data. KPIs (coverage, sharpness, accuracy) are tracked for scientific validation and guardrail engines.
Governance System
Parameter management with hierarchical priors. Full audit trail for scientific validation. Metric binding audits ensure prediction-metric alignment. Revision tracking with rollback capability.
Safety Verdicts
Every simulation produces a safety verdict that helps you decide whether to proceed with the change:
APPROVED
All safety thresholds met. Win probability exceeds the required threshold for your service tier. Safe to deploy.
Typical Criteria:
- Win probability meets or exceeds service-tier threshold (e.g., ≥90% for critical services)
- All critical thresholds pass (CPU, memory, GC pause times, error rates)
- Risk categories show OK status across memory, GC stability, and availability
- Similar configurations have succeeded in comparable services
WARNING
Win probability below threshold or some risk factors detected. Change may work but carries risk—validate in staging first.
Typical Criteria:
- Win probability below required threshold for service tier
- One or more risk categories show HIGH status
- Some thresholds approaching limits under peak load simulation
- Limited calibration data reduces prediction confidence
REJECT
Critical issues detected. The configuration change will not remediate the root cause and may cause service unavailability.
Typical Criteria:
- Critical risk categories detected (memory leaks, severe GC issues)
- Thresholds exceeded by unsafe margins (e.g., 100% heap after GC)
- Predicted service unavailability or health check failures
- Root cause analysis required before any configuration changes
Understanding the Simulation Report
Each simulation produces a comprehensive guidance report with multiple layers of analysis to help you make informed decisions:
Win Probability & Service Tier Thresholds
Every simulation calculates a win probability—the likelihood that the proposed change will achieve its intended outcome without negative side effects. This probability is compared against a threshold based on your service tier:
- Critical-tier services: ≥90% win probability required
- Standard-tier services: ≥80% win probability required
- Development-tier services: ≥70% win probability required
Safety Verdict Summary
The report breaks down risk assessment into specific categories, each rated as OK, HIGH, or CRITICAL:
Memory Risk
Heap usage post-GC, memory leak detection, allocation patterns
GC Stability
Pause time predictions (P99), collection frequency, stop-the-world events
Throughput
Request handling capacity, degradation predictions under load
Availability
Health check predictions, circuit breaker triggers, service stability
Critical Thresholds
Each report evaluates your metrics against critical thresholds, showing pass/warning/fail status for each:
Thresholds are configurable based on your service requirements and SLOs.
What Gets Predicted
The simulator predicts impact across multiple dimensions:
Performance Metrics
- Response time changes (p50, p95, p99)
- Throughput impact
- Error rate predictions
- Latency distribution shifts
Resource Utilization
- Memory consumption changes
- CPU utilization impact
- GC pause time predictions
- Thread/goroutine counts
Stability Indicators
- OOM risk assessment
- Throttling probability
- Container restart likelihood
- Resource contention risks
Confidence Scores
- Model confidence level
- Data quality indicators
- Prediction uncertainty ranges
- Historical correlation strength
Evidence-Based Recommendations
Unlike black-box optimization tools, Inflight shows its work. Every recommendation includes:
Clear Reasoning
Why this change is being recommended based on your specific metrics and workload patterns.
Data Citations
Links to the actual metric data that informed the recommendation, so you can verify the analysis.
Trade-off Analysis
When a change improves one metric but may affect another, you'll see the full picture.
Rollback Guidance
Clear instructions for how to revert if the change doesn't perform as expected in production.
Customizable Analysis
Simulation reports aren't static templates—they're dynamically generated using LLM prompts that you can customize and extend to meet your specific needs.
Dynamic Report Generation
Each report is generated fresh based on your current metrics and context—not pulled from a fixed template. The AI analyzes your specific situation every time.
Custom Prompts
Create additional prompts to extract deeper insights tailored to your domain. Ask about specific failure modes, compliance requirements, or team-specific concerns.
Extended Analysis
Go beyond the standard report with custom queries: "What would happen under 2x traffic?" or "How does this affect our SLA commitments?"
Team-Specific Views
Configure different analysis perspectives for different stakeholders—detailed technical breakdowns for engineers, risk summaries for ops, cost implications for leadership.
Your Analysis, Your Way
The power of LLM-driven analysis means you're not limited to what we anticipated. If you need insights we didn't think of, create a prompt for it. The simulation data is available for whatever analysis your team requires.
Using the Simulator
You can interact with the simulator in two ways:
Automatic Simulation
Every AI recommendation automatically includes simulation results. When the Advisor suggests a change, it's already been validated.
Manual What-If
You can also manually test any configuration change you're considering. Enter the proposed settings and see the predicted impact before committing.
Calibration & Simulation Fidelity
The accuracy of simulation predictions depends on how well the models are calibrated to your specific application behavior. Inflight continuously tracks calibration quality and reports it with every simulation.
Calibration Window
Predictions are based on a configurable window of historical data—typically 7-30 days. Longer windows provide more stable models; shorter windows capture recent behavior changes more quickly.
Simulation Fidelity
Each report includes a fidelity indicator showing how well the simulation models match observed reality. Fidelity may be reduced when calibration data is limited or workload patterns are highly variable.
Calibration Breaches
If your actual production behavior deviates significantly from the calibration data, the system detects this and may reduce confidence scores or flag the need for recalibration.
Confidence Scoring
Every prediction includes a confidence percentage reflecting data quality, model fit, and how closely the proposed scenario matches validated patterns from your calibration data.
Evidence Provenance
Each simulation report cites the evidence source—including which APM provided the data, the fidelity mode used, and the calibration window. This transparency lets you understand exactly what informed each prediction.
How It Compares
There are other ways to test configuration changes, but each has significant limitations that Inflight addresses:
| Approach | Data Source | Environment | Risk |
|---|---|---|---|
| Load Testing | Synthetic traffic | Lower environments only | Low, but doesn't reflect production |
| Chaos Engineering | Live traffic | Usually staging, sometimes production | High—intentionally causes disruption |
| A/B Config Testing | Real traffic | Production | Medium—exposes some users to changes |
| Inflight Simulation | Real production metrics | Offline (any environment) | Zero—no impact on running services |
Inflight is the only approach that gives you production-grade insights without requiring production access or risking production stability.
Understanding Limitations
While simulation significantly reduces deployment risk, it's important to understand its limitations:
- Predictions are based on historical patterns. Unprecedented workload spikes may not be accurately modeled.
- Complex interactions between multiple simultaneous changes may have emergent effects.
- External dependencies (databases, third-party APIs) are modeled based on observed behavior, not internal state.
We always recommend validating significant changes in staging environments when possible, and monitoring closely after production deployment.