Remediation at Scale
Remediation Decision Tree
Types of Remediation
Common Remediation Actions
| Package | Target Issue | Risk Level | Recovery | Mode |
|---|---|---|---|---|
| Clear Temp Files | Low disk space | Low | 2-10 GB freed | Auto |
| Restart Service | Hung services, memory leaks | Medium | Immediate | Manual |
| Set Power Plan | CPU throttling | Low | Immediate | Auto |
| Kill Process | Resource-hogging app | High | Immediate | Manual |
| Deep Disk Cleanup | Severe disk exhaustion | Medium | 5-20 GB freed | Manual |
Staged Rollout Strategy
Phase 1: Pilot (10-20 endpoints)
Deploy to your testing group. Verify success, check for side effects. Wait 1-2 hours before proceeding.
Phase 2: Early Adopters (100-200 endpoints)
Single department. Monitor health scores for 4-8 hours. Confirm no regressions.
Phase 3: Full Deployment (remaining)
All remaining endpoints. Monitor 24 hours. Have a rollback plan ready.
Always test on a pilot group first. Even well-intentioned scripts can have unintended consequences. Use a computer group labeled "Pilot - Remediation Testing" for this purpose.
Deploying a Tanium Action
Select Package
Choose from built-in Performance packages or custom ones your team created.
Define Target
Computer group, manual list, or live question results (e.g., "disk free < 5 GB").
Set Schedule
Run immediately, at a specific time, or on a recurring schedule.
Configure Options
Timeout, reissue to late-arriving endpoints, success/failure criteria.
Deploy & Monitor
Watch real-time status: succeeded, failed, pending. Investigate failures.
When Remediation Is Not Enough
Knowing when NOT to remediate is as important as knowing how. If two different remediations fail, escalate rather than continuing to experiment on production endpoints.
Simulated: Deploy a Remediation Action
Before vs. After Remediation
Compare health scores and key metrics before and after deploying the CRM service restart:
Scenario: 200 Endpoints with High Memory Usage
Customer Service department: 200 endpoints with memory above 90% for two weeks. Health scores dropped 75 → 52. Users report application freezes. Root cause: CRM update introduced a memory leak (4 GB+ after 8 hours).
What is the best remediation strategy?
Correct: B. Combine an immediate workaround (scheduled CRM restart) with a long-term fix (vendor escalation). Adding RAM (A) delays the symptom, not the leak. Rolling back without coordination (C) bypasses change management. Telling users to reboot (D) is unacceptable for a known IT issue.
Knowledge Check
1. What is the recommended approach before deploying a new remediation package to 1,000 endpoints?
Correct: B. Always test on a pilot group (10-20 endpoints), verify success without side effects, then expand in phases to minimize risk.
2. Which type of remediation is best for a well-understood, low-risk issue like clearing temp files?
Correct: B. For well-understood, low-risk fixes, automated remediation resolves issues before users notice, reduces ticket volume, and frees engineers for complex problems.
3. An endpoint consistently scores below 40 despite multiple remediation attempts. What is the most appropriate next step?
Correct: C. When software remediation has been exhausted, the issue is likely hardware-related or requires a fresh OS. Tanium Performance data provides the evidence needed to justify the escalation.