Module 1 — Lesson 7 of 8

Remediation at Scale

From clearing temp files on a single laptop to restarting services across thousands of endpoints — learn how Tanium enables fast, safe remediation at any scale.
📚 Overview
🔧 Deep Dive
🛠 Hands-On
Check
🚀
4
Steps: Detect → Analyze → Fix → Verify
💻
5
Common Remediation Types
📊
3
Phased Rollout Stages
15 sec
Tanium Action Delivery

Remediation Decision Tree

Alert / Issue Detected RCA Complete? Know the root cause? Yes Choose Remediation Type Automated? Manual? Self-service? Deploy in Phases Pilot → Early Adopters → Full No Go back to Lesson 6: RCA Verify: Health scores recovered?
0
Endpoints remediated in minutes
0
Hours saved vs. manual per 1K devices
0%
First-attempt success rate target

Types of Remediation

Automated
Runs without human intervention. "If disk < 10%, run cleanup." Best for low-risk, well-understood fixes.
👤
Manual
Engineer reviews the situation and clicks Deploy. For higher-risk fixes or unclear root causes.
🙌
Self-Service
User gets a pop-up: "Reboot to improve performance?" Reduces ticket volume.

Common Remediation Actions

Tanium Console — Remediation Packages
Available Packages
Custom
History
Package Target Issue Risk Level Recovery Mode
Clear Temp Files Low disk space Low 2-10 GB freed Auto
Restart Service Hung services, memory leaks Medium Immediate Manual
Set Power Plan CPU throttling Low Immediate Auto
Kill Process Resource-hogging app High Immediate Manual
Deep Disk Cleanup Severe disk exhaustion Medium 5-20 GB freed Manual

Staged Rollout Strategy

Phase 1: Pilot (10-20 endpoints)

Deploy to your testing group. Verify success, check for side effects. Wait 1-2 hours before proceeding.

Phase 2: Early Adopters (100-200 endpoints)

Single department. Monitor health scores for 4-8 hours. Confirm no regressions.

Phase 3: Full Deployment (remaining)

All remaining endpoints. Monitor 24 hours. Have a rollback plan ready.

Pro Tip

Always test on a pilot group first. Even well-intentioned scripts can have unintended consequences. Use a computer group labeled "Pilot - Remediation Testing" for this purpose.

Deploying a Tanium Action

Select Package

Choose from built-in Performance packages or custom ones your team created.

Define Target

Computer group, manual list, or live question results (e.g., "disk free < 5 GB").

Set Schedule

Run immediately, at a specific time, or on a recurring schedule.

Configure Options

Timeout, reissue to late-arriving endpoints, success/failure criteria.

Deploy & Monitor

Watch real-time status: succeeded, failed, pending. Investigate failures.

When Remediation Is Not Enough

🚧
Hardware
Failing HDD, bad RAM, overheating CPU. Tanium data confirms diagnosis for replacement order.
💾
Reimage
OS corrupted, bloated, or persistently infected. Schedule reimage via Tanium Deploy.
🔄
Refresh
6+ year device scoring <40 despite all remediation. Use data to justify budget request.
Key Takeaway

Knowing when NOT to remediate is as important as knowing how. If two different remediations fail, escalate rather than continuing to experiment on production endpoints.

Simulated: Deploy a Remediation Action

Tanium Console — Action Status: Restart CRM Service
Status
Logs
Details
187
Succeeded
8
Failed
5
Pending
200
Total Targeted
Elapsed: 2m 14s — Package: Restart-CRM-Service.ps1 — Target: Customer Service Group

Before vs. After Remediation

Compare health scores and key metrics before and after deploying the CRM service restart:

52
Health Score Before
78
Health Score After
91%
Memory Before
62%
Memory After

Scenario: 200 Endpoints with High Memory Usage

Customer Service department: 200 endpoints with memory above 90% for two weeks. Health scores dropped 75 → 52. Users report application freezes. Root cause: CRM update introduced a memory leak (4 GB+ after 8 hours).

What is the best remediation strategy?

Correct: B. Combine an immediate workaround (scheduled CRM restart) with a long-term fix (vendor escalation). Adding RAM (A) delays the symptom, not the leak. Rolling back without coordination (C) bypasses change management. Telling users to reboot (D) is unacceptable for a known IT issue.

Knowledge Check

1. What is the recommended approach before deploying a new remediation package to 1,000 endpoints?

Correct: B. Always test on a pilot group (10-20 endpoints), verify success without side effects, then expand in phases to minimize risk.

2. Which type of remediation is best for a well-understood, low-risk issue like clearing temp files?

Correct: B. For well-understood, low-risk fixes, automated remediation resolves issues before users notice, reduces ticket volume, and frees engineers for complex problems.

3. An endpoint consistently scores below 40 despite multiple remediation attempts. What is the most appropriate next step?

Correct: C. When software remediation has been exhausted, the issue is likely hardware-related or requires a fresh OS. Tanium Performance data provides the evidence needed to justify the escalation.