Module 1 — Lesson 7 of 8

Remediation at Scale

From clearing temp files on a single laptop to restarting services across thousands of endpoints — learn how Tanium enables fast, safe remediation at any scale.

📚 Overview

🔧 Deep Dive

🛠 Hands-On

✅ Check

🚀

Steps: Detect → Analyze → Fix → Verify

💻

Common Remediation Types

📊

Phased Rollout Stages

⏱

15 sec

Tanium Action Delivery

Remediation Decision Tree

Endpoints remediated in minutes

Hours saved vs. manual per 1K devices

First-attempt success rate target

Types of Remediation

⚡

Automated

Runs without human intervention. "If disk < 10%, run cleanup." Best for low-risk, well-understood fixes.

👤

Manual

Engineer reviews the situation and clicks Deploy. For higher-risk fixes or unclear root causes.

🙌

Self-Service

User gets a pop-up: "Reboot to improve performance?" Reduces ticket volume.

Common Remediation Actions

Tanium Console — Remediation Packages

Available Packages

Custom

History

Package	Target Issue	Risk Level	Recovery	Mode
Clear Temp Files	Low disk space	Low	2-10 GB freed	Auto
Restart Service	Hung services, memory leaks	Medium	Immediate	Manual
Set Power Plan	CPU throttling	Low	Immediate	Auto
Kill Process	Resource-hogging app	High	Immediate	Manual
Deep Disk Cleanup	Severe disk exhaustion	Medium	5-20 GB freed	Manual

Staged Rollout Strategy

Phase 1: Pilot (10-20 endpoints)

Deploy to your testing group. Verify success, check for side effects. Wait 1-2 hours before proceeding.

Phase 2: Early Adopters (100-200 endpoints)

Single department. Monitor health scores for 4-8 hours. Confirm no regressions.

Phase 3: Full Deployment (remaining)

All remaining endpoints. Monitor 24 hours. Have a rollback plan ready.

Pro Tip

Always test on a pilot group first. Even well-intentioned scripts can have unintended consequences. Use a computer group labeled "Pilot - Remediation Testing" for this purpose.

Deploying a Tanium Action

Select Package

Choose from built-in Performance packages or custom ones your team created.

Define Target

Computer group, manual list, or live question results (e.g., "disk free < 5 GB").

Set Schedule

Run immediately, at a specific time, or on a recurring schedule.

Configure Options

Timeout, reissue to late-arriving endpoints, success/failure criteria.

Deploy & Monitor

Watch real-time status: succeeded, failed, pending. Investigate failures.

Deep Dive: Building Custom Remediation Packages ▼

Write the Script

PowerShell (Windows) or Bash (macOS/Linux). Must be idempotent — running twice should be safe.

Test Locally

Run on a test endpoint manually to verify behavior and no side effects.

Create Package

Admin > Packages > Create. Upload script, define parameters, set timeout.

Add Metadata

Description, author, date, prerequisites. Critical for other team members.

Pilot Test

Deploy to your pilot group via Tanium action. Verify all endpoints succeed.

Promote to Prod

Once validated, the package is available for production use.

When Remediation Is Not Enough

🚧

Hardware

Failing HDD, bad RAM, overheating CPU. Tanium data confirms diagnosis for replacement order.

💾

Reimage

OS corrupted, bloated, or persistently infected. Schedule reimage via Tanium Deploy.

🔄

Refresh

6+ year device scoring <40 despite all remediation. Use data to justify budget request.

Key Takeaway

Knowing when NOT to remediate is as important as knowing how. If two different remediations fail, escalate rather than continuing to experiment on production endpoints.

Simulated: Deploy a Remediation Action

Tanium Console — Action Status: Restart CRM Service

Status

Logs

Details

187

Succeeded

Failed

Pending

200

Total Targeted

Elapsed: 2m 14s — Package: Restart-CRM-Service.ps1 — Target: Customer Service Group

Before vs. After Remediation

Compare health scores and key metrics before and after deploying the CRM service restart:

Health Score Before

Health Score After

91%

Memory Before

62%

Memory After

Scenario: 200 Endpoints with High Memory Usage

Customer Service department: 200 endpoints with memory above 90% for two weeks. Health scores dropped 75 → 52. Users report application freezes. Root cause: CRM update introduced a memory leak (4 GB+ after 8 hours).

What is the best remediation strategy?

Add more RAM to all 200 endpoints. Deploy a Tanium action to restart the CRM process on a schedule (e.g., during lunch), while escalating to the vendor to fix the memory leak. Roll back the CRM update without notifying the application team. Tell users to restart their computers more frequently.

Correct: B. Combine an immediate workaround (scheduled CRM restart) with a long-term fix (vendor escalation). Adding RAM (A) delays the symptom, not the leak. Rolling back without coordination (C) bypasses change management. Telling users to reboot (D) is unacceptable for a known IT issue.

← Previous: Root Cause Analysis Next: Real-World Scenarios →

DEX Training

Remediation at Scale

Remediation Decision Tree

Types of Remediation

Common Remediation Actions

Staged Rollout Strategy

Phase 1: Pilot (10-20 endpoints)

Phase 2: Early Adopters (100-200 endpoints)

Phase 3: Full Deployment (remaining)

Deploying a Tanium Action

Select Package

Define Target

Set Schedule

Configure Options

Deploy & Monitor

Write the Script

Test Locally

Create Package

Add Metadata

Pilot Test

Promote to Prod

When Remediation Is Not Enough

Simulated: Deploy a Remediation Action

Before vs. After Remediation

Scenario: 200 Endpoints with High Memory Usage

Knowledge Check