Module 1 — Lesson 5 of 8

Alerting & Thresholds

Shift from reactive firefighting to proactive detection by configuring intelligent alerts and fine-tuning thresholds in Tanium Performance.
📚 Overview
🔧 Deep Dive
🛠 Hands-On
Check
3
Severity Levels
🔔
5
Common Threshold Metrics
🔥
<10%
Target False Positive Rate
🕑
<20
Alerts/Day per 5K Endpoints

Alert Processing Flow

Metric Data CPU, Disk, RAM... Threshold Check >90% for 10 min? Alert Fires Severity assigned Notification Email, SNOW, Slack Act Fix it
Key Concept

Proactive alerting is the bridge between monitoring (knowing what is happening) and action (fixing it). Without alerts, dashboards are just pretty pictures nobody watches 24/7.

0
Alerts in an untuned environment (per day)
0
Alerts after proper tuning (per day)
0%
Reduction with threshold tuning

Alert Severity Levels

🚨
Critical
Immediate action. User cannot work. Auto-page on-call + P2 SNOW incident.
Warning
Attention within hours. Degraded but functional. P3 SNOW incident + email.
🛈
Info
No immediate action. Trend analysis & capacity planning. Dashboard only.

Common Threshold Configurations

Tanium Console — Performance > Alert Rules
Alert Rules
Alert History
Notifications
Rule Name Metric Condition Persistence Severity Status
High CPU Sustained CPU Utilization > 90% 10 minutes Critical ● Enabled
Low Disk Space Disk Free % < 10% Immediate Critical ● Enabled
High Memory Memory Utilization > 95% 5 minutes Warning ● Enabled
Slow Boot Time Boot Duration > 120 sec Immediate Warning ● Enabled
Stale Reboot System Uptime > 30 days Immediate Info ● Enabled

Persistence Windows

A persistence window is the duration a condition must remain true before the alert fires. Without it, a brief CPU spike from opening Excel triggers a Critical alert.

92%
Brief spike (2 min) — No alert
94%
Sustained (12 min) — Alert fires
35%
Normal baseline — No alert

Notification Channels

Email Notifications

Simplest channel. Configure SMTP, add distribution groups. Best for Warning and Info alerts.

ServiceNow Integration

Auto-create incidents with severity-to-priority mapping, affected CI, and diagnostic details in work notes.

Webhook / API

Send alert payloads to Slack, Teams, PagerDuty, or custom automation. Full alert context in JSON.

Alert Fatigue: The Silent Killer

Warning Signs
  • Team members dismiss alerts without investigating
  • Alert email folders have thousands of unread messages
  • Critical incidents discovered by users, not alerts
  • Engineers auto-archive alert emails
  • New hires told "just ignore most of those"
Prevention

Start with 5-10 high-confidence rules. Review volume weekly. Require action for every alert — if the response is always "nothing to do," eliminate or downgrade it. Use suppression windows during maintenance.

Simulated Alert Dashboard

Review the following alert dashboard. Identify which alerts need immediate action and which are noise.

Tanium Console — Active Alerts
Alerts (6)
Rules
History
Hostname Severity Metric Value Duration Timestamp
CAEI-445521 CRITICAL Disk Free % 2% 3 hours 08:12 AM
CAEI-778432 CRITICAL CPU Utilization 99% 45 min 10:30 AM
CAEI-332198 WARNING Memory 89% 12 min 10:55 AM
CAEI-990015 WARNING Boot Time 142 sec N/A 09:01 AM
CAEI-112847 INFO Uptime 34 days N/A 07:00 AM
CAEI-556310 INFO Uptime 21 days N/A 07:00 AM

Scenario: Alert Storm After Patch Deployment

Wednesday morning. Your team deployed a cumulative Windows update to 2,000 endpoints last night. You arrive to find 347 Warning alerts and 28 Critical alerts for high CPU. Alerts started at 2:00 AM and are still coming in. ServiceNow has 375 auto-generated incidents.

What is the best first step?

Correct: C. Post-patch, it is common for TrustedInstaller and Windows Modules Installer to spike CPU temporarily. Verify the cause, suppress non-critical alerts for the maintenance window, and monitor for resolution within 1-2 hours. Rolling back (A) is premature, disabling alerts (B) leaves you blind, and bulk-closing tickets (D) destroys audit trail.

Exercise: Configure an Alert Rule

Walk through the 9-step process to create an alert rule:

Navigate

Performance > Alerts in the left-hand menu

Create

Click "Create Alert Rule" to open the rule builder

Select Metric

Choose CPU, memory, disk, boot time, or network latency

Define Condition

Set operator (greater than, less than) and threshold value

Set Persistence

How long the condition must remain true (1 min – 24 hours)

Choose Severity

Critical, Warning, or Informational

Define Scope

All endpoints, a computer group, or an OS type

Notifications

Select channels: email, ServiceNow, webhook

Save & Enable

Rules can be enabled, disabled, or duplicated anytime

Knowledge Check

1. What is the primary purpose of a persistence window in an alert rule?

Correct: B. A persistence window requires the condition to remain true for a specified duration, filtering out momentary spikes and ensuring only sustained, actionable conditions trigger notifications.

2. Which of the following is a sign of alert fatigue in your team?

Correct: C. Auto-archiving alert emails is a clear sign the team has given up on processing them, meaning critical alerts are likely buried in the noise.

3. A Warning alert should typically result in which of the following actions?

Correct: B. Warning alerts indicate a degraded but functional state. They warrant a medium-priority ticket and attention within hours — not the immediate response of a Critical, nor the passive logging of an Informational.