Module 2 — Lesson 3 of 8

Data Collection

How Investigate gathers real-time and historical data from endpoints — processes, network, files, performance events, and more.
📚 Overview
🔧 Deep Dive
🛠 Hands-On
Check

Data collection is the foundation of every investigation. Investigate uses two complementary streams -- real-time queries from live endpoints and historical data from TDS -- to give you both the current state and the timeline leading up to the issue.

Data Flow: Live vs. Cached

LIVE ENDPOINTS Real-time via Tanium Client Processes, connections, files, registry, events TDS CACHE Historical snapshots 30-90 day retention Offline endpoint fallback INVESTIGATE Enrich + Correlate INVESTIGATION Timelines Process Trees Comparison Views Resource Charts Evidence Chain Remediation Actions

6 Data Types Collected

Processes
PID, parent, command line, CPU, memory, path
🌐
Network
TCP/UDP connections, IPs, ports, owning process
📁
File System
Existence, size, dates, hashes, permissions
📝
Registry
Keys, values, config, startup entries (Windows)
🔒
Security
Event logs, auth events, privilege changes
💻
Drivers
Loaded drivers, versions, signing status, errors

Simulated: Process Data Collection

Tanium Investigate — Process List (Live Query)
PIDProcess NameCPU %MemoryStatus
4821msedge.exe12.3%1,847 MBNormal
6104OUTLOOK.EXE8.1%623 MBNormal
7392CRMAgent.exe67.4%3,214 MBHigh
1204OneDrive.exe3.2%312 MBNormal
892Teams.exe5.7%489 MBNormal
2048csfalconservice.exe1.1%156 MBNormal

CRMAgent.exe immediately stands out: 67% CPU and 3.2 GB RAM -- a clear anomaly worth investigating.

Data Enrichment

Raw data alone is not actionable. Investigate enriches it by combining sources:

Cross-Source Correlation

Process using 4 GB today was using 500 MB yesterday = memory leak

Endpoint Context

Overlay hardware model, OS version, department -- spot patterns across groups

Process Lineage

Map parent-child relationships: who spawned the problem process?

Timeline Alignment

Software install at 2:30 PM + CPU spike at 2:32 PM = cause and effect

File Download Capability

📄
App Logs
Download crash logs for detailed analysis
Config Files
Compare against known-good configurations
💣
Crash Dumps
Download minidumps for WinDbg analysis
🔎
Suspect Files
Retrieve unknowns for hash comparison
Security Note

File downloads are governed by RBAC and all actions are logged in the Tanium audit trail -- full accountability for who downloaded what, from where, and when.

Exercise: Match the Data Type to Its Source

For each data type, select whether it comes from a live query, cached data (TDS), or both.

A. Current list of running processes with real-time CPU usage

Correct! Real-time CPU usage per process requires a live query. Cached data can show a past process list but not current utilization.
Not quite. Current CPU usage per process is inherently real-time data -- it changes every second. A live query is required.

B. Installed software inventory (applications and versions)

Correct! Software inventory is available from both: live query when online, cached from TDS when offline.
Not quite. Software inventory can come from either source. The SEV prefers live data when available and falls back to TDS cache.

C. CPU utilization trend over the past 7 days

Correct! A 7-day trend can only come from historical data stored in TDS. A live query only shows the current moment.
Not quite. Historical trends require stored data. TDS records CPU samples over time to build timeline visualizations spanning days or weeks.

D. Hardware specifications (model, RAM, CPU type)

Correct! Hardware specs rarely change, so cached data is almost always accurate. Live confirms current state after upgrades.
Not quite. Hardware specs are available from both sources. Since hardware rarely changes, cached data is typically just as accurate.

E. Active network connections with remote IP addresses

Correct! Active network connections are ephemeral -- they change constantly. A live query is needed to see current connections.
Not quite. Active network connections are transient. Cached data quickly becomes stale and irrelevant for connections.

✍ Knowledge Check

1. What is the primary advantage of combining real-time and historical data in Investigate?

Correct! Combining live and historical data lets you compare "now" vs. "before" to identify what changed.
Not quite. The key benefit is comparison -- seeing current state alongside the historical baseline reveals what changed.

2. Why is process-level resource consumption data valuable?

Correct! Process-level data pinpoints which specific process is responsible, giving you an actionable remediation target.
Not quite. The value is specificity -- knowing which process was responsible for resource consumption gives you a clear target for remediation.
Mercury Insurance — Digital Workplace Team
DEX Specialization Training © 2026