Module 2 — Lesson 3 of 8

Data Collection

How Investigate gathers real-time and historical data from endpoints — processes, network, files, performance events, and more.

📚 Overview

🔧 Deep Dive

🛠 Hands-On

✅ Check

Data collection is the foundation of every investigation. Investigate uses two complementary streams -- real-time queries from live endpoints and historical data from TDS -- to give you both the current state and the timeline leading up to the issue.

Data Flow: Live vs. Cached

6 Data Types Collected

⚙

Processes

PID, parent, command line, CPU, memory, path

🌐

Network

TCP/UDP connections, IPs, ports, owning process

📁

File System

Existence, size, dates, hashes, permissions

📝

Registry

Keys, values, config, startup entries (Windows)

🔒

Security

Event logs, auth events, privilege changes

💻

Drivers

Loaded drivers, versions, signing status, errors

Simulated: Process Data Collection

Tanium Investigate — Process List (Live Query)

PID	Process Name	CPU %	Memory	Status
4821	msedge.exe	12.3%	1,847 MB	Normal
6104	OUTLOOK.EXE	8.1%	623 MB	Normal
7392	CRMAgent.exe	67.4%	3,214 MB	High
1204	OneDrive.exe	3.2%	312 MB	Normal
892	Teams.exe	5.7%	489 MB	Normal
2048	csfalconservice.exe	1.1%	156 MB	Normal

CRMAgent.exe immediately stands out: 67% CPU and 3.2 GB RAM -- a clear anomaly worth investigating.

Data Enrichment

Raw data alone is not actionable. Investigate enriches it by combining sources:

Cross-Source Correlation

Process using 4 GB today was using 500 MB yesterday = memory leak

Endpoint Context

Overlay hardware model, OS version, department -- spot patterns across groups

Process Lineage

Map parent-child relationships: who spawned the problem process?

Timeline Alignment

Software install at 2:30 PM + CPU spike at 2:32 PM = cause and effect

Deep Dive: Tanium Data Service (TDS)▼

How TDS works: TDS receives data from the Tanium Server whenever sensor queries run. It stores results with timestamps, building a chronological record of each endpoint's state.

Retention: Configurable per data type. Typical: 90 days for performance metrics, 30 days for detailed process lists.

Visual indicators: A "cached" badge or grayed-out timestamp distinguishes TDS data from live queries in the SEV.

Limitations: Cached data is only as good as the last collection. If an endpoint went offline before a scheduled scan, the gap won't have data. Only data previously collected by sensors is stored.

File Download Capability

📄

App Logs

Download crash logs for detailed analysis

⚙

Config Files

Compare against known-good configurations

💣

Crash Dumps

Download minidumps for WinDbg analysis

🔎

Suspect Files

Retrieve unknowns for hash comparison

Security Note

File downloads are governed by RBAC and all actions are logged in the Tanium audit trail -- full accountability for who downloaded what, from where, and when.

✍ Knowledge Check

1. What is the primary advantage of combining real-time and historical data in Investigate?

It reduces the amount of storage needed on the Tanium Server It lets you see the current state and compare it to what changed, revealing the root cause It eliminates the need for the Tanium Client on endpoints It automatically remediates issues without human intervention

Correct! Combining live and historical data lets you compare "now" vs. "before" to identify what changed.

Not quite. The key benefit is comparison -- seeing current state alongside the historical baseline reveals what changed.

2. Why is process-level resource consumption data valuable?

It tells you the total number of processes on an endpoint It identifies exactly which process was consuming resources at the time of the incident It automatically kills processes that use too many resources It replaces the need for Windows Task Manager

Correct! Process-level data pinpoints which specific process is responsible, giving you an actionable remediation target.

Not quite. The value is specificity -- knowing which process was responsible for resource consumption gives you a clear target for remediation.

← Previous: Single Endpoint View Next: Investigation Workspace →

Tanium Training

Data Collection

Data Flow: Live vs. Cached

6 Data Types Collected

Simulated: Process Data Collection

Data Enrichment

Cross-Source Correlation

Endpoint Context

Process Lineage

Timeline Alignment

File Download Capability

Exercise: Match the Data Type to Its Source

✍ Knowledge Check