📚 MODULE 1 OF 3

🏛️ FOUNDATION MODULE

Splunk Architecture & Log Ingestion

Understanding the Splunk Data Pipeline

Master the foundation of Splunk architecture. Learn how Forwarders collect logs, Indexers process and store data, and Search Heads query enterprise-scale security events. Understand the complete data flow in modern SOC environments. Build expertise in log ingestion, data normalization, and centralized log management.

Splunk Architecture Overview

Components & Their Roles in Enterprise SOC

📊 Splunk Data Flow Pipeline

DATA SOURCES

Firewalls, IDS/IPS, servers, endpoints, cloud services, applications, proxies generating security events

↓

FORWARDERS

Collect logs from sources, parse data, route to Indexers. Lightweight collectors at data source points

↓

INDEXERS

Process incoming data, parse events, extract fields, store indexed data. Central processing & storage nodes

↓

SEARCH HEADS

Query indexed data, create searches, build dashboards, generate alerts. User-facing analytics platform

📤

Forwarders (Data Collection)

Lightweight agents deployed on data sources. Forwarders read logs, parse data, and forward to Indexers. Can be Universal Forwarders (any data source) or Heavy Forwarders (with preprocessing). Enable scalable, distributed log collection.

⚙️

Indexers (Processing & Storage)

Process incoming data from Forwarders. Parse events, extract fields, build field index for fast search. Store indexed data on disk organized by time. Indexers are backbone of Splunk—handle ingestion, processing, storage. Can be clustered for redundancy and scale.

🔍

Search Heads (Analytics)

Query indexed data from Indexers. Run searches in real-time or scheduled. Create dashboards for operational visibility. Generate alerts when conditions met. Search Heads are user-facing interface—where analysts perform threat hunting and create security analytics.

                💡 Architecture Principle: Splunk scales by distributing components. Forwarders
                distribute collection, Indexers distribute processing/storage, Search Heads distribute analytics. This
                architecture enables ingesting terabytes daily across geographically distributed organizations.
            

Log Ingestion Fundamentals

Types of Security Logs & Data Normalization

🔐 Types of Security Logs

Modern organizations generate diverse log types, each carrying security value:

Firewall Logs: Connection attempts, blocked traffic, rule matches. Essential for network perimeter security
IDS/IPS Logs: Intrusion detection/prevention events, attack signatures detected. Critical for threat detection
Antivirus/EDR Logs: Malware detections, endpoint events, process execution. Reveals endpoint compromise indicators
Server/OS Logs: Authentication events, process execution, configuration changes. Shows system activity and lateral movement
Application Logs: Login attempts, data access, transactions. Application-specific security events
Cloud Service Logs: AWS CloudTrail, Azure Activity Log, O365 audit logs. Security in cloud infrastructure
Proxy/Web Logs: URL access attempts, blocked categories, SSL inspection data. Web traffic security
DNS Logs: Domain lookups, DNS sinkhole hits. Reveals command & control communication attempts

📋 Data Normalization Awareness

Raw logs from different sources use different formats. Firewall logs structured differently from IDS logs, which differ from antivirus logs. Normalization is critical for security analytics.

Normalization Process:

Parse: Extract structured fields from raw log text. Identify source IP, destination IP, port, action, etc.
Standardize: Map disparate field names to common names. Firewall "src_ip" = IDS "source_ip" = Server "remote_host". All normalize to "src_ip"
Enrich: Add context. Source IP → geolocation, threat intelligence lookups, internal/external classification
Store: Save parsed, standardized, enriched data in indexed format for fast searching

Why Normalization Matters: Once normalized, analysts write single search across all log sources. Query "src_ip=1.2.3.4" searches all firewall, IDS, antivirus, server logs simultaneously. Enables comprehensive threat hunting.

⏰ Importance of Timestamp Accuracy

Timestamps are critical for threat investigation. When investigating incident, timeline is crucial: what happened first? What followed? Inaccurate timestamps break timeline analysis.

Timestamp Challenges:

Time Zone Mismatch: If system in different timezone, Splunk can misinterpret. Must normalize all to UTC
Clock Skew: If source system clock inaccurate, events appear out of order or duplicated
Batch Delays: Some logs arrive in batch hours later. Timestamp shows when event occurred, not when logged
Parsing Errors: If Splunk can't extract timestamp, assumes collection time (inaccurate)

                    🎯 Best Practice: Configure Splunk with accurate timestamp extraction and UTC
                    normalization. Validate timestamps during log ingestion. During incident investigation, accurate
                    timeline reconstruction can be the difference between detecting an attack and missing it.
                

Enterprise Use Cases

Splunk Architecture in Action

🏢 Centralized Logging at Enterprise Scale

Fortune 500 company with 100,000+ endpoints, 500+ servers, cloud infrastructure, network appliances. Generates 10+ terabytes of logs daily.

Without Splunk: Logs scattered across systems. Security team has no unified view. Threat investigation requires manually accessing firewalls, servers, cloud consoles, antivirus dashboards. Hours/days to investigate simple incident.

With Splunk Architecture: Forwarders deployed globally collect all logs → centralized to Splunk Indexers → normalized, enriched, indexed. Single search queries 10TB logs in seconds. Analyst investigates incident in minutes. 10x faster incident response.

🔍 Incident Investigation Workflows

Real incident scenario: Potential data exfiltration detected. Need to reconstruct attacker's actions.

Investigation Timeline:

Step 1 - Initial Detection: Alert triggered: suspicious file download detected on endpoint. EDR shows PID, filename, hash
Step 2 - Endpoint Investigation: Search antivirus/EDR logs for this filename hash across all endpoints. Reveals 50 endpoints compromised
Step 3 - Network Investigation: Search firewall/proxy logs for communication from compromised endpoints. Identifies C2 domain
Step 4 - Account Investigation: Search authentication logs for logins from affected endpoints. Identifies compromised user accounts
Step 5 - Data Access Investigation: Search application logs for data accessed by compromised accounts. Identifies exfiltrated data scope
Step 6 - Timeline Reconstruction: Correlate all logs by timestamp. Build complete attack narrative from compromise to exfiltration

Result: Within 30 minutes, incident scope fully understood. Containment actions executed. Breach contained. Without Splunk, this investigation would take days.

                🎯 Key Takeaway: Splunk architecture enables fast, comprehensive incident
                investigation. Centralized, normalized logging combined with powerful search capabilities transform
                incident response from reactive/slow to proactive/fast.
            

External Learning Resources

Official Splunk Documentation & References

📚 Official Splunk Documentation

Splunk Admin Manual: Comprehensive guide to Splunk deployment, architecture, configuration, and administration
Splunk Search Manual: Complete reference for search language (SPL) and analytics
Indexing & Processing: Deep dive into Indexer architecture, data parsing, field extraction, and storage
Data Forwarding Guide: Forwarder deployment, configuration, and best practices
Data Inputs & Outputs: Configure data collection, parsing rules, and output destinations

🎓 Splunk Security Certifications

After completing this course, consider official Splunk certifications:

Splunk Core Certified User: Foundational certification covering basic Splunk concepts and searching
Splunk Admin Certified: Advanced certification covering deployment, administration, and optimization
Splunk Security Expert: Security-focused certification for threat detection and SOC operations

Ready for the Next Module?

Continue Your Splunk Mastery Journey

Module 2: Search Processing Language & Data Analysis
Advance your SPL expertise and threat hunting skills