MMNA
Security Academy 2026
πŸ“š MODULE 1 OF 3
πŸ›οΈ FOUNDATION MODULE

Splunk Architecture & Log Ingestion

Understanding the Splunk Data Pipeline

Master the foundation of Splunk architecture. Learn how Forwarders collect logs, Indexers process and store data, and Search Heads query enterprise-scale security events. Understand the complete data flow in modern SOC environments. Build expertise in log ingestion, data normalization, and centralized log management.

Splunk Architecture Overview

Components & Their Roles in Enterprise SOC

πŸ“Š Splunk Data Flow Pipeline
DATA SOURCES
Firewalls, IDS/IPS, servers, endpoints, cloud services, applications, proxies generating security events
↓
FORWARDERS
Collect logs from sources, parse data, route to Indexers. Lightweight collectors at data source points
↓
INDEXERS
Process incoming data, parse events, extract fields, store indexed data. Central processing & storage nodes
↓
SEARCH HEADS
Query indexed data, create searches, build dashboards, generate alerts. User-facing analytics platform
πŸ“€
Forwarders (Data Collection)
Lightweight agents deployed on data sources. Forwarders read logs, parse data, and forward to Indexers. Can be Universal Forwarders (any data source) or Heavy Forwarders (with preprocessing). Enable scalable, distributed log collection.
βš™οΈ
Indexers (Processing & Storage)
Process incoming data from Forwarders. Parse events, extract fields, build field index for fast search. Store indexed data on disk organized by time. Indexers are backbone of Splunkβ€”handle ingestion, processing, storage. Can be clustered for redundancy and scale.
πŸ”
Search Heads (Analytics)
Query indexed data from Indexers. Run searches in real-time or scheduled. Create dashboards for operational visibility. Generate alerts when conditions met. Search Heads are user-facing interfaceβ€”where analysts perform threat hunting and create security analytics.
πŸ’‘ Architecture Principle: Splunk scales by distributing components. Forwarders distribute collection, Indexers distribute processing/storage, Search Heads distribute analytics. This architecture enables ingesting terabytes daily across geographically distributed organizations.

Log Ingestion Fundamentals

Types of Security Logs & Data Normalization

πŸ” Types of Security Logs

Modern organizations generate diverse log types, each carrying security value:

  • Firewall Logs: Connection attempts, blocked traffic, rule matches. Essential for network perimeter security
  • IDS/IPS Logs: Intrusion detection/prevention events, attack signatures detected. Critical for threat detection
  • Antivirus/EDR Logs: Malware detections, endpoint events, process execution. Reveals endpoint compromise indicators
  • Server/OS Logs: Authentication events, process execution, configuration changes. Shows system activity and lateral movement
  • Application Logs: Login attempts, data access, transactions. Application-specific security events
  • Cloud Service Logs: AWS CloudTrail, Azure Activity Log, O365 audit logs. Security in cloud infrastructure
  • Proxy/Web Logs: URL access attempts, blocked categories, SSL inspection data. Web traffic security
  • DNS Logs: Domain lookups, DNS sinkhole hits. Reveals command & control communication attempts

πŸ“‹ Data Normalization Awareness

Raw logs from different sources use different formats. Firewall logs structured differently from IDS logs, which differ from antivirus logs. Normalization is critical for security analytics.

Normalization Process:

  • Parse: Extract structured fields from raw log text. Identify source IP, destination IP, port, action, etc.
  • Standardize: Map disparate field names to common names. Firewall "src_ip" = IDS "source_ip" = Server "remote_host". All normalize to "src_ip"
  • Enrich: Add context. Source IP β†’ geolocation, threat intelligence lookups, internal/external classification
  • Store: Save parsed, standardized, enriched data in indexed format for fast searching

Why Normalization Matters: Once normalized, analysts write single search across all log sources. Query "src_ip=1.2.3.4" searches all firewall, IDS, antivirus, server logs simultaneously. Enables comprehensive threat hunting.

⏰ Importance of Timestamp Accuracy

Timestamps are critical for threat investigation. When investigating incident, timeline is crucial: what happened first? What followed? Inaccurate timestamps break timeline analysis.

Timestamp Challenges:

  • Time Zone Mismatch: If system in different timezone, Splunk can misinterpret. Must normalize all to UTC
  • Clock Skew: If source system clock inaccurate, events appear out of order or duplicated
  • Batch Delays: Some logs arrive in batch hours later. Timestamp shows when event occurred, not when logged
  • Parsing Errors: If Splunk can't extract timestamp, assumes collection time (inaccurate)
🎯 Best Practice: Configure Splunk with accurate timestamp extraction and UTC normalization. Validate timestamps during log ingestion. During incident investigation, accurate timeline reconstruction can be the difference between detecting an attack and missing it.

Enterprise Use Cases

Splunk Architecture in Action

🏒 Centralized Logging at Enterprise Scale

Fortune 500 company with 100,000+ endpoints, 500+ servers, cloud infrastructure, network appliances. Generates 10+ terabytes of logs daily.

Without Splunk: Logs scattered across systems. Security team has no unified view. Threat investigation requires manually accessing firewalls, servers, cloud consoles, antivirus dashboards. Hours/days to investigate simple incident.

With Splunk Architecture: Forwarders deployed globally collect all logs β†’ centralized to Splunk Indexers β†’ normalized, enriched, indexed. Single search queries 10TB logs in seconds. Analyst investigates incident in minutes. 10x faster incident response.

πŸ” Incident Investigation Workflows

Real incident scenario: Potential data exfiltration detected. Need to reconstruct attacker's actions.

Investigation Timeline:

  • Step 1 - Initial Detection: Alert triggered: suspicious file download detected on endpoint. EDR shows PID, filename, hash
  • Step 2 - Endpoint Investigation: Search antivirus/EDR logs for this filename hash across all endpoints. Reveals 50 endpoints compromised
  • Step 3 - Network Investigation: Search firewall/proxy logs for communication from compromised endpoints. Identifies C2 domain
  • Step 4 - Account Investigation: Search authentication logs for logins from affected endpoints. Identifies compromised user accounts
  • Step 5 - Data Access Investigation: Search application logs for data accessed by compromised accounts. Identifies exfiltrated data scope
  • Step 6 - Timeline Reconstruction: Correlate all logs by timestamp. Build complete attack narrative from compromise to exfiltration

Result: Within 30 minutes, incident scope fully understood. Containment actions executed. Breach contained. Without Splunk, this investigation would take days.

🎯 Key Takeaway: Splunk architecture enables fast, comprehensive incident investigation. Centralized, normalized logging combined with powerful search capabilities transform incident response from reactive/slow to proactive/fast.

External Learning Resources

Official Splunk Documentation & References

πŸ“š Official Splunk Documentation

πŸŽ“ Splunk Security Certifications

After completing this course, consider official Splunk certifications:

πŸŽ“
Verified Certificate Notice
Complete all 3 modules to unlock your credential
Complete all 3 modules of this course to unlock your
Verified Cyber Security Certificate from
MONEY MITRA NETWORK ACADEMY

Unique Certificate ID β€’ QR Verification β€’ Digital Credential

Ready for the Next Module?

Continue Your Splunk Mastery Journey

Module 2: Search Processing Language & Data Analysis
Advance your SPL expertise and threat hunting skills