MMNA
Security Academy 2026
📚 MODULE 2 OF 3
🔍 ANALYTICS MODULE

Search Processing Language (SPL) & Data Analysis

Master Splunk Queries & Security Analytics

Learn Splunk's query language to transform raw logs into actionable intelligence. Master data filtering, aggregation, and anomaly detection. Build searches for threat detection, incident investigation, and threat hunting. Understand search logic, piping commands, and statistical analysis. Create sophisticated security analytics queries that reveal threats hidden in massive datasets.

Introduction to SPL

Splunk's Powerful Query Language

🔤 What is Search Processing Language (SPL)?

SPL is Splunk's domain-specific query language. Like SQL for databases, SPL is the language for querying Splunk indexes. SPL enables security analysts to search, filter, extract, transform, and analyze events in vast log datasets. SPL is both simple for basic searches and sophisticated enough for complex analytics.

SPL Philosophy: "Start simple, then pipe." Every search starts with a basic query, then pipes results through commands that progressively refine data. Each pipe command takes output from the previous command, processes it, and passes results forward.

📌 Basic SPL Anatomy
A simple SPL search demonstrating core structure:
index=main sourcetype=firewall action=deny | stats count by dest_ip | where count>100
Breakdown:
index=main sourcetype=firewall: Search firewall logs
action=deny: Only denied connections
| stats count by dest_ip: Count denies per destination IP
| where count>100: Filter results: only IPs with 100+ denies

Result: Reveals potential attackers attempting mass scans

💡 Query Logic Awareness (High-Level)

Understanding SPL query logic prevents common mistakes and helps build powerful searches:

  • Search vs. Command Phase: Everything before first pipe (|) is search phase—filters events from indexes. After first pipe—command phase—processes matching events
  • Piping Flow: Output from one command becomes input to next. If command 1 returns 1000 events, command 2 processes those 1000 events
  • Order Matters: (index=main sourcetype=firewall | stats count) differs from (stats count | search index=main)—first is efficient (search then stats), second is inefficient (stats all data then filter)
  • Performance Principle: Filter early, process small datasets. Put most restrictive filters first in search phase to reduce events before commands
  • Field Requirements: Commands that reference fields require those fields exist in events. If field not extracted/parsed, command returns no results
💡 Key Insight: Efficient SPL thinking means filter aggressively in search phase to minimize events reaching command phase. A search returning 1 million events will be slow. A search returning 10,000 events will be fast, even with complex commands.

Data Analysis Concepts

Filtering, Aggregation & Anomaly Detection

🎯 Filtering & Aggregation Mindset

Security analysis fundamentally relies on filtering and aggregation. Filtering focuses on relevant data. Aggregation reveals patterns.

Filtering: Remove noise. In 10TB of logs, 90% is normal traffic. Filtering removes normal traffic, leaving 1TB of interesting events. SPL filtering:

  • Search Phase Filters: index=main sourcetype=firewall action=deny—filters at index level before processing
  • Command Filters: search action=blocked | where response_code=401—filters after extraction
  • Conditions: status>=400, bytes>1000000, protocol!=dns—complex conditions narrow results

Aggregation: Find patterns. After filtering to relevant events, aggregate to see patterns:

  • Count: How many login failures? stats count by username—reveals brute force attempts
  • Sum: Total data transferred? stats sum(bytes) by src_ip—reveals data exfiltration
  • Average: Typical response time? stats avg(response_time) by endpoint—identifies slow systems
  • Min/Max: Timing boundaries? stats min(timestamp), max(timestamp) by session—reveals session duration
📊 Filtering + Aggregation Example
Find users with suspicious login patterns:
index=main sourcetype=authentication status=failed | stats count by user, src_ip | where count>20 | sort - count
What happens:
✓ Filter: Failed authentication events only
✓ Aggregate: Count failures per user + source IP combination
✓ Condition: Keep users with 20+ failures (suspicious)
✓ Sort: Highest failure counts first

Result: Reveals potential brute force attacks by showing users/IPs with excessive failed logins

🔍 Identifying Anomalies in Logs (Conceptual)

Anomalies reveal threats. Threats deviate from normal behavior. SPL enables anomaly detection:

  • Volume Anomalies: Unexpected spike in events. Normal: 100 login attempts/hour. Anomaly: 10,000 attempts/hour suggests brute force
  • Behavior Anomalies: User behavior changed. Normal: User logs in 9-5 EST. Anomaly: Login attempt 3am JST (unusual time/location) suggests compromise
  • Protocol Anomalies: Unusual protocol usage. Normal: DNS port 53. Anomaly: DNS port 443 (likely data exfiltration)
  • Geographic Anomalies: Impossible travel. User in NYC at 1pm, London at 2pm (physically impossible) suggests account compromise
  • Pattern Anomalies: Deviation from normal patterns. Normal: File access patterns. Anomaly: User accessing files they never accessed before suggests lateral movement

Conceptual Approach: Establish baseline (normal behavior), then detect deviations. SPL enables this through historical comparisons:

index=main earliest=-7d@d latest=now | stats avg(count) as baseline by hour | eventstats stdev(baseline) | search (current_count > baseline + 2*stdev)
This search compares current events against 7-day baseline. Events exceeding 2 standard deviations from baseline are anomalous.
💡 Anomaly Detection Principle: Normal activity creates baseline. Threats create deviations. By comparing current activity to baseline, SPL enables automated anomaly detection that reveals threats hiding in noise.

Threat Detection with SPL

Behavioral Analysis & Alert Refinement

🎯 Behavior-Based Search Awareness

Signature-based detection (looking for known malware hashes) is reactive. Behavior-based detection (looking for suspicious actions) is proactive. SPL enables behavior-based threat detection:

Examples of Behavioral Detection:

  • Privilege Escalation: Detect accounts gaining admin privileges unexpectedly. Search for Account Operators group additions, sudoers modifications, UAC elevation on suspicious accounts
  • Lateral Movement: Detect attackers moving across network. Search for connections to unusual ports/protocols, failed authentication attempts spanning multiple systems
  • Data Exfiltration: Detect data leaving network. Search for large outbound data transfers to external IPs, DNS queries to malicious domains, unusual protocol usage (DNS for data transfer)
  • Command & Control Communication: Detect C2 callbacks. Search for connections to known malicious IPs, regular beaconing patterns, encrypted traffic on suspicious ports
  • Living Off The Land: Detect abuse of legitimate tools (PowerShell, cmd.exe, PsExec). Search for execution patterns, process chains, parent/child relationships indicating abuse
🔍 Behavioral Detection: Privilege Escalation
Detect suspicious account privilege elevation:
index=main sourcetype=security event_id=4732 | search account_name NOT IN (admin, svc_account, deploy_bot) | stats count by account_name, group, source_account | where count>2
What this detects:
✓ Event 4732 = Group membership modification
✓ Exclude service accounts (normal)
✓ Track: who was added to what group, by whom
✓ Alert on >2 additions (suspicious behavior)

Threat Indication: Attacker escalating privileges, adding compromised account to admin groups

⚠️ Alert Refinement Concepts

Raw alerts generate noise. A firewall generating 1000 alerts/day with 990 false positives wastes analyst time. Alert refinement reduces false positives:

  • Baseline Tuning: Adjust thresholds based on environment. Alert on 100+ failed logins (catches brute force), but baseline for support team (runs password resets) might be 500+. Different rules for different contexts
  • Whitelist Exclusions: Legitimate activity triggering false alerts. Exclude known vulnerability scanners, backup systems, maintenance scripts, known software behaviors
  • Context Filtering: Consider environment. Alert on outbound port 445 (SMB) in secure network, but not in guest network. Alert on admin logins at 3am, but not during maintenance windows
  • Correlation Rules: Reduce false positives through correlation. Single failed login = normal. 100 failed logins from same IP = alert. Correlated events reduce noise
  • Time-Based Rules: Different rules for different times. Alerts during business hours should be aggressive (catch intrusions fast). After-hours rules might be less aggressive (fewer users, less activity)
⚡ Alert Refinement: Reducing False Positives
Alert on failed logins, but reduce false positives:
index=main sourcetype=authentication status=failed user NOT IN (guest, kiosk_user, test_account) src_ip NOT IN (10.0.0.0/8, 192.168.0.0/16) | stats count by user, src_ip | where count>50 AND count < 1000
Refinements applied:
✓ Exclude known test accounts (reduce noise)
✓ Exclude internal IP ranges (internal failures are normal)
✓ Threshold: 50+ failures (catches attacks)
✓ Cap: <1000 (prevents alerts from obviously noisy sources)

Result: Alert only on suspicious patterns—external attackers, not internal chatter
💡 Alert Refinement Truth: Perfect alert (catches all threats, zero false positives) is impossible. Goal is 95%+ true positive rate with 5%- false positive rate. Iteratively tune thresholds, whitelists, and rules to approach this goal.

Enterprise SOC Benefits

SPL Enables Faster, Better Security Operations

Faster Investigations
Manual investigation: manually access firewall → check IDS → check antivirus → search servers = 2-3 hours. SPL: single search queries all systems simultaneously = 5 minutes. SPL enables investigations 20-30x faster, enabling rapid threat containment.
Reduced False Positives
Tuned SPL rules reduce noise. Instead of 1000 alerts/day with 99% false positive rate, SPL rules achieve 95%+ true positive rate. Analysts focus on real threats, not chasing false alerts. Alert fatigue decreases, team morale increases.
📊
Comprehensive Analytics
SPL enables analytics impossible with manual tools. Correlate 10TB+ logs across disparate systems. Find patterns humans can't spot. Behavioral analysis, anomaly detection, threat hunting—all enabled by SPL queries analyzing enterprise-scale data.
🎯
Proactive Threat Hunting
Reactive alerts catch obvious threats. SPL enables proactive threat hunting—searching for indicators of compromise that haven't triggered alerts yet. Sophisticated searches find advanced threats hiding in logs before they cause damage.

📈 Real-World Impact

Before SPL (Manual Investigation):

  • Incident reported
  • Analyst manually accesses 5+ systems
  • Hours spent correlating information
  • Threat often spreads during investigation
  • Containment delayed, damage extensive

With SPL (Automated Investigation):

  • Incident reported
  • Analyst runs single SPL search
  • Minutes to complete incident scope determination
  • Immediate identification of compromise breadth
  • Rapid containment, minimal damage

Business Impact: Reduced breach damage, faster recovery, lower incident costs. MTTR (Mean Time To Response) drops from hours to minutes. Security posture dramatically improves.

External Learning References

Official Splunk SPL Documentation

📚 Official Splunk SPL Documentation

🎓 Splunk Learning Resources

Continue your learning after this module:

🎓
Verified Certificate Notice
Complete all 3 modules to unlock your credential
Complete all 3 modules of this course to unlock your
Verified Cyber Security Certificate from
MONEY MITRA NETWORK ACADEMY

Unique Certificate ID • QR Verification • Digital Credential

Ready for the Final Module?

Complete Your Splunk Mastery Journey

Module 3: SIEM Dashboards, Alerts & SOC Reporting
Master operational dashboards, real-time monitoring, and enterprise alerting