Module 2: Threat Detection & Hunting Strategies

🎯 DETECTION FUNDAMENTALS

Threat Detection Concepts

Understanding the two pillars of threat detection: signatures and behavioral analysis.

🔍

Signature-Based Detection

Detects known attacks by pattern matching. Example: "Block if we see malware hash X" or "Alert if we detect ransomware extension pattern." Fast, reliable, ZERO false positives. Problem: only catches known threats. New malware = no detection.

🧠

Behavior-Based Detection

Detects suspicious behavior regardless of whether we've seen it before. Example: "Alert if user logs in from 2 different continents 10 minutes apart" or "Alert if we see 100 failed logins in 5 minutes." Catches zero-days but has false positives to tune.

☁️

Cloud-Specific Challenges

Cloud introduces detection complexity: massive log volume, legitimate user mobility (multi-continent access is normal), API-first architectures, API-based attacks, cloud-native threat patterns (storage account enumeration, privilege escalation). Traditional on-prem rules don't work.

💡 The Detection Tradeoff

Sensitivity vs. Specificity: High sensitivity catches more threats but produces more false positives. High specificity reduces false positives but misses real attacks. The art of detection engineering is finding the right balance for YOUR environment. A financial services company may tolerate 10 false positives to catch 1 real breach. A startup may not have SOC capacity for that.

Signature vs. Behavioral: Side-by-Side Comparison

Aspect	Signature-Based	Behavior-Based
Detection Speed	Instant (hash/pattern matching)	Delayed (requires baseline learning)
Known Threats	Excellent detection rate	May still detect (behavior is suspicious)
Zero-Day Threats	No detection (no signature exists)	Can detect unusual behavior
False Positives	Very low (binary match/no match)	Higher (requires tuning to environment)
Tuning Required	Minimal (signatures from threat intel)	Significant (must understand baseline)
Cloud-Native Fit	Partial (doesn't handle user mobility)	Better (adapts to dynamic cloud behavior)

🚀 Best Practice: Layered Detection

Use BOTH approaches. Deploy signature rules for known threats (APT groups, ransomware families). Deploy behavioral rules for anomalies (impossible travel, privilege escalation, data exfiltration patterns). The combination catches known attacks quickly and unknown attacks through behavioral patterns.

🔎 THREAT HUNTING

Threat Hunting Mindset

Hypothesis-driven investigation to find threats that automated detection missed.

1

Hypothesis Formation

Start with a question: "Are we being scanned for Exchange vulnerabilities?" or "Are insiders exfiltrating data through cloud storage?" or "Has anyone accessed a deleted admin account?" Create a hypothesis based on threat intelligence, industry trends, or internal risk assessment.

2

Evidence Gathering

Query logs to find evidence. Example hypothesis: "Exchange ProxyLogon exploitation." Evidence: Look for HTTP requests containing "/api/autodiscover.json" + POST requests + suspicious response codes. Write KQL queries to search the logs. No evidence = hypothesis is false.

3

Anomaly Identification

If evidence is found, identify what's abnormal. Normal: User logs in daily at 9 AM from HQ. Abnormal: User logs in at 3 AM from a VPN in Belarus, performs unusual actions, never seen before. Compare to baseline behavior to spot outliers.

4

Investigation & Triage

Investigate the anomaly. Is it benign (legitimate business travel) or malicious? Use Azure Sentinel's investigator graph: who accessed what, what permissions did they use, what changes did they make? Correlate with other evidence. Triage as true positive or false positive.

🎯 Hypothesis Examples (Real Threat Hunts)

Hypothesis: "Attackers are enumerating storage accounts for misconfiguration"
Query: Look for patterns like "ListContainers" calls from a single user/IP to multiple storage accounts, especially failures.

Hypothesis: "Insider is copying files to personal cloud storage"
Query: Find users uploading large amounts of data to Dropbox/OneDrive (exfiltration pattern).

Hypothesis: "Service principal credentials were leaked and used for privilege escalation"
Query: Look for unusual service principal activity (different time of day, different IP, different resource groups).

Anomaly Detection Patterns (Conceptual)

🌍

Impossible Travel

User logs in from New York at 10 AM, then London at 10:15 AM (impossible to travel 3,500 miles in 15 min). Behavioral red flag for credential compromise.

📈

Volume Anomalies

User's typical access: 10 API calls/day. Today: 10,000 calls. Automated attack? Data exfiltration? Investigate the spike.

🔐

Privilege Escalation

Non-admin user suddenly grants themselves admin permissions. Privilege escalation red flag. Check if this was authorized by IT.

⏰

Time-Based Anomalies

User's typical activity: 9-5 weekdays. Detected accessing systems at 2 AM on Sunday. Unusual behavior pattern.

🔑

Failed Access Bursts

100 failed login attempts in 5 minutes against different user accounts. Brute force attack? Spraying attack? Immediate action needed.

📊

Data Access Anomalies

User accessed 50 files in HR database, then immediately 50 files in Finance. Never accessed Finance before. Lateral reconnaissance? Data theft?

🛠️ ENGINEERING

Detection Engineering Strategy

How to design detection rules that catch threats without overwhelming analysts.

🎯

Alert Logic Design

Good alerts are SPECIFIC. "Failed login" alerts are too vague (thousands daily, mostly noise). Better: "5+ failed logins followed by successful login from new IP within 1 hour" (credential attack pattern). Specific logic reduces false positives.

❌

False Positive Reduction

Tune rules to YOUR environment. User legitimate travels? Add exclusions for known travel IP ranges. VPN usage? Add service principals to exclusion list. Automated jobs run at night? Don't alert on expected off-hours access. Context = reduced false positives.

📊

Rule Performance

Monitor: How many incidents per week? What % are true positives? How long to investigate? If rule generates 100 incidents/week but only 2 are real, redesign. If rule misses obvious attacks, increase sensitivity. Iterate continuously.

🔧 Building a High-Quality Detection Rule (3 Principles)

1. Be Specific: Include multiple conditions (action + result + context). Not just "Failed login" but "Failed login + Service Principal + Repeat against 10+ users + Within 5 minutes" = Likely brute force.

2. Know Your Baseline: Understand your environment. If 50% of traffic is API calls from automation, you can't alert on every API call. Define what NORMAL looks like, then alert on deviations.

3. Alert with Context: Include rich information: Who? When? What? Why is it suspicious? Add fields so analyst understands the severity without having to research.

Example: Building a Detection Rule

Rule: Impossible Travel
                    Detection

Scenario: User logs in from
                Location A, then

                Location B (3,000+ miles) within 1 hour

Rule Logic:

                ├─ Get all login events for each user

                ├─ Calculate distance between IPs (geolocation)

                ├─ Calculate time between logins

                ├─ IF distance > 3000 miles AND time < 1 hour

                │ THEN: Alert "Impossible Travel Detected"

                ├─ EXCLUDE: Known VPN IPs (false positive)

                ├─ EXCLUDE: Known automation service principals

                └─ ADD CONTEXT: User, IPs, locations, action taken

Result: High-confidence alert. Few false positives.

                Easy for analyst to confirm as real compromise.

Enterprise SOC Best Practices

🔄

Continuous Improvement

Weekly review: Which rules fire most? Which have highest accuracy? Tune underperformers. Delete ineffective rules. Add new rules based on recent threats. Detection is never "done".

👥

Cross-Team Collaboration

Detection engineers work WITH security ops (analysts), threat intel, and infrastructure teams. Analysts tell engineers about false positives. Threat intel tells engineers about new attack patterns. Infrastructure tells engineers about legitimate automation.

📈

Metrics & KPIs

Track: Detection accuracy (true positive rate), Alert volume, Mean time to detect (MTTD), Mean time to respond (MTTR). Use data to improve processes. Charts show progress over time.

🎓

Knowledge Sharing

Document rules: What is this rule trying to catch? Why was it designed this way? What false positives should we expect? Create runbooks for analysts investigating alerts.

🛡️

MITRE ATT&CK Alignment

Map detection rules to MITRE ATT&CK framework. Ensures coverage across tactics (Initial Access, Execution, Persistence, etc.). Identifies gaps. Shows organizational security posture.

📚

Threat Intelligence Integration

Incorporate IOCs (indicators of compromise) from public threat feeds. If APT-X is targeting your industry, add their TTPs to detection rules. Stay current with evolving threats.

📚 REFERENCES