Benchmark Datasets for Aggregation & Anomaly-Based Elastic Security Detections

 
## Summary

To benchmark aggregation-based detections, threshold rules, and statistical/time-series anomaly detection in Elastic Security, we should evaluate against established open datasets across multiple domains:

- 🌐 Network traffic
- 🖥️ Host / system logs
- 👤 User behavior (UEBA)
- ☁️ Cloud audit logs
- 📈 Time-series anomaly benchmarks

This issue proposes publicly available datasets suitable for:

- 📊 Aggregation-based detection testing (threshold rules)
- 📈 Statistical baselining and deviation detection
- 📉 Time-series anomaly detection benchmarking
- 🧪 Precision / recall evaluation using labeled data

---

# 🌐 Network Datasets

---

## 1️⃣ CSE-CIC-IDS2018

**Type:** Network traffic + labeled attack flows  
**Best For:** Threshold detection, aggregation benchmarks, brute-force detection, exfiltration testing  

🔗 https://www.unb.ca/cic/datasets/ids-2018.html

---

## 2️⃣ UNSW-NB15

**Type:** Network intrusion dataset  

🔗 https://research.unsw.edu.au/projects/unsw-nb15-dataset

---

## 3️⃣ CTU-13 Botnet Dataset

**Type:** Botnet traffic captures  

🔗 https://www.stratosphereips.org/datasets-ctu13

---

## 4️⃣ UGR’16 Dataset

**Type:** ISP-scale NetFlow dataset  

🔗 https://nesg.ugr.es/nesg-ugr16/

---

# ☁️ Cloud / Audit Log Datasets

Cloud datasets are particularly useful for benchmarking:

- API call frequency thresholds
- Privilege escalation detection
- Rare IAM activity
- Geographic login anomalies
- Cross-account access detection
- Aggregation-based misuse detection

---

## 5️⃣ AWS Open Data Registry (CloudTrail & Related Logs)

**Type:** Public AWS datasets including CloudTrail-style audit logs  

🔗 https://registry.opendata.aws/

### Why Use It
- Real-world cloud activity logs
- API call records with timestamps and principals
- Suitable for:
  - `terms` aggregation on `userIdentity`
  - API call count thresholds
  - Rare service usage detection
  - Geographic anomaly detection
  - Privilege escalation analysis

---

## 6️⃣ Rhino Security Labs – CloudGoat (Cloud Attack Scenarios)

**Type:** Open-source cloud attack simulation environment  

🔗 https://github.com/RhinoSecurityLabs/cloudgoat

### Why Use It
- Simulated AWS attack scenarios
- Generates realistic CloudTrail logs
- Good for:
  - IAM privilege escalation detection
  - Misconfigured policy detection
  - Cross-account access anomaly detection
  - Threshold rule testing in cloud environments

---

## 7️⃣ Azure AD / Microsoft Audit Log Samples

**Type:** Publicly available Azure AD / M365 audit log samples  

Example reference:  
🔗 https://learn.microsoft.com/en-us/azure/active-directory/reports-monitoring/

### Why Use It
- Authentication logs
- Role assignment logs
- API access logs
- Useful for:
  - Failed login aggregation rules
  - Rare role assignment detection
  - Impossible travel detection
  - Privilege grant spike detection

---

# 🖥️ Host / User Behavior Datasets

---

## 8️⃣ CERT Insider Threat Dataset

**Type:** Insider threat simulation  

🔗 https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=508099

---

## 9️⃣ LANL Authentication Dataset

**Type:** Enterprise authentication logs  

🔗 https://csr.lanl.gov/data/auth/

---

# 📈 Time-Series / Anomaly Detection Benchmarks

---

## 🔟 Numenta Anomaly Benchmark (NAB)

🔗 https://github.com/numenta/NAB

---

# Proposed Elastic Benchmark Plan

## Step 1: Normalize to ECS

Map dataset fields to:

- `@timestamp`
- `source.ip`
- `destination.ip`
- `user.name`
- `event.action`
- `event.category`
- `cloud.account.id`
- `cloud.provider`
- `network.bytes`
- `process.name`

---

## Step 2: Detection Categories

### Aggregation-Based Rules
- API call count > threshold (CloudTrail)
- Failed login count > threshold
- Outbound byte sum > threshold
- Rare IAM role usage
- Rare process parent-child relationships

### Statistical / Time-Series Detection
- Volume spikes vs baseline
- Cardinality deviation
- Hourly/weekly seasonal anomalies
- Population analysis (user vs peer group)

---

## Step 3: Evaluation Metrics

- Precision
- Recall
- False positive rate
- Detection latency
- Threshold sensitivity sweep
- Anomaly score distribution

---

# Goal

Create a reproducible benchmark framework for evaluating:

- Elasticsearch aggregation performance
- Elastic Security threshold rules
- EQL-based detection logic
- Elastic ML anomaly detection
- Cloud-specific detection engineering

---

Please comment if additional cloud providers (GCP, OCI, etc.) or SaaS audit logs should be included.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Datasets for Aggregation & Anomaly-Based Elastic Security Detections #186

Summary

🌐 Network Datasets

1️⃣ CSE-CIC-IDS2018

2️⃣ UNSW-NB15

3️⃣ CTU-13 Botnet Dataset

4️⃣ UGR’16 Dataset

☁️ Cloud / Audit Log Datasets

5️⃣ AWS Open Data Registry (CloudTrail & Related Logs)

Why Use It

6️⃣ Rhino Security Labs – CloudGoat (Cloud Attack Scenarios)

Why Use It

7️⃣ Azure AD / Microsoft Audit Log Samples

Why Use It

🖥️ Host / User Behavior Datasets

8️⃣ CERT Insider Threat Dataset

9️⃣ LANL Authentication Dataset

📈 Time-Series / Anomaly Detection Benchmarks

🔟 Numenta Anomaly Benchmark (NAB)

Proposed Elastic Benchmark Plan

Step 1: Normalize to ECS

Step 2: Detection Categories

Aggregation-Based Rules

Statistical / Time-Series Detection

Step 3: Evaluation Metrics

Goal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark Datasets for Aggregation & Anomaly-Based Elastic Security Detections #186

Description

Summary

🌐 Network Datasets

1️⃣ CSE-CIC-IDS2018

2️⃣ UNSW-NB15

3️⃣ CTU-13 Botnet Dataset

4️⃣ UGR’16 Dataset

☁️ Cloud / Audit Log Datasets

5️⃣ AWS Open Data Registry (CloudTrail & Related Logs)

Why Use It

6️⃣ Rhino Security Labs – CloudGoat (Cloud Attack Scenarios)

Why Use It

7️⃣ Azure AD / Microsoft Audit Log Samples

Why Use It

🖥️ Host / User Behavior Datasets

8️⃣ CERT Insider Threat Dataset

9️⃣ LANL Authentication Dataset

📈 Time-Series / Anomaly Detection Benchmarks

🔟 Numenta Anomaly Benchmark (NAB)

Proposed Elastic Benchmark Plan

Step 1: Normalize to ECS

Step 2: Detection Categories

Aggregation-Based Rules

Statistical / Time-Series Detection

Step 3: Evaluation Metrics

Goal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions