Summary
To benchmark aggregation-based detections, threshold rules, and statistical/time-series anomaly detection in Elastic Security, we should evaluate against established open datasets across multiple domains:
- π Network traffic
- π₯οΈ Host / system logs
- π€ User behavior (UEBA)
- βοΈ Cloud audit logs
- π Time-series anomaly benchmarks
This issue proposes publicly available datasets suitable for:
- π Aggregation-based detection testing (threshold rules)
- π Statistical baselining and deviation detection
- π Time-series anomaly detection benchmarking
- π§ͺ Precision / recall evaluation using labeled data
π Network Datasets
1οΈβ£ CSE-CIC-IDS2018
Type: Network traffic + labeled attack flows
Best For: Threshold detection, aggregation benchmarks, brute-force detection, exfiltration testing
π https://www.unb.ca/cic/datasets/ids-2018.html
2οΈβ£ UNSW-NB15
Type: Network intrusion dataset
π https://research.unsw.edu.au/projects/unsw-nb15-dataset
3οΈβ£ CTU-13 Botnet Dataset
Type: Botnet traffic captures
π https://www.stratosphereips.org/datasets-ctu13
4οΈβ£ UGRβ16 Dataset
Type: ISP-scale NetFlow dataset
π https://nesg.ugr.es/nesg-ugr16/
βοΈ Cloud / Audit Log Datasets
Cloud datasets are particularly useful for benchmarking:
- API call frequency thresholds
- Privilege escalation detection
- Rare IAM activity
- Geographic login anomalies
- Cross-account access detection
- Aggregation-based misuse detection
5οΈβ£ AWS Open Data Registry (CloudTrail & Related Logs)
Type: Public AWS datasets including CloudTrail-style audit logs
π https://registry.opendata.aws/
Why Use It
- Real-world cloud activity logs
- API call records with timestamps and principals
- Suitable for:
terms aggregation on userIdentity
- API call count thresholds
- Rare service usage detection
- Geographic anomaly detection
- Privilege escalation analysis
6οΈβ£ Rhino Security Labs β CloudGoat (Cloud Attack Scenarios)
Type: Open-source cloud attack simulation environment
π https://github.com/RhinoSecurityLabs/cloudgoat
Why Use It
- Simulated AWS attack scenarios
- Generates realistic CloudTrail logs
- Good for:
- IAM privilege escalation detection
- Misconfigured policy detection
- Cross-account access anomaly detection
- Threshold rule testing in cloud environments
7οΈβ£ Azure AD / Microsoft Audit Log Samples
Type: Publicly available Azure AD / M365 audit log samples
Example reference:
π https://learn.microsoft.com/en-us/azure/active-directory/reports-monitoring/
Why Use It
- Authentication logs
- Role assignment logs
- API access logs
- Useful for:
- Failed login aggregation rules
- Rare role assignment detection
- Impossible travel detection
- Privilege grant spike detection
π₯οΈ Host / User Behavior Datasets
8οΈβ£ CERT Insider Threat Dataset
Type: Insider threat simulation
π https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=508099
9οΈβ£ LANL Authentication Dataset
Type: Enterprise authentication logs
π https://csr.lanl.gov/data/auth/
π Time-Series / Anomaly Detection Benchmarks
π Numenta Anomaly Benchmark (NAB)
π https://github.com/numenta/NAB
Proposed Elastic Benchmark Plan
Step 1: Normalize to ECS
Map dataset fields to:
@timestamp
source.ip
destination.ip
user.name
event.action
event.category
cloud.account.id
cloud.provider
network.bytes
process.name
Step 2: Detection Categories
Aggregation-Based Rules
- API call count > threshold (CloudTrail)
- Failed login count > threshold
- Outbound byte sum > threshold
- Rare IAM role usage
- Rare process parent-child relationships
Statistical / Time-Series Detection
- Volume spikes vs baseline
- Cardinality deviation
- Hourly/weekly seasonal anomalies
- Population analysis (user vs peer group)
Step 3: Evaluation Metrics
- Precision
- Recall
- False positive rate
- Detection latency
- Threshold sensitivity sweep
- Anomaly score distribution
Goal
Create a reproducible benchmark framework for evaluating:
- Elasticsearch aggregation performance
- Elastic Security threshold rules
- EQL-based detection logic
- Elastic ML anomaly detection
- Cloud-specific detection engineering
Please comment if additional cloud providers (GCP, OCI, etc.) or SaaS audit logs should be included.
Summary
To benchmark aggregation-based detections, threshold rules, and statistical/time-series anomaly detection in Elastic Security, we should evaluate against established open datasets across multiple domains:
This issue proposes publicly available datasets suitable for:
π Network Datasets
1οΈβ£ CSE-CIC-IDS2018
Type: Network traffic + labeled attack flows
Best For: Threshold detection, aggregation benchmarks, brute-force detection, exfiltration testing
π https://www.unb.ca/cic/datasets/ids-2018.html
2οΈβ£ UNSW-NB15
Type: Network intrusion dataset
π https://research.unsw.edu.au/projects/unsw-nb15-dataset
3οΈβ£ CTU-13 Botnet Dataset
Type: Botnet traffic captures
π https://www.stratosphereips.org/datasets-ctu13
4οΈβ£ UGRβ16 Dataset
Type: ISP-scale NetFlow dataset
π https://nesg.ugr.es/nesg-ugr16/
βοΈ Cloud / Audit Log Datasets
Cloud datasets are particularly useful for benchmarking:
5οΈβ£ AWS Open Data Registry (CloudTrail & Related Logs)
Type: Public AWS datasets including CloudTrail-style audit logs
π https://registry.opendata.aws/
Why Use It
termsaggregation onuserIdentity6οΈβ£ Rhino Security Labs β CloudGoat (Cloud Attack Scenarios)
Type: Open-source cloud attack simulation environment
π https://github.com/RhinoSecurityLabs/cloudgoat
Why Use It
7οΈβ£ Azure AD / Microsoft Audit Log Samples
Type: Publicly available Azure AD / M365 audit log samples
Example reference:
π https://learn.microsoft.com/en-us/azure/active-directory/reports-monitoring/
Why Use It
π₯οΈ Host / User Behavior Datasets
8οΈβ£ CERT Insider Threat Dataset
Type: Insider threat simulation
π https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=508099
9οΈβ£ LANL Authentication Dataset
Type: Enterprise authentication logs
π https://csr.lanl.gov/data/auth/
π Time-Series / Anomaly Detection Benchmarks
π Numenta Anomaly Benchmark (NAB)
π https://github.com/numenta/NAB
Proposed Elastic Benchmark Plan
Step 1: Normalize to ECS
Map dataset fields to:
@timestampsource.ipdestination.ipuser.nameevent.actionevent.categorycloud.account.idcloud.providernetwork.bytesprocess.nameStep 2: Detection Categories
Aggregation-Based Rules
Statistical / Time-Series Detection
Step 3: Evaluation Metrics
Goal
Create a reproducible benchmark framework for evaluating:
Please comment if additional cloud providers (GCP, OCI, etc.) or SaaS audit logs should be included.