Skip to content

Add flag evaluation metrics via OTel counter and OpenFeature Hook#11040

Open
typotter wants to merge 16 commits intomasterfrom
typo/evaluations-logging
Open

Add flag evaluation metrics via OTel counter and OpenFeature Hook#11040
typotter wants to merge 16 commits intomasterfrom
typo/evaluations-logging

Conversation

@typotter
Copy link
Copy Markdown
Contributor

@typotter typotter commented Apr 2, 2026

What Does This Do

Records a feature_flag.evaluations OTel counter metric on every flag evaluation via an OpenFeature finallyAfter hook. The hook captures all evaluation paths including type mismatches that occur above the provider level in the OpenFeature SDK pipeline.

Creates a dedicated SdkMeterProvider with an OtlpHttpMetricExporter that sends metrics directly to the DD Agent's OTLP endpoint (/v1/metrics). This avoids the agent's OTel class shading (io.opentelemetry.api.*datadog.trace.bootstrap.otel.api.*) which prevents using GlobalOpenTelemetry from the published dd-openfeature jar.

Metric attributes:

Attribute When present Value
feature_flag.key Always Flag key
feature_flag.result.variant Always Variant key (empty string if null)
feature_flag.result.reason Always Reason lowercased
error.type On error ErrorCode lowercased
feature_flag.result.allocation_key When present Allocation key from flag metadata

New files: FlagEvalMetrics.java, FlagEvalHook.java, FlagEvalMetricsTest.java, FlagEvalHookTest.java
Modified files: Provider.java (adds getProviderHooks()), ProviderTest.java, build.gradle.kts

Motivation

Evaluation metrics allow tracking how many times flags are evaluated, with which results, across sessions. This is the Java implementation of the evaluation logging spec (FFL-1942), matching the existing Python (dd-trace-py#17029) and Go (dd-trace-go#4489) implementations.

System tests: 11/17 pass. The 6 remaining failures are pre-existing DDEvaluator gaps (reason mapping, parse error codes) addressed in separate PRs (#11036, #10971).

References:

Additional Notes

  • OTel SDK dependencies (opentelemetry-sdk-metrics, opentelemetry-exporter-otlp) are compileOnly — applications must include them on the classpath for metrics to flow. Falls back to silent no-op when absent.
  • Export interval: 10s (matching Go SDK and EVALLOG.4 spec)
  • Endpoint resolution follows OTel spec: OTEL_EXPORTER_OTLP_METRICS_ENDPOINTOTEL_EXPORTER_OTLP_ENDPOINT + /v1/metricshttp://localhost:4318/v1/metrics

Contributor Checklist

  • Format the title according to the contribution guidelines
  • Assign the type: and (comp: or inst:) labels
  • Avoid using close, fix, or any linking keywords when referencing an issue

Jira ticket: FFL-1942

@typotter typotter added type: feature request tag: ai generated Largely based on code generated by an AI or LLM comp: openfeature OpenFeature labels Apr 2, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Apr 8, 2026

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master typo/evaluations-logging
git_commit_date 1776183543 1776183132
git_commit_sha f89a0b26cc 93af7a8
release_version 1.62.0-SNAPSHOT~9f89a0b26cc 1.62.0-SNAPSHOT~93af7a8bd4
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1776184837 1776184837
ci_job_id 1594357445 1594357445
ci_pipeline_id 107638013 107638013
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-tm8gb8ws 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-tm8gb8ws 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 58 metrics, 13 unstable metrics.

Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~93af7a8bd4, baseline=1.62.0-SNAPSHOT~9f89a0b26cc

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.055 s) : 0, 1054701
Total [baseline] (8.838 s) : 0, 8837845
Agent [candidate] (1.056 s) : 0, 1056172
Total [candidate] (8.845 s) : 0, 8844645
section iast
Agent [baseline] (1.225 s) : 0, 1225204
Total [baseline] (9.579 s) : 0, 9579185
Agent [candidate] (1.225 s) : 0, 1224723
Total [candidate] (9.579 s) : 0, 9578576
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.055 s -
Agent iast 1.225 s 170.503 ms (16.2%)
Total tracing 8.838 s -
Total iast 9.579 s 741.341 ms (8.4%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.056 s -
Agent iast 1.225 s 168.551 ms (16.0%)
Total tracing 8.845 s -
Total iast 9.579 s 733.931 ms (8.3%)
gantt
    title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~93af7a8bd4, baseline=1.62.0-SNAPSHOT~9f89a0b26cc

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.228 ms) : 0, 1228
crashtracking [candidate] (1.209 ms) : 0, 1209
BytebuddyAgent [baseline] (632.642 ms) : 0, 632642
BytebuddyAgent [candidate] (632.853 ms) : 0, 632853
AgentMeter [baseline] (29.478 ms) : 0, 29478
AgentMeter [candidate] (29.328 ms) : 0, 29328
GlobalTracer [baseline] (248.626 ms) : 0, 248626
GlobalTracer [candidate] (248.492 ms) : 0, 248492
AppSec [baseline] (32.06 ms) : 0, 32060
AppSec [candidate] (32.1 ms) : 0, 32100
Debugger [baseline] (59.155 ms) : 0, 59155
Debugger [candidate] (59.319 ms) : 0, 59319
Remote Config [baseline] (597.5 µs) : 0, 598
Remote Config [candidate] (592.601 µs) : 0, 593
Telemetry [baseline] (8.069 ms) : 0, 8069
Telemetry [candidate] (8.096 ms) : 0, 8096
Flare Poller [baseline] (6.769 ms) : 0, 6769
Flare Poller [candidate] (8.14 ms) : 0, 8140
section iast
crashtracking [baseline] (1.227 ms) : 0, 1227
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (801.631 ms) : 0, 801631
BytebuddyAgent [candidate] (802.325 ms) : 0, 802325
AgentMeter [baseline] (11.402 ms) : 0, 11402
AgentMeter [candidate] (11.419 ms) : 0, 11419
GlobalTracer [baseline] (239.371 ms) : 0, 239371
GlobalTracer [candidate] (239.338 ms) : 0, 239338
IAST [baseline] (25.892 ms) : 0, 25892
IAST [candidate] (25.864 ms) : 0, 25864
AppSec [baseline] (28.785 ms) : 0, 28785
AppSec [candidate] (30.229 ms) : 0, 30229
Debugger [baseline] (63.178 ms) : 0, 63178
Debugger [candidate] (62.446 ms) : 0, 62446
Remote Config [baseline] (1.164 ms) : 0, 1164
Remote Config [candidate] (536.498 µs) : 0, 536
Telemetry [baseline] (12.846 ms) : 0, 12846
Telemetry [candidate] (11.477 ms) : 0, 11477
Flare Poller [baseline] (3.415 ms) : 0, 3415
Flare Poller [candidate] (3.558 ms) : 0, 3558
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~93af7a8bd4, baseline=1.62.0-SNAPSHOT~9f89a0b26cc

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.056 s) : 0, 1056136
Total [baseline] (11.171 s) : 0, 11170559
Agent [candidate] (1.055 s) : 0, 1055112
Total [candidate] (11.069 s) : 0, 11068635
section appsec
Agent [baseline] (1.249 s) : 0, 1249109
Total [baseline] (11.125 s) : 0, 11125396
Agent [candidate] (1.247 s) : 0, 1246854
Total [candidate] (11.064 s) : 0, 11063540
section iast
Agent [baseline] (1.243 s) : 0, 1242623
Total [baseline] (11.297 s) : 0, 11296861
Agent [candidate] (1.23 s) : 0, 1229872
Total [candidate] (11.267 s) : 0, 11267069
section profiling
Agent [baseline] (1.187 s) : 0, 1187372
Total [baseline] (11.169 s) : 0, 11168511
Agent [candidate] (1.182 s) : 0, 1182329
Total [candidate] (11.051 s) : 0, 11050685
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.056 s -
Agent appsec 1.249 s 192.973 ms (18.3%)
Agent iast 1.243 s 186.487 ms (17.7%)
Agent profiling 1.187 s 131.236 ms (12.4%)
Total tracing 11.171 s -
Total appsec 11.125 s -45.163 ms (-0.4%)
Total iast 11.297 s 126.302 ms (1.1%)
Total profiling 11.169 s -2.048 ms (-0.0%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.055 s -
Agent appsec 1.247 s 191.742 ms (18.2%)
Agent iast 1.23 s 174.759 ms (16.6%)
Agent profiling 1.182 s 127.217 ms (12.1%)
Total tracing 11.069 s -
Total appsec 11.064 s -5.094 ms (-0.0%)
Total iast 11.267 s 198.434 ms (1.8%)
Total profiling 11.051 s -17.949 ms (-0.2%)
gantt
    title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~93af7a8bd4, baseline=1.62.0-SNAPSHOT~9f89a0b26cc

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.231 ms) : 0, 1231
crashtracking [candidate] (1.217 ms) : 0, 1217
BytebuddyAgent [baseline] (632.767 ms) : 0, 632767
BytebuddyAgent [candidate] (631.1 ms) : 0, 631100
AgentMeter [baseline] (29.43 ms) : 0, 29430
AgentMeter [candidate] (29.416 ms) : 0, 29416
GlobalTracer [baseline] (249.195 ms) : 0, 249195
GlobalTracer [candidate] (248.948 ms) : 0, 248948
AppSec [baseline] (31.894 ms) : 0, 31894
AppSec [candidate] (32.057 ms) : 0, 32057
Debugger [baseline] (60.094 ms) : 0, 60094
Debugger [candidate] (60.149 ms) : 0, 60149
Remote Config [baseline] (596.779 µs) : 0, 597
Remote Config [candidate] (609.218 µs) : 0, 609
Telemetry [baseline] (8.1 ms) : 0, 8100
Telemetry [candidate] (8.163 ms) : 0, 8163
Flare Poller [baseline] (6.717 ms) : 0, 6717
Flare Poller [candidate] (7.34 ms) : 0, 7340
section appsec
crashtracking [baseline] (1.226 ms) : 0, 1226
crashtracking [candidate] (1.214 ms) : 0, 1214
BytebuddyAgent [baseline] (661.92 ms) : 0, 661920
BytebuddyAgent [candidate] (661.223 ms) : 0, 661223
AgentMeter [baseline] (12.056 ms) : 0, 12056
AgentMeter [candidate] (12.078 ms) : 0, 12078
GlobalTracer [baseline] (249.511 ms) : 0, 249511
GlobalTracer [candidate] (248.932 ms) : 0, 248932
IAST [baseline] (24.618 ms) : 0, 24618
IAST [candidate] (24.574 ms) : 0, 24574
AppSec [baseline] (184.913 ms) : 0, 184913
AppSec [candidate] (183.915 ms) : 0, 183915
Debugger [baseline] (65.792 ms) : 0, 65792
Debugger [candidate] (65.715 ms) : 0, 65715
Remote Config [baseline] (629.984 µs) : 0, 630
Remote Config [candidate] (597.85 µs) : 0, 598
Telemetry [baseline] (8.582 ms) : 0, 8582
Telemetry [candidate] (8.69 ms) : 0, 8690
Flare Poller [baseline] (3.559 ms) : 0, 3559
Flare Poller [candidate] (3.566 ms) : 0, 3566
section iast
crashtracking [baseline] (1.236 ms) : 0, 1236
crashtracking [candidate] (1.235 ms) : 0, 1235
BytebuddyAgent [baseline] (814.562 ms) : 0, 814562
BytebuddyAgent [candidate] (805.386 ms) : 0, 805386
AgentMeter [baseline] (11.66 ms) : 0, 11660
AgentMeter [candidate] (11.449 ms) : 0, 11449
GlobalTracer [baseline] (242.232 ms) : 0, 242232
GlobalTracer [candidate] (240.501 ms) : 0, 240501
IAST [baseline] (26.233 ms) : 0, 26233
IAST [candidate] (26.718 ms) : 0, 26718
AppSec [baseline] (31.186 ms) : 0, 31186
AppSec [candidate] (30.881 ms) : 0, 30881
Debugger [baseline] (60.677 ms) : 0, 60677
Debugger [candidate] (61.309 ms) : 0, 61309
Remote Config [baseline] (525.706 µs) : 0, 526
Remote Config [candidate] (525.674 µs) : 0, 526
Telemetry [baseline] (13.537 ms) : 0, 13537
Telemetry [candidate] (11.911 ms) : 0, 11911
Flare Poller [baseline] (3.55 ms) : 0, 3550
Flare Poller [candidate] (3.514 ms) : 0, 3514
section profiling
crashtracking [baseline] (1.187 ms) : 0, 1187
crashtracking [candidate] (1.188 ms) : 0, 1188
BytebuddyAgent [baseline] (693.378 ms) : 0, 693378
BytebuddyAgent [candidate] (690.17 ms) : 0, 690170
AgentMeter [baseline] (9.128 ms) : 0, 9128
AgentMeter [candidate] (9.065 ms) : 0, 9065
GlobalTracer [baseline] (207.772 ms) : 0, 207772
GlobalTracer [candidate] (206.849 ms) : 0, 206849
AppSec [baseline] (32.631 ms) : 0, 32631
AppSec [candidate] (32.53 ms) : 0, 32530
Debugger [baseline] (65.809 ms) : 0, 65809
Debugger [candidate] (65.561 ms) : 0, 65561
Remote Config [baseline] (569.285 µs) : 0, 569
Remote Config [candidate] (579.628 µs) : 0, 580
Telemetry [baseline] (7.871 ms) : 0, 7871
Telemetry [candidate] (7.846 ms) : 0, 7846
Flare Poller [baseline] (3.572 ms) : 0, 3572
Flare Poller [candidate] (3.594 ms) : 0, 3594
ProfilingAgent [baseline] (94.043 ms) : 0, 94043
ProfilingAgent [candidate] (93.834 ms) : 0, 93834
Profiling [baseline] (94.604 ms) : 0, 94604
Profiling [candidate] (94.39 ms) : 0, 94390
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master typo/evaluations-logging
git_commit_date 1776183543 1776183132
git_commit_sha f89a0b26cc 93af7a8
release_version 1.62.0-SNAPSHOT~9f89a0b26cc 1.62.0-SNAPSHOT~93af7a8bd4
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1776185317 1776185317
ci_job_id 1594357448 1594357448
ci_pipeline_id 107638013 107638013
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-6mf245k6 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-6mf245k6 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 3 performance improvements and 1 performance regressions! Performance is the same for 17 metrics, 15 unstable metrics.

scenario Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p95 Δ mean throughput candidate mean agg_http_req_duration_p50 candidate mean agg_http_req_duration_p95 candidate mean throughput baseline mean agg_http_req_duration_p50 baseline mean agg_http_req_duration_p95 baseline mean throughput
scenario:load:insecure-bank:iast:high_load worse
[+52.116µs; +125.929µs] or [+2.001%; +4.836%]
unsure
[+26.944µs; +388.657µs] or [+0.354%; +5.108%]
unstable
[-183.362op/s; +89.425op/s] or [-13.393%; +6.532%]
2.693ms 7.817ms 1322.156op/s 2.604ms 7.609ms 1369.125op/s
scenario:load:petclinic:profiling:high_load better
[-973.056µs; -488.906µs] or [-5.230%; -2.628%]
unsure
[-1539.062µs; -170.761µs] or [-5.128%; -0.569%]
unstable
[-16.683op/s; +35.433op/s] or [-6.757%; +14.351%]
17.874ms 29.160ms 256.281op/s 18.605ms 30.014ms 246.906op/s
scenario:load:petclinic:no_agent:high_load better
[-2.950ms; -1.817ms] or [-15.737%; -9.693%]
better
[-5.137ms; -2.407ms] or [-16.339%; -7.656%]
unstable
[+3.383op/s; +60.304op/s] or [+1.390%; +24.769%]
16.362ms 27.666ms 275.312op/s 18.745ms 31.438ms 243.469op/s
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~93af7a8bd4, baseline=1.62.0-SNAPSHOT~9f89a0b26cc
    dateFormat X
    axisFormat %s
section baseline
no_agent (19.173 ms) : 18977, 19368
.   : milestone, 19173,
appsec (18.798 ms) : 18609, 18987
.   : milestone, 18798,
code_origins (17.942 ms) : 17768, 18116
.   : milestone, 17942,
iast (17.878 ms) : 17699, 18057
.   : milestone, 17878,
profiling (18.902 ms) : 18712, 19091
.   : milestone, 18902,
tracing (17.913 ms) : 17733, 18093
.   : milestone, 17913,
section candidate
no_agent (16.942 ms) : 16778, 17106
.   : milestone, 16942,
appsec (18.954 ms) : 18763, 19144
.   : milestone, 18954,
code_origins (18.036 ms) : 17859, 18213
.   : milestone, 18036,
iast (17.909 ms) : 17734, 18085
.   : milestone, 17909,
profiling (18.207 ms) : 18029, 18385
.   : milestone, 18207,
tracing (18.279 ms) : 18098, 18460
.   : milestone, 18279,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 19.173 ms [18.977 ms, 19.368 ms] -
appsec 18.798 ms [18.609 ms, 18.987 ms] -374.583 µs (-2.0%)
code_origins 17.942 ms [17.768 ms, 18.116 ms] -1.23 ms (-6.4%)
iast 17.878 ms [17.699 ms, 18.057 ms] -1.294 ms (-6.8%)
profiling 18.902 ms [18.712 ms, 19.091 ms] -271.316 µs (-1.4%)
tracing 17.913 ms [17.733 ms, 18.093 ms] -1.26 ms (-6.6%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 16.942 ms [16.778 ms, 17.106 ms] -
appsec 18.954 ms [18.763 ms, 19.144 ms] 2.012 ms (11.9%)
code_origins 18.036 ms [17.859 ms, 18.213 ms] 1.094 ms (6.5%)
iast 17.909 ms [17.734 ms, 18.085 ms] 967.383 µs (5.7%)
profiling 18.207 ms [18.029 ms, 18.385 ms] 1.265 ms (7.5%)
tracing 18.279 ms [18.098 ms, 18.46 ms] 1.337 ms (7.9%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~93af7a8bd4, baseline=1.62.0-SNAPSHOT~9f89a0b26cc
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.252 ms) : 1239, 1264
.   : milestone, 1252,
iast (3.344 ms) : 3294, 3393
.   : milestone, 3344,
iast_FULL (6.204 ms) : 6139, 6269
.   : milestone, 6204,
iast_GLOBAL (3.638 ms) : 3584, 3692
.   : milestone, 3638,
profiling (2.439 ms) : 2415, 2462
.   : milestone, 2439,
tracing (1.922 ms) : 1906, 1938
.   : milestone, 1922,
section candidate
no_agent (1.247 ms) : 1235, 1260
.   : milestone, 1247,
iast (3.465 ms) : 3414, 3515
.   : milestone, 3465,
iast_FULL (6.197 ms) : 6133, 6260
.   : milestone, 6197,
iast_GLOBAL (3.612 ms) : 3561, 3663
.   : milestone, 3612,
profiling (2.315 ms) : 2292, 2338
.   : milestone, 2315,
tracing (1.928 ms) : 1912, 1944
.   : milestone, 1928,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.252 ms [1.239 ms, 1.264 ms] -
iast 3.344 ms [3.294 ms, 3.393 ms] 2.092 ms (167.2%)
iast_FULL 6.204 ms [6.139 ms, 6.269 ms] 4.952 ms (395.7%)
iast_GLOBAL 3.638 ms [3.584 ms, 3.692 ms] 2.386 ms (190.6%)
profiling 2.439 ms [2.415 ms, 2.462 ms] 1.187 ms (94.8%)
tracing 1.922 ms [1.906 ms, 1.938 ms] 670.48 µs (53.6%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.247 ms [1.235 ms, 1.26 ms] -
iast 3.465 ms [3.414 ms, 3.515 ms] 2.217 ms (177.8%)
iast_FULL 6.197 ms [6.133 ms, 6.26 ms] 4.95 ms (396.9%)
iast_GLOBAL 3.612 ms [3.561 ms, 3.663 ms] 2.365 ms (189.6%)
profiling 2.315 ms [2.292 ms, 2.338 ms] 1.068 ms (85.6%)
tracing 1.928 ms [1.912 ms, 1.944 ms] 681.116 µs (54.6%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master typo/evaluations-logging
git_commit_date 1776183642 1776183132
git_commit_sha f89a0b26cc 93af7a8
release_version 1.62.0-SNAPSHOT~9f89a0b26cc 1.62.0-SNAPSHOT~93af7a8bd4
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1776185152 1776185152
ci_job_id 1594357450 1594357450
ci_pipeline_id 107638013 107638013
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-49kqbmud 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-49kqbmud 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 1 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 0 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:dacapo:tomcat:appsec better
[-1.427ms; -1.082ms] or [-37.408%; -28.371%]
2.559ms 3.813ms
Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~93af7a8bd4, baseline=1.62.0-SNAPSHOT~9f89a0b26cc
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.5 ms) : 1488, 1511
.   : milestone, 1500,
appsec (3.813 ms) : 3594, 4033
.   : milestone, 3813,
iast (2.294 ms) : 2224, 2363
.   : milestone, 2294,
iast_GLOBAL (2.344 ms) : 2274, 2414
.   : milestone, 2344,
profiling (2.12 ms) : 2064, 2175
.   : milestone, 2120,
tracing (2.097 ms) : 2043, 2151
.   : milestone, 2097,
section candidate
no_agent (1.502 ms) : 1490, 1514
.   : milestone, 1502,
appsec (2.559 ms) : 2504, 2614
.   : milestone, 2559,
iast (2.282 ms) : 2213, 2351
.   : milestone, 2282,
iast_GLOBAL (2.352 ms) : 2282, 2422
.   : milestone, 2352,
profiling (2.112 ms) : 2057, 2167
.   : milestone, 2112,
tracing (2.098 ms) : 2045, 2152
.   : milestone, 2098,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.5 ms [1.488 ms, 1.511 ms] -
appsec 3.813 ms [3.594 ms, 4.033 ms] 2.314 ms (154.3%)
iast 2.294 ms [2.224 ms, 2.363 ms] 793.94 µs (52.9%)
iast_GLOBAL 2.344 ms [2.274 ms, 2.414 ms] 844.332 µs (56.3%)
profiling 2.12 ms [2.064 ms, 2.175 ms] 619.957 µs (41.3%)
tracing 2.097 ms [2.043 ms, 2.151 ms] 597.476 µs (39.8%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.502 ms [1.49 ms, 1.514 ms] -
appsec 2.559 ms [2.504 ms, 2.614 ms] 1.057 ms (70.4%)
iast 2.282 ms [2.213 ms, 2.351 ms] 779.955 µs (51.9%)
iast_GLOBAL 2.352 ms [2.282 ms, 2.422 ms] 850.068 µs (56.6%)
profiling 2.112 ms [2.057 ms, 2.167 ms] 609.942 µs (40.6%)
tracing 2.098 ms [2.045 ms, 2.152 ms] 596.529 µs (39.7%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~93af7a8bd4, baseline=1.62.0-SNAPSHOT~9f89a0b26cc
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.681 s) : 15681000, 15681000
.   : milestone, 15681000,
appsec (14.64 s) : 14640000, 14640000
.   : milestone, 14640000,
iast (18.443 s) : 18443000, 18443000
.   : milestone, 18443000,
iast_GLOBAL (18.058 s) : 18058000, 18058000
.   : milestone, 18058000,
profiling (15.445 s) : 15445000, 15445000
.   : milestone, 15445000,
tracing (14.892 s) : 14892000, 14892000
.   : milestone, 14892000,
section candidate
no_agent (14.941 s) : 14941000, 14941000
.   : milestone, 14941000,
appsec (14.948 s) : 14948000, 14948000
.   : milestone, 14948000,
iast (18.341 s) : 18341000, 18341000
.   : milestone, 18341000,
iast_GLOBAL (17.978 s) : 17978000, 17978000
.   : milestone, 17978000,
profiling (15.038 s) : 15038000, 15038000
.   : milestone, 15038000,
tracing (14.942 s) : 14942000, 14942000
.   : milestone, 14942000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.681 s [15.681 s, 15.681 s] -
appsec 14.64 s [14.64 s, 14.64 s] -1.041 s (-6.6%)
iast 18.443 s [18.443 s, 18.443 s] 2.762 s (17.6%)
iast_GLOBAL 18.058 s [18.058 s, 18.058 s] 2.377 s (15.2%)
profiling 15.445 s [15.445 s, 15.445 s] -236.0 ms (-1.5%)
tracing 14.892 s [14.892 s, 14.892 s] -789.0 ms (-5.0%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 14.941 s [14.941 s, 14.941 s] -
appsec 14.948 s [14.948 s, 14.948 s] 7.0 ms (0.0%)
iast 18.341 s [18.341 s, 18.341 s] 3.4 s (22.8%)
iast_GLOBAL 17.978 s [17.978 s, 17.978 s] 3.037 s (20.3%)
profiling 15.038 s [15.038 s, 15.038 s] 97.0 ms (0.6%)
tracing 14.942 s [14.942 s, 14.942 s] 1.0 ms (0.0%)

typotter added 5 commits April 9, 2026 09:29
Record a `feature_flag.evaluations` OTel counter on every flag evaluation
using an OpenFeature `finallyAfter` hook. The hook captures all evaluation
paths including type mismatches that occur above the provider level.

Attributes: feature_flag.key, feature_flag.result.variant,
feature_flag.result.reason, error.type (on error),
feature_flag.result.allocation_key (when present).

Counter is a no-op when DD_METRICS_OTEL_ENABLED is false or
opentelemetry-api is absent from the classpath.
Replace GlobalOpenTelemetry.getMeterProvider() with a dedicated
SdkMeterProvider + OtlpHttpMetricExporter that sends metrics
directly to the DD Agent's OTLP endpoint (default :4318/v1/metrics).

This avoids the agent's OTel class shading issue where the agent
relocates io.opentelemetry.api.* to datadog.trace.bootstrap.otel.api.*,
making GlobalOpenTelemetry calls from the dd-openfeature jar hit the
unshaded no-op provider instead of the agent's shim.

Requires opentelemetry-sdk-metrics and opentelemetry-exporter-otlp
on the application classpath. Falls back to no-op if absent.

System tests: 11/17 pass. 6 failures are pre-existing DDEvaluator
gaps (reason mapping, parse errors, type mismatch strictness).
- Add explicit null guard for details in FlagEvalHook.finallyAfter()
- Add OTEL_EXPORTER_OTLP_ENDPOINT generic env var fallback with
  /v1/metrics path appended (per OTel spec fallback chain)
- Add comments clarifying signal-specific vs generic endpoint behavior
When the OTel SDK jars are not on the application classpath,
loading FlagEvalMetrics fails because field types reference
OTel SDK classes (SdkMeterProvider). This propagated as an
uncaught NoClassDefFoundError from the Provider constructor,
crashing provider initialization.

Fix:
- Change meterProvider field type from SdkMeterProvider to
  Closeable (always on classpath), use local SdkMeterProvider
  variable inside try block
- Catch NoClassDefFoundError in Provider constructor when
  creating FlagEvalMetrics
- Null-safe getProviderHooks() and shutdown() when metrics
  is null
FlagEvalHook references FlagEvalMetrics in its field declaration.
On JVMs that eagerly verify field types during class loading,
constructing FlagEvalHook outside the try/catch could throw
NoClassDefFoundError if OTel classes failed to load. Moving it
inside the try block ensures both metrics and hook are null-safe
when OTel is absent.
@typotter typotter force-pushed the typo/evaluations-logging branch from 4cb7bab to 69c5529 Compare April 9, 2026 15:30
Documents the published artifact setup, evaluation metrics
dependencies (opentelemetry-sdk-metrics, opentelemetry-exporter-otlp),
OTLP endpoint configuration, metric attributes, and requirements.
@typotter typotter marked this pull request as ready for review April 9, 2026 17:41
@typotter typotter requested a review from a team as a code owner April 9, 2026 17:41
@typotter typotter requested review from leoromanovsky and sameerank and removed request for a team April 9, 2026 17:41
System.getenv() is forbidden by the project's forbiddenApis rules.
Replace with ConfigHelper.env() which is the approved way to read
environment variables. Add config-utils as compileOnly dependency.
Copy link
Copy Markdown

@sameerank sameerank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping with this! I agree it was a good idea to break out the system test fixes into separate PRs to keep this one brief and focused

.setUnit(METRIC_UNIT)
.setDescription(METRIC_DESC)
.build();
} catch (NoClassDefFoundError | Exception e) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to just let the error flow to the Provider class since it's already capturing the exception?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching and logging here lets the Metrics driver still operate as a no-op.

Copy link
Copy Markdown
Member

@manuel-alvarez-alvarez manuel-alvarez-alvarez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just left a couple of minor comments

- Remove transitive openfeature-sdk dep from README setup section
- Import ErrorCode at top of FlagEvalHook instead of inline FQN
- Add Options.evaluationLogging(boolean) — default true per EVALLOG.12
- When disabled: no metrics, no hook, no error
- When enabled + OTel SDK missing: log.error with instructions to
  add deps or disable, degrade to no-op (matches Go/Python pattern)
- When enabled + OTel init failure: log.error with message, degrade
- Remove silent catch — FlagEvalMetrics now logs at error level for
  NoClassDefFoundError and at error level for other init failures
The OTel SDK defaults to DELTA temporality for counters. The Datadog
agent converts OTLP delta monotonic sums to rate metrics by dividing
by the export interval (10s). Five evaluations in under 1s produce
~0.5, which rounds to zero in the points payload.

Force CUMULATIVE temporality on the OtlpHttpMetricExporter so the
agent receives an absolute count rather than a rate, making
test_ffe_eval_metric_count reliable.
- Remove exporterIsConfiguredWithCumulativeTemporalityForCounters
  test (tested OTel SDK, not our code; the integration test is the
  real regression guard)
- Fix Provider catch block comment to reflect that FlagEvalMetrics
  may not have logged if we reach this point
- Include exception in log.error calls for NoClassDefFoundError and
  general Exception to aid debugging
- Reword InMemoryMetricReader comment for precision
- Add debug log to FlagEvalMetrics.record() catch block so metric
  recording failures are visible in debug logs
- Widen Provider catch from NoClassDefFoundError to LinkageError to
  cover IncompatibleClassChangeError and other classloader issues
  from incompatible OTel SDK versions
- Add slf4j logger to Provider and log at error level when the
  fallback catch fires
The Provider catch is defense-in-depth for when FlagEvalMetrics
class itself can't load (OTel API absent entirely). The detailed
error message is logged inside FlagEvalMetrics when it CAN load
but SDK init fails. Using error level here caused the openfeature
smoke test to fail (it asserts no ERROR entries in application logs).
@typotter
Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 bot commented Apr 14, 2026

View all feedbacks in Devflow UI.

2026-04-14 14:55:56 UTC ℹ️ Start processing command /merge


2026-04-14 14:56:03 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 2h (p90).


2026-04-14 15:38:10 UTC ℹ️ MergeQueue: Readding this merge request to the queue because another merge request processed with yours failed. No action is needed from your side.


2026-04-14 16:01:23 UTC ℹ️ MergeQueue: Retrying because an high priority merge request needed to be processed first. No action is needed from your side.


2026-04-14 16:01:27 UTC ⚠️ MergeQueue: This merge request build was cancelled

tyler.potter@datadoghq.com cancelled this merge request build

Evaluation metrics are always attempted. If the OTel SDK is absent,
the provider degrades gracefully with a warning. There is no user-
facing toggle to disable metrics — this matches the Go and Python
SDKs which also always attempt metrics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: openfeature OpenFeature tag: ai generated Largely based on code generated by an AI or LLM type: feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants