ZOOKEEPER-5036: Add Timeline Metrics Provider for external metrics collection systems#2371
ZOOKEEPER-5036: Add Timeline Metrics Provider for external metrics collection systems#2371himanshumaurya09876 wants to merge 2 commits intoapache:masterfrom
Conversation
037f537 to
1778c05
Compare
|
/build |
There was a problem hiding this comment.
Many thanks, looks really nice already. 👍
I really like the clean separation of concerns: Provider/Sink/Snapshot are well-decomposed. 💪
Can you please add unit tests for the new classes?
Also, in order to be able to load the metrics provider when running from the source clone, we have to add it to the classpath. See:
| <parent> | ||
| <groupId>org.apache.zookeeper</groupId> | ||
| <artifactId>zookeeper-metrics-providers</artifactId> | ||
| <version>3.9.3</version> |
There was a problem hiding this comment.
Shouldn't this match with other projects to refer to 3.10.0-SNAPSHOT?
| <version>3.9.3</version> | |
| <version>3.10.0-SNAPSHOT</version> |
This also seems to cause Maven compilation problem.
| /** | ||
| * Returns all counter metrics in this snapshot. | ||
| * | ||
| * @return an unmodifiable view of the counters map |
There was a problem hiding this comment.
This claims to return "an unmodifiable view" but return the raw HashMap directly. Should be wrapped with Collections.unmodifiableMap() or remove this claim here?
The same applies to getGauges(), getSummaries().
| * <p>This context reuses existing metric implementations from zookeeper-server | ||
| * (SimpleCounter, AvgMinMaxCounter, etc.) to ensure consistent behavior with | ||
| * other metrics providers.</p> |
There was a problem hiding this comment.
I like that you reused the existing metric classes (SimpleCounter, AvgMinMaxCounter, etc.) 👍
| LOG.warn("Timeline sink class not found: {}. Timeline metrics will be disabled. " | ||
| + "To enable Timeline metrics, ensure the sink implementation JAR is available on the classpath. " | ||
| + "ZooKeeper will continue to operate normally without Timeline metrics.", sinkClassName); | ||
| this.sink = null; |
There was a problem hiding this comment.
As I see when the sink class isn't found, the provider logs a warning but continues. It can be like that but PrometheusMetricsProvider throws an Exception when misconfigured and ZooKeeper will fail to start. Would it maybe make sense to be consistent in behavior? What do you all think?
There was a problem hiding this comment.
Good point about consistency. I deliberately chose the "log warning and continue" approach for Timeline metrics because:
- Different Architecture: Prometheus requires an embedded HTTP server (Jetty) to function - if that fails, the whole feature is broken. Timeline just needs a scheduler and pushes metrics out, so even without a sink, ZooKeeper's core functionality isn't affected.
- External Dependency Model: Timeline's sink is intentionally designed to be loaded from external JARs (not compile-time dependencies). This allows ZooKeeper to ship without any Timeline-specific dependencies, and users add the sink JAR only when needed. If we fail hard on missing sink, ZooKeeper won't start even though the user might not care about Timeline metrics at all.
- Graceful Degradation: The current behavior allows a deployment to work even if:
- The sink JAR isn't deployed yet
- There's a temporary classpath issue
- Timeline metrics are configured but not currently needed
There was a problem hiding this comment.
Many thanks, I completely see your points. And I really like that ZK not need to depend on Timeline-specific dependencies. 👍
I was just thinking that if I explicitly enable the timeline metrics provider but forgot to add the sink to the classpath then it would be harder for me to realize that the metrics are not sent and I have a problem when there is only a warn in the log compared to when ZK fails to start. 🤔
But I can see that having graceful degradation can be more useful.
9b8cb87 to
cf9fb2b
Compare
…llection systems - Unit tests and configs
cf9fb2b to
70f68bb
Compare
Added a new Timeline Metrics Provider that enables ZooKeeper to export metrics to external Timeline-based metrics collection systems such as Apache Ambari Metrics Collector and YARN Timeline Service.
Validated it using Ambari Metrics, I was able to receive Zookeeper Metrics in Ambari and visualise it on Grafana dashboards.
Jira :- ZOOKEEPER-5036