[observability] Diagnose NodeClockNotSynchronising on ACP Ubuntu nodes by jing2uo · Pull Request #485 · alauda/knowledge

jing2uo · 2026-04-24T23:19:07Z

新增一篇 ACP KB 文章。

coderabbitai · 2026-04-24T23:19:13Z

Walkthrough

This PR adds a new troubleshooting guide documenting how to diagnose and resolve the NodeClockNotSynchronising alert on ACP Ubuntu nodes. The guide explains the upstream Prometheus alert metric, provides diagnostic commands using chrony and promtool, and recommends configuration changes for persistent timing data logging.

Changes

NodeClockNotSynchronising Troubleshooting Documentation

Layer / File(s)	Summary
NodeClockNotSynchronising diagnostic guide `docs/en/solutions/Diagnose_NodeClockNotSynchronising_on_ACP_Ubuntu_nodes.md`	New document explaining the alert meaning from node-exporter's `node_timex_sync_status` metric, concrete chrony/chronyc diagnostic commands for DNS, connectivity, and service status issues, promtool steps to validate kernel `STA_UNSYNC` state, and chrony configuration changes to enable CSV logging for post-recovery analysis.

Estimated Code Review Effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 A guide was born, so clear and bright,
For clocks that dance but lose the light,
With chronyc spells and promtool's sight,
We sync the nodes—all will be right! ✨⏰

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and specifically describes the main change: adding a troubleshooting guide for the NodeClockNotSynchronising alert on ACP Ubuntu nodes, which matches the documentation file added.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kb/2026-02/troubleshooting-the-nodeclocknotsynchron

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/en/solutions/Diagnose_NodeClockNotSynchronising_on_ACP_Ubuntu_nodes.md`:
- Line 47: Change the sentence that currently says the metric is “independent of
whether Prometheus is currently scraping” to clearly separate metric semantics
from query freshness: state that the metric value itself is produced by
adjtimex(2) and that a value of 0 corresponds to the STA_UNSYNC bit, but that
using promtool query instant (or any Prometheus query) will only return a sample
if the target has been scraped recently — results can be stale or missing even
though the kernel state exists. Update the wording around the
adjtimex(2)/STA_UNSYNC mention and add a short clause about scrape freshness and
staleness behavior for promtool query instant.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1ee6d99a-6a01-4b3c-9ce4-dcdba85eafed

📥 Commits

Reviewing files that changed from the base of the PR and between 181f78f and e06b64e.

📒 Files selected for processing (1)

docs/en/solutions/Diagnose_NodeClockNotSynchronising_on_ACP_Ubuntu_nodes.md

coderabbitai · 2026-05-17T02:42:11Z

+
+## Diagnostic Steps
+
+Confirm first that the kernel itself reports the clock as unsynchronised — that is the actual condition the alert is reacting to. The metric is produced from `adjtimex(2)` so its value is independent of whether Prometheus is currently scraping; a value of `0` corresponds to the `STA_UNSYNC` bit being set. Reference syntax for the in-cluster query (substitute the installed prometheus pod name and node label):


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify scrape-dependence of the queried value.

At Line 47, saying the value is “independent of whether Prometheus is currently scraping” is misleading for operators using promtool query instant; the returned sample still depends on successful recent scrapes (or can be stale/missing). Please reword to separate metric semantics from query freshness.

✏️ Proposed wording

-Confirm first that the kernel itself reports the clock as unsynchronised — that is the actual condition the alert is reacting to. The metric is produced from `adjtimex(2)` so its value is independent of whether Prometheus is currently scraping; a value of `0` corresponds to the `STA_UNSYNC` bit being set. Reference syntax for the in-cluster query (substitute the installed prometheus pod name and node label): +Confirm first that the kernel itself reports the clock as unsynchronised — that is the actual condition the alert is reacting to. The metric is produced from `adjtimex(2)` and a value of `0` corresponds to the `STA_UNSYNC` bit being set; however, the value returned by `promtool query instant` reflects the most recent successfully scraped sample. Reference syntax for the in-cluster query (substitute the installed prometheus pod name and node label):

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/en/solutions/Diagnose_NodeClockNotSynchronising_on_ACP_Ubuntu_nodes.md` at line 47, Change the sentence that currently says the metric is “independent of whether Prometheus is currently scraping” to clearly separate metric semantics from query freshness: state that the metric value itself is produced by adjtimex(2) and that a value of 0 corresponds to the STA_UNSYNC bit, but that using promtool query instant (or any Prometheus query) will only return a sample if the target has been scraped recently — results can be stale or missing even though the kernel state exists. Update the wording around the adjtimex(2)/STA_UNSYNC mention and add a short clause about scrape freshness and staleness behavior for promtool query instant.

jing2uo requested review from leizhuc and vivindu-1213 April 24, 2026 23:19

jing2uo temporarily deployed to translate April 24, 2026 23:19 — with GitHub Actions Inactive

jing2uo requested review from shibalu and zhhray April 24, 2026 23:19

jing2uo requested a review from chinaran April 24, 2026 23:19

jing2uo temporarily deployed to translate May 2, 2026 12:51 — with GitHub Actions Inactive

jing2uo temporarily deployed to translate May 2, 2026 16:32 — with GitHub Actions Inactive

[observability] Troubleshooting the NodeClockNotSynchronising alert

59629c3

jing2uo force-pushed the kb/2026-02/troubleshooting-the-nodeclocknotsynchron branch from ae9cefb to 59629c3 Compare May 2, 2026 16:47

jing2uo temporarily deployed to translate May 2, 2026 16:47 — with GitHub Actions Inactive

[observability] Diagnose NodeClockNotSynchronising on ACP Ubuntu nodes

e06b64e

jing2uo changed the title ~~[observability] Troubleshooting the NodeClockNotSynchronising alert~~ [observability] Diagnose NodeClockNotSynchronising on ACP Ubuntu nodes May 17, 2026

jing2uo temporarily deployed to translate May 17, 2026 02:40 — with GitHub Actions Inactive

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[observability] Diagnose NodeClockNotSynchronising on ACP Ubuntu nodes#485

[observability] Diagnose NodeClockNotSynchronising on ACP Ubuntu nodes#485
jing2uo wants to merge 2 commits into
mainfrom
kb/2026-02/troubleshooting-the-nodeclocknotsynchron

jing2uo commented Apr 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		## Diagnostic Steps

		Confirm first that the kernel itself reports the clock as unsynchronised — that is the actual condition the alert is reacting to. The metric is produced from `adjtimex(2)` so its value is independent of whether Prometheus is currently scraping; a value of `0` corresponds to the `STA_UNSYNC` bit being set. Reference syntax for the in-cluster query (substitute the installed prometheus pod name and node label):

Conversation

jing2uo commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated Code Review Effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jing2uo commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading