Skip to content

docs(platform): add guides related managed apps backups#536

Open
androndo wants to merge 1 commit into
mainfrom
feat/apps-backups-guides
Open

docs(platform): add guides related managed apps backups#536
androndo wants to merge 1 commit into
mainfrom
feat/apps-backups-guides

Conversation

@androndo
Copy link
Copy Markdown
Contributor

@androndo androndo commented May 12, 2026

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guides for application backup and recovery of managed databases (Postgres, MariaDB, ClickHouse, FoundationDB), covering one-off and scheduled backups, status checking, and in-place or copy restores.
    • Added administrator guide for configuring the new backup framework using BackupClass and driver-specific strategies.
    • Deprecated legacy chart-level backup configurations across database applications with guidance to migrate to the new framework.
    • Clarified scope of VM and application backup documentation.

Review Change Stack

@androndo androndo requested review from kvaps and lllamnyp as code owners May 12, 2026 13:02
@netlify
Copy link
Copy Markdown

netlify Bot commented May 12, 2026

Deploy Preview for cozystack ready!

Name Link
🔨 Latest commit 237d627
🔍 Latest deploy log https://app.netlify.com/projects/cozystack/deploys/6a04c71d32b7c20008a9bb45
😎 Deploy Preview https://deploy-preview-536--cozystack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Warning

Rate limit exceeded

@androndo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 58 minutes and 22 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cb5ff950-b31e-45d0-9f7a-a5fe48afdccd

📥 Commits

Reviewing files that changed from the base of the PR and between 4efa0e7 and 237d627.

📒 Files selected for processing (8)
  • content/en/docs/next/applications/backup-and-recovery.md
  • content/en/docs/next/applications/clickhouse.md
  • content/en/docs/next/applications/foundationdb.md
  • content/en/docs/next/applications/mariadb.md
  • content/en/docs/next/applications/postgres.md
  • content/en/docs/next/operations/services/managed-app-backup-configuration.md
  • content/en/docs/next/operations/services/velero-backup-configuration.md
  • content/en/docs/next/virtualization/backup-and-recovery.md
📝 Walkthrough

Walkthrough

This pull request adds comprehensive documentation for Cozystack's backup and recovery framework. It introduces two new guides—one for tenants and one for cluster administrators—covering the BackupClass-driven workflow for data-only backups across Postgres, MariaDB, ClickHouse, and FoundationDB. The changes also update existing application docs with deprecation warnings for legacy chart-level backup configurations.

Changes

Backup and Recovery Framework Documentation

Layer / File(s) Summary
Tenant-facing backup and recovery guide
content/en/docs/next/applications/backup-and-recovery.md
New comprehensive guide walks tenants through discovering available BackupClass resources, provisioning S3 buckets, extracting and configuring per-driver credentials, creating one-off and scheduled backups (BackupJob and Plan), checking backup status, and restoring either in place or to a pre-provisioned copy using RestoreJob. Includes per-database credential setup (Postgres, MariaDB, ClickHouse, FoundationDB), operational caveats, limitations (data-only scope, FoundationDB single-backup constraint, ClickHouse sidecar dependency), and troubleshooting guidance.
Admin backup configuration guide
content/en/docs/next/operations/services/managed-app-backup-configuration.md
New admin-facing guide explains how cluster administrators configure data-only backups using BackupClass and driver-specific backup strategies. Covers prerequisites (controllers, S3-compatible storage, upstream operators), the BackupJob lifecycle, and detailed per-driver setup sections with example manifests and required tenant Secrets for Postgres (CNPG barman), MariaDB (mariadb-operator dumps), ClickHouse (Altinity sidecar), and FoundationDB (blob credentials with backup_agent). Concludes with apply/verification steps and handoff guidance for tenants.
Deprecation notices and scope clarification
content/en/docs/next/applications/postgres.md, content/en/docs/next/applications/mariadb.md, content/en/docs/next/applications/foundationdb.md, content/en/docs/next/applications/clickhouse.md, content/en/docs/next/operations/services/velero-backup-configuration.md, content/en/docs/next/virtualization/backup-and-recovery.md
Adds deprecation warnings to Postgres, MariaDB, FoundationDB, and ClickHouse application docs indicating that chart-level backup.* configuration and legacy restore flows are superseded by the new BackupClass framework. Updates Velero and VM backup documentation to clarify scope boundaries: Velero handles VM-level snapshots (HelmRelease/CRs/PVCs) while the new framework handles operator-native data-only backups for managed applications.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~4 minutes

Poem

🐰 In the warren of backups, we hop and we store,
With BackupClass magic and S3 doors,
Each database bundled, from Postgres to Found,
Now tenants and admins can rest safe and sound!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding documentation guides related to managed application backups across multiple files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/apps-backups-guides

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new data-only backup and recovery framework for managed databases (Postgres, MariaDB, ClickHouse, and FoundationDB) in Cozystack, transitioning from legacy chart-level configurations to a centralized system using BackupClass and strategy-based resources. It provides detailed documentation for both tenants and administrators. Reviewers suggested several improvements, including standardizing FoundationDB account names for better multi-tenancy, enhancing security by using non-root users in backup pods, and clarifying S3 endpoint terminology to prevent configuration errors.

ENDPOINT_FULL=$(jq -r .spec.secretS3.endpoint /tmp/bucket.json)
ENDPOINT_HOSTPORT=${ENDPOINT_FULL#http://}
ENDPOINT_HOSTPORT=${ENDPOINT_HOSTPORT#https://}
ACCOUNT_NAME="${ACCESS_KEY}@${ENDPOINT_HOSTPORT}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using the dynamic $ACCESS_KEY as part of the ACCOUNT_NAME makes it difficult for administrators to pre-configure a cluster-scoped BackupClass, as they would need to know each tenant's access key in advance.

It is recommended to use a fixed, descriptive account name (e.g., fdb-backup) and ensure it matches the accountName parameter defined by the administrator in the BackupClass. This allows the same BackupClass to be used by multiple tenants with their own credentials.

Suggested change
ACCOUNT_NAME="${ACCESS_KEY}@${ENDPOINT_HOSTPORT}"
ACCOUNT_NAME="fdb-backup@${ENDPOINT_HOSTPORT}"

compression: gzip
```

The `endpoint` is **path-style without scheme** (e.g. `seaweedfs-s3.<seaweedfs-namespace>.svc:8333` for the default in-cluster SeaweedFS — substitute the namespace where SeaweedFS is deployed in your environment). Drop the `tls` block entirely when the endpoint serves a publicly-trusted certificate.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The term "path-style" in the context of S3 usually refers to the addressing mode (e.g., s3.amazonaws.com/bucket vs bucket.s3.amazonaws.com). Here, it seems to be used to describe the endpoint format (host and port without scheme). To avoid confusion, it might be clearer to explicitly state that the endpoint should contain only the host and port.

Suggested change
The `endpoint` is **path-style without scheme** (e.g. `seaweedfs-s3.<seaweedfs-namespace>.svc:8333` for the default in-cluster SeaweedFS — substitute the namespace where SeaweedFS is deployed in your environment). Drop the `tls` block entirely when the endpoint serves a publicly-trusted certificate.
The `endpoint` should be provided as **host:port without scheme** (e.g. `seaweedfs-s3.<seaweedfs-namespace>.svc:8333` for the default in-cluster SeaweedFS — substitute the namespace where SeaweedFS is deployed in your environment). Drop the `tls` block entirely when the endpoint serves a publicly-trusted certificate.

cpu: 200m
memory: 256Mi
securityContext:
runAsUser: 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Running as root (runAsUser: 0) is generally discouraged for security reasons and should not be necessary for the FoundationDB backup agent. It is recommended to use the default FoundationDB user ID 4059 to adhere to the principle of least privilege.

Suggested change
runAsUser: 0
runAsUser: 4059

kind: FoundationDB
name: foundationdb-data-strategy
parameters:
accountName: "<api_key>@<endpoint-host>:<port>"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of using a placeholder that implies a dynamic per-tenant value (which is hard to manage in a cluster-scoped resource), it is better to use a fixed account name that matches the instructions provided to tenants in their guide.

Suggested change
accountName: "<api_key>@<endpoint-host>:<port>"
accountName: "fdb-backup@<endpoint-host>:<port>"

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
content/en/docs/next/applications/backup-and-recovery.md (1)

36-43: ⚡ Quick win

Consider adding a language identifier to the fenced code block.

The output example is missing a language identifier. Adding text or console would satisfy linters and improve rendering:

-```
+```text
 NAME                      AGE
 postgres-data-backup      14m
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/en/docs/next/applications/backup-and-recovery.md` around lines 36 -
43, The fenced code block in the backup-and-recovery.md example lacks a language
identifier; update the triple-backtick fence around the output listing (the
block showing NAME / AGE entries like "postgres-data-backup" and "velero") to
include a language such as "text" or "console" (e.g., change ``` to ```text) so
linters/renderers recognize it as plain output.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@content/en/docs/next/applications/backup-and-recovery.md`:
- Around line 36-43: The fenced code block in the backup-and-recovery.md example
lacks a language identifier; update the triple-backtick fence around the output
listing (the block showing NAME / AGE entries like "postgres-data-backup" and
"velero") to include a language such as "text" or "console" (e.g., change ``` to
```text) so linters/renderers recognize it as plain output.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f871a89b-ab72-46ba-b44a-b1b590bb3620

📥 Commits

Reviewing files that changed from the base of the PR and between f466d34 and 4efa0e7.

📒 Files selected for processing (8)
  • content/en/docs/next/applications/backup-and-recovery.md
  • content/en/docs/next/applications/clickhouse.md
  • content/en/docs/next/applications/foundationdb.md
  • content/en/docs/next/applications/mariadb.md
  • content/en/docs/next/applications/postgres.md
  • content/en/docs/next/operations/services/managed-app-backup-configuration.md
  • content/en/docs/next/operations/services/velero-backup-configuration.md
  • content/en/docs/next/virtualization/backup-and-recovery.md

@androndo androndo force-pushed the feat/apps-backups-guides branch from 4efa0e7 to 0b9ead0 Compare May 13, 2026 18:46
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
@androndo androndo force-pushed the feat/apps-backups-guides branch from 0b9ead0 to 237d627 Compare May 13, 2026 18:46
@lllamnyp
Copy link
Copy Markdown
Member

NOT LGTM

Reviewing against main. Single commit 237d627 adds two new guides plus cross-link callouts in four existing application docs.

Primary blocker — tenant guide is unrunnable as a tenant

content/en/docs/next/applications/backup-and-recovery.md instructs tenant users to do two things their roles do not permit:

  1. Read the BucketInfo Secret (lines 78–83):
    kubectl -n tenant-user get secret bucket-db-backups-backup -o jsonpath='{.data.BucketInfo}'
  2. Create per-driver credential Secrets (lines 92–95, 100–102, 107–110, 135–137):
    kubectl -n tenant-user create secret generic my-postgres-cnpg-backup-creds …
    kubectl -n tenant-user create secret generic my-mariadb-mariadb-backup-creds …
    kubectl -n tenant-user create secret generic my-mariadb-mariadb-backup-ca …
    kubectl -n tenant-user create secret generic my-fdb-fdb-backup-creds …

Neither is permitted by the aggregated tenant roles. Verified against packages/system/cozystack-basics/templates/clusterroles.yaml in cozystack/cozystack:

  • cozy:tenant:base and cozy:tenant:view:base grant nothing on the core API group's secrets resource.
  • cozy:tenant:use:base adds core.cozystack.io/tenantsecrets: get/list/watch — that is TenantSecret, not bare Secret.
  • cozy:tenant:admin:base escalates write verbs only on apps.cozystack.io/* resources; still nothing on secrets.
  • cozy:tenant:super-admin:base is apps.cozystack.io/*: * and kubevirt.io/virtualmachines: * — again, no secrets.

Meanwhile the COSI flow in packages/apps/bucket/templates/bucketclaim.yaml materialises <release>-<user> as a plain v1.Secret via BucketAccess.credentialsSecretName. That Secret carries BucketInfo — and tenants have no read verb on it.

So the tenant guide is broken at step one for every driver except ClickHouse (which routes through chart values, not a hand-rolled Secret). A user following this guide on a real cluster sees 403 on the very first kubectl get secret and never reaches the BackupJob.

This is not a wording fix. The doc as written exposes a missing piece in the implementation: there is no tenant-visible path from a Bucket application to per-driver backup credentials. The Bucket app is itself a tenant-managed resource, so the most natural shape is one where a tenant who has provisioned a Bucket can point a strategy / BackupClass at it (or at a TenantSecret-shaped projection of its credentials) and never has to materialise or even read a bare Secret themselves. The exact mechanism — strategy/controller bridging BucketInfo to the operator-expected key shape, a tenant-visible projection from the Bucket app, or something else entirely — is your call. Whatever shape you land on, the corresponding text in this PR (the `Read the bucket credentials` block and every `kubectl create secret` snippet) needs to be replaced to match.

Verification

End-to-end test should run the tenant guide as a tenant ServiceAccount (one of `cozy:tenant:admin` / `cozy:tenant:use`), impersonated via `--as`, against a real cluster, and complete a Postgres / MariaDB / FoundationDB backup and restore without ever using a cluster-admin kubeconfig. A reproducible CI variant lives in `cozystack/cozystack/examples/backups//run-all.sh`; those scripts currently assume cluster-admin and would need a parallel `run-all-as-tenant.sh` variant under the chosen ServiceAccount.

Secondary issues (correctness)

These are real, but each one sits inside text that will move or vanish when the primary blocker is addressed. Worth folding into the rewrite rather than spot-patching now.

  1. Alert callouts in four autogenerated files will be wiped on next sync. `content/en/docs/next/applications/{postgres,clickhouse,mariadb,foundationdb}.md` each carry the `Autogenerated content. Don't edit this file directly` footer and a matching `_include/.md` stub; bodies are rebuilt from upstream READMEs by `hack/update_apps.sh`. The first `make update-apps` invoked by the upstream `cozystack/cozystack` `tags.yaml` workflow will delete the four warning blocks. The recommended-flow text the callouts add is already present in the upstream READMEs (verified for all four), so the safest move is to drop the in-place edits and rely on the link from the new tenant guide.
  2. FoundationDB strategy template tells admins to run `backup_agent` as root. `managed-app-backup-configuration.md:264` sets `runAsUser: 0`. The canonical example (`cozystack/cozystack/examples/backups/foundationdb/01-create-strategy.sh`) uses `runAsUser: 4059` to match the FoundationDB process UID; `backup_agent` doesn't need root. Change to `4059` (and add `runAsGroup: 4059`).
  3. Bucket-readiness wait is too narrow. `backup-and-recovery.md:65` waits only on the `HelmRelease`; the upstream example (`examples/backups/clickhouse/03-create-bucket.sh`) additionally waits for the `bucketclaim` (`.status.bucketReady=true`) and the `bucketaccess` (`.status.accessGranted=true`). The third is the one that guarantees credentials are populated. Likely moot in any shape where the tenant doesn't hand-read the Secret, but flagging in case the rewrite still surfaces a wait step.
  4. Fabricated validation error string. `backup-and-recovery.md:139` quotes `accountName is required` as the controller's error text; that exact string does not exist anywhere under `internal/backupcontroller/`. Either quote the real message produced by the strategy controller when `parameters.accountName` is empty, or drop the parenthetical.

Out of scope / fine as-is

  • `velero-backup-configuration.md` and `virtualization/backup-and-recovery.md` cross-link callouts — not autogenerated (no `_include` stub, no autogen footer), safe to edit in place. Link targets resolve.
  • Menu weights (4 in `applications/`, 31 in `operations/services/`) slot in cleanly next to the existing entries.
  • CRD kinds (`CNPG`, `MariaDB`, `Altinity`, `FoundationDB`) and plurals (`cnpgs`, `mariadbs`, `altinities`, `foundationdbs`) match the definitions in `packages/system/backupstrategy-controller/definitions/`.
  • The `backups.cozystack.io/owned-by.BackupJobName` label key in the troubleshooting section matches `OwningJobNameLabel` in `api/backups/v1alpha1/backupjob_types.go` and is applied by every strategy controller.
  • `BackupClass` cluster scope is correct per the CRD.
  • The GitHub link to `examples/backups/clickhouse/01-create-strategy.sh` resolves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants