Skip to content

DAOS-18613 container: Reduce false -DER_NO_HDLs#18082

Open
liw wants to merge 1 commit intomasterfrom
liw/pool-hdl-lookup
Open

DAOS-18613 container: Reduce false -DER_NO_HDLs#18082
liw wants to merge 1 commit intomasterfrom
liw/pool-hdl-lookup

Conversation

@liw
Copy link
Copy Markdown
Contributor

@liw liw commented Apr 22, 2026

If a request (e.g., an CONT_OID_ALLOC) arrives when the local pool is still recovering pool handles, the request's ds_pool_hdl_lookup call may return a false -DER_NO_HDL. This patch lets the client retry the operation in this case, by looking up the ds_pool and checking if the handle recovery is done.

The cont_oid_alloc_complete function only retries upon certain crt errors. This patch has to refactor the function a bit to also retry upon certain daos errors.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Ticket title is 'daos_test/suite.py:DaosCoreTest.test_daos_drain_simple - DRAIN16: 0x16 != 0 daos_drain_simple.c:890: error: Failure!'
Status is 'In Progress'
Labels: '2.6.4-aurora.p5,ci_2.6_daily,pr_test,scrubbed_2.8'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-18613

@github-actions github-actions Bot added the priority Ticket has high priority (automatically managed) label Apr 22, 2026
@liw liw force-pushed the liw/pool-hdl-lookup branch from 68aee8a to 905a570 Compare April 23, 2026 00:20
@daosbuild3
Copy link
Copy Markdown
Collaborator

Base automatically changed from liw/cont_op_in to master April 23, 2026 23:56
@liw liw changed the title DAOS-18613 pool: Reduce false -DER_NO_HDLs DAOS-18613 container: Reduce false -DER_NO_HDLs Apr 24, 2026
@liw liw force-pushed the liw/pool-hdl-lookup branch from 905a570 to 05a3787 Compare April 24, 2026 00:05
@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18082/5/execution/node/1184/log

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18082/5/testReport/

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18082/6/execution/node/1374/log

If a request (e.g., an CONT_OID_ALLOC) arrives when the local pool is
still recovering pool handles, the request's ds_pool_hdl_lookup call may
return a false -DER_NO_HDL. This patch lets the client retry the
operation in this case, by looking up the ds_pool and checking if the
handle recovery is done.

The cont_oid_alloc_complete function only retries upon certain crt
errors. This patch has to refactor the function a bit to also retry upon
certain daos errors.

Features: container
Signed-off-by: Li Wei <liwei@hpe.com>
@liw liw force-pushed the liw/pool-hdl-lookup branch from d75c4b7 to 979e957 Compare May 11, 2026 00:28
@liw liw marked this pull request as ready for review May 11, 2026 04:57
@liw liw requested review from a team as code owners May 11, 2026 04:57
@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18082/7/execution/node/1384/log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority Ticket has high priority (automatically managed)

Development

Successfully merging this pull request may close these issues.

2 participants