Summary
The serverless inference endpoint at https://inference.do-ai.run/v1/responses accepts the OpenAI Responses API request/response shape (including the store, background, and previous_response_id fields), but the underlying response storage appears to be unimplemented:
store: true is silently accepted (no error, HTTP 200, valid resp_… id returned)
- The returned id cannot be retrieved via
GET /v1/responses/{id} (endpoint is not routed — returns a DigitalOcean "Maintenance" HTML page)
- Passing that id as
previous_response_id on a subsequent POST /v1/responses returns not_found_error
The SDK docs page for responses.create describes previous_response_id as "Previous response ID (for multi-turn conversations)" — so the documentation contract suggests this should work end-to-end.
Tested 2026-05-20 against model: kimi-k2.6.
Reproduction
# 1. Create a response with store:true
RESP1=$(curl -sS -X POST https://inference.do-ai.run/v1/responses \
-H "Authorization: Bearer $DO_MODEL_ACCESS_KEY" \
-H 'Content-Type: application/json' \
-d '{"model":"kimi-k2.6","input":"Remember: secret word is purple-marmot-42. Reply OK.","store":true}')
RESP1_ID=$(echo "$RESP1" | jq -r '.id')
echo "Got id: $RESP1_ID"
# → e.g. resp_a08a095f73d4b840 (HTTP 200, full payload returned)
# 2. Try to retrieve it
curl -sS -w '\nHTTP_CODE=%{http_code}\n' \
"https://inference.do-ai.run/v1/responses/$RESP1_ID" \
-H "Authorization: Bearer $DO_MODEL_ACCESS_KEY" | head -5
# → <!DOCTYPE html>
# <title>DigitalOcean - Maintenance</title>
# ...
# 3. Try to chain it
curl -sS -X POST https://inference.do-ai.run/v1/responses \
-H "Authorization: Bearer $DO_MODEL_ACCESS_KEY" \
-H 'Content-Type: application/json' \
-d "{\"model\":\"kimi-k2.6\",\"input\":\"What was the secret word I gave you?\",\"previous_response_id\":\"$RESP1_ID\"}"
# → {"error":{"code":null,"message":"Response with id 'resp_a08a095f73d4b840' not found.",
# "param":null,"type":"not_found_error"}, ...}
What the create-response payload returns
The POST response includes all the OpenAI Responses API fields, suggesting the request schema is generated from the OpenAI spec but the actual storage path isn't wired:
keys = [background, completed_at, created_at, error, frequency_penalty, id,
incomplete_details, instructions, max_output_tokens, max_tool_calls,
metadata, model, object, output, parallel_tool_calls, presence_penalty,
previous_response_id, prompt_cache_key, reasoning, safety_identifier,
service_tier, status, store, temperature, text, tool_choice, tools,
top_logprobs, top_p, truncation, usage]
What I'd expect
One of:
- Implement the storage —
store: true actually persists the response so it can be retrieved via GET /v1/responses/{id} and referenced via previous_response_id. This is the contract the SDK docs imply.
- Document the gap clearly — call out in the docs that
previous_response_id / store / GET /v1/responses/{id} are not yet supported on serverless inference, and that callers should manage conversation state client-side by including the full message history each turn. Ideally store: true should return an error or warning rather than silently accepting it.
Impact
I was planning to chain multi-pass agentic verification via previous_response_id based on the SDK docs. Because storage doesn't actually work, the only option is to keep conversation state client-side — would have been useful to know upfront.
Environment
- Endpoint:
https://inference.do-ai.run/v1/responses
- Model:
kimi-k2.6 (also reproduces on kimi-k2.5 — same response shape)
- Auth:
DO_MODEL_ACCESS_KEY (model-access key from Gradient → Serverless Inference)
- Date observed: 2026-05-20
Happy to provide a saved trace or run additional probes if useful.
Summary
The serverless inference endpoint at
https://inference.do-ai.run/v1/responsesaccepts the OpenAI Responses API request/response shape (including thestore,background, andprevious_response_idfields), but the underlying response storage appears to be unimplemented:store: trueis silently accepted (no error, HTTP 200, validresp_…id returned)GET /v1/responses/{id}(endpoint is not routed — returns a DigitalOcean "Maintenance" HTML page)previous_response_idon a subsequentPOST /v1/responsesreturnsnot_found_errorThe SDK docs page for
responses.createdescribesprevious_response_idas "Previous response ID (for multi-turn conversations)" — so the documentation contract suggests this should work end-to-end.Tested 2026-05-20 against
model: kimi-k2.6.Reproduction
What the create-response payload returns
The POST response includes all the OpenAI Responses API fields, suggesting the request schema is generated from the OpenAI spec but the actual storage path isn't wired:
What I'd expect
One of:
store: trueactually persists the response so it can be retrieved viaGET /v1/responses/{id}and referenced viaprevious_response_id. This is the contract the SDK docs imply.previous_response_id/store/GET /v1/responses/{id}are not yet supported on serverless inference, and that callers should manage conversation state client-side by including the full message history each turn. Ideallystore: trueshould return an error or warning rather than silently accepting it.Impact
I was planning to chain multi-pass agentic verification via
previous_response_idbased on the SDK docs. Because storage doesn't actually work, the only option is to keep conversation state client-side — would have been useful to know upfront.Environment
https://inference.do-ai.run/v1/responseskimi-k2.6(also reproduces onkimi-k2.5— same response shape)DO_MODEL_ACCESS_KEY(model-access key from Gradient → Serverless Inference)Happy to provide a saved trace or run additional probes if useful.