Responses API `previous_response_id` returns not_found_error — storage appears unimplemented on inference.do-ai.run

## Summary

The serverless inference endpoint at `https://inference.do-ai.run/v1/responses` accepts the OpenAI Responses API request/response *shape* (including the `store`, `background`, and `previous_response_id` fields), but the underlying response **storage appears to be unimplemented**:

- `store: true` is silently accepted (no error, HTTP 200, valid `resp_…` id returned)
- The returned id cannot be retrieved via `GET /v1/responses/{id}` (endpoint is not routed — returns a DigitalOcean "Maintenance" HTML page)
- Passing that id as `previous_response_id` on a subsequent `POST /v1/responses` returns `not_found_error`

The [SDK docs page for `responses.create`](https://gradientai-sdk.digitalocean.com/api/python/resources/responses/methods/create) describes `previous_response_id` as *"Previous response ID (for multi-turn conversations)"* — so the documentation contract suggests this should work end-to-end.

Tested 2026-05-20 against `model: kimi-k2.6`.

## Reproduction

```bash
# 1. Create a response with store:true
RESP1=$(curl -sS -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $DO_MODEL_ACCESS_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"model":"kimi-k2.6","input":"Remember: secret word is purple-marmot-42. Reply OK.","store":true}')

RESP1_ID=$(echo "$RESP1" | jq -r '.id')
echo "Got id: $RESP1_ID"
# → e.g. resp_a08a095f73d4b840  (HTTP 200, full payload returned)

# 2. Try to retrieve it
curl -sS -w '\nHTTP_CODE=%{http_code}\n' \
  "https://inference.do-ai.run/v1/responses/$RESP1_ID" \
  -H "Authorization: Bearer $DO_MODEL_ACCESS_KEY" | head -5
# → <!DOCTYPE html>
#   <title>DigitalOcean - Maintenance</title>
#   ...

# 3. Try to chain it
curl -sS -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $DO_MODEL_ACCESS_KEY" \
  -H 'Content-Type: application/json' \
  -d "{\"model\":\"kimi-k2.6\",\"input\":\"What was the secret word I gave you?\",\"previous_response_id\":\"$RESP1_ID\"}"
# → {"error":{"code":null,"message":"Response with id 'resp_a08a095f73d4b840' not found.",
#    "param":null,"type":"not_found_error"}, ...}
```

## What the create-response payload returns

The POST response includes all the OpenAI Responses API fields, suggesting the request schema is generated from the OpenAI spec but the actual storage path isn't wired:

```
keys = [background, completed_at, created_at, error, frequency_penalty, id,
        incomplete_details, instructions, max_output_tokens, max_tool_calls,
        metadata, model, object, output, parallel_tool_calls, presence_penalty,
        previous_response_id, prompt_cache_key, reasoning, safety_identifier,
        service_tier, status, store, temperature, text, tool_choice, tools,
        top_logprobs, top_p, truncation, usage]
```

## What I'd expect

One of:

1. **Implement the storage** — `store: true` actually persists the response so it can be retrieved via `GET /v1/responses/{id}` and referenced via `previous_response_id`. This is the contract the SDK docs imply.
2. **Document the gap clearly** — call out in the docs that `previous_response_id` / `store` / `GET /v1/responses/{id}` are not yet supported on serverless inference, and that callers should manage conversation state client-side by including the full message history each turn. Ideally `store: true` should return an error or warning rather than silently accepting it.

## Impact

I was planning to chain multi-pass agentic verification via `previous_response_id` based on the SDK docs. Because storage doesn't actually work, the only option is to keep conversation state client-side — would have been useful to know upfront.

## Environment

- Endpoint: `https://inference.do-ai.run/v1/responses`
- Model: `kimi-k2.6` (also reproduces on `kimi-k2.5` — same response shape)
- Auth: `DO_MODEL_ACCESS_KEY` (model-access key from Gradient → Serverless Inference)
- Date observed: 2026-05-20

Happy to provide a saved trace or run additional probes if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Responses API `previous_response_id` returns not_found_error — storage appears unimplemented on inference.do-ai.run #96

Summary

Reproduction

What the create-response payload returns

What I'd expect

Impact

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Responses API previous_response_id returns not_found_error — storage appears unimplemented on inference.do-ai.run #96

Description

Summary

Reproduction

What the create-response payload returns

What I'd expect

Impact

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Responses API `previous_response_id` returns not_found_error — storage appears unimplemented on inference.do-ai.run #96