🐛(backend) manage race condition between GET and PATCH content#2271
🐛(backend) manage race condition between GET and PATCH content#2271
Conversation
When a PATCH and a GET on the content endpoint are made at the same time for different users a race condition can happen and the metadata returned by the S3 head_object can be outdated when the object is fetched leading to an error raised because the Content-Length header does not match the size of the response body. To avoid this, we no longer used head_object followed bu get_object, we have to manage everything in one call with the get_object. The get_object also accepts as parameters an etag or last-modified header and will return a 304 if the content has not changed, so we can use this to not return the entire body if this one has not changed.
7bde43e to
bae3015
Compare
jmaupetit
left a comment
There was a problem hiding this comment.
I only have minor comments, else 👏
| if code in ("304", "PreconditionFailed", "NotModified"): | ||
| return drf_response.Response(status=status.HTTP_304_NOT_MODIFIED) | ||
| if code in ("NoSuchKey", "404"): | ||
| return StreamingHttpResponse(b"", content_type="text/plain", status=200) |
There was a problem hiding this comment.
What about using the amazing match/case syntax? 😍
| if code in ("304", "PreconditionFailed", "NotModified"): | |
| return drf_response.Response(status=status.HTTP_304_NOT_MODIFIED) | |
| if code in ("NoSuchKey", "404"): | |
| return StreamingHttpResponse(b"", content_type="text/plain", status=200) | |
| match code: | |
| case "304" | "PreconditionFailed" | "NotModified": | |
| return drf_response.Response(status=status.HTTP_304_NOT_MODIFIED) | |
| case "NoSuchKey" | "404": | |
| return StreamingHttpResponse(b"", content_type="text/plain", status=200) | |
| case _: | |
| raise |
There was a problem hiding this comment.
I thought it was not allowed by the Geneva Convention to use a switch in Python.
| # Refresh the metadata cache so future conditional requests can | ||
| # check them earlier |
There was a problem hiding this comment.
Maybe this would be good enough?
| # Refresh the metadata cache so future conditional requests can | |
| # check them earlier | |
| # Refresh the metadata cache |
| def _stream(file_key): | ||
| with default_storage.open(file_key, "rb") as f: | ||
| while chunk := f.read(8192): | ||
| while chunk := body.read(8192): |
There was a problem hiding this comment.
Do you think the chunk size should be configurable?
There was a problem hiding this comment.
Also to avoid using a while loop, you can use iter_chunks:
| while chunk := body.read(8192): | |
| for chunk in body.iter_chunks(chunk_size): |
There was a problem hiding this comment.
the iter_chunks is doing the same thing. I have no strong opinion. I would prefer to have a solution to have an async generator.
| while chunk := body.read(8192): | ||
| yield chunk | ||
| finally: | ||
| body.close() |
There was a problem hiding this comment.
Are you sure the try/finally pattern is required here to close the http response stream (as you return a generator)?
There was a problem hiding this comment.
Nop I don't really know. I keep it from the previous implementation.
|
|
||
| def _request(**headers): | ||
| """Build a request with the given HTTP headers.""" | ||
| return APIRequestFactory().get("/", headers=headers) |
There was a problem hiding this comment.
Why not declaring a pytest fixture instead?
| assert if_none_match == '"deadbeef"' | ||
| assert if_modified_since_dt == dt.datetime( | ||
| 2015, 10, 21, 7, 28, 0, tzinfo=dt.timezone.utc | ||
| ) |
There was a problem hiding this comment.
All tests here could be configured as one single test using indirect parametrization of the proposed fixture.
Purpose
When a PATCH and a GET on the content endpoint are made at the same time
for different users a race condition can happen and the metadata returned
by the S3 head_object can be outdated when the object is fetched leading
to an error raised because the Content-Length header does not match the
size of the response body. To avoid this, we no longer used head_object
followed bu get_object, we have to manage
everything in one call with the get_object. The get_object also accepts
as parameters an etag or last-modified header and will return a 304 if
the content has not changed, so we can use this to not return the entire
body if this one has not changed.
Proposal