Skip to content

Proposal: Enable retries for non-idempotent operations using client provided "logical request" tokens. #209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
johanste opened this issue Jul 9, 2020 · 8 comments

Comments

@johanste
Copy link
Contributor

johanste commented Jul 9, 2020

Enable retries for non-idempotent operations using client provided "logical request" tokens.

Not all HTTP methods are guaranteed to be idempotent. This presents challenges when requests fail/no response is received by a client - is the request safe to retry or not?

A client-provided idempotency token/key/value can be used by a service to detect duplicated messages. A client would provide the same value for the idempotency token for each retried request.

Across Microsoft, some services, x-ms-client-request-id 1 has been used for this purpose.

Unfortunately, many Azure services have been using the same header for a different purpose (arbitrary correlation between client and server side events/telemetry).

Note as changing the meaning of the existing header would be a massively breaking change for clients that assumed that the header had no semantic value for the service.

Other services have similar constructs to facility safe retries of non-idempotent requests: AWS 2, Stripe 3

Header name

A new optional Idempotency-Token header is introduced. It's value MUST be a GUID (using the canonical text representation - all lower case without curly braces):

Example:

Idempotency-Token: 475a5eef-de54-4bd1-97a1-f28d0f0146e0

Service guidance

  • Services SHOULD support the Idempotency-Token header for non-idempotent requests.

  • Services that know do not understand or support the Idempotency-Token header MUST ignore the header and process the request as normal. This is consistent with the general "ignore headers you don't understand" guidance for services.

  • Services MUST be able to detect duplicate requests made within 24 hours of each other. Services MAY prune accepted Idempotency-Token header values after 24 hours.

  • Services SHOULD repeat the original response when a duplicated request is detected.

  • The service should check the Idempotency-Token for duplicates before it performs conditional request checks.

This applies to what is normally idempotent operations as well - for example, a retried DELETE request without an Idempotency-Token may succeed the first time and return a 404 on subsequent retries, whereas it would have continued to return success (200/204) with an Idempotency-Token.

Client guidance

  • A client MUST NOT rely on the service supporting the Idempotency-Token header when retrying requests unless the service is explicitly documented as supporting it.

  • A client MUST NOT change request (query, body or header) parameters between requests using the same Idempotency-Token value.

  • A client MUST NOT reuse the same idempotency token value within 24 hours.

References

Footnotes

  1. https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-ncnbi/817da997-30d2-4cd3-972f-a0073e4e98f7

  2. https://aws.amazon.com/blogs/aws/new-amazon-ec2-feature-idempotent-instance-creation/

  3. https://stripe.com/docs/api/idempotent_requests

@JeffreyRichter
Copy link
Contributor

I think you should limit this to POST operations only as PUT, DELETE, GET, HEAD, OPTIONS, & TRACE are supposed to be idempotent already as per the HTTP specification. If you want to be able to send this header for these other methods, then the service should just ignore it. But, I think it better to just limit this to POST.

To implement this robustly, the service needs to update it table of GUIDs AND perform the POST operation as an atomic/transacted unit. In other words, the service is not fault tolerant if it adds the guid to its table and then crashes before performing the POST operation. If this happens, the client retries, the service thinks the operation is already done, and never actually creates anything. The reverse is also problematic (but perhaps less so): the service performs the POST operation and then crashes before it adds the guid to its table. Now, the service doesn't know the operation was done and if the client retries, the operation is performed twice - this is similar to 2 PUTs in a row.

Having a time limit is great (like 24 hours) because the guid table shouldn't grow without bounds. Of course, this will increase latency and for some services, this may cause the table to become quite big. We need something here; maybe we can reduce the 24hr limit to 12hr or something - the exact time limit can be debated later. I think it best if each service can't decide its own time limit.

@johanste
Copy link
Contributor Author

johanste commented Jul 9, 2020

@JeffreyRichter, PATCH is missing from your list of idempotent methods above - I assume that this was an oversight? I'd be fine with limiting the usage to methods that are defined as not being guaranteed idempotent in the HTTP spec. The client-visible delta between honoring it and ignoring it from an idempotent operation is (mostly) negligible.

And, yes, "cleaning up torn operations" (the service had time to register that the operation started, but somehow it never marked it as completed) needs some design work. This is also true if a client makes two (almost) concurrent identical requests with the same idempotency token value where the service has not completed the first request.

@johanste
Copy link
Contributor Author

johanste commented Jul 9, 2020

Regarding the time limit, I'd like to specify a minimum value at least across Azure. And, yes, I picked 24 hours somewhat on random - it's long enough that the vast majority of "normal"/intended usage is covered. But that would likely be true for 12hrs as well...

@tg-msft
Copy link
Member

tg-msft commented Jul 9, 2020

A client MUST NOT change request (query, body or header) parameters between requests using the same Idempotency-Token value.

If we're using Shared Key for Storage, we'll sign for the Authorization header on each retry in case pathological settings push a request beyond the 15 minute window.

@johanste
Copy link
Contributor Author

johanste commented Jul 9, 2020

A client MUST NOT change request (query, body or header) parameters between requests using the same Idempotency-Token value.

If we're using Shared Key for Storage, we'll sign for the Authorization header on each retry in case pathological settings push a request beyond the 15 minute window.

Yes. I was considering excluding the Authorization header from this requirement. This may be the right thing to do. But interestingly enough, the OASIS Repeatable Requests spec (that I was just made aware of) seem to have the same requirement of headers not changing (and thus the same limitation). I'll follow up with the authors there to see if they have any thoughts.

@garethj-msft
Copy link
Member

Seems like if we're putting a requirement on the server to store significant data for 12/24 hours (returning the response for a repeat) then server should be expected to throw a 4xx if the input parameters have changed, cos adding a hash of the input is minimal extra burden. Agree that auth header needs to be exempted. Seems like we should say something about the server exposing that it supports idempotency in a standard way tokens in its OPTIONS to make it not a matter of documentation but of machine-readable record.

@johanste
Copy link
Contributor Author

@garethj-msft, agreed - if it is feasible for the service to determine changes by the client, returning a 400 response if the client changed values would indeed be preferable.

Also agreed on OPTIONS including a signal on the service having/not having the capability.

Are there any suggestions on if services should be rejecting requests with a retry request id header (loosely using the OASIS terminology) if it doesn't support the header?

@garethj-msft
Copy link
Member

I think if it supports the facility for some resources but not others it should 4xx, But given most servers won't understand the header, they will just ignore it. SO this is really a behavior targetted to removing surprise for users who have had success with soem resources and expect that the server will behave uniformly. When it doesn't behave uniformly, that is pretty surprising and thus worthy of an error IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants