After we went through the most important moments through the history of AWS, we presented several methods about how to adjust configuration settings for an AWS client. In this post, we’ll continue on the same line and we’ll discuss about how to handle failed requests.
We know it sounds like a boring topic, we know AWS has unimaginable other subjects much more engaging and helpful in the context of your application, but we remarked these low level details are not sufficiently publicized, many AWS users being focused only to integrate a web service in their application to resolve demand they have. And users that are now familiar with these concepts, learned these things only after they were burnt by default configuration values.
When you use web services, you interact with a lot of components and standards, from network cables and switchers to DNS servers, from routing algorithms to different encoding and serializations models, enriching in that way the possibility to occur an issue during the lifetime of a request. Almost always the best approach to handle failed calls is to retry them. Of course, there are many reasons a call may fail and according to the root cause, an action should be taken.
For example, many AWS services impose hard limits for accessing specific operations and if this rate is exceeded, a throttling exception is thrown. In this case, retrying the call is one short-term solution that could solve the problem, but out of that you should take action to avoid other situations like this in the future. Let’s suppose you are using AWS Simple Email Service and you are throttled when you call the verifyEmailAddress/verifyDomainIdentity operation. In this case, we advise you to revise your application logic since it’s not normal to call these operations so frequently and even these will succeed, you will reach the maximum limit of verified identities in the near future. In the same scenario, if your sendEmail calls are throttled, you should ask to increase your sending limits.
The AWS client can be configured to retry the throttled operations in 2 different ways:
- Via the already known ClientConfiguration object(by default, this flag is set to false)
ClientConfiguration clientConfig = new ClientConfiguration(); clientConfig.setUseThrottleRetries(true);
- Via a system property when included when you start the JVM
Each AWS service describes for an operation a list of errors and their associated HTTP status codes. These error codes (and the associated exceptions in the SDKs) help users to identify if the abnormal situation was caused by the client or the service doesn’t operate normally. It is very important to clarify this aspect because it determines the next actions that eventually should be taken:
- For example, if the error cause (usually error code is 4XX) is the the user input (missing or duplicate fields, incorrect data format) , the actual setup (non-existent resources, wrong credentials) or invalid privileges, then the result will be the same every time you’ll repeat the call. In this case, you have to correct your request before retrying it.
- On the other side, if the error reports that service is unavailable (usually error code is 5XX) or other unwished situation occurred ( timeout, connection broken), it is a good idea to retry your call.
Each AWS SDK comes with support for retrying failed operations. These mechanisms could be set via ClientConfiguration object by specifying the maximum number of retries and the retry policy. Besides the normal retry, you can choose to make subsequent calls with some delay (exponential backoff) to have bit by bit longer waits between successive failed calls.
If you don’t want to use the SDK retry mechanism, you are free to build your own wrapper to handle this kind of operation. This could be useful if you want to have enhanced logging and monitoring that are not provided by SDK. For example, if you use Java programming language, you should consider to use a proxy mechanism to handle these operations. Also, the aspect oriented programming paradigm solves the problem in a very elegant way.
Even current post didn’t reach to cover all details, we are sure we clarified that failures are unavoidable in a distributed environment and in some situations retrying them is a good solution.
Please don’t hesitate to send us your thoughts and to share ideas with anyone that is interested in this topic!