About S3 batch operations

Update: On April 30, AWS announced S3 Batch Operations is generally available. Most of the things we wrote below are still valid.

Last year, during re:Invent, it was announced a very interesting feature: S3 Batch operation. It was one of the most interesting and promising announcements heard and we were very impatient to test it.

What is not S3 batch operations?

Well, we have to say that our first thought was that this feature is something that allows you to put and get multiple objects in a single call in order to reduce your bill. You know…S3 pricing  model has that part where you pay for number of requests done. For example, SQS pricing model is fully based on that, but you have a mechanism to send and receive up to 10 messages in a single call. Yeah, right, it’s your responsibility to retry failed messages, but in the end you have a mechanism to reduce your costs. If you’re interested, we have a post for this subject.

What is S3 batch?

Comparing with what we described above, S3 batch is the opposite: a way to run an operation for a list objects stored in S3. That operation can be something predefined (Copy from one bucket to another) or a Lambda function that gives you a great flexibility.

How to access it

As of today, this feature is not officially launched. If you want to use it, you have to request access for using it in a preview mode. We were lucky and after several days, we received an email confirming this access.

What we liked

  • in the AWS console this feature was integrated in a very nice way. For example, before launching a job, you are advised to check what costs could generate that job. You probably know that this is an aspect we complained about: if you set a lifecycle rule, you can have big surprises with costs.
  • as always, integration with other services (like IAM) eases your work.

What can be improved

  • Artifacts are difficult to use: mail that confirmed access to this feature included a link from where to download a jar to integrate new APIs. Pretty rudimentary: no Maven, no nothing. For sure this is going to be fixed when this feature will be GA.
  • Documentation exists, but without concrete examples. We are still trying to create a job using Java API. exceptions are not very informative, usually empty messages being returned.

Overall, this feature looks interesting, even it’s not what we dreamed about. For sure there are cases where this it’s useful.

You know that your thoughts are welcome here!

Happy cloud computing!