Parallel processing in AWS

Let’s start by imagining the following context: a component produces some information that have to be consumed by multiple other services. Ideally, this could be done in parallel. Also, in case a consumer is not up, the information should be stored somewhere in a buffer until that consumer it’s on again.

What would be options provided by AWS?

  • SNS it’s a good choice to broadcast a message, but it cannot persist the message
  • SQS doesn’t have support for parallel processing
  • Kinesis could be a choice, but it’s too complicated for our scenario.

So, what is the solution in that case?

We can sent the message to a SNS topic and then we subscribe to that topic the SQS queues. Also, adding the SNS topic to broadcast the message brings the flexibility to add in the future other types of consumers like Lambda functions, APIs, etc.

What are the advantages of this solution:

  • Both broadcast and persistence requirements are accomplished
  • You can add a huge number of consumers
  • Consumers don’t have to be of the same type: we can have queues, Lambda functions, email addresses, API endpoints, etc
  • Retry failures are handled by AWS, according to your setup.

Let’s see below what are the possible minuses of this solution:

  • Extra cost added by SNS intermediate layer
  • Setup is not trivial (at least in our opinion), since some permissions have to be set on the queue-level, but this process is documented.

Maybe this is not a perfect solution and in a future post we’ll come with a different approach that could be more scalable, but until then we are waiting for your opinion about this idea and what other topics you want to debate here.