In AWS DynamoDB, data are organized into tables. Even DynamoDB is schemaless, all entries that are inserted in one table must have the same primary key as that one defined for the table. But why is that one so important? Because it is used afterward to distribute entries. According to the official documentation, a table is split into multiple partitions, depending on 2 criteria:
• Partition size – one partition cannot handle more than 10 Gb of data. Number of partitions = round ( Table size (Gb) / 10 )
• Provisioned capacity Number of partitions = round( read capacity units/3000) + (write capacity units/3000))
So, the number of partition is the maximum between the 2 values computed above.
Very important aspect to mention here: once a table is splitted, there is no mechanism to merge again the partitions.
Most probably you’ll wonder why are these internal aspects so critical? Simple: because your application performance is affected. Concrete: DynamoDB splits provisioned capacity to the total number of partitions. For example, if your table was provisioned to support 100 writes per second, but your table has 12 Gb, then each partition can handle up to 50 writes per second. This is why sometimes you can see throttled requests, even the number of operations is under the provisioned capacity. And in this specific case, increasing the provisioned capacity can fix the issue, but in the end you’ll use only a half of the provisioned capacity. But it can be even worse: if your table has 94Gb, you’ll end up using one tenth of the capacity.
The official DynamoDB documentation has an entire section with best practices and almost all of them are around one single idea: how to choose an efficient partition key, even we speak about a simple one or one that contains sort key. And the solution is different, depending on the problem you are trying to solve: time-series data, types of relations, etc. According to the documentation, the ideal partition key is one that has unique value for each table entry. In most of the cases this is not possible, but at least we must make a conscious decision and understand what the possible problems are.
We’ll wrap up this post saying that the best practice section from the DynamoDB is a must read before using this technology into production. As always, we kindly invite you to share ideas or questions in a comment posted below!