Monitor records import into Redshift

Let’s imagine that we have an application where new entries are constantly loaded into Redshift. Recently we published a blog post in which some best practices about data imports were presented and another one where was described a way to organize tables if your use case implies frequent data imports. But enough with the bounce, […]

How to handle Redshift non-idempotency problem on data loading

As you probably know, in Redshift constraints (uniqueness, primary key, foreign key, not null) are informational only. This means that if you insert 2 times an entry in a table that has defined a primary key, that table will contain that entry 2 times. Now, let’s imagine the following scenario: your application follows the recommendations […]

An efficient approach of organizing tables into Redshift

In a previous post, we presented several ideas to improve data loading into Redshift. Today we are going to discuss an approach for organizing data into Redshift that scales to billions of entries without affecting reading performance. Resuming in few words the entire flow and requirements: Entries are copied from S3, several times per hour […]

Tips about loading data into Redshift

The official description of AWS Redshift starts with: “a fast, fully managed, petabyte-scale data warehouse”. Our experiences with Redshift confirm these specifications, with a single but very important mention: in order to really see the advantages and the incredible power of this service,we had to put into action some solutions that at first seemed unimportant. […]