Connecting Redshift to Churney

Suela Isaj, Jonas Dahlbæk | September 18, 2023 · 11 min

Here you can read our guide for connecting your Redshift data warehouse with Churney.

In this guide, we will go over the steps for setting up permissions for Churney to access your Redshift cluster, using hashed views and minimal permissions. On a high level, Churney will need:

A Churney user for the Redshift cluster with read permissions only on the secure views

An s3 bucket for Churney to unload the data

Access permissions to connect to your Redshift cluster through a static IP

What kind of data is required?

The short answer is as much as possible. The long answer is that Churney requires data about:

  • Payments

  • Trials (if applicable)

  • User demographic (if available)

  • User activity

Additionally, we need to know the location (region) of your data warehouse.

Create hashed views

For context, Facebook has a guide on how to hash contact information for their conversion api: https://developers.facebook.com/docs/marketing-api/conversions-api/parameters/customer-information-parameters. Google has a similar guide for enhanced conversions using their ads api https://developers.google.com/google-ads/api/docs/conversions/enhance-conversions. Basically, we want to hash columns that contain data which would allow one to determine the identity of a user: First name, last name, birth day, street address, phone number, email address, ip address.

For this task, you can either ask Churney to help you create the hashed views, or decide to create them yourself. If you would like Churney’s help, follow up the procedure of creating secure views through a script, otherwise, jump into an example of creating secure views

Create secure views through a script

Open the query editor and set up the working environment, choose the correct cluster and database above:

Run the following query where <your_schema_name_here> belongs to the schema with the tables you would like to share with Churney.

SELECT 
   t.table_catalog, t.table_schema, t.table_name, c.column_name, c.data_type
FROM 
   information_schema.tables t inner join information_schema.columns c
ON t.table_catalog = c.table_catalog
   AND t.table_schema = c.table_schema
   AND t.table_name = c.table_name
WHERE t.table_schema = '<your_schema_name_here>'  
  ORDER BY t.table_name;

Export the result, either as json or csv and share it with Churney

We will come back to you with some create view scripts to automatically create the views and hash the data.

In the case of json fields, if there is private information in those fields like email, phone, last name, first name, etc, then we should skip the column from the view, and unpack the elements we need. The private information will also be included in the view for us, but we can modify the create view statement to unpack and hash the private field. In these cases, what we need from you is the name of the json column that has private information, and then the key that we should search for to unpack and hash. For the rest of the json fields that do not have any private user identifier, we can read them as they are.

Example of creating secure views 

Consider this setting, under database dev there is a schema named private with your raw data, containing sensitive data. We will need to create a new schema for Churney with secure views that read from private

  1. Create a new schema for Churney in the context database dev

    CREATE SCHEMA churney;

  2. Create secure views by hashing the sensitive information. For example, if the data contains names, emails, phone numbers, etc, like below in the table sensitive_data

    Then, we will hash these columns in the view:

    CREATE OR REPLACE VIEW AS dev.churney.sensitive_data as
    
     SELECT
         sha2(lower(trim(name)), 256) name,
         sha2(lower(trim(email)), 256) email,
         sha2(ltrim(trim(regexp_replace(phone_number, '[ +()-]', '')), '0'), 256) phone_number,
         sha2('+' || ltrim(trim(regexp_replace(phone_number, '[ +()-]', '')), '0'), 256) international_phone_number,
         sha2(to_char(birthday, 'YYYYMMDD'), 256) as birthday,
         date,
         cast("time" as text),
         bid_price,
         ask_price,
         bid_size,
         ask_size
    
    FROM dev.private.sensitive_data;

And the view should look like this:

Make sure the sensitive data is hashed correctly and the non-sensitive data is on plain text. Note that we cast “time” to text (in bold above), because Redshift does not support unloading time data in the parquet format

If the table doesn’t contain any private data, then the view can list the plan column names.

When you are done with creating all the views, it is time to create the Churney user

Create Churney user

3. Create a Churney user and grant access to the data. In the running example, you would replace <your-schema> with private and <churney-schema> with Churney

The below permissions mean that we are giving the user Churney access to select data from the view, but since the view refers to the raw data in private, we need to grant usage for private as well. Note that Churney can ONLY select from the view, and not from the raw data. The below permissions are detailed in the AWS documentation https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT.html

CREATE USER churney password disable; 

GRANT USAGE ON SCHEMA <your-schema> TO churney; 

GRANT USAGE ON SCHEMA <churney-schema> TO churney; 

GRANT SELECT ON ALL TABLES IN SCHEMA <churney-schema> TO churney; 

Allow access for the Churney IP to your redshift cluster 

Churney will give you a static ip, that you will use in the below step. For the purpose of this example, we will use 12.123.123.123

4. Go to https://<region-name>.console.aws.amazon.com/vpc/home?region=<region-name>#SecurityGroups: where region-name is the region of your cluster to create a VPC security group and click on Create security group 

5. Create a VPC security group in the VPC of your Redshift cluster and add an inbound rule for the the Churney static ip 12.123.123.123

Choose Type = Redshift and put the Churney static ip (in our example 12.123.123.123) in the box where the arrow is below. You will see the ip is added when it is added where the red box is.6. We will add this security group to the Redshift cluster. For that, go to your Redshift cluster and click on PropertiesAnd then go to Network and security settings and click on EditAnd finally add the new security group. Make sure you Turn on Publicly accessible

S3 bucket to unload into 

7. Create an s3 bucket in the region of your Redshift cluster that we can unload data into, so go to https://s3.console.aws.amazon.com/s3/buckets?region=<region-name>  where <region-name> is the same as your Redshift cluster region. Create a bucket with a self-explanatory name, like churney-redshift-unload

 

8. Add a lifecycle rule to the bucket, to delete the content after 7 days. Go go to the bucket you created above and click on ManagementMake sure you have the below selections: 

IAM Policies to allow for Churney to extract data from Redshift 

9. Create IAM policy ChurneyRedshiftS3 to allow Churney to access the S3 bucket you created in Step 7. Churney should have access to unload the data in this bucket. Replace bucket-name with the name of the bucket you chose in Step 7, in our example, that is churney-redshift-unload

To find where you create policies, go to https://{region-name}.console.aws.amazon.com/iamv2/home#/policies where region-name is the region of your Redshift cluster and click on Create Policy

{ 
  "Version": "2012-10-17", 
  "Statement": [ 
  { 
    "Effect": "Allow", 
    "Action": [ 
    "s3:ListBucket", 
    "s3:ListBucketMultipartUploads", 
    "s3:ListMultipartUploadParts", 
    "s3:PutObject", 
    "s3:GetObject", 
    "s3:GetBucketLocation" 
   ], 
  "Resource": [ 
    "arn:aws:s3:::<bucket-name>", 
    "arn:aws:s3:::<bucket-name>/*" 
   ] 
  } 
 ] 
} 

10. Create an IAM policy to allow Churney to unload data from Redshift. We will limit these permissions to the user you created before and to the database with the hashed views that you created for Churney.

To find where you create policies, go to https://<region-name>.console.aws.amazon.com/iamv2/home#/policies where region-name is the region of your Redshift cluster and click on Create Policy

{ 
   "Version": "2012-10-17", 
   "Statement": [
     { 
       "Effect": "Allow", 
       "Action": "redshift:DescribeClusters", 
       "Resource": "*" 
     }, 
     { 
       "Effect": "Allow", 
       "Action": "redshift:GetClusterCredentials", 
       "Resource": [ 
    "arn:aws:redshift:<region>:<account-id>:dbuser:<redshift-cluster>/churney",               "arn:aws:redshift:<region>:<account-id>:dbname:<redshift-cluster>/<database-name>" 
       ] 
     } 
  ] 
}

11. Create a role for Churney that will have the two policies created above and a trust relationship. Go to https://<region-name>.console.aws.amazon.com/iamv2/home#/roles where region-name is the region where Redshift is and click on Create role. For the purpose of the example, the name of the role is churney-redshift-role

Choose Custom Trust Policy and put the trust policy as in the code snippet below. The way we will use this trust policy is all described in this link and follows the best practices https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html. In id_1 and id_2 you will put two ids that you will get from Churney. These are the service accounts used from us to transfer data to our google cloud.

Attach the policies you created in Step 9 and Step 10

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "accounts.google.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "ForAnyValue:StringEquals": {
                    "accounts.google.com:sub": [
                        "<id_1>",
                        "<id_2>"
                    ]
                }
            }
        }
    ]
}

12. What to share with Churney?

  • The endpoint that can be found when you go to the details of the cluster, as in the picture below. The endpoint is inside the red circle

  • The user that you created in Step 3

  • The port number (can be found at the endpoint, usually 5439)

  • The arn role of Step 11, you can find it again if you go to IAM, and then Roles, and click on the role name