Connecting Redshift to Churney
Here you can read our guide for connecting your Redshift data warehouse with Churney.
In this guide, we will go over the steps for setting up permissions for Churney to access your Redshift cluster, using hashed views and minimal permissions. On a high level, Churney will need:
• A Churney user for the Redshift cluster with read permissions only on the secure views
• An s3 bucket for Churney to unload the data
• Access permissions to connect to your Redshift cluster through a static IP
What kind of data is required?
The short answer is as much as possible. The long answer is that Churney requires data about:
Payments
Trials (if applicable)
User demographic (if available)
User activity
Additionally, we need to know the location (region) of your data warehouse.
Create hashed views
For context, Facebook has a guide on how to hash contact information for their conversion api: https://developers.facebook.com/docs/marketing-api/conversions-api/parameters/customer-information-parameters. Google has a similar guide for enhanced conversions using their ads api https://developers.google.com/google-ads/api/docs/conversions/enhance-conversions. Basically, we want to hash columns that contain data which would allow one to determine the identity of a user: First name, last name, birth day, street address, phone number, email address, ip address.
For this task, you can either ask Churney to help you create the hashed views, or decide to create them yourself. If you would like Churney’s help, follow up the procedure of creating secure views through a script, otherwise, jump into an example of creating secure views
Create secure views through a script
Open the query editor and set up the working environment, choose the correct cluster and database above:
Run the following query where <your_schema_name_here>
belongs to the schema with the tables you would like to share with Churney.
SELECT
t.table_catalog, t.table_schema, t.table_name, c.column_name, c.data_type
FROM
information_schema.tables t inner join information_schema.columns c
ON t.table_catalog = c.table_catalog
AND t.table_schema = c.table_schema
AND t.table_name = c.table_name
WHERE t.table_schema = '<your_schema_name_here>'
ORDER BY t.table_name;
Export the result, either as json or csv and share it with Churney
We will come back to you with some create view scripts to automatically create the views and hash the data.
In the case of json fields, if there is private information in those fields like email, phone, last name, first name, etc, then we should skip the column from the view, and unpack the elements we need. The private information will also be included in the view for us, but we can modify the create view statement to unpack and hash the private field. In these cases, what we need from you is the name of the json column that has private information, and then the key that we should search for to unpack and hash. For the rest of the json fields that do not have any private user identifier, we can read them as they are.
Example of creating secure views
Consider this setting, under database dev there is a schema named private with your raw data, containing sensitive data. We will need to create a new schema for Churney with secure views that read from private
Create a new schema for Churney in the context database dev
CREATE SCHEMA churney;
Create secure views by hashing the sensitive information. For example, if the data contains names, emails, phone numbers, etc, like below in the table sensitive_data
Then, we will hash these columns in the view:
CREATE OR REPLACE VIEW AS dev.churney.sensitive_data as SELECT sha2(lower(trim(name)), 256) name, sha2(lower(trim(email)), 256) email, sha2(ltrim(trim(regexp_replace(phone_number, '[ +()-]', '')), '0'), 256) phone_number, sha2('+' || ltrim(trim(regexp_replace(phone_number, '[ +()-]', '')), '0'), 256) international_phone_number, sha2(to_char(birthday, 'YYYYMMDD'), 256) as birthday, date, cast("time" as text), bid_price, ask_price, bid_size, ask_size FROM dev.private.sensitive_data;
And the view should look like this:
Make sure the sensitive data is hashed correctly and the non-sensitive data is on plain text. Note that we cast “time” to text (in bold above), because Redshift does not support unloading time data in the parquet format
If the table doesn’t contain any private data, then the view can list the plan column names.
When you are done with creating all the views, it is time to create the Churney user
Create Churney user
3. Create a Churney user and grant access to the data. In the running example, you would replace <your-schema>
with private and <churney-schema>
with Churney
The below permissions mean that we are giving the user Churney access to select data from the view, but since the view refers to the raw data in private, we need to grant usage for private as well. Note that Churney can ONLY select from the view, and not from the raw data. The below permissions are detailed in the AWS documentation https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT.html .
CREATE USER churney password disable;
GRANT USAGE ON SCHEMA <your-schema> TO churney;
GRANT USAGE ON SCHEMA <churney-schema> TO churney;
GRANT SELECT ON ALL TABLES IN SCHEMA <churney-schema> TO churney;
Allow access for the Churney IP to your redshift cluster
Churney will give you a static ip, that you will use in the below step. For the purpose of this example, we will use 12.123.123.123
4. Go to https://<region-name>.console.aws.amazon.com/vpc/home?region=<region-name>#SecurityGroups: where region-name
is the region of your cluster to create a VPC security group and click on Create security group
5. Create a VPC security group in the VPC of your Redshift cluster and add an inbound rule for the the Churney static ip 12.123.123.123
Choose Type = Redshift and put the Churney static ip (in our example 12.123.123.123) in the box where the arrow is below. You will see the ip is added when it is added where the red box is.6. We will add this security group to the Redshift cluster. For that, go to your Redshift cluster and click on PropertiesAnd then go to Network and security settings and click on EditAnd finally add the new security group. Make sure you Turn on Publicly accessible
S3 bucket to unload into
7. Create an s3 bucket in the region of your Redshift cluster that we can unload data into, so go to https://s3.console.aws.amazon.com/s3/buckets?region=<region-name>
where <region-name>
is the same as your Redshift cluster region. Create a bucket with a self-explanatory name, like churney-redshift-unload
.
8. Add a lifecycle rule to the bucket, to delete the content after 7 days. Go go to the bucket you created above and click on ManagementMake sure you have the below selections:
IAM Policies to allow for Churney to extract data from Redshift
9. Create IAM policy ChurneyRedshiftS3 to allow Churney to access the S3 bucket you created in Step 7. Churney should have access to unload the data in this bucket. Replace bucket-name
with the name of the bucket you chose in Step 7, in our example, that is churney-redshift-unload
To find where you create policies, go to https://{region-name}.console.aws.amazon.com/iamv2/home#/policies where region-name
is the region of your Redshift cluster and click on Create Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:GetObject",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<bucket-name>",
"arn:aws:s3:::<bucket-name>/*"
]
}
]
}
10. Create an IAM policy to allow Churney to unload data from Redshift. We will limit these permissions to the user you created before and to the database with the hashed views that you created for Churney.
To find where you create policies, go to https://<region-name>.console.aws.amazon.com/iamv2/home#/policies where region-name is the region of your Redshift cluster and click on Create Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "redshift:DescribeClusters",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "redshift:GetClusterCredentials",
"Resource": [
"arn:aws:redshift:<region>:<account-id>:dbuser:<redshift-cluster>/churney", "arn:aws:redshift:<region>:<account-id>:dbname:<redshift-cluster>/<database-name>"
]
}
]
}
11. Create a role for Churney that will have the two policies created above and a trust relationship. Go to https://<region-name>.console.aws.amazon.com/iamv2/home#/roles where region-name
is the region where Redshift is and click on Create role. For the purpose of the example, the name of the role is churney-redshift-role
Choose Custom Trust Policy and put the trust policy as in the code snippet below. The way we will use this trust policy is all described in this link and follows the best practices https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html. In id_1
and id_2
you will put two ids that you will get from Churney. These are the service accounts used from us to transfer data to our google cloud.
Attach the policies you created in Step 9 and Step 10
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "accounts.google.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"ForAnyValue:StringEquals": {
"accounts.google.com:sub": [
"<id_1>",
"<id_2>"
]
}
}
}
]
}
12. What to share with Churney?
The endpoint that can be found when you go to the details of the cluster, as in the picture below. The endpoint is inside the red circle
The user that you created in Step 3
The port number (can be found at the endpoint, usually 5439)
The arn role of Step 11, you can find it again if you go to IAM, and then Roles, and click on the role name