Reach out if you want to know more about how we can help you.

Contact us

SOC 2

GDPR Compliance

Suela Isaj is a Data and ML engineer at Churney. She holds a Joint PhD in Computer Science from Aalborg University and Université libre de Bruxelles in the field of spatial data extraction and entity resolution. Suela has previously worked as data engineer at Issuu, and data quality officer and risk system specialist at Raiffeisen Bank.

Suela Isaj

Data and ML Engineer

Suela

The short answer is as much as possible. The long answer is that Churney requires data about:

Additionally, we need to know the location (region) of your data warehouse.

To create the views, we would need to create queries to hash the PII columns.

For context, Facebook has a guide on how to hash contact information for their conversion api: 

https://developers.facebook.com/docs/marketing-api/conversions-api/parameters/customer-information-parameters

. Google has a similar guide for enhanced conversions using their ads api 

https://developers.google.com/google-ads/api/docs/conversions/enhance-conversions

. Basically, we want to hash columns that contain data which would allow one to determine the identity of a user: First name, last name, birth day, street address, phone number, email address etc.

If the raw data contains email, phone, birthday or other identifiers, the columns need to be excluded and hashed in the view.

CREATE OR REPLACE <your_project>.churney.sensitive_data_view as ( 
select
SHA2(LOWER(TRIM(email)), 256) as email,
SHA2(LOWER(TRIM(name)), 256) as name,
SHA2(LTRIM('0', regexp_replace(phone, '[()-+-]', '')), 256) as phone,
SHA2(date_format(birthday, 'yyyyMMdd'), 256) as birthday,
* EXCEPT(email, name, phone, birthday)
from <your_project>.raw_data.sensitive
)


If you want to test it how the script looks with some data:

with sensitive as
(
 select 1 as id,
 "abc" as name,
 "abc@abc.com" as email,
 current_date() as birthday,
 '+4511111111' as phone
)
select
SHA2(LOWER(TRIM(email)), 256) as email,
SHA2(LOWER(TRIM(name)), 256) as name,
SHA2(LTRIM('0', regexp_replace(phone, '[()-+-]', '')), 256) as phone,
SHA2(date_format(birthday, 'yyyyMMdd'), 256) as birthday,
* EXCEPT(email, name, phone, birthday)
from sensitive

For this step, it is recommended that you use an email from your domain and maintain the user. You will not need to share these credentials with Churney, so this can be a developer-purpose email

Go to your account -> Settings, and then Identity and access -> Users and create the new user

Edit the user we just created (in this example 

Then verify the email, login with the new user, and go to Settings -> Developer -> Access tokens

Generate a token for Churney and remember to remove the lifetime days. Note the token down, you will share it with Churney in a safe way.

Create a group for the churney user as below

And add the email address you created above to this group.

Churney will give you the name of their gs bucket for this step. 

Go to Catalog -> Storage Credentials -> Create credentials

Note down the service account you will see and share it with Churney

. This is the service account that will access Churney’s google storage bucket to unload the data.

Go to the storage credential and grant the below permissions to yourself


CREATE EXTERNAL LOCATION IF NOT EXISTS `churney_external`
URL 'gs://<bucket_name>/'
WITH (STORAGE CREDENTIAL `churney-storage-credential`);

GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION `churney_external` TO `Churney`;
GRANT READ FILES ON EXTERNAL LOCATION `churney_external` TO `Churney`;

Let’s assume that you would like to share the views under 

Churney will create external tables pointing at the gs bucket, so let’s create a schema for the 

Churney will maintain the exports here, so grant the permissions below

 schema, Churney only needs Data reader, so grant those permissions as below

Here you can read our guide for connecting your Databricks GCP data warehouse with Churney

Connecting Databricks GCP to Churney

Databricks GCP Connector




CREATE OR REPLACE <your_project>.churney.sensitive_data_view as ( 
select
SHA2(LOWER(TRIM(email)), 256) as email,
SHA2(LOWER(TRIM(name)), 256) as name,
SHA2(LTRIM('0', regexp_replace(phone, '[()-+-]', '')), 256) as phone,
SHA2(date_format(birthday, 'yyyyMMdd'), 256) as birthday,
* EXCEPT(email, name, phone, birthday)
from <your_project>.raw_data.sensitive
)


For this step, it is recommended that you use an email from your domain and maintain the user. You will not need to share these credentials with Churney, so this can be a developer-purpose email that you will be maintaining.

Go to your account -> Settings, and then Identity and access -> Users and create the new user:

 accept the invitation, so login with the new user, follow the invitation link and login. 

Then go to Settings -> Developer -> Access tokens

Create a group for the churney user as below:

And add the email address you created above to this group (in this example 

Let's create an s3 bucket for Churney. Make sure it is in the same region as your Databricks and public access is blocked.

In this example, we are creating a bucket named 

First, we need to create a role and assign policies  for Churney in AWS:

Go in IAM -> Policies in your AWS account:

And create a policy for Churney to access the bucket we created previously

 with the name of the bucket you created above, in this example, with 

Then go to Roles to create a new role for churney with Custom trust policy:

 with the ids of the service accounts that Churney will give you.

Make sure you add the policy to the Churney role:

Go to your Databricks under Catalog -> Add a storage credential and create a storage credential, where you enter the arn of the churney role you added above. The output should look like below:

Now, go back to the role in AWS and update the trust policy as below:

CREATE EXTERNAL LOCATION IF NOT EXISTS `churney_external`
URL '<s3_url>'
WITH (STORAGE CREDENTIAL `churney_storage_credential`);


where the url contains your s3 bucket. In this example, the sql will look like below:

CREATE EXTERNAL LOCATION IF NOT EXISTS `churney_external`
URL 's3://churney-databricks-unload/'
WITH (STORAGE CREDENTIAL `churney_storage_credential`);

Churney will create external tables pointing at the s3 bucket, so let’s create a schema for the 

Churney will maintain the exports here, so grant the permissions below:

 schema, Churney only needs Data reader, so grant those permissions as below:

Here you can read our guide for connecting your Databricks AWS data warehouse with Churney

Connecting Databricks AWS to Churney

Databricks AWS Connector

Jonas is the head of data engineering at Churney. He previously worked as data engineering team lead at LiveIntent. Jonas holds a PhD in Mathematical Physics from Aarhus University.

Jonas Dahlbæk

Head of Data Engineering

Jonas

. Basically, we want to hash columns that contain data which would allow one to determine the identity of a user: First name, last name, birth day, street address, phone number, email address, ip address.

If we would like to share a users table with Churney, it is best to create a view that points to the table where you have the liberty to choose which columns you would like to share, and hash the private information according to Facebook's guide. Let’s suppose you would like to share the table 

 with Churney. To do so, you can create a new Maxcompute instance on the same region as you raw data and create a view like in this example:

CREATE OR REPLACE VIEW  churney.users_view as

SELECT
   sha2(lower(trim(name)), 256) name,
   sha2(lower(trim(email)), 256) email,
   created_at

FROM df_ew1_332.users;

What you will share with Churney will be the view in 

However, if the data that you are planning to share does not contain any private information, then you can skip the creation of views and share the original schema 

Go under RAM -> User and create a churney user with OpenAPIAccess

An access key and secret access key will be generated. Note them down and share them securely with Churney

Go in DataWorks and run the following script to give permissions to Churney on the Maxcompute instance and on the specific tables/views you will share with Churney

create role Churney;
grant READ ,LIST, CreateInstance
     on project <your_project_name>
     to ROLE Churney;


grant SELECT, DESCRIBE ON TABLE users
TO ROLE Churney;


add user RAM$<your_alibaba_id>:<churney_user_id>;


grant Churney TO RAM$<your_alibaba_id>:<churney_user_id>;

If you check on your admin user, this is the account_id that goes in the red box

You can find this id if you go back to the user you created in the previous step and get the UID in the red box

Like in the screenshot below, create a new bucket for Churney and make sure to Block Public Access

Go in RAM -> Policies and create this policy to allow Churney to access the OSS bucket

 to the name of the bucket. In this example, that would be 

Go under RAM -> Roles and create a new role 

 where you would attach the policy above. Then go in the Trust policy and change it to this:

 - in the red circle in the screenshot above

. You can find that as in the screenshot below, by going to your bucket and click on Overview

https://service.eu-west-1.maxcompute.aliyun.com/api

 of the user you created for Churney. If you didn’t note down the keys in the moment you created the user, you can generate new ones by going to the user and clicking Generate Access Key

Here you can read our guide for connecting your MaxCompute data warehouse with Churney. 

Connecting Maxcompute (Alibaba Cloud) to Churney

Alibaba Cloud Connector

We would need a role in AWS with the minimal permissions to access the data in Athena and export it to an S3 bucket. On a high level, we need access to:

The next steps explain the creation of the secure views for the data that you would like to share with Churney. If you would like Churney’s help to create the views of top of your tables and hash the private information, then follow 

Go to the query editor in Athena and choose the context, the data source and the database where your raw data lies.

 belongs to the schema with the tables you would like to share with Churney.

SELECT 
    t.table_catalog, 
    t.table_schema, 
    t.table_name, 
    c.column_name, 
    c.data_type
FROM information_schema.tables t 
INNER JOIN information_schema.columns c
ON t.table_catalog = c.table_catalog
    AND t.table_schema = c.table_schema
    AND t.table_name = c.table_name
WHERE t.table_schema = '<your_schema_name_here>'  
ORDER BY t.table_name;

Export the result by clicking on Download results as below,  and share the csv with Churney

We will come back to you with some create view scripts to automatically create the views and hash the data.

In the case of json fields, if there is private information in those fields like email, phone, last name, first name, etc, then we should skip the column from the view, and unpack the elements we need. The private information will also be included in the view for us, but we can modify the create view statement to unpack and hash the private field. In these cases, what we need from you is the name of the json column that has private information, and then the key that we should search for to unpack and hash. For the rest of the json fields that do not have any private user identifier, we can read them as they are.

Churney wants to have access only to hashed sensitive data, and not to the raw data of the private information of the users. We will go over limiting our permissions in the next section, and in this section, we will guide you to create views on top of the raw data, which hash the private information.

2. Create hashed views on top of the raw data. Let’s suppose that the raw data lies in a database called 

 and the table with the user information is called 

, which contains private information like first_name, last_name, email, etc. In the code snippet below, we will create a view that hashes the private information, and leaves the rest of the columns as raw. Note that if your table (e.g. event tables) do not contain any private information, your view can be just a select from the table.

CREATE OR REPLACE VIEW churney.user_activity AS 
SELECT
    id,
    LOWER(TO_HEX("sha256"("to_utf8"("trim"(first_name))))) first_name,
    LOWER(TO_HEX("sha256"("to_utf8"("trim"(last_name))))) last_name,
    LOWER(TO_HEX("sha256"("to_utf8"("trim"(email))))) email,
    amount_spent,
    activity_at
FROM
    mydatabase.user_activity;

After you are done with creating all the views that you would like to give access to Churney, do a simple test by running a select query to make sure everything you have hashed looks correctly, e.g.

Choose a unique name for the bucket, and something self-explanatory, like 

choose the region of this bucket the same as your Athena region. 

, then the region of the bucket should be 

Make sure the rest of the setting are the default ones and they look like below

And the s3 bucket doesn’t have public access

to create a lifecycle rule that cleans old objects from the bucket, so Churney manages your storage carefully.

https://docs.aws.amazon.com/athena/latest/ug/workgroups-procedure.html

console.aws.amazon.com/athena/home?region={region-name}#/workgroups

https://us-east-1.console.aws.amazon.com/athena/home?region=us-east-1#/workgroups

Give the workgroup a unique name, we would suggest something simple like 

Expand the Additional Configurations section and fill them as below, where the default location of the query result of Churney would always be on our bucket, which is the bucket that was created in Step 1-3. In this example, we used 

Figure out the database parameters that you will need to provide. Churney will need the name of the table(s) in Athena to export from, the database, and the data source. You can find these parameters in Athena, for example, in the screenshot below, if the table that will contain the data is 

. Keep these parameters in mind because they will be needed for the steps below.

console.aws.amazon.com/iamv2/home?region={region-name}#/policies  

 is the name of the region where your Athena lie, for example if the region is 

https://us-east-1.console.aws.amazon.com/iamv2/home?region=us-east-1#/policies

Create a policy for Athena as below, where you need to replace all in bold: 

 with the region where Athena lies, for example 

 with the id of the Athena admin account. You will also need to provide the 

 which is the name you provided in Step 6 and 7

Create a policy for the S3 bucket where Churney will export the data. Replace 

  with the name of the bucket you chose in Step 1-3. Note that is the bucket where Churney will unload the hashed data from the views you created above.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:GetBucketLocation",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::<churney-bucket-name>/*",
                "arn:aws:s3:::<churney-bucket-name>"
            ]
        }
    ]


Create a policy for AWS Glue to access the hashed views. Replace 

 with the id of the Athena admin account. Replace 

 with the named of the database with the hashed views, so in this example, that would be 

Create a policy for AWS Glue to be able to read data from the hashed views. For this, we need access to the tables from which you created hashed views. Replace 

 with the name of the database as described in Step 10, and 

 with the names of the raw tables you created hashed views for. In our example, we had only 1 table 

Note that by listing all the tables, we limit access only to those tables that you would like to share with Churney.

Create a policy for the S3 bucket where Athena queries tha data from. This is the location of the physical table data that Athena refers to. To find the bucket of the data, you can click on the three dots in the table you want to find the s3 bucket for, for example in the screenshot below the table is 

. in Location, highlighted in blue, you can see the s3 bucket. In this example, that is s3://one-time-athena-experiment/activity, so 

Note that we are adding a condition to limit the access to S3 only through Athena! This means that Churney cannot read the contents of your S3 bucket directly unless we access it through Athena.

It can happen that your tables originate from different S3 buckets, so make sure to check the physical S3 location of each table that you will share with Churney through views. Let’s suppose you have two different buckets, then the above code would look like:

 is the region where Athena is and click on Create role. For the purpose of the example, the name of the role is 

Choose a custom trust policy and add the trust policy as below, Churney will give you the ids needed for this step, so what you should fill in is 

 The way we will use this trust policy is all described in this link and follows the best practices 

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html

On the next step, you will be required to add permissions, remember to add all permissions we created on Step 11, 12, 13, 14, and 15

And the role should finally look like this, where you can see the policies and if you click on Trust relationships, you should be able to see the trust relationship

After you are done with the above steps, you will need to send these details to Churney:

1 .The arn of the role you created in Step 17. You can find that in this link 

console.aws.amazon.com/iamv2/home#/roles/details/<role-name>?section=permissions

 is the name of the region where Athena lies and 

 is the name of the role you created in Step 17. For our example, the link would be 

https://us-east-1.console.aws.amazon.com/iamv2/home#/roles/details/churney-role?section=permissions

 and the ARN is the one marked in red below

2 .The details of the Athena data (the hashed views), which would be 

3 . The name of the S3 bucket where Churney will unload the data, this is the bucket you created in Step 1

4 . The name of workgroup you created above in Step 6. 

Here you can read our guide for connecting your Athena data warehouse with Churney. 

Connecting Athena to Churney

Athena Connector

In this guide, we will go over the steps for setting up permissions for Churney to access your Redshift cluster, using hashed views and minimal permissions. On a high level, Churney will need:

A Churney user for the Redshift cluster with read permissions only on the secure views

An s3 bucket for Churney to unload the data

Access permissions to connect to your Redshift cluster through a static IP

For this task, you can either ask Churney to help you create the hashed views, or decide to create them yourself. If you would like Churney’s help, follow up the procedure of 

Open the query editor and set up the working environment, choose the correct cluster and database above:

SELECT 
   t.table_catalog, t.table_schema, t.table_name, c.column_name, c.data_type
FROM 
   information_schema.tables t inner join information_schema.columns c
ON t.table_catalog = c.table_catalog
   AND t.table_schema = c.table_schema
   AND t.table_name = c.table_name
WHERE t.table_schema = '<your_schema_name_here>'  
  ORDER BY t.table_name;

Export the result, either as json or csv and share it with Churney

Consider this setting, under database dev there is a schema named private with your raw data, containing sensitive data. We will need to create a new schema for Churney with secure views that read from private

Make sure the sensitive data is hashed correctly and the non-sensitive data is on plain text. 

(in bold above), because Redshift does not support unloading time data in the parquet format

If the table doesn’t contain any private data, then the view can list the plan column names.

When you are done with creating all the views, it is time to create the Churney user

3. Create a Churney user and grant access to the data. In the running example, you would replace 

The below permissions mean that we are giving the user Churney access to select data from the view, but since the view refers to the raw data in private, we need to grant usage for private as well. Note that Churney can ONLY select from the view, and not from the raw data. The below permissions are detailed in the AWS documentation 

https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT.html

CREATE USER churney password disable; 

GRANT USAGE ON SCHEMA <your-schema> TO churney; 

GRANT USAGE ON SCHEMA <churney-schema> TO churney; 

GRANT SELECT ON ALL TABLES IN SCHEMA <churney-schema> TO churney; 

Allow access for the Churney IP to your redshift cluster 

Churney will give you a static ip, that you will use in the below step. For the purpose of this example, we will use 12.123.123.123

console.aws.amazon.com/vpc/home?region=<region-name>#SecurityGroups

 is the region of your cluster to create a VPC security group and click on 

5. Create a VPC security group in the VPC of your Redshift cluster and add an inbound rule for the the Churney static ip 12.123.123.123

Choose Type = Redshift and put the Churney static ip (in our example 12.123.123.123) in the box where the arrow is below. You will see the ip is added when it is added where the red box is.

We will add this security group to the Redshift cluster. For that, go to your Redshift cluster and click on Properties

And finally add the new security group. Make sure you 

7. Create an s3 bucket in the region of your Redshift cluster that we can unload data into, so go to 

https://s3.console.aws.amazon.com/s3/buckets?region=<region-name>

 is the same as your Redshift cluster region. Create a bucket with a self-explanatory name, like 

8. Add a lifecycle rule to the bucket, to delete the content after 7 days. Go go to the bucket you created above and click on 

Make sure you have the below selections: 

IAM Policies to allow for Churney to extract data from Redshift 

9. Create IAM policy ChurneyRedshiftS3 to allow Churney to access the S3 bucket you created in Step 7. Churney should have access to unload the data in this bucket. Replace 

 with the name of the bucket you chose in Step 7, in our example, that is 

To find where you create policies, go to 

console.aws.amazon.com/iamv2/home#/policies

 is the region of your Redshift cluster and click on Create Policy

10. Create an IAM policy to allow Churney to unload data from Redshift. We will limit these permissions to the user you created before and to the database with the hashed views that you created for Churney.

 where region-name is the region of your Redshift cluster and click on Create Policy

Here you can read our guide for connecting your Redshift data warehouse with Churney. 

Integrations