This article introduces you the current Data Bridge configuration options and the onboarding process.


Contents


How does Peak store data?

Peak uses two types of data store technology when storing data for use in the platform:


Data lake

This is a store for storing both structured and unstructured data; data in its raw form.

Peak uses a data lake to store data from multiple sources and in multiple formats. 


Data lakes make storing data easy as data does not need to be formatted or structured in a particular way. They are also highly scalable and can accommodate growing volumes of data. However, data must still follow certain standards, such as basic metadata and date-tagging. 


Peak supports Amazon’s S3 as a data lake and it must be configured before the platform can be used.


Data warehouse

This is a store for storing structured, processed data.

They are designed with relatively strict structure in aid of robust querying and data analysis. Data warehouses aid in data governance, ensuring quality and security and can aid in streamlining data by removing redundancies. 


At Peak, we use data warehouses to store tabular data, and to access it for analysis and decision making. Peak requires a data warehouse to be configured so that core features such as Data Sources and SQL Explorer can be used.




Why use Data Bridge?

Why would you choose Data Bridge over one of our data connectors?

  • Quicker onboarding
    Multiple data connectors do not have to be configured and feeds scheduled.
    Once the required policies and permissions have been defined, data is available for use in Peak as soon as you have stored it.

  • Increased security
    You have full control over how your data is accessed and who can access it.
    Data is not exposed to the public Internet; in the case of a customer managed Amazon S3 data lake, data is securely transferred between your AWS account and Peak’s AWS account via an AWS PrivateLink.

  • No data duplication
    Data is not replicated across multiple locations, making your data easier to maintain and helping to ensure data integrity.

  • Flexibility
    You can can store your data in any format and use it in any way you see fit.

  • Uphold your data localization laws
    Data is stored on your infrastructure helping to ensure that you meet your specific data localization laws.



Secure connection to your Amazon S3 data lake

This illustration shows how Peak securely connects to data within your S3 data lake.

Key:

  • Region
    This is the AWS region where your account is located.

  • Bucket Policy that gives limited access to Peak
    Peak assumes the IAM role that you provide. This policy is set in the your S3 bucket and provides Peak with limited access to specific storage paths.

  • Amazon S3 data lake
    This is your Amazon S3 data lake.

  • Glue Catalog
    This is an optional ETL service that provides an index to the location, schema, and runtime metrics of your data. It can assist data scientists with their queries, but it is not essential.

  • IAM Policy giving access to specific resources
    This is the your IAM policy that specifies what Peak can access within your data lake.
    The policy contains the recommended permissions that Peak requires to help speed up the onboarding process. If preferred, you can also define your own permissions.

  • Cross account IAM role
    This enables you to grant Peak secure access to AWS resources in your account.




Data Bridge configuration options

When onboarding to Peak, Data Bridge enables you choose how your data is stored and accessed by the platform:

  • Peak managed:
    The data lake or data warehouse that is used by your Peak organization is owned and managed by Peak and sits within the Peak data infrastructure.

  • Customer managed:
    The data lake or data warehouse that is used by your Peak organization is owned and managed by you and sits within your own data infrastructure.

You can choose between a fully Peak managed configuration or different combinations of Peak managed and customer managed.


Supported data stores and configurations

Currently, Peak supports the following data lakes and data warehouses:


Data lake

Data warehouse

Peak managed

Amazon S3

Amazon Redshift

Snowflake

Customer managed

Amazon S3

Snowflake


The following configuration options are available:


Peak managed Amazon S3 and Peak managed Amazon Redshift

This is the simplest configuration, the quickest for onboarding and the easiest for you to manage.

Peak owns and manages both the data lake and data warehouse. The Peak platform connects to your data source and ingests data into both stores.


Customer managed Amazon S3 and Peak managed Amazon Redshift

This configuration is suitable if you have your own data lake that you want to use with Peak.

You own and manage the data lake and Peak owns and manages the data warehouse within the Peak environment.

Currently, Peak supports AWS S3 data lakes for this configuration.


Peak managed Amazon S3 and Customer managed Snowflake

This configuration is suitable if you have a Snowflake data warehouse that you want to use with Peak.

You own and manage the Snowflake data warehouse and Peak owns and manages the data lake within the Peak environment.

After onboarding with this configuration, Peak will have read-only access to the schema containing your raw data and read-write access to a separate schema that Peak can then write data back to.



Peak managed Amazon S3 and Peak managed Snowflake (with a read-only Snowflake share)

This configuration is suitable if you have a Snowflake data warehouse but do not want to share any of your details with Peak.

Peak owns and manages both the data lake and data warehouse within the Peak environment and you create a ‘share’ between your Snowflake data warehouse account and Peak.

Any data objects that you share with Peak will be read-only which means that they cannot be deleted or modified, including adding or changing table data.

Customer managed Amazon S3 and Peak managed Snowflake (with read-only Snowflake share)

This configuration is suitable if you have an Amazon S3 data lake and a Snowflake data warehouse but do not want to share any of your details with Peak.
Peak owns and manages the data warehouse within the Peak environment and you give Peak read-only access to your Amazon S3 data lake and create a ‘share’ between your Snowflake data warehouse and Peak.
Any data objects that you share with Peak will be read-only which means that they cannot be deleted or modified, including adding or changing table data.

Onboarding with Data Bridge

When you sign in to Peak for the first time, you will be prompted to connect to both a new data lake and a data warehouse. You will need to complete this process before you can start using Peak.
If you decide to let Peak manage your data lake or data warehouse, the onboarding process will be quicker as Peak holds all of the security credentials that are required to make a connection.
If you opt to manage your own data lake or data warehouse, there are a few more steps required so that Peak can be configured to securely access your data storage infrastructure.

Data lake onboarding

During this process, you will choose the type of data lake connection that you want to use and then provide some configuration details before saving.
Currently, Peak supports Amazon S3 data lakes.

You can choose between:
  • Peak managed
    You choose the data lake region where your data will be physically stored. Peak then creates and manages the data lake for your organization.
    This is the quickest process as Peak holds all of the security credentials that are required to make a connection.

  • Customer managed
    You configure your Amazon S3 data lake to work with your organization.
    During the process, you will need to create an IAM role in your AWS S3 account so that Peak can connect to your S3 bucket. The Peak platform generates the IAM policy that you will need to use while creating the IAM role.
For a guide to data lake onboarding, see Connecting Peak to a data lake.

Data warehouse onboarding

During this process, you will choose the type of data warehouse that you want to use and then provide some configuration details before saving.
You can choose between:

  • Redshift data warehouse
    You choose the region where your data warehouse is physically located.
    Your data lake and data warehouse must be located in the same region.

  • Snowflake data warehouse
    You add your Snowflake cluster details which include your account credentials, region and database schema information. Once this is done, you add details of your data lake so that the two can be linked. Your data lake and data warehouse must be located in the same region.

For a guide to data warehouse onboarding, see: