To configure a Workflow for Amazon Personalize, you will need to:

  1. Create a new workflow and set a trigger schedule.
  2. Map your data from your Redshift table to the Personalize datasets (see below)
  3. Configure the model training parameters
  4. Deploy an API endpoint 


This article describes how to use the Workflow Input step to Input and map data from your Redshift table.


Contents


Input Workflow Steps

Input Workflow Steps are used to map your datasets to the Amazon Perzonalize application. 

  • From your workflow, add an Input Step.
    Note: All workflows require a Trigger step before you can add other steps.
  • Configure it using the following guide.

Input Step Configuration

For details of how to complete the Input Step fields, see Input Step Configuration


Data Mapping

Input data from Redshift must be mapped to Personalize specific dataset schemas so that the models can be trained.

The service uses three types of dataset:


Interactions Dataset

This dataset is always required.

It stores historical and real-time data from interactions between users and items. 

This could include contextual data such as impressions, clicks, user reviews or ratings.


Mapping the Interaction Dataset to your Redshift data fields

From the Interaction Dataset window, you can map data fields from your schemas in Redshift to the Interaction Dataset schema so that Amazon Personalize can parse your data.

  1. Choose a Schema and Table from the dropdown menus:
    • Schema: Lists all of the schemas that are available to use on the tenant.
    • Table: Lists all of the tables that are available for the selected schema.
  2. From the Field drop-downs select the table field header that you want to map to the standard fields.
  3. From the Data Type fields, select the appropriate data type for the field header.
    The data type of the selected attribute (as defined in the Redshift schema) is shown by default.
    If the selected attribute data type does not meet Amazon Personalized requirements, the system shows an error message with the required datatype.
  4. If you are mapping an optional field that can have a null or categorical value, click the required checkbox.
    Null and Categorical checkboxes are not enabled for mandatory fields.


Understanding the fields

The dataset has the following sets of required fields, optional reserved keywords and optional metadata fields:


Interaction fields (required)

This data must be mapped.

An Interactions dataset stores historical and real-time data from interactions between users and items. 

To create a recommendation system using Amazon Personalize, you must at minimum create an Interactions dataset.

FieldType
USER_IDString

ITEM_ID

String

TIMESTAMP

Unix epoch format


Reserved Keywords (optional)

Reserved keywords are optional, non-metadata fields. 

These fields are considered reserved because you must define the fields as their required data type when you use them. 


AttributeTypeDetails

EVENT_TYPE

StringUsed for Interactions datasets with one or more event types, such as both click and download.

EVENT_VALUE

FloatUsed for Interactions datasets that include value data for events, such as the percentage of a video a user watched, use an EVENT_VALUE field with type float and optionally null.

IMPRESSION

String

Used for Interactions datasets with explicit impressions data. 

Impressions are lists of items that were visible to a user when they interacted with (for example, clicked or watched) a particular item.

This data is used to train this recipe:

  • User-Personalization 


  • The maximum total number of optional metadata fields you can add to an Interactions dataset, combined with total number of distinct event types in your data, is 10. The metadata fields included in this count are EVENT_TYPE, EVENT_VALUE fields along with any custom metadata fields you add to your schema
  • The maximum number of metadata fields excluding reserved fields, such as IMPRESSION, is 5. 
  • Categorical values can have a maximum of 1,000 characters. Any interaction with a categorical value with more than 1,000 characters is dropped during a dataset import job and is not used in training.


Contextual metadata (optional)

This data is optional.

Contextual metadata is environmental interaction data collected at the time of an event. 

For example, by collecting data on the type of device that users access a website with it becomes possible to determine if those users act differently when they are browsing via a phone when compared with a computer.

This data is used to train these recipes:

  • User-Personalization
  • Personalized-Ranking


AttributeType
LOCATIONString
DEVICEString



Item Dataset

This dataset is optional.

It stores metadata about users and could include information such as age, gender or loyalty membership.

From the Item Dataset window, you can map data fields from your schemas in Redshift to the Item Dataset schema so that Amazon Personalize can parse your data.


Mapping the Item Dataset to your Redshift data fields

  1. Choose a Schema and Table from the dropdown menus:
    • Schema: Lists all of the schemas that are available to use on the tenant.
    • Table: Lists all of the tables that are available for the selected schema.
  2. From the Field drop-downs select the table field header that you want to map to the standard fields.
  3. From the Data Type fields, select the appropriate data type for the field header.
    The data type of the selected attribute (as defined in the Redshift schema) is shown by default.
    If the selected attribute data type does not meet Amazon Personalized requirements, the system shows an error message with the required datatype.
  4. If you are mapping an optional field that can have a null or categorical value, click the required checkbox.
    Null and Categorical checkboxes are not enabled for mandatory fields.


Understanding the fields

The dataset has the following sets of required fields, optional reserved keywords and optional metadata fields:


Required item data

This data must be mapped.


FieldType
ITEM_IDString


During model training, Amazon Personalize considers a maximum of 750,000 items. 

If you import more than 750,000 items, Amazon Personalize decides which items to include in training, with an emphasis on including new items (items you recently added with no interactions) and existing items with recent interactions data.


Optional Items Metadata

An Items dataset stores metadata about your items. This might include information such as price, genre, or availability. An Items dataset is optional.

This data is used to train these recipes:

  • User-Personalization
  • Personalized-Ranking


FieldType
categoryString
brandString
colorString


The metadata fields are shown here as an example.

At least one metadata field is required.

  • You can add up to 50 metadata fields.
  • Categorical values can have up to 1000 characters. 
  • Any user with a categorical value that is more than 1000 is dropped during a dataset import job and is not used in training.


Reserved Keywords Metadata

This data is optional.


FieldType

CREATION_TIMESTAMP

UNIX epoch format (seconds)


Amazon Personalize uses creation timestamp data to calculate the age of an item and adjust recommendations accordingly.

If creation timestamp data is missing for one or more items, Amazon Personalize infers this information from interaction data, if any, and uses the timestamp of the item’s oldest interaction data as the item's creation timestamp. 

If an item has no interaction data, its creation timestamp is set as the timestamp of the latest interaction in the training set and Amazon Personalize considers it a new item.


User Dataset

This dataset is optional.

From the User Dataset window, you can map data fields from your schemas in Redshift to the User Dataset schema so that Amazon Personalize can parse your data.


Mapping the Users Dataset to your Redshift data fields


  1. Choose a Schema and Table from the dropdown menus:
    • Schema: Lists all of the schemas that are available to use on the tenant.
    • Table: Lists all of the tables that are available for the selected schema.
  2. From the Field drop-downs select the table field header that you want to map to the standard fields.
  3. From the Data Type fields, select the appropriate data type for the field header.
    The data type of the selected attribute (as defined in the Redshift schema) is shown by default.
    If the selected attribute data type does not meet Amazon Personalized requirements, the system shows an error message with the required datatype.
  4. If you are mapping an optional field that can have a null or categorical value, click the required checkbox.
    Null and Categorical checkboxes are not enabled for mandatory fields.


Understanding the fields

The dataset has the following sets of required fields and metadata fields:


Required user data

This data must be mapped.


FieldTypeDetails

USER_ID

String

As a minimum, you must provide a User ID for each user. 



Optional user data

A User Dataset stores metadata about your users. This might include information such as age, gender, or loyalty membership. A User Dataset is optional. 


FieldTypeDetails
AGEString

This data is used to train these recipes:

  • User-Personalization
  • Personalized-Ranking

GENDER

String

This data is used to train these recipes:

  • User-Personalization
  • Personalized-Ranking

The metadata fields are shown here as an example.

At least one metadata field is required.

  • User metadata can include empty / null values.
  • You can add up to five metadata fields (excluding USER_ID).
  • Categorical values can have up to 1000 characters. 
  • Any user with a categorical value that is more than 1000 is dropped during a dataset import job and is not used in training.