When you load data into Harmoni, it uses the inherent data dictionary to automatically map the source variables to the Harmoni variable types. But some sources, like Excel, do not include a data dictionary and the source variables must be defined as the data is loaded. Harmoni helps you to do this with a data wizard and you can also insert key identifiers to aid this process.
In this article
- Load an Undefined Source
- Define Source Variables
- Map variables using key identifiers in source data
- Prediction Logic Algorithm
1. Load an Undefined Source
When loading data files that do not contain an inherent dictionary, e.g., Excel XLXS, comma-delimited CSV, or tab delimited TXT, you follow the standard process when loading data files into Harmoni.
Learn more about creating a Harmoni project.
During the loading process, Harmoni runs a prediction algorithm to determine the best possible match to create Harmoni variable types from the source variables. Harmoni allows you to review the matches and then confirm or override the automated mapping.
You have the option to override the automated mapping in two ways:
- Using the data type option wizard.
- Inserting key identifiers in the source data.
2. Define Source Variables
After uploading the source you wish to include in your project, you will first need to confirm or override the variable mapping before adding the source to your project. To do this, you need to click on the source data tile, which will open the mapping wizard.
If you have multiple sources, you will need to define each source. A tile with an 'orange tinge' indicates that the source needs defining in order to be a viable project source.
Harmoni will remember the mapping for the same data source. However, each separate data source needs to be mapped independently.
Data Type Option Wizard Fields
Column Name
Column names correspond to the first row in your data file. The first row must contain headers with unique, non-blank descriptions.
Data Type
Harmoni will automatically map source variables into Harmoni types using a prediction algorithm.
Mapping Data Type
You have the option to override the automated mapping and map to the following types:
- Standard Axis
- Measure
- Verbatim
Include
You can select the variables that you wish to include or exclude in your project.
CONFIRM AND ADD: Confirm the definition and add the source to the project list, ready to create the project.
CONFIRM DEFINITION: Confirm the definition. You will need to add the source to the project list prior to creating the project.
RESET: If you change the mapping for the data type, you can use the reset button to change everything back to the original settings.
CANCEL: Cancel defining the sources and return to the Upload/Connect area.
- After naming your project, choose CREATE NEW.
- Locate your data and upload.
- Select the source you wish to define.
- Data Type Option Wizard will open.
- You can change the mapping of the variables and select the variables you wish to include/exclude from the project.
- When ready, click the Confirm and Add or Confirm Definition button.
- If required, add the source(s) you want to include in your project and create.
Change variable definitions after creating a project
There may be instances where after loading your project, you decide a variable in your project will suit better your design and analysis if it is a different type. For example, if the variable "Exact Age" was initially mapped to be a measure changing to a standard axis will give you the full distribution of elements in your project.
To redefine any variables, follow these steps
- Switch the project to edit mode
- Click view/add sources
- Click Add/Remove to open the sources area
- From the 3-dot menu, select Define and the data type option wizard opens
- Find the source variable and change the mapping data type
- Select CONFIRM DEFINITION
- Ensure the data source(s) you need are selected and click OK to update the project
- You will receive a warning message in case any of the changes will result in missing items in your project. If you decide to continue your project loads with the relevant changes.
Before the project loads, you will receive a warning message when the changes will cause items to be removed from your project. You can decide to go ahead or cancel.
- Switch to edit mode and navigate to the sources area
- Select Define from the three-dot menu on source you wish to change
- In the Data Type Option wizard change the mapping
- Click the Confirm Definition button.
- Click OK
Updating data sources
Should you need to update or replace your data, Harmoni will remember the mapping as long as it identifies it to be the same source (same name).
Harmoni will also identify new and missing variables.
3. Map variables using key identifiers
Before loading your data, you can change the column headers to include the following key identifiers to predetermine the data type.
- $ - any field starting with $ becomes a measure.
- $weight - any field starting with $weight becomes a weight. Please note that the wizard will identify as a measure but when loading the project it will map as a weight.
- & - any field starting with & becomes a text item
Header names with keyword identifiers will take precedence over the mapping algorithm.
When you load the data file that includes the key identifiers, you still see the Data Type Option wizard; however, Harmoni will read the predetermined mapping from your source. Viewing the wizard allows you to confirm the variable mapping is correct before you proceed with creating the project.
- Insert key identifiers in the column headers in your data.
- Load an existing Project or create a new one, and from the Sources area select Upload.
- Locate your data source and click Open.
- In the Sources area, select the source you wish to add to your project.
- The Data Type Option Wizard opens with the predetermined mapping.
- You can change the mapping of the variables and select the variables you wish to include/exclude from the project.
- When ready, click the Confirm and Add or Confirm Definition button.
- With the data source you want to include in your project selected, click OK to load.
4. Prediction Logic Algorithm
- Maximum element name length is less than 255 characters, and
- Distinct element count less than 65535, and
- Distinct element count less than 80% of sample size (current sample size is less than 65535).
- All data points in the measure are numeric.
- Anything that is not predicted as Standard or Measure, or
- Data is matching URL pattern, or
- Data is matching GUID pattern, or
- Data is matching Base64 like pattern.
Where to from here?
Learn more about data sources