Data may be collected at multiple levels, for example, one of the files contains information about respondents and a separate file contains information about their consumption occasions. If this is the case, you can append data across multiple levels and specify the unit of count.
Append allows you to add new variables to respondents or cases within a project when information of common respondents is captured in separate data sources.
In this article
1. Link Sources
You can append data sources when you create a project for the first time or append to a project that already contains the primary source. Regardless, the first step is to upload or connect to the data sources that are required in your project.
Once loaded, you need to identify the primary or parent source, and using the three-dot menu on the source tile, select the Link option.
Selecting the link option opens the append data wizard
The first step is to link your sources based on their hierarchy. The source you have identified as the primary or parent source appears on the right column of the wizard. All other sources appear on the left.
You need to select each of the secondary sources you want to append to your primary source. Once selected, on the right column, you will notice the linking hierarchy of sources.
Linking Hierarchy
After selecting the secondary sources, they will appear underneath the parent source as children.
In a hierarchical database model:
- The primary source or parent can have multiple child records.
- At the same time, each child can become the parent source for another child.
- However, each child record can only have one parent.
To create a different hierarchy, if that is what the data set requires, you need to drag and drop the child source onto the new parent. In the example below, Occasions becomes the Drinks parent.
- Ticking the orange box of the second child source (on the left side called Drinks) will move the source to the right side at the same level as the first child source (it will sit directly under the Occasions and be a child of Main).
- Dragging the second child source (Drinks) and dropping it onto the first child source (Occasions) creates another level in the hierarchy. The first child source (Occasions) is a parent to the second child source (Drinks). Drinks sits slightly indented below Occasions to indicate the relationship.
Unit of Count
When using append to link your data sources, you need to take into account what each record in each of the data files represents, whether it be a respondent (person), or an occasion, etc.
If the unit of count of the secondary source is:
-
The same as the primary source, usually respondents, you don't need to enter any information.
Note: The process for appending sources with the same unit of count is slightly different. Please refer to the Append Data Sources article if all sources have the same unit of count. - Different from the primary source (e.g., occasions), you need to define the unit of count for each of the sources before proceeding to the next step.
To define the unit of count you need to enter the word that best describes the counts for each level. You can make these counts plural and the option to provide a singular version of the count will be available within the project tree when loading the project. Learn more about multi-level units of count.
In this example:
- People - for the main file which contains information about respondents.
- Occasions - for the file which contains the information about consumption occasions
- Drinks - for the file which contains the information about the drinks consumed on each occasion.
If you have three levels of data, the append wizard won't allow you to continue to the next step until you define the relevant units of count.
But if you have two levels of data, the wizard will allow you to proceed without entering the Unit of Count. It is critical to enter the Unit of Count if you are creating a multi-level database.
Link Sources Example
In this example, data has been collected at multiple levels.
- The main file contains information about the respondents (people). This is the primary or parent source.
- The occasions file contains information about the consumption occasions (occasions) of these people. This is a secondary or child source, where the parent is the main file.
- The drinks file contains information about the drinks consumed (drinks) in each of the consumption occasions. This is also a secondary or child source, but in this case, the parent is the occasions file.
After selecting the link option on the primary file, work through the wizard.
- Select the secondary source (s).
- Ensure they are in the correct hierarchy. If you need to create an additional hierarchy, drag and drop the child source into the relevant parent.
- Define the unit of count for each of the sources.
- Proceed to the Next step.
2. Define - Data Types for Delimited Sources
Harmoni automatically maps variablesLearn more about Harmoni Variable Types. when data sources contain an inherent dictionary.Meta-data to guide interpretation Learn more about source dictionaries.
When this is the case, the append wizard displays a message indicating there are no delimited sources and that you can carry on to the next step.
3. Append - Select a Unique Identifier
When appending variables to existing respondents with the same unit of count (e.g., records in all sources represent respondents), you can match records based on the order or a unique identifier. Learn more
However, when you have multiple units of count (i.e. multi-level and you have correctly filled in the Units of Count) you only have the option to match based on unique identifiers. That is, how the parent source will be linked to the child source.
Harmoni will ask you to select the primary or parent sources' identifier as well as the secondary or child sources. Harmoni will then match the records based on this identifier.
Unique Identifier
- A unique common identifier must exist across the sources. For example, in the parent level, there is a unique respondent ID. In the child level, the unique respondent ID must also be present. Although unique, if a respondent has multiple items in the child level, the ID will be repeated as each item must link back to the respondent in the parent level.
- The unique identifier:
- Can be either a measure (numeric) or verbatim (text), but it must be the same type across the sources you want to append.
- Must not have blanks, else Harmoni will flag them as duplicates if more than one is found.
- If the identifier is unique and can be matched, the sources will append.
- If the identifier is not unique in the parent file (i.e., repeated across multiple records), the warning: "We have found duplicated record identifiers in your sources." In this case, you will need to insert unique identifiers in the primary level and ensure IDs are matched correctly to the second level.
- Orphaned records are ignored.
Example of Respondent and Occasion level data files:
Note the Respondent data file unique ID is not duplicated, but there are duplicated unique IDs in the Occasion data file as respondents 120 and 124 have multiple occasions.
If a third level Drinks data file was added for drinks consumed at each occasion, the Occasion ID becomes the unique ID between these two files. The Occasion IDs in the Drinks data file may be repeated if a respondent drank more than one drink at any of their occasions.
Selecting Unique Identifiers
The source you have identified as the primary or parent source appears on the left column of the wizard. Child sources display on the right.
When you have more than two levels (i.e., a secondary source becomes the parent of another source), you can use the parent source drop-down to choose the relevant source. In the same way, if you have multiple children to link to a parent source, you can select using the drop-down to choose the relevant source.
An * in either the parent or the child source columns indicates there are sources that still need to be linked. The wizard won't allow you to continue unless all unique identifiers have been defined.
Example
- The parent source (Main) will link to the child source (Occasions) though the unique identifier LinkID. LinkID exists in the occasion file to link back to its corresponding respondent.
- The parent source (Occasions) will link to the child source (Drinks) through the unique identifier LinkID2. LinkID2 exists in the drinks file to link back to its corresponding consumption occasion.
If the identifier is unique and can be matched, the sources will append and once the append process is complete you will be taken back to the PROJECTS area.
Where to from here?
Learn more about Harmoni: