Row-based file formats, also known as Entity-attribute-value data (EAV), organize data by record, keeping all the data associated with a record in the same row. Row based files are the traditional way of organizing data.

Project Owner

1. Data Structure

Row-based formats compatible with Harmoni require three data sets that are stored in tabular form in your media of choice, such as text files, SQL databases, AWS, Azure, GCP, NoSQL databases, to name a few.

Harmoni can directly connect to data in this structure via direct connections to a tabular based application - refer to Supported Data sources.

The three data sets required are:

a. ID table - contains the entity/respondent IDs

b. Dictionary table - defines the attributes

c. Data table - contains the data values

a. ID Table

The respondent ID table contains one field, which is a unique ID for every entity/respondent in the data. It is important that the data is sorted.

b. Dictionary Table

A tabular form of the dictionary. The order of items in the dictionary determines the order that the questions and answers appear when the dataset is loaded into Harmoni.

The dictionary has 3-5 fields:

Question Label
Answer Label
Question Type
- tv - text variable
- sr - single response categorical response
- mr - multiple response categorical response
- nv - numerical value
- wt - weight variable
- img - image variable
- coord - hotspot variable
Item/Question Label Description (optional)
Value/Answer Label Description (optional)

The last two fields in the dictionary are optional.

When there are no answer labels, i.e., for those questions with no pre-coded answers, e.g., tv, nv, wt, img, the answer label should be null or empty.

Following is a sample table illustrating a dictionary with all possible question types. In this example, question types with numerical answers (e.g., exact age), text answers (e.g., email address), weight (numerical value but marked as weight), single-response questions with pre-coded answers, and multiple-response questions with pre-coded answers, are coded as nv, tv, wt, sr, and mr, respectively. Any user-chosen codes are acceptable, so long as the codes are unique.

The dictionary table needs to be sorted by Question Label then Answer Label in the preferred order.

Question Label	Answer Label	Question Type	Question Label Description (optional)	Answer Label Description (optional)
Respondent ID (text)		tv
Q11	Male	sr	Gender	Males
Q11	Female	sr		Females
Marital Status	Married	sr
Marital Status	Widowed	sr
Marital Status	Divorced	sr
Marital Status	Separated	sr
Marital Status	Never married	sr
Weight		wt	Main Weight
Exact Age		nv
News Sources	Internet	mr	Sources Used
News Sources	Newspapers	mr	Sources Used
News Sources	News magazines	mr	Sources Used
News Sources	Television	mr	Sources Used
News Sources	Radio	mr	Sources Used
Uploaded Image	<url>	img
Hotspots	{“image”:”<url>”,”clicks”:[{x:14,y:12},{x:21,y:32}]}	coord

Using the two optional columns adjusts the display label for the items in Harmoni. The compulsory columns continue to populate the source label.

For Q11 in the table above, here is an example of the axis and element labels with and without the optional fields.

Row-based Dictionary table example.png

c. Data Table

The Data table contains the answers to each question asked of a respondent in an Entity-Attribute-Value (EAV) structure.

The table should be sorted by respondent ID then by question label.

Included in the Data are the following fields:

Respondent ID (the Entity)
Questions Label (the Attribute)
Answer (the Value)

For example:

Respondent ID	Question Label	Answer
0	Respondent Id (text)	0
0	Gender	Male
0	Marital Status	Divorced
1	Respondent id (text)	1
1	Gender	Male
1	Marital Status	Never married

So long as the respondent, dictionary, and data tables can be obtained, the actual data can be stored in any format the user chooses.

2. Data Creation

Typically, these 3 datasets will be populated using SQL commands. The following examples illustrate some sample SQL commands to obtain the respondent count, dictionary table, and data table.

1) caseIDs (respondent IDs)
SELECT distinct CaseIdCol FROM OurDataTable ORDER BY CaseIdCol;

2) Dictionary
SELECT DISTINCT ItemLabelCol, ValueLabelCol, ItemTypeCol, ItemLabelColDesc, ValueLabelColDesc FROM OurDictionaryTable order by ItemLabelColOrder, ElementLabelColOrder;

3) Data
SELECT DISTINCT CaseIdCol, ItemLabelCol, ValueLabelCol FROM OurDataTable ORDER BY CaseIdCol, ItemLabelCol;

Row-based Data storage.png

3. Relationships

This section focuses on relationships between datasets including linking and multi-level linking.

The CaseId table identifies the unit of count of a dataset.
The CaseIdCol value in the first field of the Data Table is used to identify each unique record in the Data and matches back to the CaseIDs table.
If you want to use the CaseIdCol values as a link, they must also exist in the Data and Dictionary tables. It is these entries that create the variables used for linking.
Links define the relationships between datasets. The link may be to a different unit of count (multi-level) or it may be to another dataset with the same unit of count (concatenated).
Datasets at the same level, but with different units of count are referred to as siblings, while datasets at the same level and with the same unit of count are concatenated.
The relationships determine the levels. The root level, by definition, has no parent; all other levels must have a parent.
Datasets that link into the root level are referred to as children, and their children are referred to as grand-children, etc. For example, a survey of respondents may capture data on store visits, and purchases made on those visits. Each Store Visit is a child of a Respondent and the parent of a Purchase, while each Purchase is a child of a Store and the grand-child of a Respondent.
The theory is the same for any dataset relationship; the caseIDs table identifies the unique records in the dataset, and variables in the Data table provide the linking variables.

Example of multi-level structure with siblings and concatenations:

Respondents is the master dataset
Each dataset has its own unique CaseIds table
Each dataset includes variables for each of the ids required for linking
The Respondent Weights dataset concatenates to Respondents using Respondent id
The Visits dataset links to Respondents using Respondent id
The Purchases dataset links to Visits using Visit id
The Transport dataset links to Visits using Visit id

Respondents

The master root level dataset (Respondents) has a caseIDs table that identifies each Respondent in the Data

Respondent ID
0
1

The caseIDs are represented in the Respondent ID variable of the Data table to provide a variable for other data sets to link into.

Respondent ID	Question Label	Answer
0	Respondent ID (text)	0
1	Respondent ID (text)	1
0	Gender	Male
1	Gender	Male
0	Marital Status	Divorced
1	Marital Status	Never married

The Dictionary table may or may not include labels, depending on whether the data table contains codes or labels.

Question Label	Answer Label	Answer Type	Question Label Description (optional)	Answer Label Description (optional)
Respondent ID		tv
Gender	Male	sr
Gender	Female	sr
Q3	1	sr	Marital Status	Married
Q3	2	sr	Marital Status	Widowed
Q3	3	sr	Marital Status	Divorced

Respondent Weights

The dataset for the Weights that are concatenated into the Respondents has its own caseIDs table that also identifies each Respondent in the Data that has a weight,

Respondent ID
0
1

The Respondent ID variable in the Weights Data table identifies the Respondent each Weight record is concatenated to.

Respondent ID	Question Label	Answer
0	Respondent ID	0
1	Respondent ID	1
0	Weight	0.87654321
1	Weight	0.12345678

The Dictionary table may or may not include labels, depending on whether the data table contains codes or labels.

Question Label	Answer Label	Answer Type
Respondent ID		tv
Weight		wt

Visits

The dataset for the Visits at the next level down (child) a caseIDs table that identifies each Visit in the Data.

Visit ID
11
12
13

In the Data Table, the caseIDs for the visit level are represented in the Visit ID variable of the Data table so any potential children can link into it. The inclusion of a Respondent ID variable provides the link to its parent

Visit ID	Question Label	Answer
11	Respondent ID	0
12	Respondent ID	0
13	Respondent ID	1
11	Visit ID	11
12	Visit ID	12
13	Visit ID	13
11	Location	Supermarket
12	Location	Convenience Store
13	Location	Supermarket

The Dictionary table may or may not include labels, depending on whether the data table contains codes or labels.

Question Label	Answer Label	Answer Type
Respondent ID		tv
Visit ID		tv
Location	Supermarket	sr
Location	Convenience Store	sr
Location	Supermarket	sr

Purchases

The Purchases dataset contains details of each item purchased during the Visit by a Respondent. The Purchase level is a grandchild of Respondents, and has a CaseIDs table identifying each unique Purchase made in a Visit by a Respondent

Purchase ID
0
1
2
3
4

This grandchild Data links to the child-level Visit data via the Visit ID and an optional Purchase ID field may be included for any great-grandchildren to link into. These are all in the Data Table.

Purchase ID	Question Label	Answer
0	Visit ID	11
1	Visit ID	11
2	Visit ID	12
3	Visit ID	12
4	Visit ID	12
5	Visit ID	13
0	Purchase ID	0
1	Purchase ID	1
2	Purchase ID	2
3	Purchase ID	3
4	Purchase ID	4
5	Purchase ID	5
0	Beverage	1
1	Beverage	2
2	Beverage	1
3	Beverage	3
4	Beverage	6
5	Beverage	4

The Dictionary table may or may not include labels, depending on whether the data table contains codes or labels.

Question Label	Answer Label	Answer Type	Answer Label Description (optional)
Visit ID		tv
Purchase ID		tv
Beverage	1	sr	Coffee
Beverage	2	sr	Tea
Beverage	3	sr	Energy Drink
Beverage	4	sr	Flavored Milk
Beverage	5	sr	Sports Drink
Beverage	6	sr	Bottled Water

Transport

Transport is at the same level as Purchases, i.e. they are siblings. It includes variables relating to all the transport options used to achieve a Visit by a Respondent, this links into the Visit data, and is at the same level as the Purchases.

The Transport grandchild level has a CaseIDs table identifying each unique Respondent-Visit-Transport:

Transport ID
100
101
102

The transport data links into the Visit dataset using the Visit ID variable, It has no children to link into it, so the unique Transport ID that exists in the CaseIDs table doesn’t need to exist as a variable with responses in the Data Table:

Transport ID	Question Label	Answer
100	Visit ID	11
101	Visit ID	12
102	Visit ID	13
100	Transport method	Bus
101	Transport method	Walk
102	Transport method	Car
100	Travel Companions	Family
101	Travel Companions	None
102	Travel Companions	Friend
102	Travel Companions	Family

The Dictionary table may or may not include labels, depending on whether the data table contains codes or labels.

Question Label	Answer Label	Answer Type
Visit ID		tv
Visit ID		tv
Location	Supermarket	sr
Location	Convenience Store	sr
Location	Supermarket	sr
Travel Companions	None	mr
Travel Companions	Friend	mr
Travel Companions	Family	mr
Travel Companions	Other	mr

4. General Consideration

The approach is the same for all linking, i.e., the ID needs to exist as a variable in both the parent and child Data to be used as a link.
Concatenating datasets with the same counting level is currently only available at the root level in Harmoni
Any ID that does not exist in the CaseIDs table is ignored, even if it appears in the data table.
Any variable not found in the dictionary table will be ignored, even if it exists in the data table.
Any variable listed in the dictionary table that has no corresponding response in the data table will be output with a count of zero.
There is no reasonable limit on the number of levels that can be linked together
There is no reasonable limit on the number of sibling datasets that can link into the same parent.
Img and Coord variable types are not currently supported by the FCF2X converter

Where to from here?

Learn more about Harmoni

In this article

1. Data Structure

a. ID Table

b. Dictionary Table

c. Data Table

2. Data Creation

3. Relationships

Respondents

Respondent Weights

Visits

Purchases

Transport

4. General Consideration

Where to from here?

Related articles