I have been arguing that the structure of the data is an overlooked but important aspect of any empirical investigation. Knowing the structure is part of “looking” at your data, a time-honored recommendation. This recommendation, however, is usually stated in conjunction with discussions about graphing data and ways to visualize relationships, trends, patterns, and anomalies in your data. Sometimes, you are told to print your data and physically look at it as if this is should reveal something you did not know before. This type of visualization may help identify large missing value patterns or coding errors, although there are other more efficient ways to gain this insight. Packages in R and a platform in JMP, for example, identify and portray missing value patterns. If the data set is large, especially with thousands of records and hundreds of variables, physically “looking” at your data is impractical to say the least. Visualization, whether by graphs or a physical inspection, while important, will not reveal anything about the types of structure I am concerned with, structure in which cases are subsets of larger entities so that cases are nested under the larger entities; there is a hierarchy. This can only come from knowing what your data represent and how they were collected. The data collection could be by cluster sampling or it could be just an inherent part of the data.

Examples of nested structures abound. The classic example usually referred to when nested or multilevel data are discussed is students in classes which are in schools which are in school districts. The hierarchical structure is inherent to the data simply because students are in classes which are in schools which are in districts. A hierarchical structure could result, by the way, if cluster sampling is used to select schools in a large metropolitan district and then used again to select classes. Whichever method is used, the nesting is self-evident which is why it is usually used to illustrate the concept. Examples for marketing and pricing include:

- Segments
- Stores
- Marketing regions
- States
- Neighborhoods
- Organization membership
- Brand loyalty

Consumers are nested in segments; they are nested in stores; they are nested in neighborhoods, and so forth. This nesting must be accounted for in modeling behavior. But how?

The basic OLS model, so familiar to all who have taken a Stat 101 course, is not appropriate. Sometimes analysts will add dummy variables to reflect group membership (or some other form of encoding of groups such as effects coding), but this does not adequately reflect the nesting, primarily because there may be key drivers that determine the higher-level components of the structure. For example, consider a grocery store chain with multiple locations in urban and suburban areas. Consumers are nested within the stores in their neighborhood. Those stores, and the neighborhoods they serve, have their own characteristics or attributes just as the consumers have their own. The stores in urban areas may, for instance, have smaller footprints because real estate is tighter in urban areas but have larger ones in suburban areas where real estate is more plentiful. Stores in Manhattan are smaller than comparable stores in the same chain located in Central Jersey. The store size determines the types of services that could be offered to customers, the size of stock maintained, the variety of products, and even the price points.

The basic OLS model will not handle these issues. An extension is needed. This is done by modeling the parameters of the OLS model where those models reflect the higher level in the structure and these models could be functions of the higher-level characteristics. There could be a two-stage model specified, in a simple case, as:

Notice how the parameters have a double script indicating that they vary. The Stage II model which defines the parameters could be:

The new parameters in the Stage II equations are called hyperparmeters. The error terms in the Stage II Equations are called macro errors because they are at the macro level. They could be specified as:

Notice the double subscripts on the variance terms. A covariance between the intercept and slope errors and the may exist and represented by The group-specific parameters have two parts:

- A “fixed” part that is common across groups; and
- A “random” part that varies by group.

The underlying assumption is that the group-specific intercepts and slopes are random samples from a normally distributed population of intercepts and slopes.

A reduced form of the model is:

Now this is a more complicated, and richer, model. Notice that the random component for the error is a composite of terms, not just one term as in a Stat 101 OLS model. A dummy variable approach to modeling the hierarchical structure would not include this composite error term which means that the dummy variable approach is incorrect; there is a model misspecification – it is just wrong. The correct specification has to reflect random variations at the Stage I level as well as at the Stage II level and, of course, any correlations between the two. Also notice that the composite error term contains an interaction between an error and the Stage I predictor variable which violates that OLS Classical Assumptions. A dummy variable OLS specification would not do this.

Many variations of this model are possible:

- Null Model: no explanatory variables;
- Intercept varying, slope constant;
- Intercept constant, slope varying; and
- Intercept varying, slope varying.

The Null Model is particularly important because it acts as a baseline model — no individual effects.

I will have more to say about this model and gives examples in future posts, so be sure to come back for more.