Comparative Survey Analysis

Laron K. Williams, University of Missouri


A central component of the big data revolution is explosion of polling and public opinion data. The result is that scholars are now able to track how the public’s preferences and behaviors change over time and in different contexts. This is the third course in the Modeling and Analyzing Public Opinion track aimed at providing students the tools needed to use cumulative sources of public opinion data (such as the Comparative Study of Electoral Systems and Latinobarometer) or assemble their own dataset from various sources. This course builds on the insights from the other two courses in the track that teach how to design and implement surveys, as well as how to analyze existing data. In order to analyze survey data across time and space, we first discuss proper procedures for harmonizing public opinion data so that the data are comparable across contexts. We then discuss a variety of obstacles that are unique to the process of combining surveys (such as missing data, varying question wording, etc), and strategies for overcoming these obstacles. The course provides guidance as to how to analyze public opinion data across time and space, either collectively or via meta-analysis. On the last day we explore methods for presenting basic patterns and visualizing the sources of public opinion change over time and in different contexts. Though it is not a prerequisite, we encourage students to have a particular research topic or existing data source in mind so that they can directly apply the topics learned in class to their own research.


This course runs January 27-31,2020.




Day 1 – Introduction to Advanced Survey Analysis
This introductory seminar provides an overview of advanced survey analysis. We review existing public opinion data sources, either as context-specific sources (such as the Cooperative Congressional Election Studies), cumulative sources (such as the Comparative Study of Electoral Systems), or repositories for separate surveys (such as the Roper Center for Public Opinion Research). We identify a variety of studies from the social sciences that illustrate the benefits of using aggregated public opinion.
Day 2 – Harmonizing public opinion data across time and space
If scholars intend to combine separate sources of polling data into a cumulative dataset, then harmonization is required. Harmonization is the process of making the question wording, response sets, and weighting consistent so that the data can be analyzed over time and in different contexts. We discuss appropriate protocols for ensuring that the data are comparable, and procedures for ensuring transparency and replicability.
Day 3 – Inferential challenges
Even though the separate surveys might be designed properly so that the responses are representative of the underlying population of interest, this does not ensure that longitudinal or cross-sectional inferences will be accurate. In this class we identify inferential challenges that are unique to aggregated or combined public opinion datasets. This includes inconsistent question wording, varying weighting procedures, and missing data, to name a few. While careful harmonization procedures can limit the harm of some problems, other problems require careful solutions.
Day 4 – Analyzing public opinion data across time and space
We begin by highlighting the wealth of additional inferences one can make from aggregated public opinion. We then divide the methods used to analyze aggregated public opinion into those that analyze all the varying data sources with one model (perhaps using multi-level modeling) and those that produce survey-specific estimates of effects and then perform meta-analysis. We discuss the appropriateness of one method over the other, based on the characteristics of the data and the inferences sought by the scholar.
Day 5 – Data presentation and visualization
A recent trend in the social sciences centers on providing meaningful and easy-to-interpret quantities of interest from one’s model. This is especially important when using cumulative public opinion datasets since the importance of some variables (such as gender) might change over time or in different contexts. Also, meta-analysis often provides a large number of coefficients, which means that scholars must go beyond simply providing a table with coefficients. The final day builds on the insights from the other days and demonstrates how to calculate and provide visual depictions of novel public opinion patterns.
A background in multiple regression is required and a basic understanding of survey design and analysis is suggested.