I’ll Zoom in

As schools and universities have moved to online-only instruction over the past few weeks, I find myself empathizing with many of my friends and former colleagues who were forced to rapidly convert their class plans to more remote-friendly formats.

It occurred to me that I could maybe lighten their load by joining a remote class session to talk about my work and how the material they’re covering can be applied in the real world. Work like mine requires a wide variety of modeling tasks, data-wrangling, visualization, even some trigonometry!

To that end, if you are teaching a collegiate statistics or data science class, and have some interest in bringing me in for 15 or 20 minutes, I would love to hear from you!

Continue reading

Flattening the curve

I am not an epidemiologist, and the following does not constitute a prescriptive recommendation about responding to an epidemic. My main intent is to provide code which you can use yourself to experiment. If you are interested in learning more about the coronavirus and COVID-19, I recommend this overview from Our World in Data, as well as the CDC’s recommendations.

I was interested in replicating the “Flatten the curve” plots which communicate so clearly the need to delay the spread of coronavirus. In the process of building up data for the plot, I wrote a very basic virus transmission simulator, to generate graphs of “the curve”, which everyone is encouraged to flatten.

The simulation works as follows: on day t, check how many people were infected as of day t-1, and let everyone in the population randomly encounter some number of other people. Some of those encounters will be with infected people, and some of that subset of encounters will lead to an individual contracting the virus. After some number of days with the virus, an individual realizes they are infected, and self-quarantines. (See also this approach, using NetLogo.)

My interest is in depicting how each of three dimensions — social distancing, germ-conscientiousness, and the availability of tests — works to slow the spread of the virus and “flatten the curve”.

To model social distancing, I alternately allow individuals to encounter 100 people each day, or 50 people. For health precautions such as handwashing, facemasks, germ-spreading conscientiousness, and not touching one’s face, I set the probability of viral transmission to 0.005 or 0.01. Finally, I assume that individuals will self-quarantine after seeing symptoms on the 14th day of infection — after the incubation period — or that they will self-quarantine after being tested, which I arbitrarily have set to happen after 7 days. I also run 25 iterations of each parameter combination, to give some sense of variance.

The following graphs depict three different ways of displaying the same simulation results.

ftc_distancing

As this and the subsequent plots make clear, implementing measures such as social distancing, hand-washing, and testing — under the extremely simplified conditions of this model simulation — can flatten the curve by reducing the peak number of new infections per day, even as the duration of the epidemic increases. Note that, while many other versions of this chart depict a horizontal line indicating the capacity of our healthcare system, I omit any such line in the interest of avoiding appearing prescriptive. Please do not use this write-up to evaluate the relative merits of precautionary measures!

ftc_washing

Just like the implementation of social distancing — which in this simulation simply means fewer encounters with the rest of the population — precautions such as hand-washing and not touching your face can reduce the peak of the onset of infection, by reducing the probability of contracting the virus even if you are exposed to someone carrying it.

ftc_quarantine

Finally, and interestingly, this simulation suggests that the use of tests to remove infected individuals from the general population earlier than they otherwise would be is not effective in flattening the curve unless done in concert with social distancing and good hygienic practices.

Obviously, this is drastically simplified, even for a model. There are many ways you could extend on what I’ve started here, including:

  • Allowing variation within the population: instead of each individual encountering either 50 or 100 other people, they could encounter some random number, with modes around 50 and 100. One could also experiment with introducing large numbers of interactions, such as we might observe in a public sporting event. The same obviously goes for allowing variation in transmission probabilities, and the availability of/propensity to test.
  • Instead of at random, encounters could happen probabilistically within a network, in which proximate/more closely connected individuals are more likely to meet.
  • I have done nothing with mortality, and I’ve assumed that anyone who self-quarantines completely ceases to transmit the virus.

Update: I left this graph out of the first draft of this post, because I thought it was too potentially prescriptive and a little hard to read, but I think it does a good job of summarizing the results of the simulation:

ftc_cumulative

This is just the cumulative version of the curves plotted above, averaged across all simulations, which gives us some nice, smooth sigmoids. My interpretation of this plot is that halving the number of interpersonal encounters (distance) and halving the rate at which the virus transfers from an encounter with an infected individual (wash) have roughly similar effects (see the alignment between purple and orange, or gold and brown), perhaps unsurprisingly.

This plot reflects my earlier observation that test/~test (lime/teal) has a negligible effect, absent other measures, but I am astonished at the difference between test/~test (gray/pink) if applied in conjunction with both social distancing and hand-washing. I’m not really sure how to interpret what appears to be an order-of-magnitude difference.

Here is the code I used to simulate data and produce the plots in this post:

Eat & Run Boston

Whenever I get the opportunity travel, I attempt to discern the best dining options and look for promising places to run jog. I have my own preferences and process for each search, but I realized that I mostly live my normal life in the Boston area the same way: looking for good food and interesting places.

So, in the spirit of paying forward the many good recommendations I’ve received in the past, I made this post and accompanying map to highlight some of my favorite Boston-area places to see and eat.

Continue reading

Mapping Public Opinion: A Tutorial

At the upcoming 2012 summer meeting of the Society of Political Methodology, I will be presenting a poster on Isarithmic Maps of Public Opinion. Since last posting on the topic, I have made major improvements to the code and robustness of the modeling approach, and written a tutorial that illustrates the production of such maps.

This tutorial is in a very rough draft form, but I will post it here when it is finalized. (An earlier draft had some errors, and so I have taken it down.)

Isarithmic Maps of Public Opinion Data

As a follow-up to my isarithmic maps of county electoral data, I have attempted to experiment with extending the technique in two ways. First, where the electoral maps are based on data aggregated to the county level, I have sought to generalize the method to accept individual responses for which only zip code data is known. Further, since survey respondents are not distributed uniformly across the geographic area of the United States (tending to be concentrated in more populous states and around cities), I have attempted to convey a sense of uncertainty or data sparsity through transparency. Some early products of this experimentation can be seen below.

Party Identification

Isarithmic map of party identification from the 2008 CCES. Click to enlarge.

Continue reading

Ideological extremity in social networks

Update: Make sure to read Joshua Brustein’s nice write-up of our research at the New York Times, as well as Dr. Seth Masket’s impressions.

At the upcoming meeting of the Midwest Political Science Association, Aaron King, Frank Orlando, and I will be presenting a paper that investigates the determinants of success in Senate primary elections. We are primarily interested in whether voters are best modeled as voting by ideological proximity, or whether primary electorates strategically select candidates who offer a better chance of victory in the general election. Essentially, we are trying to identify whether ideological extremity is an advantage or a hindrance to primary electoral success.

Click for PDF version of the network graph (may be slow to load)

Unfortunately, estimating the ideology of many of these candidates can be problematic, given that many, for example, have not cast a roll-call vote which could be used in a NOMINATE-like scaling. Absent a more explicitly political record, we turn to the social networking/microblogging site Twitter, and collect data on the connections between elected officials and the mass public of Twitter users.

We use a nonmetric multidimensional scaling algorithm to estimate a space which represents users’ Twitter behavior, and find that the second dimension of that space correlates very well with Poole and Rosenthal’s NOMINATE scores for Senators and Representatives. Our main results can be seen in the figure below, and the paper is now available for download here.

Click for PDF version of the estimate summary dotplot.

Choropleth tutorial and regression coefficient plots

About two weeks ago, I gave short talk at Duke, wherein I presented a brief tutorial on creating choropleth maps in R using ggplot2. Since the code is already written, and the data and shapefiles already hosted online, I thought I would share the tutorial more widely.

A .ZIP file containing all the files necessary to follow the tutorial is available at: http://goo.gl/UrvQo.

The script goes very briefly through the loading of shapefiles and presidential election returns, and ends with the production of the choropleth below.

Click to enlarge

I don’t get into further customization of the map, as there are other more authoritative and complete sources for that. Further, much more detailed instruction on reading shape files are available from CSISS and NCEAS.

Included at the very end of the script is a brief example of a regression coefficient plot, something like a ggplot2 version of the coefplot() in Andrew Gelman‘s arm package.

I decided to develop the example into a function that takes as input a list() of model objects, and returns a ggplot2 object, which can be further modified by the user if so desired.

A coefficient plot comparing three models.

The script for the above plot can be found here. I also wrote a function that eschews arbitrarily discrete confidence bounds, instead attempting to suggest a sense of our confidence in the estimate without choosing a specific interval, since the difference between significance and insignificance is not itself significant. Code for the function is available here, and an example can be seen below.

Smoothed standard error coefficient plots

High Dimension Visualization in Political Science

Last Friday, I gave a talk illustrating some examples of high-dimension visualization in Political Science. I structured the talk around three arbitrary categories of information visualization: infographics (factoid-packed, inefficient), statistical graphics (argument-making, minimal), and data displays (multidimensional, deep). The slides below are long on examples and short on text, but should be mostly self-explanatory.

Header image from BibliOdyssey.

Electoral Marimekko Plots

To be reductive, visual displays of quantitative information might be reasonably categorized on a continuum between “data display” and “statistical graphics.” By statistical graphics, I mean a plot that displays some summary of or relationship amongst several variables, likely having undergone some processing or analysis. This may be as simple as a scatterplot of a primary independent variable and the dependent variable, a boxplot, or a graphical regression table.

In this reductive scheme, then, “data displays” present variables in raw form — for use in exploratory data analysis, or perhaps just to offer the viewer access to all of the data. Where “statistical graphics” might be best served by simplicity and minimalism in design, such that a single idea might be conveyed clearly, “data displays” will tend to be inherently complex, and require effort from both the creator and viewer to parse meaning from the available information.

Where statistical graphics are ideal for presenting conclusions, data displays are useful for generating ideas, and optimally, permitting the relatively rapid identification of relationships between multiple variables. On top of this, I might add that many of the more well-regarded data displays of recent note offer macro-level insight as well as the opportunity to ascertain specific details (for this, interactivity is often valuable, as in the internet-classic New York Times box office visualization).

As several recent posts suggest, I am interested in finding ways to successfully and clearly convey multidimensional data, and have been focusing on political data as it varies across geopolitical units and time. Here I offer an approach which departs from the spatial basis of other recent efforts in favor of allowing the position of graphical objects to convey other variables.

County Vote Spinogram (Turnout), 1992

County Vote Marimekko Plot, 1992, sorted by votes cast. Click for slideshow.

This type of plot is called, variously, a spinogram, a mosaic plot, or a marimekko — and is not dissimilar from a treemap with a different organizational structure (other examples). The utility of this plot type is that it can spatially convey four numeric variables (x position, y position, height, width), and color can be added to incorporate up to three additional variables (R, G, B). Further, there is a straightforward geometric interpretation of each cell: the areas of each (in this case, width/state turnout ×height/county proportion of state turnout) are directly comparable.

Unlike a stacked bar plot, the width of each column conveys information, permitting height to convey proportion rather than count. Further, columns and cells within columns can be sorted to express the ordering of variables of interest. In some ways, these can be seen as extreme reinterpretations of (Dorling) cartograms, in which not only the size and shape of political boundaries, but also their position, are distorted by other variables.

County Vote Spinogram (Dem 2PV), 1924

County Vote Marimekko Plot, 1924, sorted by Democratic share of the two-party vote. Click for slideshow.

In the plots above, cells are colored according to the strength of Democratic (blue), Republican (red), and other party (green) support, and counties whose turnout represents greater than 1% of the total turnout in an election are labeled.

I present two different layouts for the cells in each plot. The first arrays states left-to-right in order of the number of votes cast in an election, and sorts counties bottom-to-top in the same order. Thus, more populous states are on the right, and more populous counties are at the top of the plot. This arrangement allows the viewer to observe the effects of population density both within and across states, and may better facilitate tracking changes in county or state politics over time.

The second layout sorts states left-to-right, and counties bottom-to-top in order of the Democratic share of the two party vote (Dem Votes / (Dem Votes + Rep Votes)). Thus, more Democratic-leaning (relative to Republican) states are on the right, and counties that were more supportive of Democratic candidates are at the top. I believe that this arrangement makes it easier to discern overall trends in partisanship across time, as the total “sum” of red within a diagram is relatively easy to compare to the total “sum” of blue (and green).

I have attempted to make my R code fairly general, and it is available for download here, although it will obviously require some modifications for other applications. Our approaches differ, but another instructive example can be found at Learning R.

Isarithmic History of the Two-Party Vote

A few weeks ago, I shared a series of choropleth maps of U.S. presidential election returns, illustrating the relative support for Democratic, Republican, and third Party candidates since 1920. The granularity of these county level results led me to wonder whether it would be possible to develop an isarithmic map of presidential voting using the same data.

Isarithmic maps are essentially topographic or contour maps, wherein a third variable is represented in two dimensions by color, or by contour lines, indicating gradations. I had never seen such a map depicting political data — certainly not election returns, and thus sought to create them.

There is a trade-off between an isarithmic depiction versus a choroplethic depiction, in which a third variable is shown within discrete political boundaries. Namely, that though a politically-delineated presentation better facilitates the connection of the variable of interest to the level at which it was measured, the superimposition of geographically arbitrary political boundaries may cloud the existence of more general regional patterns.

Election-year maps can be seen in a slideshow here (and compared to the three-color choropleth maps here). The isarithmic depiction does an excellent job of highlighting several broad patterns in modern U.S. political history.

2008 Isarithmic MapFirst, it does a good job of depicting local “peaks” and “valleys” of partisan support clustered around urban areas. In the 2008 map, for example, Salt Lake City, Denver, Chicago, Miami, Memphis, and many other cities stand apart from their surrounding environs, highlighted by a relatively intense concentration of voters with distinct partisan leanings. In 1980, this method shows that though Reagan enjoyed broad support in California, the revolution was not felt in the Bay Area.

Comparison of these maps across time also underscores well-known political trends, but offers more resolution than state-level choropleths and greater clarity than county-level choropleths. Note the nearly inverted maps for 1924 and 2004, between which elections the Solid South went from solidly Democratic to solidly Republican. Interestingly, though that particular regional pattern has been remarkably consistent since 1984, the South favored a Democratic candidate as recently as 1980.

These patterns over time are even better observed in motion. Interpolating support between elections, I have generated a video in which these maps shift smoothly from one election year to the next. The result is the story of 20th century presidential politics on a grand scale, condensed into a little 0ver a minute of data visualization.

The video can also be seen at YouTube (I recommend the “expanded” or “full screen” view), or at Vimeo. The images were rendered at 1280 x 720 pixels, to allow the video to be seen in HD.

This animated interpretation accentuates certain phenomena: the breadth and duration of support for Roosevelt, the shift from a Democratic to a Republican South, the move from an ostensibly east-west division to the contemporary coasts-versus-heartland division, and the stability of the latter.

More broadly, this video is a reminder that what constitutes “politics as usual” is always in flux, shifting sometimes abruptly. The landscape of American politics is constantly evolving,  as members of the two great parties battle for electoral supremacy.

Appendix on creating the visualization

Using county-level presidential returns from the CQ Press Voting and Elections Collection, I associated each county’s support in a given election year for the Democratic and Republican candidates with an approximation of that county’s centroid in degrees latitude and longitude, using the shapefiles loaded with the package mapdata.

I then used simple linear interpolation to create a smoothed transition from election-to-election, creating 99-interelectoral estimates of partisanship for each county. Using a custom function and the interp function from akima, I created a spatially smoothed image of interpolated partisanship at points other than the county centroids.

This resulted in inferred votes over the Gulf of Mexico, the Atlantic and Pacific Oceans, the Great Lakes, Canada and Mexico — so I had to clip any interpolated points outside of the U.S. border using the very handy pinpoly function from the spatialkernel package.

Finally, I created a custom color palette, a modification of the RdBu scheme from Colorbrewer, using colorRampPalette(), and plotted the interpolated data along with state borders using the excellent ggplot2.

I would like to note that I would have preferred using the Albers Equal Area Conic projection, but settled on the default Mercator projection, as drawing the Albers map with ggplot2 was prohibitively time-consuming, given that I was generating 2,201 individual frames.

Choropleth Maps of Presidential Voting

Having always appreciated the red and blue cartograms and cartographs of geographic electoral preferences, such as those made available by Mark Newman, I sought to produce similar maps, but include information about support for non-“state-sponsored” parties, and to extend the coverage back in time.

I was able to find county-level presidential election returns going as far back as 1920, thanks to the CQ Press Voting and Elections Collection (gated). I converted the proportion of the vote garnered by Democratic, Republican, and “Other” parties’ candidates to coordinates in three-dimensional RGB color space, and used shapefiles from the mapdata package to plot these results as choropleth maps with ggplot.

This slideshow requires JavaScript.

It is interesting to observe these maps in a series, which gives historical context to the Red State/Blue State narrative. Most obviously, there is a significant shift in the geographic center of Democratic support, from a concentration in the southeast to the present equilibrium, localized on each coast and near the Great Lakes.

Among these 23 elections, landslide victories, such as Roosevelt over Landon in 1936, Johnson over Goldwater in 1964, Nixon over McGovern in 1972, and Reagan over Mondale in 1984, tend to stand out for their monochromaticity.

Also intriguing are the elections featuring substantial support for third-party candidates. Most of these are individuals who were had a strong support base in a specific region of the country, such as La Follette in the northwest, and Thurmond and Wallace in the deep south. Ross Perot’s run in 1992 is unique here, as his relatively broad geographic base of support results in a map that runs the gamut to a greater degree than any others.

Click on the image below to see a full screen version of the slideshow above, or to download any of the individual maps as PNGs.

Goldwater Click for slideshow/download

K-Means Redistricting

U.S. Congressional districts are today drawn with the aim of maximizing the electoral advantage of the state’s majority party, subject to some constraints, including compactness (which can be measured in numerous ways) and a “one person, one vote” standard. What if, instead of minimizing population variance across districts, we aimed to minimize the mean distance between each resident and their district center?

To do so would be to employ something very much like k-means clustering, and produces some interesting results.

Using the population and latitude and longitude coordinates of the centroid of each (2000) census tract (a block-level reproduction was deemed too computationally intensive for the present purposes), I produced a geospatial k-means clustering for several states. Each tract was represented by its centroid as a point, weighted by population (which required a custom function, as the default kmeans() function in R does not appear to permit weighted points.

Since each run of the k-means algorithm begins with a random set of points, I replicated the function several thousand times, attempting to find a maximum inverse Herfindahl-Hirschman index of district population — the “effective number of districts,” as it were. For North Carolina, as shown below, I was able to find a maximum END of 12.17 for thirteen districts, which is a fairly even distribution of population.

Click to enlarge

Interestingly, there is still substantially wider variation in population than would be permitted under the current system. The least populous district houses fewer than 400,000 individuals, and the most populous, nearly a million. These figures are much more extreme than the extant least- (Wyoming) and most- (Montana) populous districts.

Population by district:

#  Population
1  398492
4  398896
8  423710
10 525860
2  533812
13 537417
3  618040
6  662092
11 676221
12 767249
7  785668
5  786448
9  935408

However, the district boundaries (here hastily drawn by use of chull()) are not characterized by the ragged edges and elongated shapes often seen in the existing plans.

I was interested in what the k-means-based plan would do to district partisanship, and decided to use population density as a rough proxy for local party affiliation. The distribution of population per square mile for each North Carolina census tract is shown below, with a vertical line indicating the median.

I decided to characterize any tract with greater-than-median population density as Democratic, and less-dense tracts as Republican. This resulted in the following proportion of Democrats residing in each district as plotted above:

#  % Dem.
1  0.253
4  0.265
10 0.336
8  0.350
6  0.383
7  0.474
13 0.510
3  0.589
11 0.615
9  0.628
12 0.671
2  0.673
5  0.837

As the table indicates, full turnout under such a plan would result in the election of 6 Republicans and 7 Democrats. Below, I plot “Democratic” tracts in blue and “Republican” tracts in red, scaled according to their population. Urban centers are easily identifiable. Note the difference between this plan and the current actual plan, which draws a single elongated district (the twelfth) parallel to Interstate 85.

Click to enlarge

Below, I replicate the same process for the state of Texas, generating 32 districts. One problem with the k-means algorithm is that larger states, or those with greater variance in population density, tend to generate districts with wide variations in population and inequalities of representation. The Texas plan below depicts a district with fewer than 200,000 residents and one with over 2 million. The Effective Number of Districts (maximum after 100 attempts) is a mere 21.58. Interestingly, the the district “partisanship” split is 22/10 majority Republican/Democrat — not far from the current 20/12 split. In this simulated redistricting, there are 10 districts in which the majority of residents live in higher-than-the-state-median density areas: four each in Houston and Dallas-Fort Worth, one each around San Antonio and Austin.

Click to enlarge

The slideshow below depicts the incremental steps of the weighted k-means algorithm toward convergence around alternate districts for Ohio, beginning with set of random centers, and eventually minimizing collective distances from local centroids.

Finally, I used the same algorithm to investigate what a the continental United States would look like if states were partitioned according to the k-means rule. Clicking on the image below will bring you to an interactive, scalable map of the U.S. with 48 alternate states and inferred partisanship. Instead of initializing with random centers, I started the k-means algorithm with the population centroids of the actual states, and allowed the algorithm to converge to a minimizing partition. Many of these alternative states are more compact but familiar versions of the originals, although this new plan does realize Plunkitt’s Fondest Dream.

Click to enlarge

Dimensionality in Congress

Update: A revised version of this paper, given as a poster at the 2011 Summer Meeting of the Society for Political Methodology, is available here (PDF).

 

In collaboration with Jacob Montgomery and John Aldrich, I am interested in understanding the relationship between observed (measured) and unobserved (true) dimensionality in Congress. In an ongoing project, we employ Monte Carlo simulations of legislative voting behavior, followed by dimensionality-reducing scaling techniques, to identify the parameters under which we might observe roll-call scalings similar to those we find in empirical data. Our findings suggest that the typical account of the dimensionality of ideology in Congress, “one-and-a-half dimensions,” may arise under a large variety of “true” dimensionality settings.

One of our papers, given at the 2010 APSA, is available for viewing or download here [PDF].

The slides from our paper presentation may be seen below.

A thousand words

While in Washington, DC for the 2010 APSA meeting, I gave an invited talk at the Optical Technology Division of the National Institute of Standards and Technology, on techniques for visualizing data with large numbers of observations in multiple dimensions. My thesis, in essence, is that the value of a graphic is a function of the degree to which it is necessary and and clearly conveys information in an efficient manner. The slides, which consist primarily of visual examples, can be seen below.

Incidentally, the arc diagram of connections between Congressional Twitter users was featured in Miller-McCune magazine.

Regionalization via network-constrained clustering

I was interested in applications for a clustering algorithm that works along a network, identifying contiguous partitions, and thought that a good place to start would be identifying regional patterns in electoral preferences.

This project represents the early products of this inquiry. I chose county-level data, as counties are small enough to make “interesting” regions, and the presidential vote data was available back to 1920.

The poster linked below was given at the 2010 Political Networks Conference at Duke University, and describes the project in somewhat greater detail.

Click to embiggen.

I am particularly enamored of the Obama/McCain color-coded network graph, as an abstracted version of the red/blue/purple cartograms produced in the wake of recent U.S. national elections. I also like the 12-cluster solution (middle left), as the regions produced are large enough to be considered general, but appear to cluster around recognizable politico-geographic features. In general, I have been very pleased with the results produced by this network-constrained clustering algorithm.

Partisan structure in online social networks

As part of a continuing project which makes use of data from the social microblogging service Twitter, I presented a paper at the 2010 MPSA in which I derived inferences about elite partisanship and ideology from only the patterns of connections between Twitter users. That is, given only knowledge of which of Twitter’s millions of users were following a set of Congressmen and other political elites, I am able to accurately predict both the partisanship and ideology (as measured by NOMINATE) of those Congressional elites.

This is surprising, because it implies that Twitter users’ preferences, individually somewhat uninformative, actually contain interesting and reliable information when aggregated. The paper for the talk may be seen here [PDF], and the slides are embedded below.

There is massive potential for the use of Twitter (and other online sources) to aid in our understanding of mass political behavior, largely by virtue of the volume of voluntarily expressed sentiment and expression that can be found. I am currently pursuing this line of inquiry in collaboration with Aaron King and Frank Orlando.

Party control and political agendas

At the 2010 annual meeting of the Midwest Political Science Association, I presented a paper in which I used a time-series clustering algorithm to identify eras in Congress based on the substantive nature of the Congressional agenda. I found that it was possible to correctly identify changes in party control in the Senate and House, based only on the change points in the time series of agenda focus in multidimensional space, with a reasonable degree of accuracy. Further, I found that different majority parties had statistically distinguishable agenda patterns, and that full knowledge of the time and attention devoted to each major topic predicts, with a high degree of accuracy, the majority-holding party in both chambers.

The paper on which the talk was based can be found here.

The slides for the talk may be seen below. I found that in explaining the idea of clustering agendas over time, it was useful to make an analogy to identifying seasons based on weather patterns, as depicted on slides 12-15.

Racial attitudes and candidate evaluation

In November, 2009, Candis Watts and I were invited to the Center for the Study of Race, Ethnicity and Gender in the Social Sciences (REGSS) at Duke University, to present our work on the effect of racial attitudes on candidate evaluations. Our findings, generally, are that racial attitudes have a significant impact on voter evaluations of candidates, but only when activated by the presence of a major-party minority candidate.

The paper on which our talk was based can be read or downloaded here [PDF], and the slides presented in concert with our talk may be seen below.

Incidentally, the R script for the very nice correlation matrix ellipses can be found at the R Graph Gallery.