K-Means Redistricting
U.S. Congressional districts are today drawn with the aim of maximizing the electoral advantage of the state’s majority party, subject to some constraints, including compactness (which can be measured in numerous ways) and a “one person, one vote” standard. What if, instead of minimizing population variance across districts, we aimed to minimize the mean distance between each resident and their district center?
To do so would be to employ something very much like k-means clustering, and produces some interesting results.
Using the population and latitude and longitude coordinates of the centroid of each (2000) census tract (a block-level reproduction was deemed too computationally intensive for the present purposes), I produced a geospatial k-means clustering for several states. Each tract was represented by its centroid as a point, weighted by population (which required a custom function, as the default kmeans() function in R does not appear to permit weighted points.
Since each run of the k-means algorithm begins with a random set of points, I replicated the function several thousand times, attempting to find a maximum inverse Herfindahl-Hirschman index of district population — the “effective number of districts,” as it were. For North Carolina, as shown below, I was able to find a maximum END of 12.17 for thirteen districts, which is a fairly even distribution of population.
Click to enlarge
Interestingly, there is still substantially wider variation in population than would be permitted under the current system. The least populous district houses fewer than 400,000 individuals, and the most populous, nearly a million. These figures are much more extreme than the extant least- (Wyoming) and most- (Montana) populous districts.
Population by district:
# Population 1 398492 4 398896 8 423710 10 525860 2 533812 13 537417 3 618040 6 662092 11 676221 12 767249 7 785668 5 786448 9 935408
However, the district boundaries (here hastily drawn by use of chull()) are not characterized by the ragged edges and elongated shapes often seen in the existing plans.
I was interested in what the k-means-based plan would do to district partisanship, and decided to use population density as a rough proxy for local party affiliation. The distribution of population per square mile for each North Carolina census tract is shown below, with a vertical line indicating the median.
I decided to characterize any tract with greater-than-median population density as Democratic, and less-dense tracts as Republican. This resulted in the following proportion of Democrats residing in each district as plotted above:
# % Dem. 1 0.253 4 0.265 10 0.336 8 0.350 6 0.383 7 0.474 13 0.510 3 0.589 11 0.615 9 0.628 12 0.671 2 0.673 5 0.837
As the table indicates, full turnout under such a plan would result in the election of 6 Republicans and 7 Democrats. Below, I plot “Democratic” tracts in blue and “Republican” tracts in red, scaled according to their population. Urban centers are easily identifiable. Note the difference between this plan and the current actual plan, which draws a single elongated district (the twelfth) parallel to Interstate 85.
Click to enlarge
Below, I replicate the same process for the state of Texas, generating 32 districts. One problem with the k-means algorithm is that larger states, or those with greater variance in population density, tend to generate districts with wide variations in population and inequalities of representation. The Texas plan below depicts a district with fewer than 200,000 residents and one with over 2 million. The Effective Number of Districts (maximum after 100 attempts) is a mere 21.58. Interestingly, the the district “partisanship” split is 22/10 majority Republican/Democrat — not far from the current 20/12 split. In this simulated redistricting, there are 10 districts in which the majority of residents live in higher-than-the-state-median density areas: four each in Houston and Dallas-Fort Worth, one each around San Antonio and Austin.
Click to enlarge
The slideshow below depicts the incremental steps of the weighted k-means algorithm toward convergence around alternate districts for Ohio, beginning with set of random centers, and eventually minimizing collective distances from local centroids.
Finally, I used the same algorithm to investigate what a the continental United States would look like if states were partitioned according to the k-means rule. Clicking on the image below will bring you to an interactive, scalable map of the U.S. with 48 alternate states and inferred partisanship. Instead of initializing with random centers, I started the k-means algorithm with the population centroids of the actual states, and allowed the algorithm to converge to a minimizing partition. Many of these alternative states are more compact but familiar versions of the originals, although this new plan does realize Plunkitt’s Fondest Dream.
Click to enlarge
Comments
5 Responses to “K-Means Redistricting”Trackbacks
Check out what others are saying...-
[…] K-Means Redistricting « David B. Sparks […]
Beautiful images!
Whatever code you’ll publish, showing how to create them will probably be welcomed :)
Cheers,
Tal
I’ve written something very much like a weighted-k-means redistricting solver in C++
It turns out that weighted-k-means isn’t good enough to find adequate results for all congressional and state legislature district map solving problems across all states in the US. A more detailed solver is required sometimes, and can generally produce “better” results by the measure I’m optimizing for.
Results at http://bdistricting.com/
source at http://code.google.com/p/redistricter
Based on the population standard deviations your site reports, your method seems to do a much better job of approaching the one person, one vote standard.
K-means, it appears, would be more appropriate if our minimand was proximity to district center — but this seems a less significant target than equality in voting.
An interesting exercise, it’s nice to see the results. But I’m afraid that you’ll need to go down to at least the census block level to get anywhere near to equal distribution. Oddly, it is required that Congressional Districts be exactly equal in population, which means that in some cases an apartment building might be split between districts!
The general requirements that you would need to include are “compactness, contiguousness, geographical boundaries, or political subdivisions”, as described in this article on the Voting Rights Act. Your k-means method only addresses the first two requirements (and the fourth only as an exterior boundary). This interview with a legislator describes the process in Massachusetts and also mentions other considerations, such as “communities of interest” and, of course, keeping sitting representatives in separate districts that are somewhat similar to their current ones.
Finally, any redistricting must be able to take into account other legal requirements, specifically that minorities are fairly represented, e.g. by (near-)majority-minority districts. This is especially true in states like North Carolina and Texas which are still subject to the Voting Rights Act. The linked article explains the origin of that narrow I-85 district in NC that you mentioned.
So you’ll have to figure out how to bring all of these other factors into play. I think a small computing cluster might be necessary!