Blog Post 5

Kelly Olmos

2024/10/07

In this week’s blog post I will explore how demographics provide insight into the electorate and election outcomes.

## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   Democrat Republican
##   Democrat        746        336
##   Republican      346        660
##                                           
##                Accuracy : 0.6734          
##                  95% CI : (0.6528, 0.6935)
##     No Information Rate : 0.523           
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.3456          
##                                           
##  Mcnemar's Test P-Value : 0.7304          
##                                           
##             Sensitivity : 0.6832          
##             Specificity : 0.6627          
##          Pos Pred Value : 0.6895          
##          Neg Pred Value : 0.6561          
##              Prevalence : 0.5230          
##          Detection Rate : 0.3573          
##    Detection Prevalence : 0.5182          
##       Balanced Accuracy : 0.6729          
##                                           
##        'Positive' Class : Democrat        
## 

##Replicating Kim & Zilinsky (2023)

Kim & Zilinsky (2023) co-authored a paper where they concluded that demographic information (five key attributes) about a voter can accurately predict a voter’s choice with an accuracy of 63.9%.

The replication above corroborates Kim & Zilinsky’s findings. The 95% confidence interval ranges from 65.28% to 69.35%. I personally found this interesting because campaigns reach out to likely voters based on demographic information. It makes sense for a campaign with limited resources to make an educated guess on who to target based on a voter’s demographics. Not only that, but when election predictions are being made, the demographics of the electorate are frequently cited.

Demographic attributes like race are constantly in the headlines. In one CBS article, the headline reads “Kamala Harris turns to her faith in outreach to Black voters” One quote from the article highlights the importance of reaching out to Black voters, “Her focus underscores the importance for her in activating and persuading Black voters, the core of her party’s electorate, by going to a stronghold within the community.”

In a country where race impacts the quality of life and experience of the voter, it makes sense for race to indicate a partisan preference. Especially when we consider political polarization in a two-party system.

Call: glm(formula = voted_2020 ~ sii_age_range + sii_gender + sii_race + sii_education_level, family = binomial, data = tx_fl)

Log oddsProbabilities
Intercept-1.21411380.2289740
Age Range 30-390.78602560.6869773
Age Range 40-491.03057210.7370268
Age Range 50-641.20709430.7697844
Age Range 65-741.40146920.8024169
Age Range 75+1.04397840.7396169
Male-0.28274940.4297799
Expansive-0.34380690.4148850
Unknown11.57342120.8282708
African-American0.18721000.5466663
Hispanic-0.14630790.4634881
Native American0.27908250.5693213
Other0.70971480.6703381
Unknown2-0.62468100.3487176
Caucasian0.54521110.6330238
Some College or Higher0.73027510.6748656
Completed College1.02508560.7359620
Completed Graduate School0.67705790.6630817
Attended Vocational/Technical0.47252910.6159822
sii_gendercountpercentage
Expansive50.0021983
Female11323049.7832453
Male10762047.3167257
Unknown65912.8978307
sii_age_rangecountpercentage
18-294192518.43295
30-393933317.29334
40-493763416.54635
50-645389223.69442
65-742970313.05936
75+2495910.97359

##Analyzing the Texas Voter File

To explore what insights we can obtain from voter demographics, I will analyze the Texas voter file.

I ran a binomial logistic regression to estimate the relationship between whether a voter voted in 2020 and age range, race, gender, and education level. These are four out of the five attributes Kim and Zilinsky used for their random forest model. I don’t have information about each voter’s income in the file.

The binomial logistic regression models how the probability of success varies with the independent variables and helps determines whether the changes are statistically significant. In this case, success is defined as the person voting. The binomial logistic regression produces the logarithm of the odds as shown in the table. It’s not simple to interpret and visualize the logarithm of the odds so I converted into probabilities.

Essentially what I can takeaway from this regression is that the intercept is the baseline log-odds of voting for the reference group. In this case it is the age range 18-29, females, Asian voters, and the completed high school categories. The Unknowns in the table represent when the voter file had marked unknown race and gender for the voter. Since the unknowns do not represent a specific demographic I will ignore their associated values in my analysis.

The intercept of -1.21 is the log odds of voting for the reference groups, which are age group 18-29, female gender, Asian, and completed high school. The log-odds can be converted to probabilities by raising e to the log-odds and and dividing it by 1 + e raised to the value of the log-odds.

The age range odds and probabilities demonstrate that older citizens tend to vote in higher rates. In this sample of Texas voters, the age range with the highest turnout is 65 to 74. This is consistent with the available literature on age and voter turnout.

Males and the other genders in this Texas voter file have negative log-odds which means that every other category in this sample have lower odds of voting compared to female voters. This too is also consistent with the available data on voter turnout (Gender differences in voter turnout)

All races except Hispanic have a positive coefficient, indicating that Hispanics have lower odds and probabilities of voting than the baseline group (Asians). It is unclear what demographics are represented under the Other category but the next group with highest odds are Caucasian.

When looking at education, it is consistent with available data and literature that those who have completed college and graduate school are the groups of registered voters with the highest odds of voting.

I have also included some graphs that provide descriptive statistics about the Texas electorate. The majority of registered voters in Texas is Caucasian, making up about 53% of registered voters. The next largest group is Hispanic voters, making up just under 30% of registered voters.

Most of the Texas electorate completed high school and college. In regard to gender, there is an almost 50-50 split between female and male registered voters. The age range group leading in registered voters is 50-64, followed by 18-29. Although the 18-29 age range is known for being the age range with the lowest voter turnout.

##Simulation and Prediction for this Week

partytotal_electors
Democrat319
Republican219

My prediction for this week is based on a simple linear model that uses polling data, two-party lagged vote shares, and turnout data. 10,000 simulations are then run to account for variability in turnout. One of the benefits of simulating with turnout variability is that it provides a range of plausible turnout percentages which captures how changes in voter turnout across states can influence the election outcome. The final prediction is that Kamala Harris wins 319 electors and Trump wins 219. This model predicts that Harris will win in all the swing states which is an interesting result that I am hesitant about accepting but provokes thoughts about how to refine this model for future iterations.

##References

“Gender Differences in Voter Turnout.” Center for American Women and Politics, cawp.rutgers.edu/facts/voters/gender-differences-voter-turnout. Accessed 28 Oct. 2024.

Kim, Seo-young Silvia, and Jan Zilinsky. “Division Does Not Imply Predictability: Demographics Continue to Reveal Little about Voting and Partisanship.” Political Behavior, vol. 46, no. 1, 20 Aug. 2022, pp. 67–87, doi:10.1007/s11109-022-09816-z.

Navarro, Aaron. “Kamala Harris Turns to Her Faith in Outreach to Black Voters.” CBS News, CBS Interactive, www.cbsnews.com/news/kamala-harris-faith-black-voter-outreach/. Accessed 27 Oct. 2024.