Mapping California’s Politically Vulnerable Communities

Research Methodology

 

The UC Davis California Civic Engagement Project (CCEP) has created maps of California communities that illustrate the relationship between voter turnout rates and social, economic and health conditions, as measured by several indicators.  These maps appear to demonstrate a relationship between low voter turnout and poor community outcomes.  Created at the state and regional level, the maps cover Los Angeles County, the Bay Area, the Sacramento Region, San Diego County, and the San Joaquin Valley.  To make these maps, the CCEP utilized data from the American Community Survey, the California Department of Education, the California Office of Environmental Health Hazard Assessment, and the UC Davis Center for Regional Change’s Regional Opportunity Index (ROI).  The specific indicators and calculation methods used are described below.  For an overview of the “Mapping California’s Politically Vulnerable Communities Project,” please see the project’s page on the CCEP’s website.

 

Measures of Community Outcomes: Domains of Education, Economic, Health and Civic Access

Education

1.      High-school Graduation Rate

The high-school graduation rate is the percentage of the 9th grade cohort that graduated from high school four years later.  (Source: California Department of Education)

Calculation: The graduation rate indicator is the number of students in a 9th grade cohort that graduated from high school within four years, divided by the size of the cohort, multiplied by 100, and averaged over three years.  The graduation rate is calculated at the district level, combining data from all schools in unified and high school districts, including traditional schools, alternative schools, continuation high schools and community day schools.  Schools that closed prior to July 1 of the final school year included in the reference period are excluded from the calculation. This data is from the 2013 Regional Opportunity Index. Districts are mapped to census tracts using the MABLE/Geocorr12: Geographic Correspondence Engine.  If a tract is covered by more than one district, the value assigned to the tract is the weighted average of each district's graduation rate, where the weights are determined by the portion of the tract population covered by each district. This variable was calculated by the UC Davis Center for Regional Change Regional Opportunity Index 2013.

2.      College-Educated Adults

The percentage of college-educated adults is the percentage of adults age 25 and over who have completed a post-secondary certificate/degree. (American Community Survey, U.S. Census, Table B15002)

Calculation: The number of adults age 25 or older who have completed an associate degree or higher, divided by the number of adults age 25 or older, multiplied by 100.  The category of "associate degree" includes people whose highest degree is an associate degree, which generally requires 2 years of college level work and is either in an occupational program that prepares them for a specific occupation, or an academic program primarily in the arts and sciences. This variable was calculated by the UC Davis Center for Regional Change Regional Opportunity Index.

 

Economic Factors

1.      Poverty

The poverty rate measures the percentage of people with income under 200% of the federal poverty level (Source: American Community Survey, 5 Year Estimate 2009-2013, U.S. Census, Table C17002). For information about Poverty Thresholds for 2014 by Size of Family and Number of Related Children under 18 Years, see the document for 2014 here.

Calculation: The percentage of the tract population (for which poverty status was determined) with income under 200% of the federal poverty level, divided by the total population, multiplied by 100.

For example, a four-person family with two children earns an income of $30,000.  The poverty threshold for 2014 for a family of this size is $24,008. Their Ratio of Income to Poverty threshold is:

            30,000/24,008 = 1.25

Each person in this household is counted as having this Ratio of Income to Poverty.  This ratio indicates that the family earns an income above the poverty line and is therefore not in poverty.  This project, however, defines poverty as a Ratio of Income to Poverty that is under 2.00.  This is because the official poverty line does not take into account the high cost of living in California.

If this same family of four earned $48,016, its Ratio of Income to Poverty would be 2.00 (48,016/24,008), and this family would not be determined to be poor according to the definitions used in this project.

Note that this metric at times indicates a higher rate of poverty than would be found with a metric using median household income.  For example, if poverty is defined as a median household income of $48,875 and under, a family of five with three children making less $56,000 a year would qualify as “in poverty” by this project’s definition (Ratio of Income to Poverty = 56,000/28,252 = 1.98).  However, it would not qualify as “in poverty” by a definition that uses the median household income and does not adjust for household size.

2.      Employment Rate

The employment rate is the percentage of civilian adults age 20-64 years who are employed.  (Source: American Community Survey, U.S. Census, Table B23001)

Calculation: The number of civilian adults age 20-64 years who are employed divided by the number of civilian adults age 20-64 years in the labor force, multiplied by 100.

 

Health

1.      Disadvantaged Communities

CalEnviroScreen 2.0 is a screening methodology that can be used to help identify California communities that are disproportionately burdened by, and vulnerable to, multiple sources of pollution.  Developed by the Office of Environmental Health Hazard Assessment (OEHHA) on behalf of the California Environmental Protection Agency (CalEPA), the score includes multiple measures of both pollution burdens and sensitive population characteristics.  The score does not represent a rate or ratio of any kind, but identifies disadvantaged communities by assigning values from 0 to 100 (100 being the most disadvantaged and 0 the least). The CalEnviroScreen 2.0 is available at the census tract level, and was last updated in October of 2014. Reports and supporting documents are available at: http://oehha.ca.gov/calenviroscreen.

2.      Teen Birth Rate

The teen birth rate is the three-year average of percentage of all births that were to teens (Source: California Department of Public Health Birth Statistical Master Files; University of California Davis Center for Regional Change Regional Opportunity Index).

This variable was calculated by the UC Davis Center for Regional Change Regional Opportunity Index.

Calculation: Counting multiple births as one birth event, the indicator is the number of births to women under the age of 20 during the three-year reference period, divided by the total number of births in that same time period, multiplied by 100.  The indicator is inverted for the index by subtracting it from 100.  Birth records were geocoded to the census tract of the mother’s residence.  Approximately 4% of addresses could not be geocoded; these records were dropped.  Values in tracts with fewer than 25 births in that time period are considered unreliable and should be interpreted with caution. This data is from the 2013 Regional Opportunity Index.

3.      Years of Potential Life Lost

Years of Potential Life Lost (YPLL) is measured by determining the three-year average of the years of potential life lost rate per 1,000 population under the age of 65.  This metric is used to represent areas with greater premature mortality. (Source: California Department of Public Health Death Statistical Master Files; Census 2010, SF1 Table P12; UC Davis Center for Regional Change Regional Opportunity Index)

Calculation: YPLL is a measure of premature death, or the number of years of life lost among those who died before a predetermined age.  The age of 65 was chosen to assess the number of prime working years lost, assuming an average retirement age of 65.  YPLL is calculated by subtracting the age at death from 65 for all deaths that occurred before the age of 65, and ignoring those at or above age 65, and summing the results.  Death records were geocoded to the census tract using the descendant’s residential address.  Less than 4% of addresses could not be geocoded; these records were dropped. This variable was calculated by the UC Davis Center for Regional Change Regional Opportunity Index

For this indicator, the YPLL was calculated over fixed-age categories, using the midpoint of the category’s age range to determine YPLL for that category, then multiplying by the number of deaths in that age range. For example, for deaths to persons between ages 40 and 45, the midpoint is 42.5, and YPLL for all persons in this group is 65 – 42.5 = 22.5.  This figure is multiplied by the annual average number of deaths in this age group over the three-year reference period to get the total YPLL for this age group. This step is repeated for all age groups, summing the YPLL over all age groups to arrive at the total YPLL.  The YPLL rate is this total divided by the population under the age of 65.  The result is then divided by 1,000 to arrive at the YPLL rate.  This data is from the 2013 Regional Opportunity Index.

 

Civic Access

1.    Eligible Voters

Eligible voters are the percentage of population eligible to vote (i.e., who are adult U.S. citizens). (Source: American Community Survey, 5-Year Estimate 2009-2013, U.S. Census).

Calculation: The sum of the total population of adult citizens divided by the total population, divided by 100.

Latino Percent of Eligible Voters:

This represents the percentage of eligible voters (adult U.S. citizens) who are Latino (Source: American Community Survey, 5-Year Estimate 2009-2013, U.S. Census).

Calculation: The sum of the total population of Latino adult citizens divided by the total population of adult citizens, divided by 100.

Asian-American Percent of Eligible Voters:

This represents the percentage of eligible voters (i.e., adult U.S. citizens) who are Asian-American (Source: American Community Survey, 5 Year Estimate 2009-2013, U.S. Census).

Calculation: The sum of the total population of Asian-American adult citizens divided by the total population of adult citizens, divided by 100.

2.      Limited English Proficient

This represents the lowest quartile for the percentage of the population age 18-64 years that speaks only English or speaks English "well" or "very well" (Source: American Community Survey, 5 Year Estimate 2009-2013, U.S. Census, Table B16004).

Calculation: Sum of the number age 18-64 years who speak only English, who speak English “well” and those who speak English “very well”, divided by the total population age 18-64, multiplied by 100 to convert the decimal to percent. This percentage describes English proficiency in the census block groups. The lowest quartile for English proficiency is represented as areas in which there is the least English proficiency (i.e. where there is limited English proficiency).

 

Map representation of variables

The precincts used to display registered voter turnout on the maps utilize the registration precincts created by the Statewide Database. These are geographic units created for statistical merging purposes that may not conform exactly to registration precincts as designated by the County Registrar of Voters. To learn more about the distinctions between these two representations of precincts, view methodology from the Statewide Database here.

Critical thresholds for demographic variables were selected by quartiles.  Quartiles separate the data into four equal groups according to the distribution of values of a particular variable.  For example, the map on Limited English Proficiency displays areas of ‘low English proficiency.’  The definition of ‘low’ is given by the lowest quartile of the data, that is, the lowest rate of English proficiency experienced by a quarter of the population for the selected region.  The threshold for all indicators was selected in this way, and the highest or lowest quartile of the data depended on the particular measure (e.g., the high rate of poverty was defined by the highest quartile of the data, and limited English proficiency was defined by the lowest quartile of the data for English proficiency).

 

Selection Criteria

Within each domain of data, we looked for measures that would allow us to assess performance in that domain, and that met the following three criteria: Availability and Currency; Geographic Scale; and Reliability.  The following sections provide more information on each of these criteria.

1.      Availability and Currency

The first criterion is that the data be readily available and updated regularly, so that changes can track change over time.  For instance, we chose to use ACS data rather than decennial census data, as it is updated annually, even though it is slightly less reliable due to smaller sample sizes (approximately one of every 8 persons is sampled for ACS 5-year estimates, compared to 1 in 6 for Census long-form questions. See section on “Managing Census Estimates with High Margins of Error”).

Almost all of the indicators we selected are updated annually, if not more frequently.

2.    Geographic Scale

Variables are represented at the census tract or census block group level, allowing users to assess opportunity at a relatively small geographic scale.  Most of the census-sourced indicators are available at the tract level.

Data from the California Department of Education is school-based, and while the location of schools is known, the residential location of students who attend those schools is not known.

In order to allocate school-level data to census tracts, the three schools closest to the population-weighted center of each tract were identified, and calculated the mean of the indicator for those three schools, and assigned the mean to the tract.  School proximity was determined by calculating the distance from the population-weighted tract center to the geocoded location of each school using ArcGIS software.  Note that this calculation is based on straight-line distance, and does not account for road placement or geographical features, which may lengthen actual travel distance.  Moreover, this does not account for school district boundaries, so it is possible that one or more schools included in the tract’s average are located in districts other than the one that covers the tract center.

A slightly different tactic was used for the high school based indicators.  Because student mobility is much higher at the high-school level, it would be inappropriate to use the same method of calculating tract-level indicators from school-level data.  Instead, the district averages were calculated for all high schools in each district, and applies the district average to all tracts that reside within the district.  Almost all tracts reside completely within a single district; in the few cases where this is not true, a weighted average of the district means was used for the districts that cover the tract.  The weights are derived from a district-tract crosswalk obtained from the University of Missouri’s MABLE/GeoCorr12 Version 1.2 and are simply the percent of the tract’s population covered by each school district in the tract.

3.   Reliability

Our third criterion is that the data be reasonably reliable, meaning that we have a relatively high level of confidence that the indicators are accurate representations of what is being measured.  Reliability can be an issue when tracking events in small populations or in surveys based on small sample sizes. In these cases, estimates may not be reliable due to sample error or random fluctuations across time or space.

For this reason, we use Census ACS-5 year estimates rather than 1- or 3-year estimates. This effectively increases the sample size by aggregating all responses over the 5-year period, increasing the sampling fraction from about 1/40 to about 1/8. The resulting estimates have smaller standard errors, though it is important to remember that the longer time frame tends to smooth out short-term fluctuations and may mask rapidly occurring changes.

 

Managing Census Estimates with High Margins of Error:

In certain circumstances the population estimates provided by the American Community Survey (ACS) have a large margin of error. The CCEP has defined a threshold that represents relatively high uncertainty regarding the population estimates, and displays these areas on the static maps using a light gray dot pattern.

The CCEP calculated the margin of error (MOE) for the derived estimates and proportions (e.g., the percent of the population that is employed) and calculated the coefficient of variation using the derived MOE.  The coefficient of variation (CV) provides a measure of the relative amount of sampling error that is associated with a sample estimate.  The CV is calculated as the ratio of the standard error (SE) for an estimate to the estimate itself, and is expressed as a percentage.1   The CCEP selected a CV of 40% as the level of relatively high sampling error, and displays these areas on the maps using a light gray dot pattern.  These areas are identified in the map legend as census tracts or census block groups with high margins of error.

See the U.S. Census Bureau document Appendix 3 page A-14 for the formula used to calculate margins of error for derived estimates1, and the same appendix pages A-14 and A-15 for the formula used to calculate the MOEs for the derived proportions.  The CV is calculated by dividing the MOE BY 1.645 to get the SE, and then dividing the SE by the derived estimate and converting to percent.  See page A-13 of the same document for more details.

 

Methods for Index of Political Vulnerability as Visualized in the CCEP ‘Hot Spot Maps”

Hot Spot Maps identify communities with overlapping socioeconomic and environmental disadvantages, as well as low voter turnout, that together result in political vulnerability.  These maps present the CCEP’s Index of Political Vulnerability.  The index includes education, economic, and environmental health measures commonly associated with community disadvantage and related to low voter turnout. These maps provide a clear visual representation of how and where political vulnerability is most manifest in the Golden State.

Hot Spot Maps were created for each individual geographic region (Sacramento Region, Bay Area, Los Angeles Area, San Diego County, and the San Joaquin Central Valley).  The ESRI tool Hot Spot Analysis (Getis-Ord Gi*) was used to flag areas with statistically significant high and low values, for example, census block groups that have a significantly high or low rates of poverty.  Statistically significant is generally defined as being greater than two standard deviations from the mean of local zones.  The distance that is considered a local zone was determined individually for each geographic region.  This tool was used on each of the nine variables to assign coded values to areas with significantly high or low values in each geographic region for each variable.  Areas without population were excluded from the analysis.  Variables were rescaled so that all characteristics indicating political vulnerability received a high score, while characteristics indicating lack of vulnerability received a low score.  Thus, for example, a census block with a high rate of poverty received a high score, but a census block with a low employment rate also received a high score.  All coded data layers were combined within a geographic region to create a single polygon layer, and the coded values were summed to create a composite score for each polygon.  Sliver polygons were excluded and remaining polygons were converted to centroids. 

The ESRI Inverse Distance Weighting (IDW) tool was used to create a continuous generalized data layer from the representative points that summarizes the degree of political vulnerability.  For example, an area that experiences a high rate of poverty, a high disadvantaged community score, a high teen birth rate, high premature mortality, a low registered voter turnout rate, a low employment rate, a low English proficiency rate, a low high school graduation rate, and a low rate of college education will have the highest score possible and appear as the most red/orange color on the map.  Given the different units of geographies available for these variables (census tracts, census block groups, and precincts) and the artificiality of discrete polygon borders, a more generalized visualization of these variables was preferred.

 

References:

1.      U.S. Census Bureau, “A Compass for Understanding and Using American Community Survey Data,” Appendices 1-3. (https://www.census.gov/content/dam/Census/library/publications/2008/acs/ACSGeneralHandbook.pdf)