Below is a list of our public QSIDE datasets. You can read more about the context and methodology for data collection in the associated studies included below. All data sets are available for use in the file section on OSF. Please cite as referenced on the connected OSF repository. If you have any questions about any of the data, or on its uses, please contact tyrone.bass@qsideinstitute.org or chad@qsideinstitute.org.
Table of Contents
Criminal-Legal System Justice
Federalist Society’s Public Engagements Dataset
The Federalist Society, a leading conservative legal organization, has played a significant role in shaping the American judiciary for decades. Despite its influence, comprehensive empirical data on the organization remains scarce. We address this gap by systematically documenting 20,205 public events hosted by the Society from 1984 to 2024, with substantive coverage from 2007 onward. This dataset is structured to facilitate analysis of event trends, co-speaking networks, and thematic shifts over time. To ensure data integrity, we performed validation, deduplication, and cleaning, resulting in a well-structured, high-quality dataset. You can read more about our analysis and collection methodology in the study linked below.
- Federalist Society’s Public Engagements Dataset
- Study: A structured dataset of the Federalist Society’s public engagements
Opportunity Youth Dataset
Opportunity youth—individuals aged 16 to 24 who are disconnected from education and employment due to significant barriers—constitute a sizable yet underserved demographic whose marginalization leads to substantial social and economic costs. This dataset centralizes fragmented data from sources such as the American Community Survey, the Adoption and Foster Care Analysis and Reporting System, FBI Crime Data, and Bureau of Justice Statistics incarceration reports to provide the foundational data to evaluate four key indicators—disconnected youth, youth in foster care, justice-impacted youth, and children with an incarcerated parent. You can read more about our analysis and collection methodology in the study linked below.
- Opportunity Youth Dataset
- Study: Using Data Science for Social Good: Mapping Opportunity Youth
- Visualization: DATA2LIFT-OpportunityYouth Dashboard
JUSTFAIR Dataset
In the United States, the public has a constitutional right to access criminal trial proceedings. In practice, it can be difficult or impossible for the public to exercise this right. We present JUSTFAIR: Judicial System Transparency through Federal Archive Inferred Records, a database of criminal sentencing decisions made in federal district courts. We have compiled this data set from public sources including the United States Sentencing Commission, the Federal Judicial Center, the Public Access to Court Electronic Records system, and Wikipedia. With nearly 600,000 records from the years 2001—2018, JUSTFAIR is the first large scale, free, public database that links information about defendants and their demographic characteristics with information about their federal crimes, their sentences, and, crucially, the identity of the sentencing judge. You can read more about our analysis and collection methodology in the study linked below.
- JUSTFAIR Dataset
- Study: JUSTFAIR: Judicial System Transparency through Federal Archive Inferred Records
- Visualization: JUDGEFAIR
Judicial Sentencing (Before and After COMPAS) Dataset
Judicial and carceral systems increasingly use criminal risk assessment algorithms to make decisions that affect individual freedoms. While the accuracy, fairness, and legality of these algorithms have come under scrutiny, their tangible impact on the American justice system remains almost completely unexplored. To fill this gap, we investigated the effect of the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) algorithm on judges’ decisions to mandate confinement as part of criminal sentences in Broward County, Florida. This dataset compiles a novel dataset of over ten thousand court records from periods before and after the implementation of COMPAS in Broward County.
- Judicial Sentencing (Before and After COMPAS) Dataset
- Study: Algorithms in Judges’ Hands: Incarceration and Inequity in Broward County, Florida
New York City Jails COVID Discharge Policy Dataset
During the early stages of the COVID-19 pandemic in 2020, Mayor Bill de Blasio ordered the release of individuals incarcerated in New York City jails who were at high risk of contracting the disease and at low risk of committing criminal re-offense. Using public information, we construct and analyze a database of nearly 350,000 incarceration episodes in the city jail system from 2014—2020, paying special attention to what happened during the week of March 23—29, 2020, immediately following the mayor’s order. You can read more about our analysis and collection methodology in the study linked below.
- New York City Jails COVID Discharge Policy Dataset
- Study: New York City jails: COVID discharge policy, data transparency, and reform
Educational Equity
Student-Teacher Race-Match Dataset
Same-race teachers improve academic outcomes for minoritized students, yet most never encounter a teacher who shares their racial background. Existing research highlights the benefits of race-matched instruction but often focuses on aggregate workforce diversity, obscuring students’ structural opportunities for contact. We introduce race-match sufficiency—the probability that a student has at least one same-race teacher—and estimate it using administrative data from 8,691 Texas public schools serving 5.4 million students. This repository contains both the data and code used to create our model. You can read more about our analysis and collection methodology in the study linked below.
Mathematical Sciences Editorial Board Dataset
The dataset includes the gender representation of 435 editorial boards of journals in the mathematical sciences. Group variations within the editorships are identified by specific journals, subfields, publishers, and countries that significantly exceed or fall short of this average. To enable the creation of this dataset, we developed a semi-automated method for inferring gender that has an estimated accuracy of 97.5%. You can read more about our analysis and collection methodology in the study linked below.
- Mathematical Sciences Editorial Board Dataset
- Study: Gender Representation on Journal Editorial Boards in the Mathematical Sciences
- Visualization: Gender Makeup of Statistics Journal Editors
Signatories on Public Letters on Diversity in Mathematical Sciences Dataset
In its December 2019 edition, the Notices of the American Mathematical Society published an essay critical of the use of diversity statements in academic hiring. The publication of this essay prompted many responses, including three public letters circulated within the mathematical sciences community. Each letter was signed by hundreds of people and was published online, also by the American Mathematical Society. This dataset includes the demographics of each signatory, which we infer using a crowdsourcing approach. You can read more about our analysis and collection methodology in the study linked below.
- Signatories on Public Letters on Diversity in Mathematical Sciences Dataset
- Study: Comparing demographics of signatories to public letters on diversity in the mathematical sciences
- Visualization:
Temporal Dynamics of Faculty Hiring in Mathematics Dataset
University faculty hiring networks are known to be hierarchical and to exacerbate various types of inequity. Still, a detailed, historical understanding of hiring dynamics lacks in many academic fields. This dataset focuses on the field of mathematics, including over 120,000 records from 150 institutions over seven decades to elucidate the temporal dynamics of hiring doctoral-granting (DG) faculty at the individual and departmental levels. You can read more about our analysis and collection methodology in the study linked below.
- Temporal Dynamics of Faculty Hiring in Mathematics Dataset
- Study: Temporal dynamics of faculty hiring in mathematics
Representation in the Arts
Artist Diversity in Museum Collections Dataset
This dataset comes from the first the first large-scale study of artist diversity in museums. While previous work has investigated the demographic diversity of museum staffs and visitors, the diversity of artists in their collections has remained unreported. By scraping the public online catalogs of 18 major U.S. museums, deploying a sample of 10,000 artist records comprising over 9,000 unique artists to crowdsourcing, and analyzing 45,000 responses, we infer artist genders, ethnicities, geographic origins, and birth decades. You can read more about our analysis and collection methodology in the study linked below.
Race and Gender Representation Across Creative Fields Dataset
This dataset comes from the first comprehensive, comparative, empirical study of intersecting identities across creative fields (art, fashion, film and music). This data includes 4700 creative contributors within contemporary art, high fashion, box office film, and popular music in the United States. The starting points for our study are contributors from four domains: contemporary artists whose works appear in certain high-profile museums; fashion designers whose clothing has been shown in major fashion shows; cast and crew of top-grossing box office films; and popular musicians who have appeared on top charts. This data was collected for the years 2018-2019. The data for contemporary artists comes from the Artist Diversity in Museum Collections Dataset. Demographic data was collected through crowdsourcing using Amazon Mechanical Turk. You can read more about our analysis and collection methodology in the study linked below.
- Race and Gender Representation Across Creative Fields Dataset
- Study: Race- and gender-based under-representation of creative contributors: art, fashion, film, and music
- Visualization:
Environmental Justice
Superfund Site Race and Remediation Dataset
Superfund is a federal program established in 1980 to manage the cleanup of hazardous waste sites across the United States. Given the health and economic costs borne by people living near these sites, any demographic disparities within the Superfund program are issues of environmental justice. This dataset includes 1,688 Superfund sites across the country and the demographic data of the areas surrounding them. You can read more about our analysis and collection methodology in the study linked below.
Health Equity
U.S. Firearm Markets Dataset
In spring 2020, firearm purchases surged across the United States as the COVID-19 pandemic triggered widespread disruption (Levine and McKnight, 2020). This spike—and a second in June 2020 following the George Floyd protests—was historically large and remarkably uniform, even across politically diverse states (Lang and Lang, 2020). These observations prompt a broader question: do local firearm markets behave independently or respond to shared national forces? This repository contains both the data and code used to conduct our principal component analysis (PCA). You can read more about our analysis and collection methodology in the study linked below.
