Getting Started¶
Accessing the Data¶
All analysis-ready datasets are CSV files in shared_resources/data/. They are not tracked in git (too large), but the documentation, codebooks, and quality reports are.
To get the data files, contact Laurence Wilse-Samson at NYU Wagner.
Quick Start (Python)¶
import pandas as pd
# Agricultural census — district-level panel
ag = pd.read_csv('shared_resources/data/ag_census/geography_panel_1948_1994.csv')
# TEBA mine recruiting — district panel
teba = pd.read_csv('shared_resources/data/teba_panel/district_recruiting_panel.csv')
# Strike incidents
strikes = pd.read_csv('shared_resources/data/strike_panel/strikes_master_csv.csv')
# Elections
elections = pd.read_csv('shared_resources/data/elections/elections_completed_converted/schoeman_elections.csv')
File Conventions¶
All CSVs use:
- UTF-8 encoding, comma delimiters, header row
- NA for missing values (never empty strings or zero)
- Metadata columns:
_source_file,_category,_year,_converted_date
Data Quality¶
All 1,814 CSV files were quality-checked in March 2026:
- 179 files had generic headers (
col_0,col_1) — all recovered from source Excel files - 8 wide-format files trimmed from 256 columns to actual data width
- Duplicate files removed, backups preserved in
_originals/subfolders
See shared_resources/data/CSV_QUALITY_REPORT.md for the full audit.
Key Panels¶
| Panel | File | Rows | Use |
|---|---|---|---|
| Ag census geography | ag_census/geography_panel_1948_1994.csv |
13,113 | District-level agricultural production |
| Long-run districts | ag_census/long_run_district_panel_1918_1994.csv |
15,525 | Extended 55-year panel |
| TEBA recruiting | teba_panel/district_recruiting_panel.csv |
1,674 | Mine labor supply by district |
| Strikes master | strike_panel/strikes_master_csv.csv |
2,679 | Strike incidents with firm matches |
| District crosswalk | district_crosswalk_master.csv |
398 | Links naming schemes across datasets |
Geographic Crosswalks¶
Linking datasets across different geographic coding schemes is a core challenge. See Geographic Crosswalks for available crosswalk files.