Skip to content

Getting Started

Accessing the Data

All analysis-ready datasets are CSV files in shared_resources/data/. They are not tracked in git (too large), but the documentation, codebooks, and quality reports are.

To get the data files, contact Laurence Wilse-Samson at NYU Wagner.

Quick Start (Python)

import pandas as pd

# Agricultural census — district-level panel
ag = pd.read_csv('shared_resources/data/ag_census/geography_panel_1948_1994.csv')

# TEBA mine recruiting — district panel
teba = pd.read_csv('shared_resources/data/teba_panel/district_recruiting_panel.csv')

# Strike incidents
strikes = pd.read_csv('shared_resources/data/strike_panel/strikes_master_csv.csv')

# Elections
elections = pd.read_csv('shared_resources/data/elections/elections_completed_converted/schoeman_elections.csv')

File Conventions

All CSVs use:

  • UTF-8 encoding, comma delimiters, header row
  • NA for missing values (never empty strings or zero)
  • Metadata columns: _source_file, _category, _year, _converted_date

Data Quality

All 1,814 CSV files were quality-checked in March 2026:

  • 179 files had generic headers (col_0, col_1) — all recovered from source Excel files
  • 8 wide-format files trimmed from 256 columns to actual data width
  • Duplicate files removed, backups preserved in _originals/ subfolders

See shared_resources/data/CSV_QUALITY_REPORT.md for the full audit.

Key Panels

Panel File Rows Use
Ag census geography ag_census/geography_panel_1948_1994.csv 13,113 District-level agricultural production
Long-run districts ag_census/long_run_district_panel_1918_1994.csv 15,525 Extended 55-year panel
TEBA recruiting teba_panel/district_recruiting_panel.csv 1,674 Mine labor supply by district
Strikes master strike_panel/strikes_master_csv.csv 2,679 Strike incidents with firm matches
District crosswalk district_crosswalk_master.csv 398 Links naming schemes across datasets

Geographic Crosswalks

Linking datasets across different geographic coding schemes is a core challenge. See Geographic Crosswalks for available crosswalk files.