Skip to content

Data Catalog

All datasets live in shared_resources/data/ as UTF-8 CSV files.

District-Level Data

These datasets can be joined on magisterial district (279 districts, stable ~1960-1990).

Dataset Years Unit Key Variables Page
Agricultural Census 1949-1990 District × year Employment by race, wages, mechanization Details
TEBA Mine Recruiting 1953-1975 District × year Mine recruit counts by origin Details
Population 1904-1996 District × census year Population by race, urban/rural Details
Forced Removals 1960s-1980s District (cross-section) Removal events, population moved Details
Elections 1938-1992 Constituency × year Votes by party, turnout Details

National & Sectoral Data

Not at district level — different units of observation.

Dataset Years Unit Key Variables Page
Strike Panel 1978-1990 Incident Company, union, workers, outcome Details
Manpower Survey 1965-1994 Sector-occupation-year Employment by race and gender (165K rows) Details
Mining Census 1964-1987 Mine or mineral type Gold production, costs, profits Details
Manufacturing Census 1948-1985 Industrial region Output, employment by industry Details
Industrial Wages 1973-1988 Industry-occupation-area Gazetted minimum wage rates (825 files) Details

Financial & Macro

Dataset Years Key Variables Page
Sanctions & Macro 1968-2015 Gold price, exchange rates, bond yields, credit spreads Details
Share Prices & Boards 1973-2015 JSE market values, director networks Details
Input-Output Tables 1967-1993 Sector flow matrices (48 files) Details

Supplementary

Dataset Description Page
Geography Reference State of emergency, labour law dates, ag suitability Details
Rural Property Farm land transfers 1939-1960 (4,711 rows) Details
CPS 1980 Household survey microdata (84K records) Details
Public Sector Government employment 1920-1980 (44 files) Details

Total: ~1,900 CSV files across 24 categories.

Analysis-Ready Panels (April 2026)

The panel builder pipeline (shared_resources/scripts/panel_builders/) has been fully tested. All 14 scripts produce validated output.

Panel Rows Coverage Notes
Master District 8,928 279 districts x 32 years Joins ag census, TEBA, population, removals, geography
Ag Census Employment 3,774 15 years (1949-1990) Boone cleaned data (1961-1983) + oDesk fallback
Manufacturing 2,054 7 years (1963-1985) 275 districts, 33 industry categories
Mining (mine-level) 696 17 years (1964-1987) Gold mine production, costs, profits
Electoral 3,123 10 elections (1943-1987) Geocoded to constituency coordinates
Strikes 3,956 1960-1991 51% geocoded to magisterial districts
TEBA Station ~400 23 years (1953-1975) Station-level recruiting volumes
Industry 1,262 1965-1994 Manpower survey + mining mine-year

Output location: shared_resources/data/panels/

Book Manuscript (April 2026)

The data infrastructure supports a complete ~67,000-word book manuscript (The Economic Contradictions of Apartheid) with: - 23 publication-quality figures (including homelands map from shapefiles, workforce composition, electoral event study) - 7 data tables plus balance/summary statistics tables in appendix - ~12 core regression specifications with extensive robustness checks - Data appendix with full variable definitions, covariate balance, and specification sensitivity analysis - Post-1994 empirical coda linking 1996 census to apartheid-era district characteristics

Key empirical findings:

  • The 1974 foreign labor shock tripled Black mine wages and drove national agricultural mechanization (+30% combines, +13% tractors, -10% Black employment)
  • The district-level combines DiD is specification-sensitive: +0.373 in pooled OLS with province FE, null (+0.017) in TWFE with district FE
  • The Eiselen Line electoral divergence is the lead finding: districts east of the line (where farmers competed with mines for Black labor) shifted +6.3pp toward right-wing parties by 1987 (p < 0.001)
  • The book honestly reports the specification sensitivity and reframes around the national wage shock → Eiselen electoral divergence narrative

Data Quality

All production census data validated in March-April 2026:

  • Agricultural census: Phases A-D complete (internal consistency, cross-year plausibility, Vision PDF verification, cross-panel reconciliation vs Boone). 99.1% cell match rate with Boone's independently cleaned panel.
  • Mining census: National totals cross-verified against Financial Statistics (independent publication). 0 discrepancies.
  • Manufacturing census: TABLE 1 (industry by district) is the cleanest census data type. One confirmed national total error in TABLE 1.2.
  • All corrections logged with backups in _originals/ subdirectories.

See Provenance & Validation for error rates, correction logs, and validation methodology.