Data Catalog¶
All datasets live in shared_resources/data/ as UTF-8 CSV files.
District-Level Data¶
These datasets can be joined on magisterial district (279 districts, stable ~1960-1990).
| Dataset | Years | Unit | Key Variables | Page |
|---|---|---|---|---|
| Agricultural Census | 1949-1990 | District × year | Employment by race, wages, mechanization | Details |
| TEBA Mine Recruiting | 1953-1975 | District × year | Mine recruit counts by origin | Details |
| Population | 1904-1996 | District × census year | Population by race, urban/rural | Details |
| Forced Removals | 1960s-1980s | District (cross-section) | Removal events, population moved | Details |
| Elections | 1938-1992 | Constituency × year | Votes by party, turnout | Details |
National & Sectoral Data¶
Not at district level — different units of observation.
| Dataset | Years | Unit | Key Variables | Page |
|---|---|---|---|---|
| Strike Panel | 1978-1990 | Incident | Company, union, workers, outcome | Details |
| Manpower Survey | 1965-1994 | Sector-occupation-year | Employment by race and gender (165K rows) | Details |
| Mining Census | 1964-1987 | Mine or mineral type | Gold production, costs, profits | Details |
| Manufacturing Census | 1948-1985 | Industrial region | Output, employment by industry | Details |
| Industrial Wages | 1973-1988 | Industry-occupation-area | Gazetted minimum wage rates (825 files) | Details |
Financial & Macro¶
| Dataset | Years | Key Variables | Page |
|---|---|---|---|
| Sanctions & Macro | 1968-2015 | Gold price, exchange rates, bond yields, credit spreads | Details |
| Share Prices & Boards | 1973-2015 | JSE market values, director networks | Details |
| Input-Output Tables | 1967-1993 | Sector flow matrices (48 files) | Details |
Supplementary¶
| Dataset | Description | Page |
|---|---|---|
| Geography Reference | State of emergency, labour law dates, ag suitability | Details |
| Rural Property | Farm land transfers 1939-1960 (4,711 rows) | Details |
| CPS 1980 | Household survey microdata (84K records) | Details |
| Public Sector | Government employment 1920-1980 (44 files) | Details |
Total: ~1,900 CSV files across 24 categories.
Analysis-Ready Panels (April 2026)¶
The panel builder pipeline (shared_resources/scripts/panel_builders/) has been fully tested. All 14 scripts produce validated output.
| Panel | Rows | Coverage | Notes |
|---|---|---|---|
| Master District | 8,928 | 279 districts x 32 years | Joins ag census, TEBA, population, removals, geography |
| Ag Census Employment | 3,774 | 15 years (1949-1990) | Boone cleaned data (1961-1983) + oDesk fallback |
| Manufacturing | 2,054 | 7 years (1963-1985) | 275 districts, 33 industry categories |
| Mining (mine-level) | 696 | 17 years (1964-1987) | Gold mine production, costs, profits |
| Electoral | 3,123 | 10 elections (1943-1987) | Geocoded to constituency coordinates |
| Strikes | 3,956 | 1960-1991 | 51% geocoded to magisterial districts |
| TEBA Station | ~400 | 23 years (1953-1975) | Station-level recruiting volumes |
| Industry | 1,262 | 1965-1994 | Manpower survey + mining mine-year |
Output location: shared_resources/data/panels/
Book Manuscript (April 2026)¶
The data infrastructure supports a complete ~67,000-word book manuscript (The Economic Contradictions of Apartheid) with: - 23 publication-quality figures (including homelands map from shapefiles, workforce composition, electoral event study) - 7 data tables plus balance/summary statistics tables in appendix - ~12 core regression specifications with extensive robustness checks - Data appendix with full variable definitions, covariate balance, and specification sensitivity analysis - Post-1994 empirical coda linking 1996 census to apartheid-era district characteristics
Key empirical findings:
- The 1974 foreign labor shock tripled Black mine wages and drove national agricultural mechanization (+30% combines, +13% tractors, -10% Black employment)
- The district-level combines DiD is specification-sensitive: +0.373 in pooled OLS with province FE, null (+0.017) in TWFE with district FE
- The Eiselen Line electoral divergence is the lead finding: districts east of the line (where farmers competed with mines for Black labor) shifted +6.3pp toward right-wing parties by 1987 (p < 0.001)
- The book honestly reports the specification sensitivity and reframes around the national wage shock → Eiselen electoral divergence narrative
Data Quality¶
All production census data validated in March-April 2026:
- Agricultural census: Phases A-D complete (internal consistency, cross-year plausibility, Vision PDF verification, cross-panel reconciliation vs Boone). 99.1% cell match rate with Boone's independently cleaned panel.
- Mining census: National totals cross-verified against Financial Statistics (independent publication). 0 discrepancies.
- Manufacturing census: TABLE 1 (industry by district) is the cleanest census data type. One confirmed national total error in TABLE 1.2.
- All corrections logged with backups in
_originals/subdirectories.
See Provenance & Validation for error rates, correction logs, and validation methodology.