How Datasets Connect¶
The datasets in this archive use different geographic coding schemes, time periods, and units of observation. This page explains how they fit together.
The Master District Panel¶
The core geographic unit is the magisterial district — 279 administrative areas that were stable from roughly 1960-1990. The panel-building scripts in shared_resources/scripts/panel_builders/ assemble a master district-year panel by joining:
Master Roster (279 districts, time-invariant)
├── Agricultural Census (employment, wages, mechanization by year)
├── TEBA Recruiting (mine recruit counts by year)
├── Population (race composition by census year)
├── Forced Removals (cumulative removal counts)
└── Geography Controls (ag suitability, policy implementation dates)
↓
Master District Panel (district × year)
Run the pipeline:
cd shared_resources/scripts/panel_builders
make all # builds everything in dependency order + validates
District ID Systems¶
Different datasets use different numbering for the same 279 districts:
| System | Ordering | Used By | Example |
|---|---|---|---|
district_id |
Geographic (1=Namaqualand, 2=Calvinia, ...) | Agricultural census, geography panel | 1-279 |
distid |
Alphabetical (1=Aberdeen, 2=Adelaide, ...) | TEBA panel | 1-279 |
These are different numbers for the same districts. The crosswalk file crosswalk_teba_geography_panel.csv maps between them. The panel-building scripts handle this automatically.
Linking Elections to Districts¶
Electoral constituencies don't map 1:1 to magisterial districts (urban areas had multiple constituencies per district; large rural districts sometimes shared a constituency). Three crosswalk files handle different delimitation periods:
| Crosswalk | Elections Covered |
|---|---|
crosswalk_electoral_magisterial_districts_1966.csv |
Pre-1974 elections |
crosswalk_electoral_magisterial_1974.csv |
1974-1979 elections |
crosswalk_electoral_magisterial_districts_1980_clean.csv |
1980-1989 elections |
Linking Strikes to Districts¶
Strike incidents have location strings (e.g., "Dunlop factory, Durban, Natal") but not standardized district codes. The strike panel builder (08_strike_panel.py) attempts to match location names to districts, but coverage is partial. Manual geocoding would improve this.
What Cannot Be Joined (Yet)¶
| Dataset | Geographic Level | Why Not District-Level |
|---|---|---|
| Mining Census | Individual mine | Mines can be mapped to districts via nrc_member_mines.csv (17 districts with gold mines), but most mining data uses broad regions (OFS, Witwatersrand, etc.) |
| Manufacturing Census | Industrial region | Regions don't correspond to magisterial districts; no crosswalk exists |
| Manpower Survey | National (by sector) | No geographic disaggregation — sector × occupation × race only |
| Industrial Wages | Industry × area | "Areas" are bargaining council jurisdictions, not magisterial districts |
Temporal Coverage¶
Not all datasets cover the same years. Here's where they overlap:
1950 1955 1960 1965 1970 1975 1980 1985 1990 1995
Ag Census ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
TEBA ■■■■■■■■■■■■■■■■■■■■■■■■■
Population ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Elections ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Strikes ■■■■■■■■■■■■■■■■■■■■■■■■■■
Manpower ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Mining ■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Mfg Census ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Sanctions ■■■■■■■■■■■■■■■
The densest overlap is 1965-1985, when most datasets are available simultaneously.