Agricultural Census¶
District-level agricultural production data from South African agricultural census publications.
Analysis-Ready Panel (April 2026)¶
| File | Rows | Description |
|---|---|---|
panels/ag_census_district_panel.csv |
3,774 | Employment/wages/mechanization panel (15 years) |
panels/master_district_panel.csv |
8,928 | Joined panel (ag + TEBA + population + removals + geography) |
_boone_cleaned/data_census_ag_panel.csv |
3,097 | Boone's independently cleaned panel (authoritative for 1961-1983) |
geography_panel_1948_1994.csv |
13,113 | Cleaned geography panel |
long_run_district_panel_1918_1994.csv |
15,525 | Extended 55-year panel |
COMPLETE_YEAR_COVERAGE.csv |
25 | Year coverage inventory |
The panel builder (shared_resources/scripts/panel_builders/01_ag_census_panel.py) uses Boone's cleaned data as the primary source for 11 years (1961-1983), with oDesk-digitized CSVs as fallback for years Boone doesn't cover. See BOONE_METHODOLOGY_ALIGNMENT.md for methodology comparison.
Year-Specific Data¶
141 CSV files organized by census year:
| Folder | Years | Files | Notes |
|---|---|---|---|
agriculture_1949_converted/ |
1949-50 | 5 | Validated (1-6% error) |
agriculture_1951_converted/ |
1951-52 | 6 | |
agriculture_1956_converted/ |
1956 | 2 | Validated (0-23% error) |
agriculture_1961_converted/ |
1961-62 | 11 | Validated (0-23% error) |
agriculture_1967_converted/ |
1967-68 | 3 | |
agriculture_1970_converted/ |
1970-71 | 2 | |
agriculture_1972_converted/ |
1972-73 | 16 | Validated (86% pass) |
agriculture_1974_converted/ |
1974 | 6 | Validated — crisis start |
agriculture_1975_converted/ |
1975 | 6 | Validated |
agriculture_1976_converted/ |
1976 | 9 | Corrected (160 fixes, Dec 2025) |
agriculture_1978_converted/ |
1978 | 5 | Wide format trimmed |
agriculture_1979_converted/ |
1979 | 4 | |
agriculture_1980_converted/ |
1980 | 4 | |
agriculture_1981_converted/ |
1981 | 2 | Wide format trimmed |
agriculture_1983_converted/ |
1983 | 19 | Wide format trimmed |
Post-1983 Extractions (NEW 2026-03-28)¶
8 CSV files extracted from scanned bilingual PDFs via Claude Vision API. Extends the agricultural census panel beyond 1983.
1988 Census of Agriculture (Natal Development Region E, 119 pages):
| File | Rows | Contents |
|---|---|---|
agriculture_1988_principal_farmers_units.csv |
353 | Principal stats, farmer demographics, unit sizes |
agriculture_1988_employment_extracted.csv |
169 | Employment by race and worker type |
agriculture_1988_employment_clean.csv |
169 | Cleaned — correct column names (validated 2026-03-29) |
agriculture_1988_remuneration_income.csv |
215 | Wages + gross income from products |
agriculture_1988_production_animals.csv |
494 | Production volume/value + livestock |
agriculture_1988_expenditure_assets_debts.csv |
291 | Expenditure, asset values, farming debts |
1990 Agricultural Survey (95 pages):
| File | Rows | Contents |
|---|---|---|
agriculture_1990_principal_units.csv |
355 | Principal stats + unit sizes |
agriculture_1990_employment_extracted.csv |
524 | Employment by race (mixed tables — use clean version) |
agriculture_1990_employment_clean.csv |
219 | Cleaned — correct column names, employment only (validated 2026-03-29) |
agriculture_1990_remuneration_income_expenditure.csv |
937 | Wages, income, expenditure, assets |
Pre-1949 Extractions (NEW 2026-03-29)¶
pre_1949_extracted/ — 16 markdown extractions from pre-1949 agricultural census PDFs via Gemini Vision. These were previously unusable due to poor OCR quality from the original oDesk digitization.
| Census Year | Parts | Contents |
|---|---|---|
| 1921 | 1 | Agricultural-pastoral production |
| 1923-24 | 1 | Agricultural-pastoral production |
| 1926-27 | 1 | Agricultural-pastoral production |
| 1929-30/1934 | 1 | Agricultural-pastoral production |
| 1936-37 | 8 | Full census — livestock, crops, land use by district |
| 1938-39 | 3 | Agricultural-pastoral production |
| 1945-46 | 1 | Agricultural-pastoral production |
Also includes legacy CSV files (pre_1949_districts_improved.csv, pre_1949_districts_parsed.csv) from earlier parsing attempts.
Variables¶
Typical variables include:
- District identification (name, code, province)
- Employment: regular employees by race and gender, casual/seasonal workers
- Wages: annual wage bill by race category
- Machinery: tractors, combines, other equipment counts
- Land: area farmed, area irrigated
- Production: output by crop type
Provenance¶
- Source: South African agricultural census publications
- Digitization: oDesk contractors (2013-14), Excel format
- Conversion:
ops_admin/scripts/digitization/batch_convert.py - Source PDFs:
raw_archives/01_production_censuses/agriculture/source_pdfs/
Corrections¶
Corrections are logged in CORRECTIONS_LOG.md. All corrections have originals backed up to _originals/.
| Date | File | Corrections | Source |
|---|---|---|---|
| Dec 2025 | 1976 Part5 | 214 (160 regular + 54 casual employment) | Chris Boone corrections |
| Dec 2025 | 1976 Part7 | 1 (Potgietersrus wage extra digit) | Stata do-file |
| Dec 2025 | 1974_3.1 | 5 severe (Piet Retief, Mount Currie, OFS districts) | Claude Vision PDF verification |
| Mar 2026 | 4 files | 4 cross-year outliers (Belfast, Gordonia, Knysna, Pietersburg) | Internal consistency checks |
Phase D reconciliation (April 2026): 69 discrepancies vs Boone across 5 years. 99.1% cell match rate. 4 corrected in source CSVs, 65 resolved by adopting Boone's values in the panel builder. 1979 is the worst year (31 discrepancies, including Wynberg/Bellville row displacement).
Key Empirical Finding: Specification Sensitivity¶
The agricultural census data supports a district-level difference-in-differences analysis of the 1974 mine labor shock's effect on mechanization (combines adoption in open vs. closed recruiting districts). This result is specification-sensitive:
- Pooled OLS with province FE: +0.373 log-points (p < 0.05) — open districts adopted more combines post-1974
- TWFE with district FE: +0.017 (null) — the result disappears with district fixed effects
- Tractors placebo: +0.795 in pooled OLS (larger than combines), suggesting the pooled OLS captures pre-existing trends rather than a causal effect
The national-level patterns are robust: combines +30%, tractors +13%, Black employment -10% between 1972-1980. The wage shock was real and economically transformative — but it operated as a general equilibrium effect, not a differential one between open and closed districts. The book manuscript honestly reports this specification sensitivity and reframes around the Eiselen electoral divergence as the lead empirical finding. See Data Appendix D.0-D.3.