Skip to content

Agricultural Census

District-level agricultural production data from South African agricultural census publications.

Analysis-Ready Panel (April 2026)

File Rows Description
panels/ag_census_district_panel.csv 3,774 Employment/wages/mechanization panel (15 years)
panels/master_district_panel.csv 8,928 Joined panel (ag + TEBA + population + removals + geography)
_boone_cleaned/data_census_ag_panel.csv 3,097 Boone's independently cleaned panel (authoritative for 1961-1983)
geography_panel_1948_1994.csv 13,113 Cleaned geography panel
long_run_district_panel_1918_1994.csv 15,525 Extended 55-year panel
COMPLETE_YEAR_COVERAGE.csv 25 Year coverage inventory

The panel builder (shared_resources/scripts/panel_builders/01_ag_census_panel.py) uses Boone's cleaned data as the primary source for 11 years (1961-1983), with oDesk-digitized CSVs as fallback for years Boone doesn't cover. See BOONE_METHODOLOGY_ALIGNMENT.md for methodology comparison.

Year-Specific Data

141 CSV files organized by census year:

Folder Years Files Notes
agriculture_1949_converted/ 1949-50 5 Validated (1-6% error)
agriculture_1951_converted/ 1951-52 6
agriculture_1956_converted/ 1956 2 Validated (0-23% error)
agriculture_1961_converted/ 1961-62 11 Validated (0-23% error)
agriculture_1967_converted/ 1967-68 3
agriculture_1970_converted/ 1970-71 2
agriculture_1972_converted/ 1972-73 16 Validated (86% pass)
agriculture_1974_converted/ 1974 6 Validated — crisis start
agriculture_1975_converted/ 1975 6 Validated
agriculture_1976_converted/ 1976 9 Corrected (160 fixes, Dec 2025)
agriculture_1978_converted/ 1978 5 Wide format trimmed
agriculture_1979_converted/ 1979 4
agriculture_1980_converted/ 1980 4
agriculture_1981_converted/ 1981 2 Wide format trimmed
agriculture_1983_converted/ 1983 19 Wide format trimmed

Post-1983 Extractions (NEW 2026-03-28)

8 CSV files extracted from scanned bilingual PDFs via Claude Vision API. Extends the agricultural census panel beyond 1983.

1988 Census of Agriculture (Natal Development Region E, 119 pages):

File Rows Contents
agriculture_1988_principal_farmers_units.csv 353 Principal stats, farmer demographics, unit sizes
agriculture_1988_employment_extracted.csv 169 Employment by race and worker type
agriculture_1988_employment_clean.csv 169 Cleaned — correct column names (validated 2026-03-29)
agriculture_1988_remuneration_income.csv 215 Wages + gross income from products
agriculture_1988_production_animals.csv 494 Production volume/value + livestock
agriculture_1988_expenditure_assets_debts.csv 291 Expenditure, asset values, farming debts

1990 Agricultural Survey (95 pages):

File Rows Contents
agriculture_1990_principal_units.csv 355 Principal stats + unit sizes
agriculture_1990_employment_extracted.csv 524 Employment by race (mixed tables — use clean version)
agriculture_1990_employment_clean.csv 219 Cleaned — correct column names, employment only (validated 2026-03-29)
agriculture_1990_remuneration_income_expenditure.csv 937 Wages, income, expenditure, assets

Pre-1949 Extractions (NEW 2026-03-29)

pre_1949_extracted/ — 16 markdown extractions from pre-1949 agricultural census PDFs via Gemini Vision. These were previously unusable due to poor OCR quality from the original oDesk digitization.

Census Year Parts Contents
1921 1 Agricultural-pastoral production
1923-24 1 Agricultural-pastoral production
1926-27 1 Agricultural-pastoral production
1929-30/1934 1 Agricultural-pastoral production
1936-37 8 Full census — livestock, crops, land use by district
1938-39 3 Agricultural-pastoral production
1945-46 1 Agricultural-pastoral production

Also includes legacy CSV files (pre_1949_districts_improved.csv, pre_1949_districts_parsed.csv) from earlier parsing attempts.

Variables

Typical variables include:

  • District identification (name, code, province)
  • Employment: regular employees by race and gender, casual/seasonal workers
  • Wages: annual wage bill by race category
  • Machinery: tractors, combines, other equipment counts
  • Land: area farmed, area irrigated
  • Production: output by crop type

Provenance

  • Source: South African agricultural census publications
  • Digitization: oDesk contractors (2013-14), Excel format
  • Conversion: ops_admin/scripts/digitization/batch_convert.py
  • Source PDFs: raw_archives/01_production_censuses/agriculture/source_pdfs/

Corrections

Corrections are logged in CORRECTIONS_LOG.md. All corrections have originals backed up to _originals/.

Date File Corrections Source
Dec 2025 1976 Part5 214 (160 regular + 54 casual employment) Chris Boone corrections
Dec 2025 1976 Part7 1 (Potgietersrus wage extra digit) Stata do-file
Dec 2025 1974_3.1 5 severe (Piet Retief, Mount Currie, OFS districts) Claude Vision PDF verification
Mar 2026 4 files 4 cross-year outliers (Belfast, Gordonia, Knysna, Pietersburg) Internal consistency checks

Phase D reconciliation (April 2026): 69 discrepancies vs Boone across 5 years. 99.1% cell match rate. 4 corrected in source CSVs, 65 resolved by adopting Boone's values in the panel builder. 1979 is the worst year (31 discrepancies, including Wynberg/Bellville row displacement).

Key Empirical Finding: Specification Sensitivity

The agricultural census data supports a district-level difference-in-differences analysis of the 1974 mine labor shock's effect on mechanization (combines adoption in open vs. closed recruiting districts). This result is specification-sensitive:

  • Pooled OLS with province FE: +0.373 log-points (p < 0.05) — open districts adopted more combines post-1974
  • TWFE with district FE: +0.017 (null) — the result disappears with district fixed effects
  • Tractors placebo: +0.795 in pooled OLS (larger than combines), suggesting the pooled OLS captures pre-existing trends rather than a causal effect

The national-level patterns are robust: combines +30%, tractors +13%, Black employment -10% between 1972-1980. The wage shock was real and economically transformative — but it operated as a general equilibrium effect, not a differential one between open and closed districts. The book manuscript honestly reports this specification sensitivity and reframes around the Eiselen electoral divergence as the lead empirical finding. See Data Appendix D.0-D.3.