Skip to content

Source Archives

Raw archival materials are organized in shared_resources/raw_archives/ under 11 numbered domains. These are read-only — analysis-ready outputs go to shared_resources/data/.

All 11 domains were systematically audited in March 2026. Each has a comprehensive README with coverage tables, gap analysis, and digitization priorities.

Archive Structure

Code Domain Contents Key Finding (2026-03 audit)
01 Production Censuses Agricultural, manufacturing, mining census PDFs + contractor digitizations Pre-1949 ag OCR unusable; 1988/1990 ag + Rural Property PDFs undigitized
02 Elections Electoral results, delimitation reports, crosswalks Two pipelines (oDesk + Stata); 1974/1977 gaps now filled
03 Geography Maps, shapefiles, district classifications, rainfall Primarily GIS reference; tabular data promoted to data/
04 Population Population censuses (1960-2001), household surveys, manpower survey Manpower survey (165K rows, 1965-94) newly promoted
05 Labor Strikes, wages, forced removals, unions, public sector employment Strike WP3-32 (1978-84) not digitized; wages complete
06 Mining TEBA/NRC archive UJ books 1957-69 (37 PDFs) highest priority for panel extension
07 Industry UNIDO, input-output tables, firm-level data, enterprise surveys All structured (Excel/Stata); no digitization needed
08 Corporate McGregor Who Owns Whom, Orbis, Who's Who Fully digitized
09 Finance Macro series (SARB, FRED, World Bank), inflation, CPI All structured; no digitization needed
10 Social SAIRR race relations surveys, education, health, opinion surveys SAIRR is narrative, not tabular — demoted from digitization queue
11 Policy Legislation, Government Gazettes Text documents, not tabular data

Digitization Queue (updated March 2026)

# Domain What Status Notes
1 TEBA Archive UJ books 1957-1969 (37 PDFs) Pending Extends recruiting panel to 1950s
2 Production Censuses Rural Immovable Property (4 PDFs) COMPLETE 4,711 rows, all 4 years (1939-1960)
3 Production Censuses 1988 + 1990 agriculture censuses COMPLETE 3,338 rows, all table types
4 Production Censuses Pre-1949 agriculture re-digitization (8 PDFs) Pending Current OCR unusable
5 Labor Strike Working Papers 3-32 (1978-1984) Pending 32 PDFs, extends pre-1984 strikes
6 TEBA Archive Wage files (8 PDFs) Pending NRC wage schedules
7 Production Censuses Manufacturing 1950-61 summary Pending Fills manufacturing gap
8 Social SAIRR targeted time series Low priority Narrative; use as book source

Completed Digitization

Domain Files Method Date
Industrial Wages 825 CSVs Claude Vision API Nov 2025
Production Censuses (ag/mining/mfg) 231 CSVs oDesk contractors + batch_convert 2013-14, converted 2025-11
Elections 33 CSVs oDesk + Stata conversion 2013-14, Stata converted 2026-03
Public Sector Employment 43 CSVs Claude Vision API 2025
Manpower Survey 1 CSV (165K rows) Stata conversion 2026-03
Rural Immovable Property 5 CSVs (4,711 rows) Claude Vision API 2026-03
1988 + 1990 Ag Census 8 CSVs (3,338 rows) Claude Vision API 2026-03
TEBA WNLA 1957-1969 9 CSVs (3,226 rows) Claude Vision API 2026-03
I-O Tables 48 CSVs Excel conversion 2026-03
CPS 1980 2 CSVs (84K rows) Stata conversion 2026-03
AMP Firm Management 4 CSVs (19K rows) Promoted 2026-03

Total Archive

  • ~1,600 source PDFs across all domains
  • 1,900+ analysis-ready CSVs in shared_resources/data/
  • 100% provenance coverage — every CSV traceable to source