About the site
The Exploring Tanzanian Education site investigates the education sector in Tanzania by visualizing contextual socio-economic data, education outcomes and investment data, to provide insight into the state of education within Tanzania.
Methodology
Data work was carried out using OpenOffice and SQLite to join attribute data with spatial data for the purpose of visualization. All primary and secondary datasets, other than the exam results, were obtained and collected from Tanzania's Ministry of Education and Vocational Training, National Bureau of Statistics, and the Ministry of Finance, with corresponding region names.
Primary School Leaving Evaluation (PSLE) and Certificate of Secondary Education Examination (CSEE) exam results were obtained from the National Examinations Council of Tanzania (NECTA). Primary exam results were listed by region with gender and score categorizations and secondary were listed by school code and name. Percentages of students passing (C or higher for primary, or a passing division of I, II, III for secondary) were calculated for both the PSLE and the CSEE exam results. The primary exam results only contained region location identifiers and did not include a national school identifier code so only region-level aggregations were able to be visualized.
Exam results were obatained in pdf format, and were converted to html, then parsed with a python script that outputted to csv. A list of all secondary schools in 2012, with unique codes and district names, was then joined with the school codes found in the CSEE test results and aggregated by district. Only 75% of school codes matched between the 2012 school list and the 2011 exam school codes, resulting in a loss of accuracy for the aggregate pass rate. Other issues arose with multiple district names, for example, Dodoma, Dodoma Urban, and Dodoma Rural were all present in 2012 secondary school district list, however the only the latter two are considered districts, so only Dodoma Urban and Dodoma Rural were mapped. Averages, minimums, and maximums were calculated to be included in the visualization.
Data was matched to county administrative geographic data by region name. There is no official source of county administrative boundary geographic data. This site used county geographic data made publicly available online.
All maps were created with TileMill and hosted with MapBox. Color scales were determined based on a colorblind-safe color-picker tool within each of the visualized indicators. Photo on data page taken by Wendy Tanner.
Close