Data Analysis Stack 2020


<- Back to Initiatives

Project Summary

Need: DRE has expressed, and others' observations confirm, that the DRE's data analysis processes exhibit characteristis of siloing, duplication, and reliance on manual tools which result in inefficiency and lowered responsiveness.
Purpose: To demonstrate a system that leverages technology to improve the DRE's data analysis and reporting processes.
Developer: Ms. Caitlyn Low, CMU/TCinGC Student Consultant 2020.
Local Counterpart: Edwel Ongrung, Palau MOE TCinGC Project Supervisor.
Project: Demonstration of a system composed of Data Source -> Data Pipeline -> Data Warehouse -> Data Analysis -> Reporting, using concepts of Business Intelligence, Data Analysis Stack, and Extract/Load/Transform.
Status: Successfully completed and turned over to MOE on August 14, 2020.
Next: First DRE should decide if they want to implement what has been demonstrated by the project. If they do, they should realize that there are many components to the effort. And each componet can easily represent several individual projects. At the least, DRE has two options:

  • Tackle it piece by piece while keeping the overview in mind (cheaper but takes a lot of time). In this case, there should be a project manager to oversee the effort and make sure the pieces fit together to reach the end goal.
  • Hire a systems consultant to build the system in one go (fast but can be very expensive). The two critical parts of getting a good result from a contract are 1) setting the appropriate and very clear specifications, and 2) having a competent inspector to ensure that the contractor's proposal is the best fit, and once contracted, that the contractor is meeting the specs. The DRE must ensure these are taken care of.
Background

The DRE has for a long time sought to improve the efficiency of its data analysis and reporting processes. Two examples are proved below to show the different aspects of the problem.

Example 1: Palau's report to the UNESCO Institute of Statistics (UIS).
External data, unclear work assignments, siloing, time intensive manual data processing and analysis

This is an external report which ends up being assigned to the DRE from time to time. The major issues are: certain categories of data and even certain key data are not MOE data so DRE does not have access to them; the indicators needed by UIS are different from indicators used by MOE so DRE ends up doing additional data collection; no one in the DRE has accountability for the report so it ends up being assigned by the supervisor to someone who has to start it from scratch. The results in a delayed and incomplete report and and a lack of consistency from year to year.

Example 2: MOE Statistics Digest
Informal or non-existent data collection process, siloing, time intensive manual data processing and analysis

This is an internal report using MOE data. It is made available to the public. The report needs data from many sources in the MOE, including school report cards, student registration and residence information, private school data (in terms of data, private schools can be considered internal as they are chartered by MOE), personnel data, curricular data, and others. The biggest issue is that everytime the report needs to be done, DRE has to go out to these data sources to ask for data. Since the the MOE has no formal reporting frameworks that ensure the data is collected, it becomes an imposition on the data sources (eg., schools which are busy with their own work and may view these collections as an imposition). If DRE manages to collect the data, the hard work of data entry (in many cases), cleaning, and verifying needs to take place. Then the manual work of analysing and reporting is assigned to someone to perform. That person does all the work on his laptop, generates the report, and submits it to his boss. The following year, all that has to be repeated even if it is the same person that is assigned. That is, each year is essentially a silo.

There are more examples but it should be obvious that the whole process from data source all the way to the publication of the report needs to be examined.

Even while these issues were apparent, in 2016 the DRE had to take on more data analysis tasks especially to monitor the progress of the new 10 year master plan and by extension, to monitor the progress and effectiveness of the MOE's annual management action plans.

We needed help in setting up a model of the situation so that we can clearly understand the steps in the process, the areas that need attention, and the kind of work needed to attend to those areas. Further, we needed help in exploring potential solutions.

Project Overview


Pics Who
Caitlyn Ms. Caitlyn Low is the developer for this project. She is a rising junior at Carnegie Mellon University and is working from home in California. Ms. Low's Resume.
Edwel Mr. Edwel Ongrung is the local counterpart for this project. Mr. Ongrung is the BEA's Administrative Services Manager and also the MOE's TCinGC projects supervisor.


The two will work on developing a clear model of the DRE's data analysis and reporting process and a demo of a possible solution for the DRE to consider.

Ms. Low successfully completed her projects and submitted her report to MOE management on August 14, 2020. She documented her model and solution demo in writing and on video for the DRE to review at their convenience.

Ms. Low worked with MOE to develop a clear understanding of the context. Then she developed and demonstrated a solution based on Data Pipeline, Data Analysis Stack, Data Visualization, and Business Intelligence concepts. Ms. Low first discussed the importance of the data source -> data pipeline -> data warehouse workflow, then demonstrated the benefits of the data warehouse -> analysis -> reporting using existing DRE data in an MOE database in combination with the Apache Superset business intelligence tool.

Current MOE Reporting Process

  • The analyzer/reporter repeats the whole process for each report
  • The workflow is: negotiate with data source -> extract data from data source -> clean the data -> put the date in tables that can be analyzed -> analyze the data -> format the analysis results for a report.
  • The analyzer/reporter has to figure out how he/she can improve the workflow so that his/her report is timely.

A Systemic Approach

  • New tools exist which can help the MOE. These are data visualization and business intelligence tools. Read More
  • The workflow in the current MOE reporting process can be organized as pieces of a system.
  • A system, The Data Analysis Stack (DAS), and its associated process, the Extract, Load, and Transform process, is recommended. It can help the MOE improve its reporting efficiency. Read More

Reports

  • Apache Superset, a data visualisation and business intelligence tool, is used to create statistics yearbook tables and graphs to demonstrate its capability to handle an MOE report. Below is a dashboard created from DRE's data. The report emulates tables and charts from the MOE Statistics Digest 2017.

Next steps

With the completion of the demo and its associated documentation, the DRE should have enough information to make a decision whether to pursue a systematic approache or not.

The systematic approach is composed of several individual components. Each component in turn can represent several projects. So take the systematic approach is not a simple undertaking. At the very least, the DRE has to decide on two options.

  • Tackle it piece by piece while keeping the overview in mind (cheaper but takes a lot of time). In this case, there should be a project manager to oversee the effort and make sure the pieces fit together to reach the end goal.
  • Hire a systems consultant to build the system in one go (fast but can be very expensive). The two critical parts of getting a good result from a contract are 1) setting the appropriate and very clear specifications, and 2) having a competent inspector to ensure that the contractor's proposal is the best fit, and once contracted, that the contractor is meeting the specs. The DRE must ensure these are taken care of.

Looking at the first option, to tackle it piece by piece, here are two example projects.

  • Extract, Load, Transform.
    • Presently being done "manually" using Microsoft Excel Spreadsheets, then inserting the data into the database
    • A project could be started to build tools to automiate the data clean up and automatically load the data into the database.
  • Data Analysis
    • Presently being done "manually" using Microsoft Excel spreadsheet. Then tables and charts are created, moved to a Microsoft Word document where the report is written. Then the document is converted to pdf and mailed out.
    • A project could be set up to use Apache Superset to grab the data from the database, generate a dashboard that has all the tables and charts, and publish it to the internet. Stakeholders then can view the reports online.

Please note that other components of the systematic approach will require top management involvement because they involve policy, protocols, job descriptions, etc., and work across units and even different agencies.

  • Data sources are frequently outside the jurisdiction of the DRE. Only top management can create policy and procedures that ensure that the data source puts data into the pipeline.
  • Data pipelines cross units within the MOE or even with outside agencies. Negotiations and MOUs may have to be done to ensure that the pipeline works and delivers data on time, something only top management can make into a reality.
  • The data warehouse needs to be built. Right now the MOE has an ad-hoc MySQL database in place that can act as a warehouse. If the systematic approach is taken, money will have to be spent to actually build the warehouse.
Project Detail

Coming soon ...

For now, the project's work website is here: project website.