Global Health
Transforming Community Health Data Ecosystem

The Living Goods Data Architecture in Kenya: A Blueprint for Modern BI
In our second installment of Data Portals, we are going to see how a modern data stack is helping save lives in Kenya and how an architectural decision to move from traditional ETL pipelines can facilitate the transformation of unpredictable data.
As a technical partner to the Ministry of Health, Living Goods plays a critical role in supporting the Kenyan government's Universal Health Coverage Policy for 2020-2030: supporting a network of over 4,000 community health workers (CHWs) reaching 1.6 Million Kenyans across counties including Isiolo, Kisumu, Busia, and Vihiga and developing and managing the electronic Community Health Information System (eCHIS) foundation for 107,000 CHWs nationwide to support integrated healthcare for children, maternal health, family planning, and immunization interventions.
However, to conduct similar operations in other intervention countries like Burkina Faso and Uganda became a massive technical hurdle for Living Goods at scale: fragmented, chaotic data.
Imagine the scene: In rural Busia, a field supervisor huddles with a CHW under the shade of an avocado tree. To understand the health of the community, the supervisor squints at the CHW's mobile phone screen, manually transcribing visit records into a physical logbook with a ballpoint pen.
This scene, captured during the early discovery phases of the project, illustrates the "data chaos" that once defined Living Goods' initiative we are going to talk about: The organization's mission was frequently hamstrung by a persistent, three-week reporting lag. With health data trapped in physical books, three weeks isn't just a technical delay; it is a barrier to providing life-saving care to a mother in labor or a child with malaria.
The challenges
The challenges were systemic:
- Technical and infrastructure: included collected data silos across multiple systems like CHT, CommCare, and DHIS2, leading to fragmented data. The legacy system could not perform incremental syncing due to the absence of common update timestamps, forcing full dataset refreshes, which caused all kinds of performance issues, such as network timeouts, server errors, and high latency. Thus, data extraction was unstable due to intermittent government server availability. Furthermore, unannounced form changes would break any existing data pipelines.
- Data quality and modeling: With the absence of an
updated_attimestamp and a unique ID for critical data like pregnancy. The door was 'open' for some field workers to artificially inflate their performance metrics by duplicating pregnancy or visit records, leading to inconsistent patient data with conflicting details like multiple Estimated Dates of Delivery (EDDs) or Dates of Birth, making it hard to track a mother's records over time. Additionally, the lack of a geographic and organizational hierarchy hindered the analysis of performance across different supervision levels. - Reporting and Analytics: Living Goods faced other significant challenges, including slow manual reporting as different department levels (field team, analytics team, and product management team) used their own business logic for data processing, resulting in inconsistent KPI reports and conflicting numbers, leading to delays of up to three weeks for routine data requests and reconciliations. High data latency and broken dashboards eroded trust in the data, while the lack of self-service tools created heavy reliance on the central data team.
To move from this data crisis to a reliable single source of truth, Living Goods partnered with Ona Insights to overhaul its reporting infrastructure and resolve the chaotic data. After the initial design review, Ona Insights opted for an Extract, Load, Transform (ELT) approach instead of Extract, Transform, Load (ETL). The decision was driven by several key factors: large data volumes, reaching up to 30 gigabytes per county, and the complexity of transforming nested JSON strings within an ETL pipeline was deemed too error-prone and time-intensive. Finally, transforming data in a traditional ETL pipeline before it landed in the warehouse would be too slow, complicated, and any unannounced form changes would break the entire system.
The blueprint for data success
- Smart, Secure Extraction with Airbyte: To extract data from the CouchDB applications, the team built a custom Airbyte connector. Because the source system lacked a reliable tracking cursor, our engineers designed a solution that automatically generates a timestamp as data leaves CouchDB, finally enabling true incremental data loading. This optimization drastically reduced the reporting cycle, bringing data freshness from monthly to daily. Crucially, while the project scope did not include full patient anonymization in the raw data flow, the Airbyte connector was specifically engineered to strip out CHW passwords to prevent severe security vulnerabilities.
- On-Premise Scalability with ClickHouse: To comply with strict legal requirements regarding health data, Living Goods needed a system that could be hosted entirely within its own infrastructure. ClickHouse, a highly scalable, open-source data warehouse, was selected. Capable of seamlessly handling the 100+ GB of incoming data, ClickHouse serves as the robust destination where all raw, un-nested JSON data lands safely.
- Taming Complex Business Logic with dbt: Once the data is in ClickHouse, dbt (data build tool) takes over to handle the heavy lifting of business logic. The transformations solved several massive operational headaches:
- The Pregnancy Puzzle: The most complex logic involved longitudinal maternal health tracking. The source apps did not generate a unique "Pregnancy ID" to link a mother's initial registration with her subsequent Antenatal (ANC) and Postnatal (PNC) visits. dbt was used to systematically compare timestamps and EDDs, filter out duplicates, and generate unique IDs to link a single pregnancy journey across multiple tables.
- Non-Breaking Form Changes: To handle the Ministry of Health's unannounced form updates, dbt uses "snapshots." If a form's structure changes, the pipeline doesn't crash; instead, it triggers a warning to the data team to review the new fields.
- Trust and Alerting: dbt automatically calculates data freshness and displays it directly on the final dashboards. If data goes stale, automated alerts are fired directly into a Slack channel.
- Secure, Role-Based Analytics in Power BI: The transformed data is ultimately visualized in Power BI. To resolve the issue of conflicting reports, dbt stitches the various county and country databases together into a single, holistic view. Power BI then uses Role-Based Access Control (RBAC) to ensure that field supervisors can only access data relevant to their specific supervision units, providing targeted, actionable insights. All data presented on these front-end dashboards is fully anonymized.
From Chaos to Clarity
A comprehensive assessment of the initial situation and documentation, like the Design Review Report, provided a shared vision and planning basis. The project was managed using an Agile Governance and Technical Management Framework with a phased, sprint-based implementation approach to deliver the technical MVP while building internal capacity for long-term system management. A strict feature-branch workflow on GitHub was used to protect the production environment, requiring code reviews by at least two collaborators before merging.
Automated deployments ensured the live warehouse always matches the approved codebase. Additionally, the project focused on organizational design, recommending roles like Data Engineers and Data Analysts, and establishing data quality processes involving multiple stakeholders.
The shift from a near-monthly reporting cycle to daily freshness has transformed field operations. Near real-time data allows for "daily conversations" between managers and field teams. Rather than performing retroactive troubleshooting, supervisors can identify a lack of syncs or service gaps within 24 hours, fostering a culture of accountability and building the trust in data necessary for government co-financing.
Takeaway 1: Moving from Monthly "Stitching" to Daily Freshness — From a monthly, manual "stitching" of data to a daily automated system, significantly improved efficiency and trust in the data. Now, daily updates provide timely insights, allowing stakeholders to focus on understanding the implications of the data rather than verifying its accuracy.
Takeaway 2: Why "Load First, Ask Questions Later" (ELT)? — The Ona Insights team made a pivotal decision to move away from traditional ETL in favor of an ELT pattern. By loading raw JSON data into ClickHouse and adjusting transformations after the data is safely stored, we made the new system more resilient.
Takeaway 3: Empowering Local Supervisors with Role-Based Truth — Data is only as transformative as the person empowered to act on it. The new architecture utilizes Role-Based Access Control in Power BI to ensure that the right data reaches the right hands. This clarity has birthed the Sync Activities Dashboard, a tool that monitors the technical health of the mobile application.
Takeaway 4: A Foundation for National Scale and Advanced AI — This modernization effort is a direct contribution to Kenya's Universal Health Coverage (UHC) agenda. By creating access to clean, warehouse-grade data, Living Goods has also built a prerequisite for future AI and Machine Learning augmentation initiatives like predictive modeling to identify the key drivers of CHW success or predict disease outbreaks before they escalate.
Conclusion
Strategic policy-making and donor reporting require a "Single Source of Truth". With a total architectural pivot toward a modern Data Stack, Living Goods, with Ona Insights' support, helped bridge the gap between field engineering and high-level social impact. The solution wasn't merely a new set of charts; it redefined how data-driven stewardship can save lives.
By treating data architecture as a pillar of government stewardship rather than a temporary tech fix, Living Goods is ensuring that data-driven health is a permanent fixture of the Kenyan landscape.
As social-impact organizations worldwide struggle with fragmented information, the journey from "data books" to "data stacks" offers a vital lesson: engineering integrity is the prerequisite for scaling impact.
Is your organization ready to trade the chaos of manual reporting for the clarity of a modern data stack?