Automated Survey Analytics Pipeline for Healthcare Education | SurveyMonkey, Snowflake, Tableau

Overview

Our client is a healthcare and medical education organisation that runs competitive, survey-based learning programmes for clinicians, residents, and academic medical professionals across the United States. The organisation conducts multiple educational event series, structured as races with individual laps, where participants respond to clinically oriented survey questions under time pressure, and performance is ranked across both individual questions and cumulative laps with top performers recognised and rewarded at the close of each event cycle. Prior to NeenOpal's engagement, every aspect of survey data collection, aggregation, scoring, and leaderboard generation was handled manually through Excel, an approach that introduced systematic risk of error and created significant operational burden for the internal team. NeenOpal designed and delivered an end-to-end automated pipeline, migrating the client from a fragmented, manual process to a fully governed, cloud-based analytics platform powering daily leaderboard refreshes across all active events.

Manual Steps Per Reporting Cycle

90%

Faster Leaderboard Delivery

Customer Challenges

A Manual Process That Could Not Scale

Manual Excel Aggregation Was Error-Prone and Labour-Intensive

Survey responses from SurveyMonkey and Mailchimp were being manually downloaded, combined, and scored in Excel by the internal team. The process was time-consuming, operationally fragile, and highly dependent on individual attention to detail, with any inconsistency such as duplicate email entries or incorrectly named surveys flowing directly into the leaderboard without validation.

No Unified View Across Multiple Concurrent Race Events

The client ran three concurrent educational race series, each comprising multiple laps and dozens of survey questions, with participants spread across different academic programmes and professional designations. Without a unified data store, the team had no consistent way to compare performance across events, track cumulative scores through a race, or generate programme-level rankings.

SurveyMonkey's API Data Structure Blocked Downstream Analysis

SurveyMonkey exposes response data through a REST API that requires a Premier licence and returns data in a format that cannot be directly loaded into reporting tools or spreadsheet environments. Without a dedicated extraction and transformation layer, the client had no automated mechanism to retrieve data at scale, apply scoring logic, and produce ranked outputs in a timely manner.

Data Quality Issues Undermined Trust in Results

The client's survey data contained recurring quality issues that, left unaddressed, would distort leaderboard rankings, including incorrectly captured email addresses, inconsistent survey naming conventions, and duplicate responses requiring resolution to the first submission. Without an automated validation layer, these issues had to be caught and corrected manually each cycle, adding overhead and risking errors reaching the published leaderboard.

Solutions

An End-to-End Automated Survey Analytics Platform

01.

SurveyMonkey API Integration and Python ETL Pipeline

A set of Python ETL scripts was built to connect to SurveyMonkey's REST API using a Premier licence access token, extract raw response data for all active surveys across the three race series, and load it into a structured relational database. The extraction logic ran on a daily incremental schedule, checking for new responses within a defined cutoff window and loading only net-new data while handling deduplication, zero-score assignment for unattempted questions, and standardised column naming.

02.

Relational Data Model and Ranking Engine

A purpose-built relational data model was designed to support both question-level and cumulative lap-level leaderboards across all three race events, with six structured tables capturing raw survey metadata, individual question responses, cumulative scoring through each lap, and final summarised rankings per race and per academic programme. The two-factor scoring methodology, ranking by correctness first and response speed second, was codified into the ETL pipeline rather than applied post-hoc, ensuring every leaderboard update reflected a consistent, reproducible calculation.

03.

Cloud Data Warehouse Migration to Snowflake

The solution was initially built on an Amazon RDS MySQL database hosted on AWS, providing a structured environment for the pipeline's output tables and supporting early Tableau dashboard connections. As the client's analytics maturity grew, the data infrastructure was migrated to Snowflake, preserving all existing data models and table structures while unlocking improved query performance, more scalable data management, and tighter integration with the Tableau Cloud environment.

04.

Tableau Cloud Dashboard Suite

A Tableau Cloud dashboard suite was built and published on top of the warehouse, connecting directly to the output tables and refreshing daily to reflect the latest survey results. The dashboards provided management with a real-time leaderboard view by race, lap, and question, programme-level rankings across academic affiliations, and a dedicated top-ten performers view designed to support prize and reward decisions, with viewer and creator licences provisioned for stakeholders across the organisation.

05.

Automated Validation and Data Quality Controls

A multi-stage quality assurance process was implemented to ensure dashboard outputs could be trusted by management before being used for reward decisions. Automated validation checks compared pipeline outputs against the client's existing manual Excel reports, flagging discrepancies for review and covering critical business logic including correct score assignment, tie-breaking rule application, and handling of edge cases such as partial responses and duplicate submissions.

Automate what your team still does by hand

Get in touch

Services

Python

SQL

Amazon RDS

Amazon EC2

Snowflake

Benefits

Complete Elimination of Manual Survey Scoring

Tasks that had previously required the internal team to download, combine, score, and validate Excel files after every survey cycle were entirely replaced by automated pipeline runs. This freed operational capacity, removed the risk of human error in score calculation and ranking, and reduced the time between survey completion and leaderboard publication from hours to minutes.

A Single Platform for Three Concurrent Race Events

By unifying data from three independent race series into a single, structured data model, the client gained a consistent and reliable view of performance across all active events for the first time. Management could access programme-level rankings, compare performance across laps within a race, and identify top performers across the entire educational programme without navigating separate files.

Daily Leaderboard Refresh Supporting Real-Time Decisions

The automated daily data pipeline ensured that management always accessed current leaderboard data when making reward decisions, without relying on a manual refresh or the availability of a specific team member. The incremental load architecture kept refresh times short while ensuring the Tableau dashboards reflected the latest state of every active race.

Trusted Data Through Automated Validation

The multi-stage validation process, which compared automated outputs against existing manual benchmarks before stakeholder sign-off, established a documented record of data accuracy that built and sustained confidence in the platform over time. The client could access the dashboards with the assurance that scores, rankings, and programme-level aggregations had been verified against a known baseline.

A Scalable Foundation Ready for Future Growth

The migration from MySQL to Snowflake positioned the client's analytics infrastructure for long-term growth, replacing a fixed-capacity relational database with a cloud warehouse designed to scale with increasing data volumes and more complex reporting requirements. The modular table architecture ensured that new survey types, additional race series, or expanded participant populations could be incorporated without restructuring the existing pipeline.

Conclusion

For a healthcare education organisation whose operational credibility depended on accurate, timely, and consistent leaderboard results, the reliance on manual Excel-based scoring was both a risk and a constraint. By engineering a complete automated pipeline from SurveyMonkey's API through a cloud data warehouse to daily-refreshed Tableau dashboards, NeenOpal eliminated that risk entirely and replaced it with a governed, scalable analytics platform. The client moved from a process defined by manual effort and exposure to human error to one characterised by operational reliability, data integrity, and the organisational confidence to make daily reward decisions on the strength of the data, with the subsequent migration to Snowflake securing the long-term value of the engagement as the client's programmes continue to scale.

FAQ

Frequently Asked Questions

Why was a custom ETL pipeline needed rather than using SurveyMonkey's built-in exports?

SurveyMonkey's native export functionality produces flat files that require manual download and significant post-processing before they can be used for ranking calculations. For an organisation running multiple concurrent race events with hundreds of participants across three series, manual exports would have maintained the same operational burden the client sought to eliminate. The custom Python ETL pipeline automated the extraction via SurveyMonkey's REST API, applied all scoring and ranking logic programmatically, and loaded clean, structured data into the warehouse on a daily schedule, removing the need for any manual intervention in the data collection and processing workflow.

How does the two-factor ranking methodology work?

The ranking system prioritises two variables for each survey question and lap. Correctness is assessed first: participants who answer correctly receive a score of one, while incorrect responses receive zero. Among those who answered correctly, ranking is then determined by response speed, measured as the time elapsed between when the survey question was opened and when the response was submitted. This means that a participant who answers correctly and earliest is ranked first, creating an incentive for both accuracy and engagement speed. At the lap level, cumulative scores and cumulative ranks across all questions in the lap are combined to produce a final overall ranking per race, with programme-level aggregations calculated separately for academic affiliation comparisons.

What is Snowflake and why was the infrastructure migrated to it?

Snowflake is a cloud-native data warehouse platform designed for scalable, high-performance analytics workloads. The client's initial infrastructure was built on Amazon RDS MySQL, which provided reliable relational database functionality for the early stages of the project. As the client's survey volume, participant base, and reporting requirements grew, Snowflake offered a more capable environment with better query performance at scale, more flexible data management, and tighter native integration with Tableau Cloud for dashboard refreshes. The migration preserved all existing data models and pipeline logic while positioning the client's analytics infrastructure to accommodate future growth without architectural rework.