Data Collection Challenges in Healthcare

March 18, 2019


Digital Transformation



We are often asked to visualise data and create digital dashboards for teams to analyse operations, plan activity and act based on accurate metrics.

The preparation of data can take a substantial time investment – time organisations would prefer to invest in analysis is instead spent on collection and cleaning.

New start businesses are often at an advantage as they begin with a clean sheet – no legacy applications to support, no convoluted transcription from one format to another. They can go directly to modelling, charting, interpretation and action. However, established businesses that have grown up with a range of software applications that have been refactored many times, often have massive challenges with basic data collection before they get anywhere near model building or data interpretation via digital dashboards.

Healthcare is a great example where systems from different periods and different providers co-exist to support operational processes, for example, patient intake, laboratory results, pharmacy dispensing, etc. Information exchange between different patient care teams operating different systems, working to different SOPs, recording different data, all sounds like a plan to make analysis difficult. Use cases to support cross-silo metrics and analysis were never part of the original systems design brief but are necessary today for an integrated view, either for data modelling, dashboarding or alerting. Business planning without an integrated data view is sub-optimal.

What challenges do Healthcare providers face to gain an integrated view of data? 

Data Collection Overhead

Collecting data is part of healthcare. There are robust processes in place to accurately record personal data and metrics from hundreds of patient-facing processes. However, few of these original processes have been implemented with analytics and cross-silo dashboards in mind. Where existing collection matches the digital dashboard requirements, all is well, however, analysts may require extended data collection or more frequent collection or additional tags and categories. The burden will fall on resources not focused on the later data use. That is, existing clinical roles will be expected to absorb the additional load without consideration of their other responsibilities. Clearly that is not a feasible approach for wide-spread adoption and collection, transcription and management of clinician-sourced extended data streams is an enabling step for data visualisation and digital dashboards to take off.

What about manual record keeping and transcription? What about the use of locally shared abbreviations and short cuts? What about social communication channels rather than fixed data communications? 

Patient or Machine Collected Versus Clinician Collected 

Many organisations are on the cusp of a data explosion (some are already there) where formats, volumes and speed of acquisition have changed significantly over the last few years. Trackers, micro-controllers and sensors account for some of the variety and growth but there are now patient collected sources to account for – apps that record health metrics either through direct measurement or user recording. The mix of sources and formats is greater than ever, and a massive increase in data counts from Internet of Things (IoT) devices should be expected. McKinsey (1) estimate that between 2012 and 2016, the number of sensors used globally underwent a seven-fold increase to 30 billion. The pharmaceutical company Amgen (2) reported that R&D robotics-based drug candidate screenings generate 200,000 data points per day. These observations indicate a future where data variety, pace and volume are the new normal. 

AdobeStock 206008116

What are some of the steps organisations can take to get their data in order and prepare for class-leading analysis and dashboarding capability? A data workflow that spans multiple sources and feeds analysis-ready data to modelling and charting tools is easy to write down as an action, harder to do. Here are a few of the topics we tend to look at. 

Develop Quality Assured Data Pipelines

This means embedding quality into a continuous process, rather than checking for quality at the end of each batch. 

A data pipeline is a component within a data workflow that takes original data from an operational system and prepares it for analysis. The pipeline might involve many steps such as cleaning data, normalising formats, queuing, aggregating, filtering, etc. In essence, operational systems connect at one end and analytics connect at the other. In the middle, is the pipeline that provides a quality-assured method of consistently bringing critical operational data to a digital dashboard. 

In Data Visualisation projects, Wyoming usually spend at least as much time on pipelines as on charting. The skill is to detect outage, out of range data and quality issues before they reach analysis, and to automate as much of the pipeline as possible. Consistent metrics that decision makers can trust will help move the needle.

Adopt Common Data Interchange Standards

When two teams or two systems exchange information, there are many options. Excel or Comma Separated Values (CSV), perhaps? For automated data streams and certainly for big data (high volume, mixed format data that arrives at pace) formats like Excel & CSV may need to be augmented.

Sometimes the exchange between two parties has a high degree of complexity and a specification, or even an industry standard, is required. For example, Standard Guide for Raw Material eData Transfer from Material Suppliers to Pharmaceutical & Biopharmaceutical Manufacturers, published by ASTM (3) is an XML format (an extensible data file format that allows very complicated data to be exchanged safely). An advantage of a standard is consistency across a wide set of collaborators and XML allows for future extension without big rework time.  

Agreeing a specification can be a project in its own right but is essential for high quality data visualisation outputs.

Close any Disconnect Between Analysis and Collection

Analysts, modellers and executives often desire cross-silo views with drill-down and pivot capability, from sources broadcasting in near real-time. However, operational systems may poll sub-systems or third parties irregularly, there may be gaps in data collection and there may be delays in acquisition. Some teams in different locations may operate to different SOPs, with different sets of required fields and even different styles of encoding the same information. 

Depending on the challenge, there may be a case for refactoring or replacing source systems but since this has operational impacts, the justification for analytics needs to be solid. However, the value of good data for planning cannot be understated and information is acknowledged as a key organisational asset. 

We are careful to avoid large system build projects, when we work on data visualisation but sometimes, the source systems are not fit for analysis purpose and tough choices need to be made.

(1) McKinsey Reference:

(2) Amgen Reference:

(3) ASTM Reference:

 If you have a data challenge in healthcare, we can help. Whether it’s practical solutions or strategic consultancy you’re looking for, get in touch, we’d love to help. 

About the author

Rob innes

Director and head of consultancy
Rob is the Director and Head of Consultancy at Wyoming and has over 20 years’ experience in digital transformation, using data to unlock value for numerous finance, life science and manufacturing organisations. Originally from a technology background in which he built applications for supply chain integration, customer self-service and customer acquisition, Rob is now primarily involved in helping organisations to do more with data-generating insights that are accessible to users regardless of their level of data expertise.

Latest insights

Experimenting with ChatGPT to set out a data strategy

How Customer Experience and design work together

What is the pharmaceutical industry doing to achieve net-zero?